Integration Runtime in Azure Data Factory

Integration runtime joins Activity and Linked services. It provides a compute environment to the Activity to process enlisted actions.

Azure IR	Self-Hosted	Azure-SSIS
Running Dataflows, data movement inside Azure	Data movement from externally hosted systems	To execute SSIS packages

Azure data factory can be hosted in any azure region of Customer’s choice and IR location can be independent of ADF’s azure regions. Generally, IRs are hosted in the Azure region where data movement, activity dispatching etc. is required.

IR behavior with AutoResolve Region in public network

Time To Live TTL in Integration Runtime

Auto resolve, Adhoc integration runtime clusters add a cluster acquisition time (approximately 4-5 mins) every time it spins up a new cluster for being used in a data flow. Thus it adds an additional compute setup time in total job time for dataflow execution, this behavior can be detrimental for overall execution time optimization of a batch load, data warehouse daily ETL load for instance.

To overcome this Microsoft has added TTL time to live feature in Integration Runtime. This feature waves off some of the extra time taken by each workflow in acquiring the cluster.

This Setting is designed for dataflow activities, and this pre-warming cluster feature basically operates at the core level and working is explained in blog.

Setting TTL significantly reduces overall batch timings when jobs are interdependent and are executed in a sequential manner.

The picture below gives a non-geeky explanation of the TTL feature. Batch 1 is a set of 3 jobs running in a sequence. Each having 5 minutes of spark cluster setup time and 10 minutes of data flow processing time. Batch 2 uses an integration runtime with TTL set to 10 minutes hence after setting up a cluster in Job A, the same cluster is used in subsequent jobs, and therefore Cluster warming and acquiring time is reduced to 2 minutes in both B and C and overall executions of batch 2 takes 4 minutes lesser than that of batch 1.

This time difference might like okay in a batch of three jobs but will have a mighty effect when job number is scaled out to thousands in real-life DW/BI batch loads.

**So TTL feature can be a very handy add-on to optimize batch timings.

Quick re-use

With TTL there still be a cluster grab time of approx. 2 minutes however this latest feature Quick re-use will further limit cluster acquiring time to few seconds. This is going to be a game-changer, and brings ADF at par levels with legacy data integration tools for instance Power Center where a job can directly start processing data without worrying about 'setting up compute facility'

Photo by Vitaly Vlasov from Pexels

Comments

Anonymous13 July 2021 at 10:04
good read.
ReplyDelete
Replies

Add comment

StoredProcs

Search This Blog