Azure Data Factory aka ADF is Azure's Data offering that caters development and orchestration of Data pipelines. ADF empowers cloud developers to orchestrate their Databrick notebooks and various other codebases. This cloud-managed service is specially designed for complex hybrid ELT, ETL, and Data Integration solutions.
ETL Tool - Azure Data Factory
ADF is one among many data offerings by Azure and is designed to orchestrate data pipelines. Capabilities like Data flows make it a powerful ETL tool with an ever-growing list of data source integrations.
How Azure Data Factory is licensed?
Azure Data Factory is Azure's Platform as service (PaaS) solution.
Azure Data Factory Components.
It has a number of components wrapped in 'Author' and 'Management' options in the left pane.
Author components( GA till date) include Pipelines, Dataflows, Datasets, Power Queries.
Management Components are Integration Runtimes, Linked Services, Triggers, Global Parameters, etc.
What is a Pipeline in Azure Data Factory?
Logical grouping of activities to perform a task. The task can be data extraction, data transformation or loading etcetera!
What is an Activity in Azure Data Factory?
Activity defines the action to be performed, for instance, Data copy in the Copy activity. Based on their actions, ADF activities can be split into three categories.
- Data Movement Activities- Copy Activity
- Data Transformation Activities- Mapping Dataflows, Stored Procedure Activity
- Control Activities- Until Activity, GetMetaData activity
What are the Chaining activities in Azure Data Factory?dependsOn (Add Activity On UI interface) property in an activity can be used in the latest ADF version to chain activities to one another. Unlike the old days when we had to configure the output of an activity as an input of the upstream activities for managing Control Flow. The snippet below explains the usage and options available.
Add Activity on Option selected for Success & Fail activity will become dependsOn property respective activities and can be viewed in XML code.
"name": "Success", | |
"type": "Delete", | |
"dependsOn": [ | |
{ | |
"activity": "BASE", | |
"dependencyConditions": [ | |
"Succeeded" | |
] | |
} | |
], |
Linked services in Azure Data Factory.
These are nothing but connection strings that contain connection-related information details required by ADF to connect to the source instance of a dataset. Datasets require a Linked service.
Integration Runtime in Azure Data Factory
Integration Runtime IR provides compute facility for various actions such as data movement, activity dispatching.
What are Mapping data flows?
Mapping data flow allows cloud engineers to set up visually created data transformation logic. Data flows can be executed inside a pipeline hence all the scheduling capabilities are available in case of data flows.
Data flows can be leveraged to create data pipelines for loading various types of Dimensions and Fact entities in a DW/BI application. Data flows simplify the creation of complex ETL logic which was quite a tedious task (using native ADF activities in an ADF pipeline) in the past when data flows were not available.
To enable Data Flow, While creating Azure Data Factory you need to choose version 2 with Data flows.
Currently, there are three versions available.
- Version 1
- Version 2
- Version 2 with Data Flows
How can we prevent sensitive data from being displayed in Monitor logs when passing/receiving inputs across ADF Activities?
We can check/uncheck the following options under the General table of activities to securely pass sensitive information across activities.
Comments
Post a Comment