Skip to main content

How to setup Source control in Azure Data Factory






Step by Step guide 

You can either configure source control at the time of ADF deployment or can keep it for later.

If you prefer to set up later Check Configure Git later in the Git Configuration tab and proceed. In this blog, we will set it up later.




Now let us set up source control in already deployed ADF resource. Navigate to your Data factory UI, click on the drop-down next to the Data Factory icon, and set up a code repository.


Let us consider only Azure DevOps Git in this blog.

Nest step is to choose the mode of linking your Azure integrated Git with ADF. You can either choose.

Select repository or Use repository link

Select repository option allows you to provide repository names and Use repository link enables you to use URL link in Project to connect repository for ADF source control. 


Picture

You need to set up the Azure DevOps project in Azure DevOps Organizations. Without this, you will not get any options in the dropdown shown in the Picture.! Check this to understand AzureDevops organization

Tip**** Make sure you sign out and sign in again, after clearing your browser cache in case you are not getting the Project name under the Azure Dev Ops organization you selected. This generally happens if you are configuring the DevOps project in parallel with ADF git setup and the browser cache is stale.


Common Errors

Verify that your Azure DevOps account is connected to the AAD account, the Azure DevOps repository is in the Default Directory (bla..bla..bla) tenant, and that your current ADF user account has been added to the Azure DevOps organization.


You can fix it by connecting the Azure Active directory used in your ADF resource to Azure DevOps Organizations. Follow the below steps.

 

Login to Azure DevOps Organization →  Organization Settings → Overview→ Azure Active Directory → Connect directory



If you have not received any errors then you are through and have successfully configured Azure DevOps Git in your ADF. Below is a picture how your current setup should look like in an ideal case.


The Collaboration branch is the one used to push code to Other Data factory instances - DEV to QA for instance.


You can follow general best practices such as 'always create a feature branch for new enhancements' and 'never directly push to masters' or follow any policies designed by your organization.


Save in data factory will act as commits in Azure Git Repo's working branch.


The picture above shows how you can toggle between different branches.

adf_publish branch has the ADF ARM template, which is used for Code Migrations.

*** You can also disconnect Git Repository from Source Contol in Management Hub.




Comments

Popular posts from this blog

How to work with XML files in Databricks using Python

This article will walk you through the basic steps of accessing and reading XML files placed at the filestore using python code in the community edition databricks notebook. We will also explore a few important functions available in the Spark XML maven library. Think of this article as a stepping stone in the databricks community edition. Features and functionalities elaborated herein can be scaled at the enterprise level using Enterprise editions of databricks to design reusable file processing frameworks. Requirements We will be using the Spark-XML package from Maven. **Spark 3.0 or above is required on your cluster for working with XML files. This article uses Databricks Community edition, refer to  this  video tutorial from Pragmatic works for getting started with  Databricks Community Edition . Create your first cluster in seconds : The next step is to install the Spark-XML library on your cluster. The cluster needs to be in a running state to install this library . This l

Learn Azure Data Factory.

     Azure Data Factory aka ADF is Azure's Data offering that caters development and orchestration of Data pipelines.  ADF empowers cloud developers to orchestrate their Databrick notebooks and various other codebases. This cloud-managed service is specially designed for complex hybrid ELT, ETL, and Data Integration solutions. ETL Tool - Azure Data Factory ADF is one among many data offerings by Azure and is designed to orchestrate data pipelines. Capabilities like Data flows make it a powerful ETL tool with an ever-growing list of data source integrations. How Azure Data Factory is licensed? Azure Data Factory is Azure's Platform as service (PaaS) solution. Azure Data Factory Components . It has a number of components wrapped in 'Author' and 'Management' options in the left pane. Author components( GA till date) include Pipelines, Dataflows, Datasets, Power Queries. Management Components are Integration Runtimes, Linked Services, Triggers, Global Paramet

Microsoft Fabric

Complete Analytics Platform  In its new Software as a service offering, Microsoft basically clubbed every tool in their Analytics portfolio and gave it a new name - Fabric :). Claims are Fabric can serve every Data stakeholder ranging from a developer working with Data Lake to a Sales associate working on a self-serve Powerbi report. Microsoft has implemented tenant centric architecture in Fabric like office 365, In optimal design an organization will have 1 fabric similar to 1 office 365 tenant for entire organization. Lake centric and Open  All the data and apps built on Fabric provided solutions will get stored at a single lake, It auto calculates the lineage for objects stored on a single data lake. It uses delta file format and parquet data storage for all the objects.  Advantage: Table storage is shared across the fabric workspace, suppose you have a data issue in a Synapse datawarehouse query, just run a fix on the data set using Synapse data engineering python notebook, and ref