Hi All,
I’ve been working on a project that made use of Azure Data Factory and Azure DevOps recently. In this post I would like to explain how I went about creating my Azure Devops Pipeline, this pipeline is a multi stage pipeline created in YAML, I am not using the classic release pipelines.
I’m going to assume you have some experience in Azure DevOps before I continue.
Repo Folder Setup
I would like to show you how my DevOps repo has been setup, my repo is called “KDH-People-Workforce” in this repo I have created the following folders:
- data-factory
- md-organisation
- credentials
- data-flows
- datasets
- linked-services
- pipelines
- md-organisation
- pipelines
- build
- ci-templates
- data-factory
- ci-release
- dev
- ci-release
- build
I’ve included an image of my repo layout as well.
The Template
Within the ci-templates folder, I created a YAML template file called DeployADFResources-template.yaml
the full code is shown below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 |
parameters: - name: subscription type: string - name: resourceGroup type: string - name: datafactoryName type: string - name: workingDirectory type: string - name: adfResourceType type: string steps: - task: PowerShell@2 displayName: Read ADF JSON Files - ${{parameters.adfResourceType}} name: SETVAR inputs: targetType: 'inline' script: | # Write your PowerShell commands here. $folderPath = "${{parameters.workingDirectory}}" $jsonArray = @() if (Test-Path $folderPath) { Get-ChildItem -Path $folderPath -Filter *.json | ForEach-Object { try { $content = Get-Content -Path $_.FullName | ConvertFrom-Json $jsonArray += $content } catch { Write-Error "Failed to read or parse JSON file: $($_.FullName). Error: $_" } } # Combine all JSON objects into one JSON array $combinedJson = $jsonArray | ConvertTo-Json -Depth 10 -Compress Write-Host "Combined JSON Array" Write-Host $combinedJson # Encode the combined JSON as a Base64 string $encodedResult = [Convert]::ToBase64String([System.Text.Encoding]::UTF8.GetBytes($combinedJson)) # Store the result in a variable that can be accessed later Write-Host "##vso[task.setvariable variable=JsonResult;isOutput=true]$encodedResult" } else { Write-Error "Folder $folderPath does not exist." } - task: AzurePowerShell@5 displayName: Create/Update ADF Resource - ${{parameters.adfResourceType}} inputs: azureSubscription: 'MY_SUBSCRIPTION' ScriptType: 'InlineScript' azurePowerShellVersion: 'LatestVersion' Inline: | $decodedJson = [System.Text.Encoding]::UTF8.GetString([Convert]::FromBase64String("$(SETVAR.JsonResult)")) try { $jsonArray = $decodedJson | ConvertFrom-Json Write-Host "JSON is valid. Array contents:" Write-Host $jsonArray } catch { Write-Error "Error parsing JSON: $_" } # foreach ($item in $jsonArray) { # Write-Host "Name: $($item.name)" # Write-Host "Body: $($item.body | ConvertTo-Json -Depth 10)" # } $token = (Get-AzAccessToken).token foreach ($item in $jsonArray) { $url = "https://management.azure.com/subscriptions/${{parameters.subscription}}/resourceGroups/${{parameters.resourceGroup}}/providers/Microsoft.DataFactory/factories/${{parameters.datafactoryName}}/${{parameters.adfResourceType}}/$($item.name)?api-version=2018-06-01" # Define the headers for the API request $headers = @{ 'Authorization' = "Bearer $token" "Accept" = "application/json" "Content-Type" = "application/json" } # Convert the JSON body to a string to send it in the request $body = $item.body | ConvertTo-Json -Depth 10 -Compress # Make the API call using Invoke-WebRequest $response = Invoke-WebRequest -Method PUT -Uri $url -Headers $headers -Body $body # Output the response for logging or debugging Write-Host "Response for $($item.name): $($response.StatusCode)" } |
Discussing The Code
Now lets discuss the above YAML code.
1. Template Parameters
I am making use of template parameters, this template accepts the below parameters to make it reusable:
subscription
: Azure subscription ID where the resources are managed.resourceGroup
: Azure resource group containing the ADF resources.datafactoryName
: The name of the Azure Data Factory instance.workingDirectory
: Directory path containing JSON files for the ADF resources.adfResourceType
: The type of ADF resource to manage (e.g., pipelines, datasets).
2. First Task: Reading and Combining JSON Files
I’m using PowerShell to read the JSON files for the different ADF resources.
- Task:
PowerShell@2
- Purpose: Reads JSON files in the specified directory and combines them into a single JSON array, then encodes this array in Base64 format.
Key Steps in the Script
- Initialize Folder Path:
- Check if Folder Exists:
- If the folder exists, read all
.json
files.
- If the folder exists, read all
- Read JSON Files:
- Use
Get-ChildItem
to find.json
files in the directory. - Use
ConvertFrom-Json
to parse the content of each file into a PowerShell object. - Collect these objects in
$jsonArray
.
- Use
- Combine and Encode JSON:
- Convert the JSON array to a string using
ConvertTo-Json
. - Encode the string into Base64 using
[Convert]::ToBase64String
.
- Convert the JSON array to a string using
- Store Output as Variable:
- Use
Write-Host
with the##vso[task.setvariable]
syntax to set a variable (JsonResult
) that can be accessed in later tasks.
- Use
3. Second Task: Creating/Updating ADF Resources
- Task:
AzurePowerShell@5
- Purpose: Decode the combined JSON array, validate it, and make API calls to create or update ADF resources.
Key Steps in the Script
- Decode Base64 JSON:
- Validate JSON:
- Use
ConvertFrom-Json
to parse the decoded string. - Log the parsed JSON contents for debugging.
- Use
- Get Azure Token:
- Retrieve an access token using
Get-AzAccessToken
for authorization in API calls.
- Retrieve an access token using
- Iterate Over JSON Array:
- For each JSON object in the array:
- Construct the API endpoint URL for the ADF resource type and name.
- Define headers for the API request, including the token.
- Convert the resource body back to JSON format.
- Make a
PUT
API call to create or update the ADF resource.
- For each JSON object in the array:
- Log Response:
- Log the status code of the API call for each resource.
The YAML Pipeline
With the above template I now need to create the YAML release pipeline I named this pipeline “ci-build-release-md-organisation-dev.yaml“. the code for this is showing below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 |
trigger: branches: include: - develop paths: include: - data-factory/md-organisation variables: # Azure Resource Manager connection created during pipeline creation azureSubscription: "MY_SUBSCRIPTION" # Agent Pool name vmImage: "windows-latest" # Working Directory workingDirectory: "$(System.DefaultWorkingDirectory)/data-factory/md-organisation" subscription: SUBSCRIPTION_ID resourceGroupName: rg-kdh-dev-uks-1 ADFName: kcl-kdh-adf-dev-uks-1 parameters: - name: templateFolder type: string default: '../../../ci-templates' stages: - stage: DeployADFCredentials displayName: Deploy ADF Credentials jobs: - job: DeployCredential displayName: Deploy ADF Credentials pool: vmImage: $(vmImage) steps: - template : ${{ parameters.templateFolder }}/DeployADFResources-template.yaml parameters: subscription: "$(subscription)" resourceGroup: "$(resourceGroupName)" datafactoryName: "$(ADFName)" workingDirectory: "$(workingDirectory)\\credentials\\dev" adfResourceType: credentials - stage: DeployADFLinkedServices displayName: Deploy ADF Linked Services jobs: - job: DeployLinkedServices displayName: Deploy ADF Linked Services pool: vmImage: $(vmImage) steps: - template : ${{ parameters.templateFolder }}/DeployADFResources-template.yaml parameters: subscription: "$(subscription)" resourceGroup: "$(resourceGroupName)" datafactoryName: "$(ADFName)" workingDirectory: "$(workingDirectory)\\linked-services\\dev" adfResourceType: linkedservices - stage: DeployADFDatasets displayName: Deploy ADF Datasets jobs: - job: DeployDatasets displayName: Deploy ADF Datasets pool: vmImage: $(vmImage) steps: - template : ${{ parameters.templateFolder }}/DeployADFResources-template.yaml parameters: subscription: "$(subscription)" resourceGroup: "$(resourceGroupName)" datafactoryName: "$(ADFName)" workingDirectory: "$(workingDirectory)\\datasets\\dev" adfResourceType: datasets - stage: DeployADFDataFlows displayName: Deploy ADF DataFlows jobs: - job: DeployDataFlows displayName: Deploy ADF DataFlows pool: vmImage: $(vmImage) steps: - template : ${{ parameters.templateFolder }}/DeployADFResources-template.yaml parameters: subscription: "$(subscription)" resourceGroup: "$(resourceGroupName)" datafactoryName: "$(ADFName)" workingDirectory: "$(workingDirectory)\\data-flows\\dev" adfResourceType: dataflows - stage: DeployADFPipelines displayName: Deploy ADF Pipelines jobs: - job: DeployPipelines displayName: Deploy ADF Pipelines pool: vmImage: $(vmImage) steps: - template : ${{ parameters.templateFolder }}/DeployADFResources-template.yaml parameters: subscription: "$(subscription)" resourceGroup: "$(resourceGroupName)" datafactoryName: "$(ADFName)" workingDirectory: "$(workingDirectory)\\pipelines\\dev" adfResourceType: pipelines |
Discussing the Code
1. Triggers
- Branches: The pipeline triggers on changes to the
develop
branch. - Paths: Limits the trigger to changes in the
data-factory/md-organisation
folder.
2. Variables
- Azure Subscription: Specifies the Service Connection for deploying resources.
- Agent Pool: Uses the
windows-latest
image for the pipeline agent. - Working Directory: Points to the folder containing the ADF resource JSON definitions.
- Environment Details:
subscription
: Azure subscription ID.resourceGroupName
: Resource group where ADF resides.ADFName
: Name of the target Azure Data Factory.
3. Parameters
- Template Folder: Points to the folder containing the reusable deployment template (
DeployADFResources-template.yaml
).
4. Stages
Each stage corresponds to a specific ADF resource type, with a similar structure for deploying:
- Credentials
- Linked Services
- Datasets
- Data Flows
- Pipelines
Example: Deploy ADF Credentials
- Stage and Job:
- Each stage contains one or more jobs for deploying a specific resource type.
- The job’s pool uses the
$(vmImage)
variable.
- Reusable Template:
- The deployment logic is extracted into the
DeployADFResources-template.yaml
. - Parameters are passed to the template, including:
subscription
,resourceGroup
,datafactoryName
: Environment-specific details.workingDirectory
: Path to the JSON files for the resource type (credentials/dev
for this stage).adfResourceType
: Specifies the type of resource being deployed.
- The deployment logic is extracted into the
Other Stages
The structure is repeated for:
- Linked Services: Uses the
linked-services/dev
directory. - Datasets: Uses the
datasets/dev
directory. - Data Flows: Uses the
data-flows/dev
directory. - Pipelines: Uses the
pipelines/dev
directory.
The ADF JSON Files
I got the JSON files from the code view for the resource in ADF Studio. I then copied this code into a new file, I needed to modify the JSON slightly to include an additional property called “body“, below is an example of json for an ADF credential.
1 2 3 4 5 6 7 8 9 10 11 |
{ "name": "cred_managedidentity_adf", "body": { "properties": { "type": "ManagedIdentity", "typeProperties": { "resourceId": "MY_RESOURCE_ID" } } } } |
I proceeded to do the same for all the other needed ADF resources, I’ve include an image of how my repo looked.
That’s all, I hope this gave you some idea on how to deploy ADF resources using Azure DevOps.