Azure Pipeline Environments and Deployment Strategies

Shane

2/8/2026

25 min read

A comprehensive guide to Azure DevOps pipeline environments and deployment strategies including rolling, canary, approvals, lifecycle hooks, and multi-environment configurations.

ci-cd azure-devops pipelines deployment environments strategies

Azure Pipeline Environments and Deployment Strategies

Overview

Azure DevOps environments are first-class resources in YAML pipelines that represent the actual targets where your code gets deployed -- a Kubernetes cluster, a set of virtual machines, or a logical grouping that gates your releases. They give you approval workflows, deployment history, traceability from commit to production, and most importantly, deployment strategies like rolling and canary that would otherwise require significant custom scripting. If you are deploying anything beyond a toy project and you are not using environments, you are leaving safety and visibility on the table.

Prerequisites

Before working through this article, you should have:

An Azure DevOps organization and project with Pipelines enabled
Basic familiarity with YAML pipeline syntax (stages, jobs, steps)
A Node.js application with a build pipeline already producing artifacts
Access to create environments in your Azure DevOps project (Project Administrator or Environment Creator role)
Optionally, an Azure subscription with an AKS cluster or VMs for resource targets

What Are Environments in Azure DevOps?

An environment in Azure DevOps is not just a label you slap on a stage. It is a first-class resource that carries its own configuration, permissions, approval gates, and deployment history. When you reference an environment in a deployment job, Azure DevOps tracks every deployment to that environment, records which pipeline run deployed which commit, and enforces whatever checks you have configured before allowing the deployment to proceed.

This is fundamentally different from using stage-level variables or naming conventions to represent your environments. With a proper environment resource, you get:

Deployment history -- a full audit trail of what was deployed, when, by whom, and from which commit
Approval and check gates -- human approvals, business hours restrictions, branch control, and exclusive locks
Resource targeting -- direct association with Kubernetes namespaces or VM pools
Deployment strategies -- built-in rolling, canary, and runOnce strategies with lifecycle hooks

Think of environments as the deployment equivalent of service connections. They are a managed resource that you configure once and reference across pipelines.

Creating and Configuring Environments

You can create environments through the Azure DevOps UI or let your pipeline create them automatically on first reference. I recommend creating them explicitly through the UI so you can configure approvals and checks before the first deployment ever runs.

To create an environment through the UI, navigate to Pipelines > Environments > New Environment. Give it a name, an optional description, and choose whether to add resource targets immediately or leave it as a logical grouping.

In your YAML pipeline, you reference an environment in a deployment job:

stages:
  - stage: DeployStaging
    displayName: 'Deploy to Staging'
    jobs:
      - deployment: DeployStagingJob
        displayName: 'Deploy to Staging Environment'
        environment: 'staging'
        strategy:
          runOnce:
            deploy:
              steps:
                - script: echo "Deploying to staging"

Notice that the job type is deployment, not job. This is critical. A regular job does not support environment references, deployment strategies, or lifecycle hooks. The deployment job type is purpose-built for this.

If the environment named staging does not exist when this pipeline first runs, Azure DevOps creates it automatically. But it will be created with no approvals or checks, which is rarely what you want for anything beyond development.

Environment Approvals and Checks

Approvals and checks are the enforcement mechanism that makes environments useful for governance. You configure them on the environment itself, not in the pipeline YAML. This means that no matter which pipeline deploys to production, the same approval gates apply.

Manual Approvals

The most common check. Navigate to your environment, click the three-dot menu, select Approvals and checks, and add an Approvals check. You specify one or more approvers, set a timeout (how long the approval request stays active before auto-rejecting), and optionally allow the approver to defer the deployment.

A practical configuration for a production environment:

Approvers: Your team lead and a senior engineer (require any one to approve)
Timeout: 72 hours (gives people time across weekends)
Instructions: "Verify staging deployment passed smoke tests before approving production"

Branch Control

Branch control restricts which branches can deploy to an environment. For production, you almost always want to restrict deployments to main or master:

In the environment checks, add a Branch control check:

Allowed branches: refs/heads/main, refs/heads/release/*
This prevents feature branches from accidentally deploying to production

Business Hours

The business hours check prevents deployments outside of specified time windows. This is useful for production environments where you want deployments to happen only when the full team is available to respond to incidents:

Time zone: Your team's primary timezone
Business days: Monday through Friday
Start time: 09:00
End time: 16:00

Exclusive Lock

The exclusive lock check ensures only one pipeline run deploys to an environment at a time. When multiple runs target the same environment, they are serialized -- the second run waits until the first completes. You can configure this with two behaviors:

Sequential -- runs queue up and execute in order
Latest only -- only the most recent queued run proceeds, older queued runs are canceled

For production deployments, I almost always use exclusive lock with "latest only." If three commits are waiting to deploy, I only care about the most recent one.

Deployment Strategies

This is where environments become genuinely powerful. Azure DevOps supports three built-in deployment strategies, each with lifecycle hooks that let you run custom logic at specific points during the deployment.

runOnce

The simplest strategy. It deploys to all targets at once with no incremental rollout. Use this for development environments or any deployment target where you do not need gradual rollout.

strategy:
  runOnce:
    deploy:
      steps:
        - task: AzureWebApp@1
          inputs:
            azureSubscription: 'my-azure-connection'
            appName: 'my-node-app-dev'
            package: '$(Pipeline.Workspace)/drop/*.zip'

Rolling Deployment

A rolling deployment updates targets in batches. If you have 10 VMs, you can deploy to 2 at a time, verify each batch is healthy, and then move to the next batch. If a batch fails, the remaining batches are not updated.

strategy:
  rolling:
    maxParallel: 2
    deploy:
      steps:
        - script: |
            echo "Deploying to $(Environment.ResourceName)"
            npm install --production
            pm2 restart my-app
          displayName: 'Deploy application'
    routeTraffic:
      steps:
        - script: |
            echo "Routing traffic to updated instance"
          displayName: 'Route traffic'
    postRouteTraffic:
      steps:
        - script: |
            echo "Running health check on $(Environment.ResourceName)"
            curl --fail http://$(Environment.ResourceName):8080/health
          displayName: 'Validate health'
    on:
      failure:
        steps:
          - script: |
              echo "Deployment failed on $(Environment.ResourceName)"
              echo "Rolling back..."
              pm2 restart my-app --update-env
            displayName: 'Rollback on failure'
      success:
        steps:
          - script: echo "Batch deployment succeeded"
            displayName: 'Confirm success'

The maxParallel property controls how many targets receive the update simultaneously. You can specify an absolute number (maxParallel: 2) or a percentage (maxParallel: 25%). For a 10-node cluster, maxParallel: 2 means deploy to 2 nodes at a time, verify, then move on.

Canary Deployment

Canary deployments route a small percentage of traffic to the new version first, validate it, and then incrementally roll out to more targets. This is the safest strategy for production deployments of critical services.

strategy:
  canary:
    increments: [10, 25, 50, 100]
    preDeploy:
      steps:
        - script: echo "Preparing canary deployment - current increment $(Strategy.CycleSize)"
          displayName: 'Pre-deploy canary'
    deploy:
      steps:
        - script: |
            echo "Deploying canary to $(Strategy.CycleSize)% of targets"
          displayName: 'Deploy canary increment'
    routeTraffic:
      steps:
        - script: |
            echo "Routing $(Strategy.CycleSize)% of traffic to canary"
          displayName: 'Route traffic to canary'
    postRouteTraffic:
      steps:
        - script: |
            echo "Monitoring canary health for 5 minutes..."
            sleep 300
            curl --fail http://my-app-canary:8080/health
          displayName: 'Validate canary health'
    on:
      failure:
        steps:
          - script: |
              echo "Canary failed at $(Strategy.CycleSize)%. Rolling back."
            displayName: 'Canary rollback'
      success:
        steps:
          - script: echo "Canary deployment completed successfully"
            displayName: 'Canary success'

The increments array defines the rollout percentages. With [10, 25, 50, 100], the pipeline first deploys to 10% of targets, validates, then 25%, validates, then 50%, and finally 100%. If validation fails at any increment, the on.failure hook runs and the rollout stops.

Lifecycle Hooks in Detail

Every deployment strategy supports lifecycle hooks that execute at specific points during the deployment. Understanding these hooks is essential for building robust deployment pipelines.

Hook	When It Runs	Typical Use
`preDeploy`	Before the deployment starts	Database backups, feature flag checks, snapshot creation
`deploy`	The main deployment step	Actual application deployment
`routeTraffic`	After deploy, before validation	Load balancer updates, DNS changes, traffic shifting
`postRouteTraffic`	After traffic is routed	Health checks, smoke tests, integration tests
`on.success`	After all steps succeed	Notifications, cleanup, metric annotations
`on.failure`	If any step fails	Rollback, alerting, incident creation

Here is a practical example showing all hooks for a Node.js application:

strategy:
  runOnce:
    preDeploy:
      steps:
        - script: |
            echo "Creating database backup before deployment..."
            mongodump --uri="$(MONGO_URI)" --out=/tmp/backup-$(Build.BuildId)
          displayName: 'Backup database'
    deploy:
      steps:
        - task: DownloadPipelineArtifact@2
          inputs:
            buildType: 'current'
            artifactName: 'drop'
            targetPath: '$(Pipeline.Workspace)/drop'
        - script: |
            cd $(Pipeline.Workspace)/drop
            npm install --production
            pm2 stop my-app || true
            pm2 start app.js --name my-app
          displayName: 'Deploy Node.js app'
    routeTraffic:
      steps:
        - script: |
            echo "Updating load balancer to include new deployment"
            az network lb rule update \
              --resource-group my-rg \
              --lb-name my-lb \
              --name my-rule \
              --backend-pool-name new-pool
          displayName: 'Update load balancer'
    postRouteTraffic:
      steps:
        - script: |
            echo "Running smoke tests..."
            node tests/smoke-test.js
          displayName: 'Run smoke tests'
        - script: |
            echo "Checking health endpoint..."
            for i in 1 2 3 4 5; do
              STATUS=$(curl -s -o /dev/null -w "%{http_code}" http://my-app:8080/health)
              if [ "$STATUS" = "200" ]; then
                echo "Health check $i passed"
              else
                echo "Health check $i failed with status $STATUS"
                exit 1
              fi
              sleep 10
            done
          displayName: 'Health check validation'
    on:
      failure:
        steps:
          - script: |
              echo "Deployment failed. Restoring database backup..."
              mongorestore --uri="$(MONGO_URI)" /tmp/backup-$(Build.BuildId)
              pm2 restart my-app-previous || true
            displayName: 'Rollback deployment'
          - task: SendSlackNotification@1
            inputs:
              channel: '#deployments'
              message: 'FAILED: Deployment $(Build.BuildNumber) to $(Environment.Name)'
      success:
        steps:
          - script: |
              echo "Deployment successful. Cleaning up old backups..."
              rm -rf /tmp/backup-$(Build.BuildId)
            displayName: 'Cleanup'

Resource Targets

Environments can be associated with specific infrastructure targets. The two supported resource types are Kubernetes and Virtual Machines.

Kubernetes Resources

When you add a Kubernetes resource to an environment, Azure DevOps connects directly to your cluster and can deploy to a specific namespace:

environment: 'production.my-app-namespace'

The dot notation (environment.namespace) targets a specific Kubernetes namespace within the environment. This is powerful because it means your environment checks (approvals, branch control) apply at the namespace level.

To add a Kubernetes resource, go to your environment in the UI, click Add resource, select Kubernetes, and provide your cluster connection details. You can connect via Azure Kubernetes Service (direct integration) or a generic Kubernetes service connection.

Virtual Machine Resources

For VM-based deployments, you install the Azure Pipelines agent on each target VM and register it with an environment. This is how rolling deployments work -- the pipeline distributes the deployment across registered VMs.

To register a VM, go to your environment, click Add resource, select Virtual machines, and follow the registration script. The script installs the pipeline agent and registers the VM with your environment. You can tag VMs to target specific subsets:

environment:
  name: 'production'
  resourceType: VirtualMachine
  tags: 'web-server'

This targets only VMs tagged as web-server in the production environment.

Environment History and Traceability

One of the most underrated features of environments is the deployment history view. Navigate to Pipelines > Environments > [your environment], and you get a chronological list of every deployment, including:

Which pipeline and run number
Which commit triggered the deployment
Who approved it (if approvals are configured)
Whether it succeeded or failed
The duration of the deployment

This is invaluable during incident response. When something breaks in production, you can immediately see what was deployed recently and trace it back to the exact commit. You do not need to cross-reference build logs, deployment scripts, and git history manually -- it is all in one place.

You can also use the Azure DevOps REST API to query environment deployment history programmatically:

var https = require("https");

var org = "my-org";
var project = "my-project";
var envId = 5;
var pat = process.env.AZURE_DEVOPS_PAT;

var options = {
  hostname: "dev.azure.com",
  path: "/" + org + "/" + project + "/_apis/distributedtask/environments/" + envId + "/environmentdeploymentrecords?api-version=7.1",
  headers: {
    "Authorization": "Basic " + Buffer.from(":" + pat).toString("base64")
  }
};

https.get(options, function(res) {
  var data = "";
  res.on("data", function(chunk) {
    data += chunk;
  });
  res.on("end", function() {
    var records = JSON.parse(data);
    records.value.forEach(function(record) {
      console.log(record.definition.name + " - " + record.result + " - " + record.startTime);
    });
  });
});

Environment Permissions and Security

Environments have their own permission model, separate from pipeline permissions. You can control:

Creator -- who can create new environments
Reader -- who can view the environment and its deployment history
User -- who can reference the environment in their pipelines
Administrator -- who can manage approvals, checks, and permissions

For a mature setup, I recommend:

Restrict environment creation to Project Administrators
Grant User role on development environments broadly (all developers)
Grant User role on staging environments to your team leads
Grant User role on production environments only to the release pipeline service account
Grant Reader role on all environments to the broader team for visibility

You can also set pipeline-level permissions on environments. This restricts which specific pipelines can target an environment. Navigate to the environment, click Security, and under Pipeline permissions, add specific pipelines or choose "All pipelines."

Combining Environments with Variable Groups

A common pattern is pairing environments with variable groups to inject environment-specific configuration. Each environment references a different variable group containing connection strings, API keys, and feature flags appropriate for that stage:

stages:
  - stage: DeployDev
    variables:
      - group: 'app-config-dev'
    jobs:
      - deployment: Deploy
        environment: 'dev'
        strategy:
          runOnce:
            deploy:
              steps:
                - script: |
                    echo "Deploying with DB_HOST=$(DB_HOST)"
                    echo "Feature flags: $(FEATURE_FLAGS)"
                  displayName: 'Deploy with env-specific config'

  - stage: DeployProd
    variables:
      - group: 'app-config-prod'
    jobs:
      - deployment: Deploy
        environment: 'production'
        strategy:
          runOnce:
            deploy:
              steps:
                - script: |
                    echo "Deploying with DB_HOST=$(DB_HOST)"
                  displayName: 'Deploy with prod config'

Link your variable groups to Azure Key Vault for secrets. Never store connection strings or API keys directly in variable groups as plaintext.

Complete Working Example

Here is a complete multi-environment YAML pipeline for a Node.js application. It deploys to three environments: dev (runOnce), staging (rolling), and production (canary with health checks and automatic rollback).

# azure-pipelines.yml
trigger:
  branches:
    include:
      - main
  paths:
    exclude:
      - '*.md'
      - 'docs/**'

pool:
  vmImage: 'ubuntu-latest'

variables:
  nodeVersion: '20.x'
  artifactName: 'node-app'

stages:
  # ==========================================
  # BUILD STAGE
  # ==========================================
  - stage: Build
    displayName: 'Build and Test'
    jobs:
      - job: BuildJob
        displayName: 'Build Node.js Application'
        steps:
          - task: NodeTool@0
            inputs:
              versionSpec: '$(nodeVersion)'
            displayName: 'Install Node.js'

          - script: npm ci
            displayName: 'Install dependencies'

          - script: npm run lint
            displayName: 'Run linter'

          - script: npm test
            displayName: 'Run unit tests'

          - script: npm run build --if-present
            displayName: 'Build application'

          - task: ArchiveFiles@2
            inputs:
              rootFolderOrFile: '$(System.DefaultWorkingDirectory)'
              includeRootFolder: false
              archiveType: 'zip'
              archiveFile: '$(Build.ArtifactStagingDirectory)/$(artifactName)-$(Build.BuildId).zip'
              replaceExistingArchive: true
            displayName: 'Archive application'

          - publish: '$(Build.ArtifactStagingDirectory)/$(artifactName)-$(Build.BuildId).zip'
            artifact: '$(artifactName)'
            displayName: 'Publish artifact'

  # ==========================================
  # DEV DEPLOYMENT - runOnce
  # ==========================================
  - stage: DeployDev
    displayName: 'Deploy to Dev'
    dependsOn: Build
    condition: succeeded()
    variables:
      - group: 'app-config-dev'
    jobs:
      - deployment: DeployDevJob
        displayName: 'Deploy to Dev Environment'
        environment: 'dev'
        strategy:
          runOnce:
            preDeploy:
              steps:
                - script: echo "Starting dev deployment for build $(Build.BuildId)"
                  displayName: 'Pre-deploy notification'
            deploy:
              steps:
                - download: current
                  artifact: '$(artifactName)'

                - task: ExtractFiles@1
                  inputs:
                    archiveFilePatterns: '$(Pipeline.Workspace)/$(artifactName)/*.zip'
                    destinationFolder: '$(Pipeline.Workspace)/extracted'
                  displayName: 'Extract artifact'

                - task: AzureWebApp@1
                  inputs:
                    azureSubscription: 'azure-dev-connection'
                    appType: 'webAppLinux'
                    appName: 'my-node-app-dev'
                    package: '$(Pipeline.Workspace)/$(artifactName)/*.zip'
                    runtimeStack: 'NODE|20-lts'
                    startUpCommand: 'npm start'
                  displayName: 'Deploy to Azure App Service (Dev)'

            postRouteTraffic:
              steps:
                - script: |
                    echo "Running dev smoke tests..."
                    RESPONSE=$(curl -s -o /dev/null -w "%{http_code}" https://my-node-app-dev.azurewebsites.net/health)
                    if [ "$RESPONSE" != "200" ]; then
                      echo "Health check failed with status $RESPONSE"
                      exit 1
                    fi
                    echo "Dev deployment healthy"
                  displayName: 'Dev health check'

            on:
              failure:
                steps:
                  - script: echo "##vso[task.logissue type=warning]Dev deployment failed for build $(Build.BuildId)"
                    displayName: 'Log deployment failure'

  # ==========================================
  # STAGING DEPLOYMENT - Rolling
  # ==========================================
  - stage: DeployStaging
    displayName: 'Deploy to Staging (Rolling)'
    dependsOn: DeployDev
    condition: succeeded()
    variables:
      - group: 'app-config-staging'
    jobs:
      - deployment: DeployStagingJob
        displayName: 'Rolling Deploy to Staging'
        environment:
          name: 'staging'
          resourceType: VirtualMachine
          tags: 'web-tier'
        strategy:
          rolling:
            maxParallel: 2
            preDeploy:
              steps:
                - script: |
                    echo "Pre-deploy on $(Environment.ResourceName)"
                    echo "Current app version:"
                    pm2 describe my-node-app 2>/dev/null | head -5 || echo "App not yet deployed"
                  displayName: 'Capture current state'

            deploy:
              steps:
                - download: current
                  artifact: '$(artifactName)'

                - script: |
                    echo "Deploying to $(Environment.ResourceName)..."
                    APP_DIR=/opt/my-node-app

                    # Backup current version
                    if [ -d "$APP_DIR" ]; then
                      cp -r $APP_DIR ${APP_DIR}-backup-$(Build.BuildId)
                    fi

                    # Extract new version
                    mkdir -p $APP_DIR
                    unzip -o $(Pipeline.Workspace)/$(artifactName)/*.zip -d $APP_DIR

                    # Install production dependencies
                    cd $APP_DIR
                    npm ci --production

                    # Restart application
                    pm2 stop my-node-app 2>/dev/null || true
                    pm2 start app.js --name my-node-app --env production
                    pm2 save
                  displayName: 'Deploy and restart application'

            routeTraffic:
              steps:
                - script: |
                    echo "Enabling traffic to $(Environment.ResourceName)"
                    # Re-register with load balancer
                    az network lb address-pool address add \
                      --resource-group my-rg \
                      --lb-name staging-lb \
                      --pool-name staging-pool \
                      --name $(Environment.ResourceName) \
                      --ip-address $(Environment.ResourceName)
                  displayName: 'Add to load balancer'

            postRouteTraffic:
              steps:
                - script: |
                    echo "Health check on $(Environment.ResourceName)..."
                    RETRIES=5
                    DELAY=10
                    for i in $(seq 1 $RETRIES); do
                      STATUS=$(curl -s -o /dev/null -w "%{http_code}" http://$(Environment.ResourceName):8080/health)
                      if [ "$STATUS" = "200" ]; then
                        echo "Health check $i/$RETRIES passed"
                      else
                        echo "Health check $i/$RETRIES failed (HTTP $STATUS)"
                        if [ "$i" = "$RETRIES" ]; then
                          echo "All health checks failed. Marking deployment as failed."
                          exit 1
                        fi
                      fi
                      sleep $DELAY
                    done
                  displayName: 'Validate health'

            on:
              failure:
                steps:
                  - script: |
                      echo "Rolling back $(Environment.ResourceName)..."
                      APP_DIR=/opt/my-node-app
                      BACKUP_DIR=${APP_DIR}-backup-$(Build.BuildId)

                      if [ -d "$BACKUP_DIR" ]; then
                        pm2 stop my-node-app || true
                        rm -rf $APP_DIR
                        mv $BACKUP_DIR $APP_DIR
                        cd $APP_DIR
                        pm2 start app.js --name my-node-app --env production
                        echo "Rollback complete on $(Environment.ResourceName)"
                      else
                        echo "##vso[task.logissue type=error]No backup found for rollback on $(Environment.ResourceName)"
                      fi
                    displayName: 'Rollback on failure'

              success:
                steps:
                  - script: |
                      echo "Cleaning up backup for $(Environment.ResourceName)"
                      rm -rf /opt/my-node-app-backup-$(Build.BuildId)
                    displayName: 'Cleanup backup'

  # ==========================================
  # PRODUCTION DEPLOYMENT - Canary
  # ==========================================
  - stage: DeployProduction
    displayName: 'Deploy to Production (Canary)'
    dependsOn: DeployStaging
    condition: succeeded()
    variables:
      - group: 'app-config-production'
    jobs:
      - deployment: DeployProductionJob
        displayName: 'Canary Deploy to Production'
        environment:
          name: 'production'
          resourceType: VirtualMachine
          tags: 'web-tier'
        strategy:
          canary:
            increments: [10, 25, 50, 100]
            preDeploy:
              steps:
                - script: |
                    echo "========================================"
                    echo "PRODUCTION CANARY DEPLOYMENT"
                    echo "Build: $(Build.BuildId)"
                    echo "Increment: $(Strategy.CycleSize)%"
                    echo "========================================"
                  displayName: 'Canary pre-deploy'

            deploy:
              steps:
                - download: current
                  artifact: '$(artifactName)'

                - script: |
                    echo "Deploying to $(Strategy.CycleSize)% of production targets"
                    echo "Target: $(Environment.ResourceName)"

                    APP_DIR=/opt/my-node-app

                    # Create timestamped backup
                    TIMESTAMP=$(date +%Y%m%d%H%M%S)
                    cp -r $APP_DIR ${APP_DIR}-backup-${TIMESTAMP}

                    # Deploy new version
                    unzip -o $(Pipeline.Workspace)/$(artifactName)/*.zip -d $APP_DIR
                    cd $APP_DIR
                    npm ci --production
                    pm2 stop my-node-app || true
                    pm2 start app.js --name my-node-app --env production
                    pm2 save

                    echo "BACKUP_DIR=${APP_DIR}-backup-${TIMESTAMP}" > /tmp/deploy-meta.env
                  displayName: 'Deploy canary increment'

            routeTraffic:
              steps:
                - script: |
                    echo "Routing $(Strategy.CycleSize)% traffic to canary on $(Environment.ResourceName)"
                  displayName: 'Route traffic to canary'

            postRouteTraffic:
              steps:
                - script: |
                    echo "Monitoring canary at $(Strategy.CycleSize)% for 3 minutes..."
                    CHECKS=6
                    INTERVAL=30
                    FAILURES=0
                    MAX_FAILURES=2

                    for i in $(seq 1 $CHECKS); do
                      # Check HTTP status
                      STATUS=$(curl -s -o /dev/null -w "%{http_code}" http://$(Environment.ResourceName):8080/health)

                      # Check response time
                      RESPONSE_TIME=$(curl -s -o /dev/null -w "%{time_total}" http://$(Environment.ResourceName):8080/health)

                      echo "Check $i/$CHECKS: HTTP $STATUS, Response time: ${RESPONSE_TIME}s"

                      if [ "$STATUS" != "200" ]; then
                        FAILURES=$((FAILURES + 1))
                        echo "##vso[task.logissue type=warning]Health check failed ($FAILURES/$MAX_FAILURES allowed)"
                      fi

                      # Check if response time exceeds threshold (2 seconds)
                      SLOW=$(echo "$RESPONSE_TIME > 2.0" | bc -l 2>/dev/null || echo "0")
                      if [ "$SLOW" = "1" ]; then
                        FAILURES=$((FAILURES + 1))
                        echo "##vso[task.logissue type=warning]Response time too slow: ${RESPONSE_TIME}s"
                      fi

                      if [ "$FAILURES" -ge "$MAX_FAILURES" ]; then
                        echo "##vso[task.logissue type=error]Too many failures. Canary is unhealthy."
                        exit 1
                      fi

                      sleep $INTERVAL
                    done

                    echo "Canary healthy at $(Strategy.CycleSize)%"
                  displayName: 'Validate canary health'

                - script: |
                    echo "Checking error rate in application logs..."
                    ERROR_COUNT=$(pm2 logs my-node-app --nostream --lines 100 2>&1 | grep -c "ERROR" || echo "0")
                    echo "Errors in last 100 log lines: $ERROR_COUNT"
                    if [ "$ERROR_COUNT" -gt "10" ]; then
                      echo "##vso[task.logissue type=error]Error rate too high: $ERROR_COUNT errors"
                      exit 1
                    fi
                  displayName: 'Check error rates'

            on:
              failure:
                steps:
                  - script: |
                      echo "CANARY FAILED at $(Strategy.CycleSize)%"
                      echo "Initiating rollback on $(Environment.ResourceName)..."

                      if [ -f /tmp/deploy-meta.env ]; then
                        source /tmp/deploy-meta.env
                        if [ -d "$BACKUP_DIR" ]; then
                          pm2 stop my-node-app || true
                          rm -rf /opt/my-node-app
                          mv $BACKUP_DIR /opt/my-node-app
                          cd /opt/my-node-app
                          pm2 start app.js --name my-node-app --env production
                          echo "Rollback complete."
                        fi
                      fi
                    displayName: 'Rollback canary'

                  - script: |
                      echo "Sending failure notification..."
                      curl -X POST "$(SLACK_WEBHOOK_URL)" \
                        -H "Content-Type: application/json" \
                        -d "{\"text\":\"PRODUCTION CANARY FAILED at $(Strategy.CycleSize)% - Build $(Build.BuildId) rolled back automatically\"}"
                    displayName: 'Notify team of failure'

              success:
                steps:
                  - script: |
                      echo "Production canary deployment complete!"
                      echo "All increments deployed successfully."
                      curl -X POST "$(SLACK_WEBHOOK_URL)" \
                        -H "Content-Type: application/json" \
                        -d "{\"text\":\"Production deployment $(Build.BuildId) completed successfully via canary rollout\"}"
                    displayName: 'Notify success'

The health check script for the Node.js application referenced in the pipeline:

// health.js - Health check endpoint
var express = require("express");
var os = require("os");
var router = express.Router();

var startTime = Date.now();

router.get("/health", function(req, res) {
  var uptime = Date.now() - startTime;
  var memUsage = process.memoryUsage();

  var health = {
    status: "healthy",
    uptime: Math.floor(uptime / 1000),
    timestamp: new Date().toISOString(),
    hostname: os.hostname(),
    memory: {
      rss: Math.floor(memUsage.rss / 1024 / 1024) + "MB",
      heapUsed: Math.floor(memUsage.heapUsed / 1024 / 1024) + "MB",
      heapTotal: Math.floor(memUsage.heapTotal / 1024 / 1024) + "MB"
    },
    version: process.env.npm_package_version || "unknown",
    nodeVersion: process.version
  };

  // Check critical dependencies
  var checks = [];

  // Database connectivity check
  checks.push(checkDatabase());

  Promise.all(checks).then(function(results) {
    var allHealthy = results.every(function(r) { return r.healthy; });
    health.checks = results;

    if (allHealthy) {
      res.status(200).json(health);
    } else {
      health.status = "degraded";
      res.status(503).json(health);
    }
  }).catch(function(err) {
    health.status = "unhealthy";
    health.error = err.message;
    res.status(503).json(health);
  });
});

function checkDatabase() {
  var mongoose = require("mongoose");
  return new Promise(function(resolve) {
    var state = mongoose.connection.readyState;
    resolve({
      name: "database",
      healthy: state === 1,
      state: state === 1 ? "connected" : "disconnected"
    });
  });
}

module.exports = router;

And a simple smoke test script referenced in the postRouteTraffic hook:

// tests/smoke-test.js
var http = require("http");

var BASE_URL = process.env.APP_URL || "http://localhost:8080";
var tests = [
  { path: "/health", expectedStatus: 200 },
  { path: "/", expectedStatus: 200 },
  { path: "/api/status", expectedStatus: 200 },
  { path: "/nonexistent", expectedStatus: 404 }
];

var passed = 0;
var failed = 0;

function runTest(test, callback) {
  var url = BASE_URL + test.path;
  http.get(url, function(res) {
    if (res.statusCode === test.expectedStatus) {
      console.log("PASS: " + test.path + " returned " + res.statusCode);
      passed++;
    } else {
      console.log("FAIL: " + test.path + " expected " + test.expectedStatus + " got " + res.statusCode);
      failed++;
    }
    callback();
  }).on("error", function(err) {
    console.log("FAIL: " + test.path + " error: " + err.message);
    failed++;
    callback();
  });
}

function runAll(index) {
  if (index >= tests.length) {
    console.log("\nResults: " + passed + " passed, " + failed + " failed");
    process.exit(failed > 0 ? 1 : 0);
    return;
  }
  runTest(tests[index], function() {
    runAll(index + 1);
  });
}

runAll(0);

Common Issues and Troubleshooting

1. Environment Not Found After Rename

Error: Environment [old-name] does not exist or has not been authorized for use.

This happens when you rename an environment in the UI but forget to update the YAML pipeline. Environment references in YAML use the name, not an internal ID. After renaming, update every pipeline that references the old name. There is no automatic redirect.

Fix: Search your repository for the old environment name and update all references:

# Before
environment: 'old-name'

# After
environment: 'new-name'

2. Deployment Job Stuck Waiting for Approval

Error: The pipeline shows "Waiting for approval" indefinitely with no notification sent.

This usually means the approval check is configured but the specified approvers do not have notification subscriptions enabled, or the approvers were set to a team that has since been deleted.

Fix: Navigate to the environment, check the approvals configuration, verify the approvers still exist, and check their notification settings under User Settings > Notifications. Also verify you have not set the approval timeout to an unreasonably long value.

3. Rolling Deployment Targets Zero VMs

Error: No resources found in environment 'staging' matching the specified tags.

This happens when you specify resourceType: VirtualMachine with tags, but no VMs in the environment have matching tags, or the VM agents are offline.

Fix: Go to your environment in the UI and check the Resources tab. Verify that VMs are registered, online, and have the correct tags assigned. A common mistake is registering VMs with the agent but forgetting to add tags:

# This requires VMs tagged 'web-tier' to exist in the environment
environment:
  name: 'staging'
  resourceType: VirtualMachine
  tags: 'web-tier'

4. Canary Increments Not Working as Expected

Error: The strategy 'canary' is not supported for the pool type 'vmImage'.

Canary and rolling strategies require VM or Kubernetes resource targets. They do not work with Microsoft-hosted agents (vmImage: 'ubuntu-latest'). The strategy needs actual infrastructure targets to distribute the deployment across.

Fix: Add VM resources to your environment and specify resourceType: VirtualMachine in your deployment job, or use a Kubernetes resource. If you want to simulate canary behavior with Azure App Service, use the App Service deployment slots feature instead of the pipeline-level canary strategy.

5. Exclusive Lock Causing Pipeline Queue Backup

Error: Multiple pipeline runs pile up in a "Waiting" state with the message Waiting for exclusive lock on environment 'production'.

When you have exclusive lock configured and frequent commits, runs stack up. If you are using the "Sequential" lock behavior, every single run will execute in order, which can create long queues.

Fix: Switch the exclusive lock to "Latest only" behavior. This cancels intermediate queued runs and only executes the most recent one, which is almost always what you want for continuous delivery pipelines.

6. Lifecycle Hook Steps Failing Silently

Error: The deployment reports success even though postRouteTraffic checks should have caught issues.

This can happen if your health check script has a bug that causes it to exit with code 0 even on failure. A common mistake is using curl without --fail -- by default, curl returns exit code 0 even when the server responds with HTTP 500.

Fix: Always use curl --fail or explicitly check the HTTP status code:

# Wrong - exits 0 even on HTTP 500
curl http://my-app/health

# Correct - exits non-zero on HTTP 4xx/5xx
curl --fail http://my-app/health

# Most reliable - explicit status check
STATUS=$(curl -s -o /dev/null -w "%{http_code}" http://my-app/health)
if [ "$STATUS" != "200" ]; then
  echo "Health check failed with HTTP $STATUS"
  exit 1
fi

Best Practices

Create environments explicitly through the UI before first use. Do not rely on auto-creation from YAML. Auto-created environments have no approvals, no checks, and no security controls. Set up approvals and branch control before the first deployment ever targets the environment.
Use exclusive locks with "latest only" on production. There is almost never a reason to deploy an older commit when a newer one is already queued. The "latest only" behavior ensures you always deploy the most current version and avoids unnecessary queue buildup.
Put health checks in postRouteTraffic, not in deploy. The deploy hook should handle the actual deployment. Validation belongs in postRouteTraffic so that the pipeline's failure handling can distinguish between deployment failures and health validation failures. This separation makes rollback logic cleaner.
Back up before deploying, not after. Always create a backup or snapshot in the preDeploy hook, before any changes are made. If the deployment itself fails partway through, you need a clean state to roll back to. Backups created after deployment has started may capture a corrupted state.
Use variable groups linked to Key Vault for secrets. Never hardcode connection strings, API keys, or credentials in your YAML pipeline. Create a variable group that pulls from Azure Key Vault, and reference it at the stage level. This keeps secrets out of source control and gives you centralized secret rotation.
Start canary increments small. Begin with 5-10% of traffic, not 25%. The whole point of canary is to minimize blast radius. If your first increment is 25%, you have already exposed a quarter of your users to a potentially bad deployment. Use increments like [5, 15, 50, 100] for critical production services.
Set meaningful approval timeouts. A 30-day approval timeout is not a safety measure -- it is a forgotten deployment waiting to surprise someone. Use 24-72 hours for production approvals. If the deployment is not approved within that window, it should require a new pipeline run with fresh artifacts.
Tag your VM resources meaningfully. Use tags like web-tier, api-tier, and worker-tier to target specific subsets of your infrastructure. This lets you deploy web servers and API servers on different schedules using the same environment with different tag filters.
Monitor canary increments for at least 2-3 minutes. A quick health check that runs once is not sufficient. Production issues often manifest under load over time. Run multiple health checks with intervals between them to catch issues that only appear after the application has been handling traffic for a while.
Keep on.failure hooks simple and reliable. Your rollback logic should not depend on external services that might also be experiencing issues. A rollback that calls a third-party API to fetch the previous deployment version is fragile. Use local backups or well-known artifact locations that are guaranteed to be available.

Azure Pipeline Environments and Deployment Strategies

Azure Pipeline Environments and Deployment Strategies

Overview

Prerequisites

What Are Environments in Azure DevOps?

Creating and Configuring Environments

Environment Approvals and Checks

Manual Approvals

Branch Control

Business Hours

Exclusive Lock

Deployment Strategies

runOnce

Rolling Deployment

Canary Deployment

Lifecycle Hooks in Detail

Resource Targets

Kubernetes Resources

Virtual Machine Resources

Environment History and Traceability

Environment Permissions and Security

Combining Environments with Variable Groups

Complete Working Example

Common Issues and Troubleshooting

1. Environment Not Found After Rename

2. Deployment Job Stuck Waiting for Approval

3. Rolling Deployment Targets Zero VMs

4. Canary Increments Not Working as Expected

5. Exclusive Lock Causing Pipeline Queue Backup

6. Lifecycle Hook Steps Failing Silently

Best Practices

References

Quick Links

Recommended Reading

Retrieval Augmented Generation with Node.js

Need Expert Help?