Advanced Atlantis: Plan Policies, Custom Workflows, and Integration Testing

Advanced Atlantis: Plan Policies, Custom Workflows, and Integration Testing

The basics of Atlantis—comment atlantis plan on a PR, review the plan, apply—are well documented. What's less covered is how to enforce policy at plan time, build reusable workflow components, and test your Atlantis configuration before it breaks a production PR. This post goes deep on those topics.

Policy-as-Code with conftest

Atlantis supports policy checking as a first-class feature. After terraform plan, Atlantis runs conftest against the plan JSON and blocks apply if policies fail.

Enable it in atlantis.yaml:

# atlantis.yaml
version: 3
policies:
  conftest_version: v0.45.0
  policy_sets:
    - name: security-policies
      path: policies/security
      source: local
    - name: tagging-policies
      path: policies/tagging
      source: local

The plan JSON that conftest receives looks like:

{
  "format_version": "1.1",
  "resource_changes": [
    {
      "address": "aws_s3_bucket.data",
      "type": "aws_s3_bucket",
      "change": {
        "actions": ["create"],
        "after": {
          "bucket": "my-data-bucket",
          "force_destroy": false
        }
      }
    }
  ]
}

Write Rego policies against this structure:

# policies/security/s3.rego
package main

import future.keywords.every

deny[msg] {
  resource := input.resource_changes[_]
  resource.type == "aws_s3_bucket"
  resource.change.actions[_] == "create"
  resource.change.after.force_destroy == true
  msg := sprintf(
    "S3 bucket '%s' has force_destroy=true. This is not allowed in production.",
    [resource.address]
  )
}

deny[msg] {
  resource := input.resource_changes[_]
  resource.type == "aws_s3_bucket"
  resource.change.actions[_] == "create"
  not resource.change.after.tags.team
  msg := sprintf(
    "S3 bucket '%s' is missing required 'team' tag.",
    [resource.address]
  )
}
# policies/security/iam.rego
package main

deny[msg] {
  resource := input.resource_changes[_]
  resource.type == "aws_iam_policy"
  resource.change.actions[_] == "create"
  doc := json.unmarshal(resource.change.after.policy)
  statement := doc.Statement[_]
  statement.Effect == "Allow"
  statement.Action == "*"
  statement.Resource == "*"
  msg := sprintf(
    "IAM policy '%s' grants full admin access (Action:* Resource:*). Use least-privilege instead.",
    [resource.address]
  )
}

Test the policies without Atlantis involved:

# Generate a plan JSON locally
terraform init
terraform plan -out=tfplan
terraform show -json tfplan > plan.json

<span class="hljs-comment"># Run conftest directly
conftest <span class="hljs-built_in">test plan.json \
  --policy policies/security \
  --policy policies/tagging \
  --namespace main

Write unit tests for your Rego policies:

# policies/security/s3_test.rego
package main

test_deny_force_destroy {
  deny[_] with input as {
    "resource_changes": [{
      "address": "aws_s3_bucket.test",
      "type": "aws_s3_bucket",
      "change": {
        "actions": ["create"],
        "after": {
          "bucket": "test",
          "force_destroy": true,
          "tags": {"team": "platform"}
        }
      }
    }]
  }
}

test_allow_no_force_destroy {
  count(deny) == 0 with input as {
    "resource_changes": [{
      "address": "aws_s3_bucket.test",
      "type": "aws_s3_bucket",
      "change": {
        "actions": ["create"],
        "after": {
          "bucket": "test",
          "force_destroy": false,
          "tags": {"team": "platform"}
        }
      }
    }]
  }
}
conftest verify --policy policies/security

Custom Pre/Post-Plan Hooks

Hooks run shell commands at specific points in the Atlantis workflow. They're the escape hatch for anything the built-in workflow doesn't handle.

Pre-plan: Enforce Module Versions

# atlantis.yaml
workflows:
  production:
    plan:
      steps:
        - env:
            name: ATLANTIS_TERRAFORM_VERSION
            command: 'cat .terraform-version 2>/dev/null || echo "1.8.0"'
        - run: |
            # Ensure no module sources use latest without a version pin
            if grep -r 'source.*github.com' . | grep -v '?ref=' | grep -v '#'; then
              echo "ERROR: Found unpinned GitHub module source. All modules must specify ?ref=<tag>"
              exit 1
            fi
        - init
        - plan

  staging:
    plan:
      steps:
        - init
        - plan

Post-plan: Cost Estimation

Run Infracost after planning and post the estimate as an Atlantis comment:

workflows:
  default:
    plan:
      steps:
        - init
        - plan:
            extra_args: ["-out", "$PLANFILE"]
        - run: |
            infracost breakdown \
              --path $PLANFILE \
              --format json \
              --out-file /tmp/infracost.json
            
            COST=$(jq -r '.totalMonthlyCost' /tmp/infracost.json)
            DIFF=$(jq -r '.diffTotalMonthlyCost' /tmp/infracost.json)
            
            echo "## Cost Estimate" >> $PLANFILE.txt
            echo "Total monthly: \$${COST}" >> $PLANFILE.txt
            echo "Delta: \$${DIFF}" >> $PLANFILE.txt

Post-apply: Slack Notification

workflows:
  production:
    apply:
      steps:
        - apply
        - run: |
            curl -s -X POST $SLACK_WEBHOOK_URL \
              -H 'Content-Type: application/json' \
              -d "{
                \"text\": \"Terraform apply completed in *$REPO_NAME/$DIR* by $USERNAME\",
                \"attachments\": [{
                  \"color\": \"good\",
                  \"text\": \"PR #$PULL_NUM\"
                }]
              }"

Testing Atlantis Workflows Locally

The atlantis testdrive command spins up a local Atlantis instance with your config, but it's limited. For full workflow testing, use Docker:

# docker-compose.yml for local Atlantis testing
version: <span class="hljs-string">'3.8'
services:
  atlantis:
    image: ghcr.io/runatlantis/atlantis:v0.28.0
    ports:
      - <span class="hljs-string">"4141:4141"
    environment:
      ATLANTIS_GH_USER: your-bot-user
      ATLANTIS_GH_TOKEN: <span class="hljs-variable">${GH_TOKEN}
      ATLANTIS_GH_WEBHOOK_SECRET: test-secret
      ATLANTIS_REPO_ALLOWLIST: github.com/your-org/*
      ATLANTIS_ATLANTIS_URL: http://localhost:4141
    volumes:
      - ./atlantis.yaml:/atlantis.yaml
      - ./policies:/policies
    <span class="hljs-built_in">command: server --config /atlantis.yaml

For workflow validation without a GitHub webhook, use the atlantis CLI to validate config:

atlantis validate --atlantis-yaml atlantis.yaml

Write an integration test script that exercises the full plan+policy cycle:

#!/bin/bash
<span class="hljs-comment"># scripts/test-atlantis-workflow.sh

<span class="hljs-built_in">set -euo pipefail

TERRAFORM_DIR=<span class="hljs-variable">${1:?Usage: $0 <terraform-dir>}

<span class="hljs-built_in">echo <span class="hljs-string">"=== Testing Atlantis workflow for: $TERRAFORM_DIR ==="

<span class="hljs-comment"># Step 1: Init and plan
<span class="hljs-built_in">cd <span class="hljs-string">"$TERRAFORM_DIR"
terraform init -backend=<span class="hljs-literal">false
terraform plan -out=tfplan.binary \
  -var-file=test.tfvars 2>&1

<span class="hljs-comment"># Step 2: Generate plan JSON
terraform show -json tfplan.binary > plan.json

<span class="hljs-comment"># Step 3: Run conftest
<span class="hljs-built_in">echo <span class="hljs-string">"--- Running policy checks ---"
conftest <span class="hljs-built_in">test plan.json \
  --policy ../../policies/security \
  --policy ../../policies/tagging \
  --namespace main \
  --output tap

<span class="hljs-comment"># Step 4: Run tflint
<span class="hljs-built_in">echo <span class="hljs-string">"--- Running tflint ---"
tflint --format compact

<span class="hljs-built_in">echo <span class="hljs-string">"=== Workflow test passed for: $TERRAFORM_DIR ==="

Integrate into CI to test the testing infrastructure:

# .github/workflows/test-atlantis-config.yml
name: Test Atlantis Config

on:
  pull_request:
    paths:
      - 'atlantis.yaml'
      - 'policies/**'

jobs:
  validate-policies:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Validate atlantis.yaml
        run: |
          docker run --rm \
            -v $(pwd):/workspace \
            ghcr.io/runatlantis/atlantis:latest \
            validate --atlantis-yaml /workspace/atlantis.yaml

      - name: Test conftest policies
        run: |
          conftest verify --policy policies/security
          conftest verify --policy policies/tagging

Team Patterns for Large Orgs

Per-Team Workflows

Large organizations often need different workflows per team—some require approval from a security group, others are autonomous.

# atlantis.yaml
repos:
  - id: github.com/example/platform
    allowed_overrides: [workflow, apply_requirements]
    allowed_workflows: [production, staging, sandbox]
    apply_requirements: [approved, mergeable]

  - id: github.com/example/sandbox
    allowed_overrides: [workflow, apply_requirements]
    allowed_workflows: [sandbox]
    apply_requirements: []

workflows:
  production:
    plan:
      steps:
        - run: ./scripts/pre-plan-checks.sh
        - init:
            extra_args: ["-backend-config=prod.hcl"]
        - plan
    apply:
      steps:
        - run: ./scripts/pre-apply-checks.sh
        - apply

  sandbox:
    plan:
      steps:
        - init
        - plan
    apply:
      steps:
        - apply

Directory-Level Apply Requirements

Require security team approval only for Terraform that touches IAM:

repos:
  - id: github.com/example/infra
    projects:
      - name: iam-prod
        dir: terraform/iam
        workspace: production
        apply_requirements: [approved, mergeable]
        required_approvals: 2
        
      - name: networking-prod
        dir: terraform/networking
        workspace: production
        apply_requirements: [approved, mergeable]

Preventing Concurrent Applies

Atlantis has a built-in queue, but for cross-repo dependencies, use workspace locking with an external backend:

# pre-plan hook: acquire lock
run: <span class="hljs-pipe">|
  aws dynamodb put-item \
    --table-name atlantis-locks \
    --item <span class="hljs-string">"{\"LockId\": {\"S\": \"$REPO_NAME-<span class="hljs-variable">$DIR\"}, \"Owner\": {\"S\": \"<span class="hljs-variable">$PULL_NUM\"}, \"TTL\": {\"N\": \"<span class="hljs-subst">$(date -d '+1 hour' +%s)\"}}" \
    --condition-expression <span class="hljs-string">"attribute_not_exists(LockId)"

<span class="hljs-comment"># post-apply hook: release lock
run: <span class="hljs-pipe">|
  aws dynamodb delete-item \
    --table-name atlantis-locks \
    --key <span class="hljs-string">"{\"LockId\": {\"S\": \"$REPO_NAME-<span class="hljs-variable">$DIR\"}}"

The Atlantis policy system—conftest integration, pre/post hooks, and per-repo workflow configuration—gives you control that goes well beyond the basic PR automation. The key is testing the testing infrastructure itself: Rego unit tests for policies, local workflow scripts in CI, and atlantis validate to catch config errors before they break a production apply.

Read more