Semgrep for Code Security Scanning: Rules, CI Integration, and Custom Patterns

Semgrep for Code Security Scanning: Rules, CI Integration, and Custom Patterns

Semgrep is a fast, language-agnostic static analysis tool that finds security vulnerabilities by pattern matching against source code. It ships with thousands of community rules for OWASP Top 10 vulnerabilities, secret detection, and language-specific security patterns. You can also write custom rules in YAML to catch patterns specific to your codebase. Semgrep integrates with GitHub Actions, GitLab CI, and pre-commit hooks.

Key Takeaways

Semgrep rules are readable YAML. A Semgrep rule specifies a pattern (what code to match), a message (what to tell the developer), a severity, and languages. No cryptic regex or query language.

The Semgrep registry has 5000+ community rules. Use p/owasp-top-ten, p/secrets, p/django, p/react, and other community packs before writing custom rules.

Metavariables match any expression. $X in a pattern matches any expression. $X.execute($QUERY) matches any method named execute called with any argument—useful for finding SQL injection sinks.

Pattern combinators find complex vulnerabilities. Use pattern-not to exclude safe patterns, pattern-inside to match within a specific context, and patterns to require multiple conditions simultaneously.

Autofix generates code fixes. Add a fix field to your rule and Semgrep can automatically apply it with semgrep --autofix—useful for safe API upgrades and configuration fixes.

Why Semgrep

Most SAST tools are either too slow, too noisy, or require extensive configuration. Semgrep is different:

  • Fast: scans large codebases in seconds, not minutes
  • Accurate: high-quality community rules with low false positive rates
  • Customizable: write rules in readable YAML using the same syntax as the pattern you're matching
  • Language-aware: understands code structure (AST), not just text—catches patterns regardless of whitespace, variable names, or comment placement

Installation

# macOS
brew install semgrep

<span class="hljs-comment"># Python (all platforms)
pip install semgrep

<span class="hljs-comment"># Docker
docker pull semgrep/semgrep

Running Semgrep

Quick Scan with Community Rules

# OWASP Top 10
semgrep --config=p/owasp-top-ten .

<span class="hljs-comment"># Secrets and API keys
semgrep --config=p/secrets .

<span class="hljs-comment"># Language-specific packs
semgrep --config=p/python-security .
semgrep --config=p/javascript .
semgrep --config=p/typescript .
semgrep --config=p/django .
semgrep --config=p/react .
semgrep --config=p/flask .

<span class="hljs-comment"># Multiple configs at once
semgrep --config=p/owasp-top-ten --config=p/secrets .

Scan Output Formats

# Human-readable (default)
semgrep --config=p/owasp-top-ten .

<span class="hljs-comment"># JSON for CI processing
semgrep --config=p/owasp-top-ten --json . > semgrep-results.json

<span class="hljs-comment"># SARIF for GitHub Code Scanning
semgrep --config=p/owasp-top-ten --sarif . > semgrep.sarif

<span class="hljs-comment"># Only show errors (exit code 1 if any found)
semgrep --config=p/owasp-top-ten --error .

Scoping the Scan

# Scan specific directory
semgrep --config=p/owasp-top-ten src/

<span class="hljs-comment"># Exclude directories
semgrep --config=p/secrets --exclude=node_modules --exclude=.venv .

<span class="hljs-comment"># Only scan specific file types
semgrep --config=p/python-security --include=<span class="hljs-string">"*.py" .

Writing Custom Rules

Custom rules are YAML files that describe patterns to find. Put them in a rules/ directory.

Basic Rule Structure

# rules/no-hardcoded-credentials.yaml
rules:
  - id: no-hardcoded-password
    pattern: |
      password = "..."
    message: |
      Hardcoded password detected. Use environment variables or a secrets manager instead.
      Replace with: password = os.environ.get("DB_PASSWORD")
    severity: ERROR
    languages: [python]
    metadata:
      cwe: "CWE-798"
      owasp: "A02:2021"

Metavariables

$VARIABLE matches any expression:

rules:
  - id: sql-injection-string-concat
    patterns:
      - pattern: |
          $CURSOR.execute($QUERY + $INPUT)
      - pattern: |
          $CURSOR.execute(f"... {$INPUT} ...")
    message: "SQL injection risk: user input concatenated into SQL query. Use parameterized queries."
    severity: ERROR
    languages: [python]
rules:
  - id: eval-with-user-input
    pattern: eval($USER_INPUT)
    message: "eval() with user-controlled input can execute arbitrary code."
    severity: ERROR
    languages: [javascript, typescript]

Pattern Combinators

patterns — ALL must match

rules:
  - id: requests-without-timeout
    patterns:
      - pattern: requests.$METHOD(...)
      - pattern-not: requests.$METHOD(..., timeout=...)
    message: "requests.$METHOD() called without timeout. Set timeout=30 to prevent hanging connections."
    severity: WARNING
    languages: [python]

pattern-either — ANY must match

rules:
  - id: insecure-hash-algorithm
    pattern-either:
      - pattern: hashlib.md5(...)
      - pattern: hashlib.sha1(...)
    message: "MD5 and SHA1 are cryptographically broken. Use hashlib.sha256() or hashlib.sha3_256() instead."
    severity: WARNING
    languages: [python]

pattern-inside — match within a context

rules:
  - id: debug-mode-in-production
    patterns:
      - pattern: app.run(debug=True)
      - pattern-inside: |
          if __name__ == "__main__":
            ...
    message: "Flask debug mode enabled. Never run with debug=True in production."
    severity: ERROR
    languages: [python]

pattern-not-inside — exclude certain contexts

rules:
  - id: print-in-non-test-code
    patterns:
      - pattern: print(...)
      - pattern-not-inside: |
          def test_$NAME(...):
            ...
    message: "print() found outside test code. Use logging instead."
    severity: INFO
    languages: [python]

Taint Analysis

Track user input from source to sink:

rules:
  - id: xss-flask-template
    mode: taint
    pattern-sources:
      - pattern: request.args.get(...)
      - pattern: request.form.get(...)
      - pattern: request.json
    pattern-sinks:
      - pattern: render_template_string(...)
      - pattern: Markup(...)
    message: "User input flows into HTML rendering without sanitization. Use Jinja2 auto-escaping."
    severity: ERROR
    languages: [python]

Autofix

Add a fix field to automatically patch the issue:

rules:
  - id: use-secrets-compare-digest
    pattern: $A == $B
    fix: hmac.compare_digest($A, $B)
    message: |
      Use hmac.compare_digest() for timing-safe comparison of secrets.
      Regular == can leak information via timing attacks.
    severity: WARNING
    languages: [python]
    metadata:
      confidence: LOW  # Only apply fix when context is clear

Apply fixes:

semgrep --config=rules/use-secrets-compare-digest.yaml --autofix .

JavaScript/TypeScript Rules

rules:
  - id: no-dangerously-set-inner-html
    pattern: dangerouslySetInnerHTML={{ __html: $X }}
    message: "dangerouslySetInnerHTML can lead to XSS. Ensure $X is sanitized before use."
    severity: WARNING
    languages: [javascript, typescript, jsx, tsx]

  - id: no-document-write
    pattern: document.write($X)
    message: "document.write() with user-controlled input leads to XSS. Use DOM manipulation methods instead."
    severity: ERROR
    languages: [javascript, typescript]

Organizing Rules

Structure your custom rules by category:

rules/
  injection/
    sql-injection.yaml
    command-injection.yaml
    template-injection.yaml
  secrets/
    hardcoded-credentials.yaml
    api-keys.yaml
  crypto/
    weak-algorithms.yaml
    insecure-random.yaml
  auth/
    jwt-issues.yaml
    session-config.yaml

Run all custom rules:

semgrep --config=rules/ .

CI Integration

GitHub Actions

# .github/workflows/semgrep.yml
name: Semgrep Security Scan
on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  semgrep:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Semgrep Scan
        uses: semgrep/semgrep-action@v1
        with:
          config: >-
            p/owasp-top-ten
            p/secrets
            p/python-security
          generateSarif: "1"
      
      - name: Upload SARIF to GitHub Code Scanning
        uses: github/codeql-action/upload-sarif@v3
        if: always()
        with:
          sarif_file: semgrep.sarif

With SARIF upload, findings appear as annotations in GitHub pull requests.

Custom Rules in CI

      - name: Run custom security rules
        run: semgrep --config=rules/ --error --json > custom-findings.json
      
      - name: Check for critical findings
        run: |
          CRITICAL=$(jq '[.results[] | select(.extra.severity == "ERROR")] | length' custom-findings.json)
          echo "Critical findings: $CRITICAL"
          if [ "$CRITICAL" -gt "0" ]; then
            exit 1
          fi

Pre-commit Hook

# .pre-commit-config.yaml
repos:
  - repo: https://github.com/semgrep/semgrep
    rev: v1.60.0
    hooks:
      - id: semgrep
        args: ['--config', 'p/secrets', '--error']

Install:

pip install pre-commit
pre-commit install

Managing False Positives

Mark a finding as a false positive with an inline comment:

# nosemgrep: semgrep-rule-id
password = "test_password"  # nosemgrep: no-hardcoded-password

Or ignore a file globally in .semgrepignore:

# .semgrepignore
tests/
vendor/
node_modules/
migrations/
*.min.js

Use --exclude-rule to disable specific rules for a scan:

semgrep --config=p/python-security --exclude-rule=python.lang.security.audit.non-literal-import .

Severity Levels and Exit Codes

Severity Exit Code Meaning
INFO 0 Finding for awareness
WARNING 0 Review recommended
ERROR 1 (with --error) Block CI pipeline

Configure CI to fail only on ERROR severity:

semgrep --config=rules/ --error .

Summary

Semgrep provides fast, accurate static analysis with a low barrier to entry:

  1. Start with community packsp/owasp-top-ten and p/secrets cover the most critical vulnerabilities
  2. Add language-specific packsp/django, p/react, p/flask for framework-specific patterns
  3. Write custom rules — use YAML pattern syntax to catch codebase-specific anti-patterns
  4. Integrate in CI — run on every PR, block on ERROR severity findings
  5. Use autofix — where safe, let Semgrep apply fixes automatically

The combination of broad community coverage and easy custom rule writing makes Semgrep the most practical SAST tool for developer teams who want security without a dedicated security team.

Read more