Semgrep

Semgrep SAST: Writing Custom Rules and Integrating Static Analysis in CI

HelpMeTest

19 May 2026 — 7 min read

Semgrep is a fast, syntax-aware static analysis tool that lets you write custom security rules in YAML using code patterns rather than regular expressions. This guide covers rule syntax, writing custom rules for your codebase, using community rulesets, CI integration, and reducing false positives through triage and rule refinement.

Key Takeaways

Semgrep rules match code structure, not text. Unlike grep-based tools, Semgrep understands syntax — a pattern for subprocess.call($X) matches regardless of whitespace, variable names, or line breaks, and never matches inside comments or strings. Custom rules encode your team's security policies. You can write rules that enforce organisation-specific patterns — for example, flagging direct db.query() calls that bypass your validated query builder, or detecting use of deprecated internal APIs. The metavariable-pattern construct enables taint tracking. By chaining patterns with metavariable constraints, Semgrep can track data flow from untrusted sources to dangerous sinks without a full program analysis framework. semgrep --test validates rules against fixture files. Annotate test files with # ruleid: and # ok: comments, then run the test command to verify your rule matches what it should and nothing it should not. Community rulesets cover most OWASP Top 10 scenarios. The p/owasp-top-ten, p/javascript, and p/python rulesets provide immediate value without writing a single custom rule — start here before investing in custom rules.

Static Application Security Testing (SAST) tools have historically suffered from two problems: high false positive rates and difficulty extending them for your specific codebase. Semgrep addresses both. Its pattern language is close enough to real code that security engineers with no compiler background can write effective rules, and its dataflow analysis is powerful enough to catch non-trivial vulnerability patterns.

This guide covers the Semgrep rule format in depth, walks through writing custom rules for common vulnerability patterns, explains how to use community rulesets, and shows how to integrate Semgrep into GitHub Actions with a sustainable triage workflow.

Understanding Semgrep's Pattern Language

Semgrep patterns use the actual syntax of the target language, with special metavariables and ellipsis operators for matching variable parts.

Core Pattern Constructs

$X — Matches any single expression, statement, or identifier. The same metavariable used multiple times in a pattern must match the same value.

... — Matches zero or more of anything (arguments, statements, etc.). In function arguments: func(...) matches any call to func with any number of arguments.

$...ARGS — Spread metavariable: matches zero or more arguments and captures them as a list.

# Match any call to exec with any arguments
rules:
  - id: exec-call
    pattern: exec(...)
    message: "Avoid exec() calls"
    languages: [python]
    severity: WARNING

# Match subprocess.call with shell=True
rules:
  - id: subprocess-shell-true
    pattern: subprocess.call(..., shell=True, ...)
    message: "subprocess.call with shell=True is vulnerable to command injection"
    languages: [python]
    severity: ERROR

Pattern Combinations

Real rules often need to match something while excluding benign variants. Semgrep provides pattern-either, pattern-not, and pattern-inside for composition:

rules:
  - id: sql-injection-python
    patterns:
      - pattern: |
          cursor.execute($QUERY, ...)
      - pattern-not: |
          cursor.execute("...", ...)
      - pattern-not: |
          cursor.execute($QUERY, ($VALUE, ...))
    message: >
      Possible SQL injection: cursor.execute called with a non-literal
      query string and no parameterized values. Use parameterized queries.
    languages: [python]
    severity: ERROR
    metadata:
      cwe: "CWE-89"
      owasp: "A03:2021 - Injection"

This rule matches cursor.execute() calls where the first argument is a variable (not a string literal) and there is no second argument providing parameterised values — a classic SQL injection pattern.

Writing Custom Rules for Your Codebase

Rule Structure

Every Semgrep rule requires these fields:

rules:
  - id: unique-rule-id          # kebab-case, unique within your ruleset
    pattern: ...                # or patterns:/pattern-either:/etc.
    message: "Description"      # shown to developers, explain the risk and fix
    languages: [javascript]     # language(s) this rule applies to
    severity: ERROR             # ERROR, WARNING, or INFO

Optional but recommended:

    metadata:
      cwe: "CWE-79"
      confidence: HIGH
      likelihood: MEDIUM
      impact: HIGH
      subcategory: vuln         # vuln, audit, or best-practice
    fix: |
      sanitize($INPUT)          # automated fix pattern (experimental)

Example: Detecting Hardcoded JWT Secrets

rules:
  - id: hardcoded-jwt-secret
    pattern-either:
      - pattern: jwt.sign($PAYLOAD, "...")
      - pattern: jwt.verify($TOKEN, "...")
      - pattern: |
          const $SECRET = "...";
          ...
          jwt.sign($PAYLOAD, $SECRET)
    pattern-not:
      - pattern: jwt.sign($PAYLOAD, process.env.$VAR)
      - pattern: jwt.sign($PAYLOAD, $CONFIG.$KEY)
    message: >
      JWT secret appears to be hardcoded. Use an environment variable:
      jwt.sign(payload, process.env.JWT_SECRET)
    languages: [javascript, typescript]
    severity: ERROR
    metadata:
      cwe: "CWE-798"

Example: Detecting Dangerous `eval` Usage

rules:
  - id: eval-with-user-input
    patterns:
      - pattern: eval($EXPR)
      - pattern-not: eval("...")
      - pattern-inside: |
          function $FUNC(..., $REQ, ...) {
            ...
          }
    message: >
      eval() called inside a request handler with a non-literal argument.
      This is a critical code injection vulnerability.
    languages: [javascript]
    severity: ERROR
    metadata:
      cwe: "CWE-95"
      owasp: "A03:2021 - Injection"

Metavariable Patterns for Taint Tracking

metavariable-pattern lets you constrain what a metavariable matches, enabling basic taint tracking:

rules:
  - id: express-xss-response
    patterns:
      - pattern: res.send($OUTPUT)
      - metavariable-pattern:
          metavariable: $OUTPUT
          patterns:
            - pattern: req.$FIELD
            - pattern-not: sanitizeHtml(...)
    message: >
      Potential XSS: user input from req.$FIELD sent directly to res.send()
      without sanitization.
    languages: [javascript]
    severity: ERROR

Using Community Rulesets

The Semgrep registry at semgrep.dev/r contains thousands of rules maintained by Semgrep and the community. Use the -c flag with a registry identifier:

# Run OWASP Top 10 ruleset
semgrep -c p/owasp-top-ten .

<span class="hljs-comment"># Run JavaScript security rules
semgrep -c p/javascript .

<span class="hljs-comment"># Run Python security rules
semgrep -c p/python .

<span class="hljs-comment"># Run multiple rulesets
semgrep -c p/owasp-top-ten -c p/nodejs -c p/secrets .

<span class="hljs-comment"># Run specific rules by ID
semgrep -c r/javascript.express.security.express-puppeteer-injection .

Recommended Starting Rulesets

Ruleset	Command	Coverage
OWASP Top 10	`p/owasp-top-ten`	Injection, XSS, CSRF, XXE, deserialization
Secrets	`p/secrets`	API keys, passwords, tokens in source
JavaScript	`p/javascript`	Node.js, Express, React security
Python	`p/python`	Django, Flask, SQLAlchemy
TypeScript	`p/typescript`	Type-unsafe operations, injection
Docker	`p/dockerfile`	Dockerfile best practices

Testing Custom Rules with `semgrep --test`

Semgrep has a built-in testing framework that verifies rules against annotated test files. This prevents rules from drifting out of sync with the patterns they claim to catch.

Create a test fixture file alongside your rule:

# tests/test_sql_injection.py

import sqlite3
conn = sqlite3.connect("test.db")
cursor = conn.cursor()

# ruleid: sql-injection-python
cursor.execute("SELECT * FROM users WHERE id = " + user_id)

# ruleid: sql-injection-python
query = f"SELECT * FROM users WHERE name = '{username}'"
cursor.execute(query)

# ok: sql-injection-python (parameterized — should NOT match)
cursor.execute("SELECT * FROM users WHERE id = ?", (user_id,))

# ok: sql-injection-python (literal string — should NOT match)
cursor.execute("SELECT * FROM users")

Run the test:

semgrep --test rules/ tests/

Output:

Testing rules against test fixtures...
rules/sql-injection.yaml: 2 ✓, 0 ✗
All tests passed.

Rules fail the test if they produce false negatives (missing a # ruleid: annotation) or false positives (matching a # ok: annotation). This feedback loop is essential for rule quality.

GitHub Actions Integration

name: Semgrep SAST

on:
  pull_request: {}
  push:
    branches: [main]

jobs:
  semgrep:
    runs-on: ubuntu-latest
    container:
      image: semgrep/semgrep
    permissions:
      security-events: write
      contents: read

    steps:
      - uses: actions/checkout@v4

      - name: Run Semgrep with community + custom rules
        run: |
          semgrep \
            --config p/owasp-top-ten \
            --config p/secrets \
            --config ./semgrep-rules/ \
            --sarif \
            --output semgrep.sarif \
            --error \
            --exclude "node_modules,dist,build,*.test.js,*.spec.js" \
            .

      - name: Upload SARIF
        uses: github/codeql-action/upload-sarif@v3
        if: always()
        with:
          sarif_file: semgrep.sarif

The --error flag causes Semgrep to exit with code 1 if any findings exist, failing the CI step. If this produces too many alerts initially, remove --error and treat the SARIF upload as informational until you have triaged the existing findings.

Suppressing False Positives Inline

// nosemgrep: hardcoded-jwt-secret
const TEST_SECRET = "test-secret-for-unit-tests-only";

The nosemgrep comment suppresses specific rule IDs inline. This is preferable to per-file exclusions because it is scoped precisely and visible during code review.

Comparison with ESLint Security Plugins

ESLint's eslint-plugin-security is often the first SAST tool JavaScript teams reach for, since it integrates with existing ESLint config. The comparison is nuanced:

Aspect	Semgrep	eslint-plugin-security
Pattern model	Syntax-tree aware, cross-statement	Single-expression, AST-based
False positive rate	Lower (structural matching)	Higher (text/regex-based)
Custom rules	YAML, language-agnostic	JavaScript, ESLint-specific
Multi-language	Yes (30+ languages)	JavaScript/TypeScript only
CI integration	Standalone binary	Requires Node.js/ESLint setup
IDE integration	VS Code extension	Native ESLint integration

For JavaScript/TypeScript projects, running both is reasonable: ESLint for code style and basic security in the editor, Semgrep in CI for structural security analysis and cross-language coverage if your codebase includes Python or Go services.

Building a Triage Workflow

A sustainable SAST program requires a triage process that prevents the backlog from growing unboundedly.

Initial triage (when first enabling Semgrep): Run in --no-error mode, generate a JSON report, and triage every finding into one of: Fix immediately (Critical/High with high confidence), Fix in current sprint (Medium with clear exploit path), Suppress with comment (false positive or accepted risk), or Open ticket (real issue but not urgent).

Ongoing triage: Set --error with a baseline file to only fail on new findings:

# Save current findings as baseline
semgrep --config p/owasp-top-ten . --json > .semgrep-baseline.json

<span class="hljs-comment"># In CI: fail only on findings not in the baseline
semgrep --config p/owasp-top-ten . --baseline-commit HEAD~1 --error

Rule refinement: Track false positive rate per rule. Rules with >50% false positive rate in your codebase should be tightened with pattern-not clauses or removed from your CI config.

Summary

Semgrep's syntax-aware pattern matching makes it practical to write security rules that are both precise and maintainable — a combination that has historically been difficult to achieve with regex-based scanners. Starting with the OWASP Top 10 community ruleset provides immediate coverage, while the custom rule framework lets you encode organisation-specific security policies that no generic ruleset could know about. The --test command ensures rules stay accurate as the codebase evolves, and the GitHub Actions SARIF integration surfaces findings in the security dashboard without requiring developers to context-switch to a separate tool.

Semgrep SAST: Writing Custom Rules and Integrating Static Analysis in CI

HelpMeTest

Key Takeaways

Understanding Semgrep's Pattern Language

Core Pattern Constructs

Pattern Combinations

Writing Custom Rules for Your Codebase

Rule Structure

Example: Detecting Hardcoded JWT Secrets

Example: Detecting Dangerous `eval` Usage

Metavariable Patterns for Taint Tracking

Using Community Rulesets

Recommended Starting Rulesets

Testing Custom Rules with `semgrep --test`

GitHub Actions Integration

Suppressing False Positives Inline

Comparison with ESLint Security Plugins

Building a Triage Workflow

Summary

Read more

Testing React Router v7 with Vite + Vitest: Setup and Best Practices

E2E Testing React Router v7 Apps with Playwright

Migrating from Remix to React Router v7: Testing Your Migration

Testing React Router v7 Loaders and Actions with Vitest

Key Takeaways

Understanding Semgrep's Pattern Language

Core Pattern Constructs

Pattern Combinations

Writing Custom Rules for Your Codebase

Rule Structure

Example: Detecting Hardcoded JWT Secrets

Example: Detecting Dangerous eval Usage

Metavariable Patterns for Taint Tracking

Using Community Rulesets

Recommended Starting Rulesets

Testing Custom Rules with semgrep --test

GitHub Actions Integration

Suppressing False Positives Inline

Comparison with ESLint Security Plugins

Building a Triage Workflow

Summary

Read more

Testing React Router v7 with Vite + Vitest: Setup and Best Practices

E2E Testing React Router v7 Apps with Playwright

Migrating from Remix to React Router v7: Testing Your Migration

Testing React Router v7 Loaders and Actions with Vitest

Example: Detecting Dangerous `eval` Usage

Testing Custom Rules with `semgrep --test`