Semgrep SAST: Writing Custom Rules and Integrating Static Analysis in CI
Semgrep is a fast, syntax-aware static analysis tool that lets you write custom security rules in YAML using code patterns rather than regular expressions. This guide covers rule syntax, writing custom rules for your codebase, using community rulesets, CI integration, and reducing false positives through triage and rule refinement.
Key Takeaways
Semgrep rules match code structure, not text. Unlike grep-based tools, Semgrep understands syntax — a pattern for subprocess.call($X) matches regardless of whitespace, variable names, or line breaks, and never matches inside comments or strings.
Custom rules encode your team's security policies. You can write rules that enforce organisation-specific patterns — for example, flagging direct db.query() calls that bypass your validated query builder, or detecting use of deprecated internal APIs.
The metavariable-pattern construct enables taint tracking. By chaining patterns with metavariable constraints, Semgrep can track data flow from untrusted sources to dangerous sinks without a full program analysis framework.
semgrep --test validates rules against fixture files. Annotate test files with # ruleid: and # ok: comments, then run the test command to verify your rule matches what it should and nothing it should not.
Community rulesets cover most OWASP Top 10 scenarios. The p/owasp-top-ten, p/javascript, and p/python rulesets provide immediate value without writing a single custom rule — start here before investing in custom rules.
Static Application Security Testing (SAST) tools have historically suffered from two problems: high false positive rates and difficulty extending them for your specific codebase. Semgrep addresses both. Its pattern language is close enough to real code that security engineers with no compiler background can write effective rules, and its dataflow analysis is powerful enough to catch non-trivial vulnerability patterns.
This guide covers the Semgrep rule format in depth, walks through writing custom rules for common vulnerability patterns, explains how to use community rulesets, and shows how to integrate Semgrep into GitHub Actions with a sustainable triage workflow.
Understanding Semgrep's Pattern Language
Semgrep patterns use the actual syntax of the target language, with special metavariables and ellipsis operators for matching variable parts.
Core Pattern Constructs
$X — Matches any single expression, statement, or identifier. The same metavariable used multiple times in a pattern must match the same value.
... — Matches zero or more of anything (arguments, statements, etc.). In function arguments: func(...) matches any call to func with any number of arguments.
$...ARGS — Spread metavariable: matches zero or more arguments and captures them as a list.
# Match any call to exec with any arguments
rules:
- id: exec-call
pattern: exec(...)
message: "Avoid exec() calls"
languages: [python]
severity: WARNING# Match subprocess.call with shell=True
rules:
- id: subprocess-shell-true
pattern: subprocess.call(..., shell=True, ...)
message: "subprocess.call with shell=True is vulnerable to command injection"
languages: [python]
severity: ERRORPattern Combinations
Real rules often need to match something while excluding benign variants. Semgrep provides pattern-either, pattern-not, and pattern-inside for composition:
rules:
- id: sql-injection-python
patterns:
- pattern: |
cursor.execute($QUERY, ...)
- pattern-not: |
cursor.execute("...", ...)
- pattern-not: |
cursor.execute($QUERY, ($VALUE, ...))
message: >
Possible SQL injection: cursor.execute called with a non-literal
query string and no parameterized values. Use parameterized queries.
languages: [python]
severity: ERROR
metadata:
cwe: "CWE-89"
owasp: "A03:2021 - Injection"This rule matches cursor.execute() calls where the first argument is a variable (not a string literal) and there is no second argument providing parameterised values — a classic SQL injection pattern.
Writing Custom Rules for Your Codebase
Rule Structure
Every Semgrep rule requires these fields:
rules:
- id: unique-rule-id # kebab-case, unique within your ruleset
pattern: ... # or patterns:/pattern-either:/etc.
message: "Description" # shown to developers, explain the risk and fix
languages: [javascript] # language(s) this rule applies to
severity: ERROR # ERROR, WARNING, or INFOOptional but recommended:
metadata:
cwe: "CWE-79"
confidence: HIGH
likelihood: MEDIUM
impact: HIGH
subcategory: vuln # vuln, audit, or best-practice
fix: |
sanitize($INPUT) # automated fix pattern (experimental)Example: Detecting Hardcoded JWT Secrets
rules:
- id: hardcoded-jwt-secret
pattern-either:
- pattern: jwt.sign($PAYLOAD, "...")
- pattern: jwt.verify($TOKEN, "...")
- pattern: |
const $SECRET = "...";
...
jwt.sign($PAYLOAD, $SECRET)
pattern-not:
- pattern: jwt.sign($PAYLOAD, process.env.$VAR)
- pattern: jwt.sign($PAYLOAD, $CONFIG.$KEY)
message: >
JWT secret appears to be hardcoded. Use an environment variable:
jwt.sign(payload, process.env.JWT_SECRET)
languages: [javascript, typescript]
severity: ERROR
metadata:
cwe: "CWE-798"Example: Detecting Dangerous eval Usage
rules:
- id: eval-with-user-input
patterns:
- pattern: eval($EXPR)
- pattern-not: eval("...")
- pattern-inside: |
function $FUNC(..., $REQ, ...) {
...
}
message: >
eval() called inside a request handler with a non-literal argument.
This is a critical code injection vulnerability.
languages: [javascript]
severity: ERROR
metadata:
cwe: "CWE-95"
owasp: "A03:2021 - Injection"Metavariable Patterns for Taint Tracking
metavariable-pattern lets you constrain what a metavariable matches, enabling basic taint tracking:
rules:
- id: express-xss-response
patterns:
- pattern: res.send($OUTPUT)
- metavariable-pattern:
metavariable: $OUTPUT
patterns:
- pattern: req.$FIELD
- pattern-not: sanitizeHtml(...)
message: >
Potential XSS: user input from req.$FIELD sent directly to res.send()
without sanitization.
languages: [javascript]
severity: ERRORUsing Community Rulesets
The Semgrep registry at semgrep.dev/r contains thousands of rules maintained by Semgrep and the community. Use the -c flag with a registry identifier:
# Run OWASP Top 10 ruleset
semgrep -c p/owasp-top-ten .
<span class="hljs-comment"># Run JavaScript security rules
semgrep -c p/javascript .
<span class="hljs-comment"># Run Python security rules
semgrep -c p/python .
<span class="hljs-comment"># Run multiple rulesets
semgrep -c p/owasp-top-ten -c p/nodejs -c p/secrets .
<span class="hljs-comment"># Run specific rules by ID
semgrep -c r/javascript.express.security.express-puppeteer-injection .Recommended Starting Rulesets
| Ruleset | Command | Coverage |
|---|---|---|
| OWASP Top 10 | p/owasp-top-ten |
Injection, XSS, CSRF, XXE, deserialization |
| Secrets | p/secrets |
API keys, passwords, tokens in source |
| JavaScript | p/javascript |
Node.js, Express, React security |
| Python | p/python |
Django, Flask, SQLAlchemy |
| TypeScript | p/typescript |
Type-unsafe operations, injection |
| Docker | p/dockerfile |
Dockerfile best practices |
Testing Custom Rules with semgrep --test
Semgrep has a built-in testing framework that verifies rules against annotated test files. This prevents rules from drifting out of sync with the patterns they claim to catch.
Create a test fixture file alongside your rule:
# tests/test_sql_injection.py
import sqlite3
conn = sqlite3.connect("test.db")
cursor = conn.cursor()
# ruleid: sql-injection-python
cursor.execute("SELECT * FROM users WHERE id = " + user_id)
# ruleid: sql-injection-python
query = f"SELECT * FROM users WHERE name = '{username}'"
cursor.execute(query)
# ok: sql-injection-python (parameterized — should NOT match)
cursor.execute("SELECT * FROM users WHERE id = ?", (user_id,))
# ok: sql-injection-python (literal string — should NOT match)
cursor.execute("SELECT * FROM users")Run the test:
semgrep --test rules/ tests/Output:
Testing rules against test fixtures...
rules/sql-injection.yaml: 2 ✓, 0 ✗
All tests passed.Rules fail the test if they produce false negatives (missing a # ruleid: annotation) or false positives (matching a # ok: annotation). This feedback loop is essential for rule quality.
GitHub Actions Integration
name: Semgrep SAST
on:
pull_request: {}
push:
branches: [main]
jobs:
semgrep:
runs-on: ubuntu-latest
container:
image: semgrep/semgrep
permissions:
security-events: write
contents: read
steps:
- uses: actions/checkout@v4
- name: Run Semgrep with community + custom rules
run: |
semgrep \
--config p/owasp-top-ten \
--config p/secrets \
--config ./semgrep-rules/ \
--sarif \
--output semgrep.sarif \
--error \
--exclude "node_modules,dist,build,*.test.js,*.spec.js" \
.
- name: Upload SARIF
uses: github/codeql-action/upload-sarif@v3
if: always()
with:
sarif_file: semgrep.sarifThe --error flag causes Semgrep to exit with code 1 if any findings exist, failing the CI step. If this produces too many alerts initially, remove --error and treat the SARIF upload as informational until you have triaged the existing findings.
Suppressing False Positives Inline
// nosemgrep: hardcoded-jwt-secret
const TEST_SECRET = "test-secret-for-unit-tests-only";The nosemgrep comment suppresses specific rule IDs inline. This is preferable to per-file exclusions because it is scoped precisely and visible during code review.
Comparison with ESLint Security Plugins
ESLint's eslint-plugin-security is often the first SAST tool JavaScript teams reach for, since it integrates with existing ESLint config. The comparison is nuanced:
| Aspect | Semgrep | eslint-plugin-security |
|---|---|---|
| Pattern model | Syntax-tree aware, cross-statement | Single-expression, AST-based |
| False positive rate | Lower (structural matching) | Higher (text/regex-based) |
| Custom rules | YAML, language-agnostic | JavaScript, ESLint-specific |
| Multi-language | Yes (30+ languages) | JavaScript/TypeScript only |
| CI integration | Standalone binary | Requires Node.js/ESLint setup |
| IDE integration | VS Code extension | Native ESLint integration |
For JavaScript/TypeScript projects, running both is reasonable: ESLint for code style and basic security in the editor, Semgrep in CI for structural security analysis and cross-language coverage if your codebase includes Python or Go services.
Building a Triage Workflow
A sustainable SAST program requires a triage process that prevents the backlog from growing unboundedly.
Initial triage (when first enabling Semgrep): Run in --no-error mode, generate a JSON report, and triage every finding into one of: Fix immediately (Critical/High with high confidence), Fix in current sprint (Medium with clear exploit path), Suppress with comment (false positive or accepted risk), or Open ticket (real issue but not urgent).
Ongoing triage: Set --error with a baseline file to only fail on new findings:
# Save current findings as baseline
semgrep --config p/owasp-top-ten . --json > .semgrep-baseline.json
<span class="hljs-comment"># In CI: fail only on findings not in the baseline
semgrep --config p/owasp-top-ten . --baseline-commit HEAD~1 --errorRule refinement: Track false positive rate per rule. Rules with >50% false positive rate in your codebase should be tightened with pattern-not clauses or removed from your CI config.
Summary
Semgrep's syntax-aware pattern matching makes it practical to write security rules that are both precise and maintainable — a combination that has historically been difficult to achieve with regex-based scanners. Starting with the OWASP Top 10 community ruleset provides immediate coverage, while the custom rule framework lets you encode organisation-specific security policies that no generic ruleset could know about. The --test command ensures rules stay accurate as the codebase evolves, and the GitHub Actions SARIF integration surfaces findings in the security dashboard without requiring developers to context-switch to a separate tool.