Semgrep for Code Security Scanning: Rules, CI Integration, and Custom Patterns
Semgrep is a fast, language-agnostic static analysis tool that finds security vulnerabilities by pattern matching against source code. It ships with thousands of community rules for OWASP Top 10 vulnerabilities, secret detection, and language-specific security patterns. You can also write custom rules in YAML to catch patterns specific to your codebase. Semgrep integrates with GitHub Actions, GitLab CI, and pre-commit hooks.
Key Takeaways
Semgrep rules are readable YAML. A Semgrep rule specifies a pattern (what code to match), a message (what to tell the developer), a severity, and languages. No cryptic regex or query language.
The Semgrep registry has 5000+ community rules. Use p/owasp-top-ten, p/secrets, p/django, p/react, and other community packs before writing custom rules.
Metavariables match any expression. $X in a pattern matches any expression. $X.execute($QUERY) matches any method named execute called with any argument—useful for finding SQL injection sinks.
Pattern combinators find complex vulnerabilities. Use pattern-not to exclude safe patterns, pattern-inside to match within a specific context, and patterns to require multiple conditions simultaneously.
Autofix generates code fixes. Add a fix field to your rule and Semgrep can automatically apply it with semgrep --autofix—useful for safe API upgrades and configuration fixes.
Why Semgrep
Most SAST tools are either too slow, too noisy, or require extensive configuration. Semgrep is different:
- Fast: scans large codebases in seconds, not minutes
- Accurate: high-quality community rules with low false positive rates
- Customizable: write rules in readable YAML using the same syntax as the pattern you're matching
- Language-aware: understands code structure (AST), not just text—catches patterns regardless of whitespace, variable names, or comment placement
Installation
# macOS
brew install semgrep
<span class="hljs-comment"># Python (all platforms)
pip install semgrep
<span class="hljs-comment"># Docker
docker pull semgrep/semgrepRunning Semgrep
Quick Scan with Community Rules
# OWASP Top 10
semgrep --config=p/owasp-top-ten .
<span class="hljs-comment"># Secrets and API keys
semgrep --config=p/secrets .
<span class="hljs-comment"># Language-specific packs
semgrep --config=p/python-security .
semgrep --config=p/javascript .
semgrep --config=p/typescript .
semgrep --config=p/django .
semgrep --config=p/react .
semgrep --config=p/flask .
<span class="hljs-comment"># Multiple configs at once
semgrep --config=p/owasp-top-ten --config=p/secrets .Scan Output Formats
# Human-readable (default)
semgrep --config=p/owasp-top-ten .
<span class="hljs-comment"># JSON for CI processing
semgrep --config=p/owasp-top-ten --json . > semgrep-results.json
<span class="hljs-comment"># SARIF for GitHub Code Scanning
semgrep --config=p/owasp-top-ten --sarif . > semgrep.sarif
<span class="hljs-comment"># Only show errors (exit code 1 if any found)
semgrep --config=p/owasp-top-ten --error .Scoping the Scan
# Scan specific directory
semgrep --config=p/owasp-top-ten src/
<span class="hljs-comment"># Exclude directories
semgrep --config=p/secrets --exclude=node_modules --exclude=.venv .
<span class="hljs-comment"># Only scan specific file types
semgrep --config=p/python-security --include=<span class="hljs-string">"*.py" .Writing Custom Rules
Custom rules are YAML files that describe patterns to find. Put them in a rules/ directory.
Basic Rule Structure
# rules/no-hardcoded-credentials.yaml
rules:
- id: no-hardcoded-password
pattern: |
password = "..."
message: |
Hardcoded password detected. Use environment variables or a secrets manager instead.
Replace with: password = os.environ.get("DB_PASSWORD")
severity: ERROR
languages: [python]
metadata:
cwe: "CWE-798"
owasp: "A02:2021"Metavariables
$VARIABLE matches any expression:
rules:
- id: sql-injection-string-concat
patterns:
- pattern: |
$CURSOR.execute($QUERY + $INPUT)
- pattern: |
$CURSOR.execute(f"... {$INPUT} ...")
message: "SQL injection risk: user input concatenated into SQL query. Use parameterized queries."
severity: ERROR
languages: [python]rules:
- id: eval-with-user-input
pattern: eval($USER_INPUT)
message: "eval() with user-controlled input can execute arbitrary code."
severity: ERROR
languages: [javascript, typescript]Pattern Combinators
patterns — ALL must match
rules:
- id: requests-without-timeout
patterns:
- pattern: requests.$METHOD(...)
- pattern-not: requests.$METHOD(..., timeout=...)
message: "requests.$METHOD() called without timeout. Set timeout=30 to prevent hanging connections."
severity: WARNING
languages: [python]pattern-either — ANY must match
rules:
- id: insecure-hash-algorithm
pattern-either:
- pattern: hashlib.md5(...)
- pattern: hashlib.sha1(...)
message: "MD5 and SHA1 are cryptographically broken. Use hashlib.sha256() or hashlib.sha3_256() instead."
severity: WARNING
languages: [python]pattern-inside — match within a context
rules:
- id: debug-mode-in-production
patterns:
- pattern: app.run(debug=True)
- pattern-inside: |
if __name__ == "__main__":
...
message: "Flask debug mode enabled. Never run with debug=True in production."
severity: ERROR
languages: [python]pattern-not-inside — exclude certain contexts
rules:
- id: print-in-non-test-code
patterns:
- pattern: print(...)
- pattern-not-inside: |
def test_$NAME(...):
...
message: "print() found outside test code. Use logging instead."
severity: INFO
languages: [python]Taint Analysis
Track user input from source to sink:
rules:
- id: xss-flask-template
mode: taint
pattern-sources:
- pattern: request.args.get(...)
- pattern: request.form.get(...)
- pattern: request.json
pattern-sinks:
- pattern: render_template_string(...)
- pattern: Markup(...)
message: "User input flows into HTML rendering without sanitization. Use Jinja2 auto-escaping."
severity: ERROR
languages: [python]Autofix
Add a fix field to automatically patch the issue:
rules:
- id: use-secrets-compare-digest
pattern: $A == $B
fix: hmac.compare_digest($A, $B)
message: |
Use hmac.compare_digest() for timing-safe comparison of secrets.
Regular == can leak information via timing attacks.
severity: WARNING
languages: [python]
metadata:
confidence: LOW # Only apply fix when context is clearApply fixes:
semgrep --config=rules/use-secrets-compare-digest.yaml --autofix .JavaScript/TypeScript Rules
rules:
- id: no-dangerously-set-inner-html
pattern: dangerouslySetInnerHTML={{ __html: $X }}
message: "dangerouslySetInnerHTML can lead to XSS. Ensure $X is sanitized before use."
severity: WARNING
languages: [javascript, typescript, jsx, tsx]
- id: no-document-write
pattern: document.write($X)
message: "document.write() with user-controlled input leads to XSS. Use DOM manipulation methods instead."
severity: ERROR
languages: [javascript, typescript]Organizing Rules
Structure your custom rules by category:
rules/
injection/
sql-injection.yaml
command-injection.yaml
template-injection.yaml
secrets/
hardcoded-credentials.yaml
api-keys.yaml
crypto/
weak-algorithms.yaml
insecure-random.yaml
auth/
jwt-issues.yaml
session-config.yamlRun all custom rules:
semgrep --config=rules/ .CI Integration
GitHub Actions
# .github/workflows/semgrep.yml
name: Semgrep Security Scan
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
semgrep:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Semgrep Scan
uses: semgrep/semgrep-action@v1
with:
config: >-
p/owasp-top-ten
p/secrets
p/python-security
generateSarif: "1"
- name: Upload SARIF to GitHub Code Scanning
uses: github/codeql-action/upload-sarif@v3
if: always()
with:
sarif_file: semgrep.sarifWith SARIF upload, findings appear as annotations in GitHub pull requests.
Custom Rules in CI
- name: Run custom security rules
run: semgrep --config=rules/ --error --json > custom-findings.json
- name: Check for critical findings
run: |
CRITICAL=$(jq '[.results[] | select(.extra.severity == "ERROR")] | length' custom-findings.json)
echo "Critical findings: $CRITICAL"
if [ "$CRITICAL" -gt "0" ]; then
exit 1
fiPre-commit Hook
# .pre-commit-config.yaml
repos:
- repo: https://github.com/semgrep/semgrep
rev: v1.60.0
hooks:
- id: semgrep
args: ['--config', 'p/secrets', '--error']Install:
pip install pre-commit
pre-commit installManaging False Positives
Mark a finding as a false positive with an inline comment:
# nosemgrep: semgrep-rule-id
password = "test_password" # nosemgrep: no-hardcoded-passwordOr ignore a file globally in .semgrepignore:
# .semgrepignore
tests/
vendor/
node_modules/
migrations/
*.min.jsUse --exclude-rule to disable specific rules for a scan:
semgrep --config=p/python-security --exclude-rule=python.lang.security.audit.non-literal-import .Severity Levels and Exit Codes
| Severity | Exit Code | Meaning |
|---|---|---|
| INFO | 0 | Finding for awareness |
| WARNING | 0 | Review recommended |
| ERROR | 1 (with --error) | Block CI pipeline |
Configure CI to fail only on ERROR severity:
semgrep --config=rules/ --error .Summary
Semgrep provides fast, accurate static analysis with a low barrier to entry:
- Start with community packs —
p/owasp-top-tenandp/secretscover the most critical vulnerabilities - Add language-specific packs —
p/django,p/react,p/flaskfor framework-specific patterns - Write custom rules — use YAML pattern syntax to catch codebase-specific anti-patterns
- Integrate in CI — run on every PR, block on ERROR severity findings
- Use autofix — where safe, let Semgrep apply fixes automatically
The combination of broad community coverage and easy custom rule writing makes Semgrep the most practical SAST tool for developer teams who want security without a dedicated security team.