How to Test Code You Write with Aider

How to Test Code You Write with Aider
  • Aider writes code at CLI speed, but it has no visibility into whether the running app behaves correctly.
  • Code that compiles and passes unit tests can still break the user experience in ways Aider will never catch.
  • HelpMeTest adds a behavioral QA layer: plain English tests that run against your actual app, not just your code.
  • You can hook HelpMeTest into Aider's /test command so verification happens automatically after every change.
  • Free tier covers 10 tests, no setup required — it's the missing half of your Aider workflow.

Key Takeaways

Aider solves the "writing code" problem, not the "verifying behavior" problem. It edits files and runs commands, but it cannot open a browser and confirm that your checkout flow still works after the refactor.

AI-generated code regresses in non-obvious ways. Aider's edits are often structurally correct but functionally wrong — a renamed variable, a missed state update, a CSS class that disappeared. Unit tests rarely catch these.

The fix is a behavioral test layer, not more unit tests. HelpMeTest runs real browser tests against your live app — plain English scenarios that verify what your users actually experience.

You opened your terminal. You ran aider. You described the feature, added the relevant files to context, and watched Aider generate, edit, and commit the code. It took six minutes.

Now what?

If you are like most Aider users, "now what" means running the app manually, clicking around, hoping nothing broke. Maybe you run pytest. Maybe you don't. Either way, you have a nagging feeling that the code is probably fine — and "probably fine" is doing a lot of work.

This post is about closing that gap. Specifically: how to build a testing layer that works at Aider's speed, fits into your CLI workflow, and catches the class of bugs that Aider will never catch on its own.

What Aider Does Well

Aider is genuinely good at its job. If you haven't used it, here's the loop: you launch it from the terminal, pull in files with /add, describe what you want in natural language, and it edits your codebase directly. It supports GPT-4, Claude, Gemini, and a dozen other models. It has git integration, so every change is committed. You can run shell commands with /run, run your test suite with /test, and ask it to fix failing tests automatically.

For developers who live in the terminal, it feels like the right tool. No browser tabs, no copy-paste between a chat window and your editor, no context switching. You stay in the flow.

Aider is strong at:

  • Targeted refactors. "Rename user_id to account_id across the codebase" — done in seconds.
  • Feature scaffolding. "Add a /export endpoint that returns the user's data as CSV" — it writes the route, the serializer, and the test.
  • Bug fixes from stack traces. Paste an error, Aider finds the source and edits it.
  • Keeping context tight. You control exactly which files are in context, so it doesn't hallucinate changes to unrelated code.

The git integration is underrated. Every Aider change is a commit. You get a full audit trail of what the AI changed and why. Rollbacks are trivial.

So far so good. Here is where it stops.

The Gap: Code Changes vs. Behavioral Verification

Aider edits files. It does not run your app. It does not open a browser. It does not click through your UI, submit forms, or verify that the API returns the right shape of data to the right kind of authenticated user.

It can run your test suite — but only if you have one, and only if your test suite actually covers the behavior in question. Most test suites don't. Most test suites cover units of code, not user flows.

This matters because AI-generated code has a specific failure pattern. It tends to be syntactically correct, structurally plausible, and behaviorally wrong. Aider will:

  • Rename a CSS class that a downstream component relied on
  • Refactor an async flow in a way that works in isolation but breaks under real network latency
  • Add a new API field while forgetting to update the form that submits it
  • Fix a backend bug while introducing a regression in the frontend that wasn't in context

None of these show up in pytest. They show up when a user tries to check out, log in, or submit a support ticket and gets a blank screen.

The gap is this: Aider verifies that the code is correct. Nobody is verifying that the app works.

How HelpMeTest Fills the Gap

HelpMeTest is a cloud-hosted test automation platform built on Robot Framework and Playwright. You write tests in plain English. The platform runs them against your real app in a real browser.

Here's what a test looks like:

Open browser to https://myapp.com/login
Enter "user@example.com" into the email field
Enter "password123" into the password field
Click "Sign In"
Verify the dashboard heading is visible
Verify the user menu shows "user@example.com"

That's it. No selectors. No XPath. No Playwright boilerplate. You describe what a user would do, and HelpMeTest figures out how to do it.

What this gets you that Aider's /test command doesn't:

Real browser execution. Tests run in Chromium (and optionally Firefox, Safari) against your actual deployed app. If a CSS change made your submit button invisible on mobile, this catches it.

Self-healing tests. When a selector changes — because Aider renamed a component, for instance — HelpMeTest uses AI to find the updated element instead of just failing with ElementNotFound. Your tests stay green through refactors.

Visual testing. Screenshot diffs across viewports. If Aider's edit shifted a layout, you'll see it before your users do.

Auth state persistence. You save a logged-in browser state once ("Save As Admin"), and every test that needs authentication reuses it. No test ever re-logs in from scratch.

24/7 monitoring. Tests run on a schedule, not just when you push. If something breaks in production at 2am — not because of your code, but because of a third-party dependency or a database state — you find out immediately.

CI/CD integration. Tests run on every deploy. You get a pass/fail signal before merging.

The MCP Angle: When You Use Both Aider and Claude Code

Aider users who also work with Claude Code get an additional integration point: helpmetest mcp.

HelpMeTest ships an MCP server. When you connect it to Claude Code or Cursor, your AI assistant can write, run, and iterate on HelpMeTest tests directly inside your coding session. The assistant sees test results, reads failure output, and updates tests when behavior changes intentionally.

The workflow becomes:

  1. Aider makes a code change
  2. Claude Code (with HelpMeTest MCP) runs the behavioral test suite
  3. Failing tests surface in your editor context
  4. Claude Code fixes the regression, or updates the test if the behavior change was intentional

You're not switching tools. You're not opening a browser dashboard. The full QA loop runs inside your CLI-and-editor workflow.

To set it up:

helpmetest mcp

Then add the MCP server config to your Claude Code or Cursor settings. Your AI assistant will have access to helpmetest_run_test, helpmetest_status, helpmetest_upsert_test, and the rest of the tool surface.

Practical Workflow: Aider Changes Code, HelpMeTest Verifies Behavior

Here is how this looks in practice, end to end.

Step 1: Write your behavioral tests first (or after your first working version)

Before Aider starts making changes, define what "working" means in behavioral terms. Open the HelpMeTest dashboard or use the CLI and write scenarios for the flows that matter:

  • User can log in with valid credentials
  • User sees an error message with invalid credentials
  • Authenticated user can submit the form and sees a confirmation
  • Form submission fails gracefully when the API is unreachable

These don't need to be exhaustive on day one. Five tests covering your core flows is infinitely better than zero.

Step 2: Run Aider normally

aider src/auth/login.js src/components/LoginForm.jsx

Describe the change. Let Aider edit the files. Let it commit.

Step 3: Trigger HelpMeTest from Aider's /test hook

Aider supports a configurable test command. Add this to your .aider.conf.yml:

test-cmd: helpmetest run --tag auth --wait

Now when Aider runs tests — either because you typed /test or because you have --auto-test enabled — it runs your HelpMeTest suite against your deployed (or proxied) app.

If you're testing a local dev server, run the proxy first:

helpmetest proxy start localhost:3000

This gives HelpMeTest cloud runners access to your local instance. No firewall rules, no ngrok, no configuration. Run it once before your Aider session starts.

Step 4: Read the output

If tests pass, Aider proceeds. If tests fail, Aider sees the failure output. You can ask it: "The login test is failing with this output — fix it." It reads the error, finds the regression, edits the code.

You have just closed the loop. Aider writes code. HelpMeTest verifies behavior. Aider fixes regressions. The cycle completes in your terminal without touching a browser.

What HelpMeTest Catches That Aider Can't

To make this concrete, here are failure patterns that come up regularly when using AI coding assistants:

The invisible element. Aider refactors a component and accidentally removes a CSS class that controlled visibility. The element is still in the DOM. Unit tests pass. But users can't see the button. HelpMeTest's visual testing catches this.

The broken auth flow. Aider updates the login handler to use a new session format. The backend works. But the frontend still sends the old token structure. The app appears to log in but immediately redirects back to the login page. HelpMeTest catches this because it runs the full flow end-to-end.

The silent form failure. Aider adds input validation to a form field. The validation logic is correct, but it suppresses the error message on a specific browser. HelpMeTest runs cross-browser, so it surfaces the Firefox-only failure that you wouldn't catch testing in Chrome.

The regression from three changes ago. Aider has been making changes across three sessions. A change in session one interacted badly with a change in session three. No single change was obviously wrong. HelpMeTest's scheduled monitoring catches the failure the next morning before users hit it.

These are not edge cases. They are the normal failure modes of AI-assisted development.

Pricing and Getting Started

HelpMeTest has a free tier: 10 tests, no credit card, no time limit. That covers your core flows.

Pro is $100/month for unlimited tests, CI/CD integration, and team access.

No infrastructure to set up. No Docker, no Selenium Grid, no browser farm to maintain. You write the tests. The platform runs them.

To start:

npm install -g helpmetest
helpmetest login
helpmetest init

If you're testing a local Aider project:

helpmetest proxy start localhost:3000

Then open the dashboard and write your first test. Describe a flow — login, form submission, a search result page — in plain English. Run it. See it pass or fail against your actual app.

That's the layer Aider is missing. Not more code. Not more unit tests. A behavioral verification loop that runs against the app users actually experience.

Aider is fast. Make sure what it's building actually works.

Try HelpMeTest free →

Read more