The Vibe Coding Pre-Ship Checklist: 12 Tests Before Every Deploy

The Vibe Coding Pre-Ship Checklist: 12 Tests Before Every Deploy

AI coding tools write the happy path beautifully. This checklist covers the 12 categories they consistently skip — the ones that get you paged at 2am or show up in a bug report from a user who just saw someone else's data.

Key Takeaways

AI optimizes for the demo, not for edge cases. Claude, Cursor, and Copilot will write auth code that works for the primary flow and miss the route-level checks that prevent user A from reading user B's data.

The most dangerous bugs are invisible in dev. Empty states, API failures, mobile keyboard interactions, and email link expiry all look fine locally and break in production for specific users.

Security gaps are systematic, not random. Input validation, open redirects, file upload validation, and rate limiting are absent from AI-generated code for the same reason every time — the prompt didn't mention them.

Run this checklist before every meaningful deploy. Not just for new features — any AI-generated change to auth, forms, payments, or file handling warrants a full pass.

Before You Merge That AI-Generated PR

The code looks right. The tests Claude wrote pass. You've eyeballed the diff and it makes sense. Time to ship?

Maybe. But "looks right" and "works correctly under adversarial conditions" are different standards. AI coding tools are excellent at the primary flow. They are consistently bad at edge cases, security boundaries, and failure scenarios — not because they're dumb, but because your prompt didn't include those scenarios, so they weren't generated.

Here are 12 things to verify before every meaningful deploy. Treat it as a gate, not a suggestion.

1. Authentication Boundaries

Why AI misses this: AI generates auth middleware correctly — it'll add a requireLogin() check to your routes without you asking. What it misses is resource-level authorization. The login check confirms you're authenticated. It doesn't confirm you're authorized to touch this specific resource.

What to test: Log in as User A, create a record (post, order, profile, document — whatever your app creates). Copy the resource ID from the URL. Log out. Log in as User B. Paste that URL directly into the browser. Can User B see User A's data?

Then go further: try the API directly.

GET /api/orders/12345
Authorization: Bearer <user-b-token>

If this returns User A's order, you have an IDOR vulnerability. AI generates this bug at a remarkable rate because it writes CRUD that works — it just doesn't scope it to the current user in every path.

Failure mode if you skip it: Data breach. Users accidentally (or intentionally) access other users' private records. Depending on your domain, this is a support ticket, a compliance incident, or a news story.

2. Input Validation and Injection

Why AI misses this: AI writes validation for the inputs you describe. Ask it to build a search feature and it'll write the search — it won't add SQL injection protection unless you explicitly ask. When using ORMs, it often parameterizes queries correctly by default, but switches to string interpolation the moment the query gets complex enough that the ORM feels limiting.

What to test: Put these strings into every user-facing input and watch what happens:

' OR '1'='1
<script>alert(1)</script>
../../../etc/passwd
; DROP TABLE users; --
${7*7}

For XSS specifically: submit <img src=x onerror=alert(document.cookie)> into any field that gets displayed back to users. If it executes, you're serving XSS.

Failure mode if you skip it: SQL injection can expose or delete your entire database. XSS lets attackers steal session cookies and impersonate users. These aren't theoretical — they're the most common vulnerabilities in web apps and among the easiest to exploit.

3. Error States and API Failures

Why AI misses this: AI models the happy path. When you prompt "add Stripe payment processing," it writes code that assumes Stripe responds successfully. The error handlers it generates are often placeholders (catch (e) { console.error(e) }) rather than real user-facing states.

What to test: Use your network devtools to block external API calls, or use a tool like mitmproxy to return 500s from your third-party integrations. For each integration:

  • What does the user see when Stripe returns a payment error?
  • What happens when SendGrid can't deliver an email?
  • What does the UI show when your own database is slow to respond?
  • What happens if an API call hangs indefinitely — does the UI ever tell the user something is wrong?

Failure mode if you skip it: Silent failures. Users think their action succeeded (payment went through, email was sent, data was saved) when it didn't. You find out later through a support ticket or a data audit, not in real time.

4. Form Edge Cases

Why AI misses this: AI generates forms that work for typical inputs — ASCII names, standard email addresses, short text. It doesn't test what happens when a user's last name is O'Brien, their company is <script>Corp</script>, or they paste 50,000 characters into a text field.

What to test: For every form in your app, submit:

  • Empty (no fields filled)
  • Minimum and maximum field lengths (and beyond — what happens at 10,001 characters?)
  • Special characters: ', ", <, >, &, \, /
  • Unicode: 𝓱𝓮𝓵𝓵𝓸, emojis (🎉), Arabic/Chinese/Hebrew text (RTL rendering issues)
  • Email edge cases: test+tag@example.com, user@subdomain.domain.com, very.long.email.address@really-long-domain-name.com

See what AI-generated test data misses for more on why this matters specifically with AI-written code.

Failure mode if you skip it: Forms that break or corrupt data for real users who have non-ASCII names, use password managers, or simply type faster than you expected. Unicode edge cases are especially common in global apps.

5. Rate Limiting and Abuse Prevention

Why AI misses this: Rate limiting is never in the prompt. When you say "add a login form," you get a login form. You don't get express-rate-limit unless you ask for it. Same for signup, password reset, API endpoints, and any action that costs you money or could be abused.

What to test: Write a loop that hits your most sensitive endpoints repeatedly:

for i <span class="hljs-keyword">in $(<span class="hljs-built_in">seq 1 100); <span class="hljs-keyword">do
  curl -X POST https://yourapp.com/api/auth/login \
    -d <span class="hljs-string">'{"email":"test@test.com","password":"wrong"}' &
<span class="hljs-keyword">done

Can you submit the signup form 50 times in a row? Can you trigger 1,000 password reset emails to the same address? Can you hit a paid API feature 10,000 times in a minute?

Failure mode if you skip it: Credential stuffing attacks succeed because there's no lockout. Competitors scrape your API for free. Abuse campaigns inflate your SendGrid or Twilio bill to thousands of dollars in an afternoon.

6. File Uploads

Why AI misses this: AI writes file upload code that uploads files. It frequently skips MIME type validation (relying on file extension instead of actual content type), file size limits, and path sanitization. The happy path — user uploads a normal photo — works fine. The attack paths don't.

What to test:

  • Upload a file with a .jpg extension that's actually a PHP or Python script. Does your server execute it?
  • Upload a file named ../../../etc/passwd or ../../config/database.yml. Where does it end up?
  • Upload a 2GB file. What happens? Does it time out gracefully or crash the server?
  • Upload a file with no extension. Upload a file with a double extension: shell.php.jpg.
  • Check that uploaded files aren't served from a path that allows directory traversal.

Failure mode if you skip it: Stored XSS via uploaded HTML files. Remote code execution if uploaded scripts land in an executable directory. Server crashes from oversized uploads. Path traversal exposing files outside the upload directory.

7. Redirects and URL Manipulation

Why AI misses this: AI writes redirect logic that follows the parameter without validating where it points. A common pattern: after login, redirect to the ?next= parameter. AI generates this correctly for the happy path but misses that ?next=https://evil.com is a valid value it shouldn't follow.

What to test:

https://yourapp.com/login?next=https://evil.com
https://yourapp.com/login?next=//evil.com
https://yourapp.com/login?next=javascript:alert(1)
https://yourapp.com/redirect?url=https://phishing-site.com

Also check URL parameters that influence file paths or template names:

https://yourapp.com/template?name=../../etc/passwd
https://yourapp.com/file?path=/etc/shadow

Failure mode if you skip it: Open redirect vulnerabilities are used in phishing campaigns. Users see your legitimate domain in the URL, click the link, and land on a credential-harvesting page. This is a real attack pattern, not a theoretical one.

8. Multi-User Isolation

Why AI misses this: AI builds CRUD that works. It generates getCurrentUser() in the right places for the flows you describe. It rarely adds defensive data scoping everywhere — especially for background jobs, admin views, analytics queries, and secondary resources that weren't part of the original prompt.

What to test: Create two test accounts — User A and User B — and run through the app's core flows in parallel. Then check:

  • Does User A's dashboard show any of User B's data?
  • Can User A list User B's resources via API (GET /api/items — does it scope to the current user or return everything)?
  • Do shared resources (tags, categories, templates) correctly scope to the user's organization?
  • What do admin views show — all users' data, or scoped? Is that intentional?

Failure mode if you skip it: Users see each other's private data. This breaks trust instantly and in regulated industries (healthcare, finance) triggers compliance violations. It's also one of the most common sources of "I can see someone else's account" support tickets.

9. Mobile and Responsive Behavior

Why AI misses this: AI generates layouts that look correct in the desktop browser where you're testing. It doesn't simulate iOS virtual keyboard behavior, touch target sizes, mobile viewport edge cases, or the fact that position: fixed elements behave differently on mobile Safari.

What to test: Pull up your app on an actual phone (not just a browser resize):

  • Open any modal or drawer. Does the virtual keyboard push it off-screen?
  • Try submitting a form on iOS. Does autocorrect or autocapitalize corrupt the input (emails, passwords)?
  • Tap the smallest interactive elements. Is the tap target actually 44x44px minimum?
  • Scroll a long page. Does anything flicker, freeze, or get stuck behind fixed headers?
  • Try the app in landscape orientation. Does any layout break?

Failure mode if you skip it: A significant percentage of your users are on mobile. Forms that fail on iOS affect real people. Tap targets that are too small create friction that shows up as abandonment, not as a clear bug report.

10. Email Flows

Why AI misses this: AI generates the email-sending code correctly. It doesn't verify that the emails actually arrive, that verification links expire correctly, that reset tokens are single-use, or that the links still work 24 hours later.

What to test: Go through each email flow end-to-end with a real email address:

  • Signup verification: Do you actually receive the email? Does the link work? What happens if you click it twice (should the second click reject)?
  • Password reset: Request two resets in a row — does the first link become invalid when the second is issued? What's the expiry? Does clicking an expired link give a clear error or a confusing 500?
  • Magic links: Are they single-use? Do they expire?
  • Transactional emails: Check spam folders. Check that the sender name and reply-to are correct.

Failure mode if you skip it: Reset links that work indefinitely are a security issue. Verification links that silently fail mean users can't activate accounts. "I never got the email" is the most common support ticket in early-stage apps, and it's usually preventable.

11. Payment and Billing Edge Cases

Why AI misses this: Payment flows are complex state machines. AI writes the happy path — card accepted, subscription created, user upgraded. The edge cases are where real money gets lost or where fraud happens: concurrent payment attempts, webhook deduplication, trial expiry behavior, and declined card recovery.

What to test:

  • Use Stripe's test card 4000 0000 0000 0002 (always declined). Does the UI handle this gracefully?
  • Try submitting the payment form twice quickly (double-click). Are you charged twice?
  • Simulate a webhook being delivered twice (replay it in the Stripe dashboard). Is it idempotent?
  • Let a free trial expire. Does the user's access change at the right time, or do they retain access indefinitely?
  • Cancel a subscription mid-billing-period. Is access retained until the end of the period? Is the billing table updated?

Failure mode if you skip it: Duplicate charges. Users charged after canceling. Trial users with permanent free access. Webhook failures that leave subscription state out of sync. Payment bugs directly impact revenue and trust — they're the hardest category to fix after the fact.

12. The "What If I'm Logged Out" Test

Why AI misses this: AI builds the authenticated experience. The logged-out experience is often an afterthought — and it shows. Routes that should redirect to login instead return blank pages or raw data. Cached data from a previous session leaks into the logged-out state.

What to test: Do this sequence for every major page in your app:

  1. Log in and navigate to a protected page (dashboard, settings, a specific resource)
  2. Copy the URL
  3. Clear all cookies and local storage (or open a private window)
  4. Paste the URL and load it

What happens? You should see a login redirect with the URL preserved in ?next= so users land back where they started after authenticating. Instead, you often see: a blank page, an error, the page rendered with no data, or — worst case — data that shouldn't be visible to an unauthenticated user.

Also test: bookmark a resource URL while logged in, log out in another tab, click the bookmark. Does anything break or expose data?

Failure mode if you skip it: Users who get logged out (session expiry, clearing cookies, switching devices) hit broken experiences and assume your app is broken. Worse, if protected routes render with partial data before the auth check kicks in, you're leaking information to unauthenticated users.


Run This Every Time, Not Just at Launch

The instinct with AI-generated code is to trust it more than you should. It's fluent. It references the right libraries. It handles the obvious cases. But the failure modes above aren't random — they're predictable gaps in what AI tools generate by default, because they weren't in the original prompt.

This checklist takes an hour. An hour that's far cheaper than a data breach, a billing bug, or a degraded reputation with mobile users who can't use your forms.

If you want to run these as automated tests that execute before every deploy — without writing test code — HelpMeTest lets you define each scenario in plain English and run it continuously. Set it up once, and the checklist runs itself.


Read more