Developers

Testing Best Practices: 15 Rules for Reliable Test Suites

HelpMeTest

18 Mar 2026 — 8 min read

Tests only pay dividends if you can trust them. Flaky tests, tests that pass when they shouldn't, and tests that take 30 minutes to run all erode that trust until teams stop running them. These 15 practices are about building tests worth running.

Key Takeaways

Tests should be deterministic. The same test, run 100 times on 100 machines, should always produce the same result. Non-determinism is the root cause of most flaky tests.

Test behavior, not implementation. Test what the code does, not how it does it. Tests that break when you refactor without changing behavior are a tax, not an asset.

One assertion per concept, not one assertion per test. Group related assertions together — test the whole user object, not just the ID. But put unrelated behaviors in separate tests.

Prefer integration tests for workflows. Unit tests are fast; integration tests give more confidence. Most bugs happen at the seams between units.

Delete bad tests. A test that consistently fails, passes unreliably, or tests nothing meaningful should be deleted, not skipped.

Writing tests is easy. Writing tests that remain useful six months later — as the codebase grows and the team changes — requires discipline. These practices come from patterns that consistently produce reliable test suites.

1. Write Tests That Describe Behavior, Not Code

Tests are documentation. The test name should describe observable behavior, not internal implementation.

Bad:

it('calls getUserById with correct params', () => { ... })
it('sets isLoading to true then false', () => { ... })

Good:

it('returns user data when user exists', () => { ... })
it('shows loading indicator during data fetch', () => { ... })

When a test fails, the name should tell you what broke from the user's perspective. "calls getUserById with correct params" failing tells you nothing useful. "returns user data when user exists" failing tells you exactly what to investigate.

2. Structure Tests with Arrange-Act-Assert

Every test should have three clearly separated phases.

it('applies discount for premium users', () => {
  // Arrange: set up state
  const user = createUser({ tier: 'premium' })
  const cart = createCart({ items: [{ price: 100 }] })

  // Act: execute the behavior under test
  const total = calculateTotal(cart, user)

  // Assert: verify the outcome
  expect(total).toBe(85) // 15% premium discount
})

Mixing these phases produces tests that are hard to read and harder to debug. When a test fails, you want to immediately understand what was set up, what happened, and what was wrong with the result.

3. One Test Per Behavior, Not Per Function

The goal is to cover behaviors, not achieve line coverage. A single function might have ten distinct behaviors. A single behavior might involve multiple functions.

Bad: testing internal structure

it('validateEmail calls String.prototype.includes', () => { ... })

Good: testing behaviors

describe('validateEmail', () => {
  it('returns true for valid email format', () => { ... })
  it('returns false for missing @ symbol', () => { ... })
  it('returns false for missing domain', () => { ... })
  it('returns false for empty string', () => { ... })
  it('handles unicode characters in local part', () => { ... })
})

4. Make Tests Deterministic

Non-deterministic tests are the primary source of flakiness. Common sources of non-determinism:

Time:

// BAD: depends on current time
it('formats timestamp correctly', () => {
  const result = formatTimestamp(new Date())
  expect(result).toBe('today at 10:30 AM') // breaks when time changes
})

// GOOD: use fixed time
it('formats timestamp correctly', () => {
  const fixedDate = new Date('2026-01-15T10:30:00Z')
  const result = formatTimestamp(fixedDate)
  expect(result).toBe('Jan 15, 2026 at 10:30 AM')
})

Random data:

// BAD: test depends on random output
it('generates unique IDs', () => {
  const id = generateId()
  expect(id).toMatch(/^[a-z0-9]{8}$/) // pattern check is fine
})

// Even better: fix the seed
it('generates consistent ID with seed', () => {
  jest.spyOn(Math, 'random').mockReturnValue(0.5)
  expect(generateId()).toBe('nnnnnnnn')
})

External services:

// BAD: test hits real API
it('returns user count', async () => {
  const count = await db.getUserCount() // real database call
  expect(count).toBeGreaterThan(0)
})

// GOOD: mock the dependency
it('returns user count from database', async () => {
  db.query.mockResolvedValue([{ count: 42 }])
  const count = await getUserCount()
  expect(count).toBe(42)
})

5. Isolate Tests from Each Other

Tests must not share mutable state. When test A changes global state, test B inherits that state and produces unpredictable results.

// BAD: shared mutable state
const cache = new Map()

beforeAll(() => {
  cache.set('user:1', { id: 1, name: 'Alice' })
})

it('test A modifies cache', () => {
  cache.delete('user:1')
  // ...
})

it('test B expects cache to be intact', () => {
  expect(cache.has('user:1')).toBe(true) // fails because test A deleted it
})

// GOOD: each test creates its own state
function createTestCache() {
  const cache = new Map()
  cache.set('user:1', { id: 1, name: 'Alice' })
  return cache
}

it('test A modifies cache', () => {
  const cache = createTestCache()
  cache.delete('user:1')
  // ...
})

it('test B expects its own cache', () => {
  const cache = createTestCache()
  expect(cache.has('user:1')).toBe(true) // always passes
})

6. Use Factories, Not Fixtures

Test fixtures (static data files) become a maintenance burden. When the data shape changes, every test using that fixture needs updating. Factories are functions that produce test data with sensible defaults you can override.

// GOOD: factory with defaults
function createUser(overrides = {}) {
  return {
    id: Math.floor(Math.random() * 10000),
    email: 'user@example.com',
    name: 'Test User',
    role: 'member',
    createdAt: new Date('2026-01-01'),
    ...overrides,
  }
}

// Each test specifies only what matters
it('rejects admin operations for non-admin users', () => {
  const user = createUser({ role: 'member' })
  expect(canPerformAdminAction(user)).toBe(false)
})

it('allows admin operations for admins', () => {
  const user = createUser({ role: 'admin' })
  expect(canPerformAdminAction(user)).toBe(true)
})

7. Write the Test Name Before the Test Code

This sounds trivial but changes what you write. When you write the test body first, you tend to write implementation-specific assertions. When you write the name first, you commit to describing a behavior and then write code to verify it.

it('sends welcome email to new users within 5 minutes of registration', ...)

Now write the test to verify exactly that. The name is your specification.

8. Test Edge Cases and Error Paths

The happy path is the last thing that fails in production. Tests for error paths, edge cases, and boundary conditions are what actually catch bugs before production.

For any function or feature, ask:

What happens with empty input (null, "", [], {})?
What happens at the boundary (0, 1, max value)?
What happens when a dependency fails (network error, timeout, 500)?
What happens with invalid input (wrong type, wrong format)?
What happens with duplicate data?

describe('divideUsers', () => {
  it('divides users evenly', () => { ... })
  it('handles uneven division by assigning remainder to first group', () => { ... })
  it('handles single user', () => { ... })
  it('throws for empty array', () => { ... })     // edge case
  it('throws for zero groups', () => { ... })     // edge case
  it('throws for negative group count', () => { ... }) // invalid input
})

9. Don't Test Implementation Details

Testing implementation details means your tests break when you refactor — even if the external behavior is unchanged. This is the leading cause of teams abandoning tests.

Implementation detail (don't test this):

Private methods or internal functions
Which specific library function was called
Internal state of an object
The order of operations inside a function

Behavior (test this instead):

What does the function return?
What side effects does it produce?
What errors does it throw?
What does the user see?

// BAD: tests implementation
it('calls cache.set after fetching from database', () => {
  getUser(1)
  expect(cache.set).toHaveBeenCalled() // brittle
})

// GOOD: tests behavior
it('returns cached user on second call', async () => {
  db.query.mockResolvedValue([{ id: 1, name: 'Alice' }])

  const first = await getUser(1)
  const second = await getUser(1)

  expect(db.query).toHaveBeenCalledTimes(1) // only fetched once
  expect(second).toEqual(first) // same data returned
})

10. Keep Tests Fast

Slow tests don't get run. A test suite that takes 30 minutes stops being part of the development workflow.

Guidelines:

Unit tests: < 1 second per test
Integration tests: < 5 seconds per test
E2E tests: < 30 seconds per test

What makes tests slow and how to fix it:

Slow pattern	Fix
Real HTTP calls	Mock with jest.fn() or MSW
Real database	Use in-memory DB or mock
Sleep/arbitrary delays	Use jest.useFakeTimers()
File system reads	Mock fs module
Spinning up a real browser	Use jsdom for unit tests; reserve Playwright/Cypress for E2E

11. Use the Testing Pyramid

Different test types have different trade-offs. The testing pyramid says: lots of unit tests, fewer integration tests, fewer E2E tests.

         /E2E\         — few, slow, high confidence
        /------\
       / Integ  \      — moderate, medium speed
      /----------\
     /    Unit    \    — many, fast, focused
    /--------------\

Unit tests test individual functions in isolation. Fast, cheap, easy to debug. Limited confidence — they test the units individually but not how they work together.

Integration tests test multiple units working together. Database queries, service interactions, API endpoints. Slower, but catch a different class of bugs.

E2E tests test the whole system from the user's perspective. Browser automation that clicks through real flows. Slowest, but highest confidence. Reserve them for critical user journeys: signup, login, checkout, key workflows.

12. Test Your Tests with Mutation Testing

Mutation testing automatically introduces bugs (mutations) into your code and checks whether your tests catch them. If a mutation survives — the code is broken but tests pass — you have a gap.

Popular mutation testing tools:

Stryker Mutator (JavaScript/TypeScript)
Pitest (Java)
mutmut (Python)

npx stryker run

A mutation score below 70% usually indicates tests that assert presence without asserting correctness.

13. Keep the Test Suite Green in CI

A failing CI test is an emergency, not background noise. When tests are allowed to fail routinely, teams start ignoring the failures — and real bugs hide in the noise.

Rules:

Never merge code that breaks tests
Fix or delete flaky tests within one sprint of identifying them
Don't skip tests unless there is a documented reason and a follow-up ticket
Review test coverage in pull requests for significant features

# Fail CI fast — don't waste runner time
jobs:
  test:
    steps:
      - run: npm run test -- --bail  # stop on first failure

14. Name Test Files Consistently

Consistent naming makes tests discoverable and predictable.

Recommended conventions:

src/
  utils/
    dateUtils.js
    dateUtils.test.js    # colocated with source
  services/
    userService.js
    userService.test.js

tests/                   # OR: separate test directory
  unit/
    utils/dateUtils.test.js
  integration/
    services/userService.test.js
  e2e/
    flows/login.test.js

Pick one. Colocated tests are easier to find; a separate tests/ directory provides cleaner separation.

15. Know When Not to Test

Not everything needs a test. Testing too much creates maintenance burden without proportional confidence gains.

Don't test:

Third-party library behavior (test your code's use of the library)
Framework internals
Generated code (ORMs, GraphQL schema, etc.)
Trivial getters/setters with no logic
Configuration files

Do test:

Business logic and domain rules
Edge cases that have caused bugs before
Complex algorithms
Any code that handles user input or money
Integration points between your system and external systems
Critical user flows

The right test coverage is not 100% line coverage. It is the set of tests that would catch a significant change in behavior. If you can delete 30% of your tests and remain confident the important behaviors are covered, you had 30% waste.

Putting It Together

A healthy test suite is:

Fast — runs in under 5 minutes locally
Deterministic — passes consistently, not sometimes
Meaningful — failures indicate real problems
Maintainable — doesn't break when you refactor implementation details

These 15 practices are not theoretical — they come from patterns that repeatedly distinguish codebases where teams trust their tests from codebases where tests are an obstacle. Start with practices 1-6 (structure and isolation), and the others follow naturally.

For end-to-end browser testing specifically, consider AI-powered tools like HelpMeTest that handle the hardest part of E2E maintenance: keeping selectors up to date when your UI changes. Self-healing tests that adapt to UI changes follow the spirit of practice 9 (don't test implementation details) — if a button moves, the test should still work.