Test Planning

Test Estimation Techniques: Story Points, Function Points, and How to Present Estimates to PMs

HelpMeTest

22 May 2026 — 7 min read

Test estimation is notoriously difficult because the effort required to test something depends on factors that are often unknown at estimation time — complexity, environment stability, defect rate, and whether the feature actually works as described. This guide covers four practical estimation techniques, when to apply each, and how to present estimates to product managers in a way that builds trust rather than creating conflict.

Key Takeaways

No estimation technique is perfectly accurate; the goal is to be calibrated, not precise
Three-point estimation (optimistic / most likely / pessimistic) gives PMs useful range information, not false precision
Story points work best when QA and development estimate together — separate estimation breaks the conversation
Function point analysis is useful for large, well-defined projects; too heavyweight for agile sprints
Always present estimates with explicit assumptions — a changed assumption invalidates the estimate, and PMs need to know that

Why Test Estimation Is Hard

Test estimation is harder than development estimation for a structural reason: developers estimate the work of building something known, while testers estimate the work of finding problems in something unknown.

The number of bugs in a feature is not predictable at estimation time. The stability of the test environment is not predictable. Whether the feature will be ready to test when the sprint schedule says it will — not predictable. A QA engineer estimating a testing effort is making assumptions about all of these unknowns simultaneously.

This does not mean estimation is impossible. It means estimates should carry explicit uncertainty, and the techniques used should be chosen to surface that uncertainty rather than paper over it.

Technique 1: Story Points for Testing

Story points are a relative estimation unit that measure effort, complexity, and uncertainty together. They originated in development estimation but apply equally well to testing.

How It Works

Story points assign a number (typically from a Fibonacci sequence: 1, 2, 3, 5, 8, 13, 21) to each testing task. The number is relative to a reference task. If "testing a CRUD form with five fields and no integrations" is a 3, then "testing a multi-step checkout flow with payment integration" might be a 13.

The advantage of story points over hours is that they decouple effort from calendar time. A 13-point task takes a senior QA engineer less calendar time than a junior one, but the effort is the same. This makes velocity calculations more stable over time.

Applying Story Points to Testing

Define your reference point first. Pick a testing task the team has done before and agrees is "medium" complexity. Assign it a baseline point value (3 or 5 works well).

Factors that increase story point estimates for testing:

Multiple integrations to verify
New feature with no existing test infrastructure
Unclear or incomplete requirements
Known unstable test environment
Feature involves third-party services (payment processors, external APIs)
Multiple user roles with different access patterns
Mobile and desktop both need testing
No existing regression suite to build on

Factors that decrease story point estimates:

Feature is a well-understood variation of something tested before
Strong unit test coverage already exists
Stable test environment with good test data
Single user role, single platform

QA Story Points in Planning Poker

The most effective use of story points for testing is to include QA in planning poker alongside developers. When QA estimates at the same time as development, two things happen:

The development estimate and QA estimate can be compared — a large gap prompts discussion about why
QA clarifying questions during estimation (what about mobile? what about the payment failure path?) improve requirement quality before work begins

This is valuable enough that many teams run planning poker with both a development estimate and a separate QA estimate on each ticket.

Technique 2: Function Point Analysis

Function point analysis (FPA) was developed by IBM in the 1970s and measures software size in terms of functionality delivered to the user. It is more rigorous than story points and produces estimates that are more consistent across teams — but it requires more setup and is better suited to larger, well-defined projects.

What Gets Counted

Function points count five types of functional components:

Component	Description	Example
External Input (EI)	User inputs that modify internal data	Form submission, file upload
External Output (EO)	Outputs generated for the user	Reports, exported files
External Inquiry (EQ)	Input/output pairs with no data modification	Search results, read-only queries
Internal Logical File (ILF)	Data maintained by the application	User table, order table
External Interface File (EIF)	Data maintained by external applications	Payment processor records, external API

Each component is rated as simple, average, or complex, and weighted accordingly. The total function point count is used to estimate testing effort.

FPA Testing Effort Formula

A common rule of thumb (from industry benchmarks) is that testing effort runs at 25–40% of development effort. If development is estimated at 100 function points with an average of 8 hours per function point (800 hours), testing effort is approximately 200–320 hours.

More granular benchmarks by function point type are available from organizations like ISBSG and IFPUG, though these require access to historical project data to be useful.

When to Use FPA

FPA is most useful when:

The project scope is large and well-defined (a complete system or major subsystem)
You need to compare testing effort across projects consistently
You are providing a fixed-price bid for QA services
Historical function point data is available for calibration

It is not practical for individual sprint tasks or exploratory testing estimation.

Technique 3: Analogy-Based Estimation

Analogy-based estimation uses historical data from similar past tasks to estimate current tasks. It is the most intuitive technique and the one most experienced QA engineers use implicitly, even when they call it something else.

How It Works

Identify the task to be estimated
Find one or more completed tasks that are similar
Note the similarities and differences
Adjust the historical effort up or down based on the differences
Produce the estimate

Example:

"Last sprint we tested the user profile settings page — four fields, one image upload, no integrations. That took 6 hours including regression. This sprint we're testing the company settings page — eight fields, a logo upload, and an SSO integration. The SSO integration is new territory for us. I'd estimate 10–14 hours: double the fields roughly doubles the basic test effort, plus 3–5 hours buffer for the SSO integration complexity."

Building Your Reference Library

Analogy-based estimation becomes more accurate as you accumulate a reference library of past estimates and actual outcomes. After each sprint, record:

What was tested
Estimated effort
Actual effort
Key factors that caused deviation

A simple spreadsheet works. After 6–12 sprints, this data becomes the most reliable estimating resource a team has.

Technique 4: Three-Point Estimation

Three-point estimation, also known as PERT (Program Evaluation and Review Technique), addresses the problem of uncertainty by explicitly encoding it in the estimate. Instead of a single number, you provide three:

O (Optimistic): the best case, assuming everything goes smoothly
M (Most Likely): the realistic estimate, assuming typical conditions
P (Pessimistic): the worst case, accounting for plausible obstacles

The weighted average formula is:

E = (O + 4M + P) / 6

The standard deviation is:

SD = (P - O) / 6

Worked Example

Testing a new checkout flow:

Optimistic (O): 12 hours — environment stable, feature works as specified, no blockers
Most Likely (M): 20 hours — one or two integration issues, some exploratory time on edge cases
Pessimistic (P): 36 hours — test environment instability, payment sandbox issues, significant defects requiring retest cycles

E = (12 + 4×20 + 36) / 6 = (12 + 80 + 36) / 6 = 128 / 6 ≈ 21.3 hours

SD = (36 - 12) / 6 = 4 hours

So you'd present this as: "Estimated 21 hours, ±4 hours, assuming [specific conditions]."

This is dramatically more honest than saying "20 hours" as if that number were precise.

How to Present Estimates to Product Managers

The estimate is only half of the conversation. How you present it determines whether PMs trust your estimates or negotiate against them reflexively.

Lead With Assumptions

Never present a number without its assumptions. "12 hours to test the checkout flow" invites negotiation. "12 hours to test the checkout flow, assuming the staging environment is stable, test payment cards are set up, and the feature enters testing in the first half of the sprint" is a professional estimate with auditable conditions.

If an assumption is violated — the feature is not ready until day 4 of a 5-day sprint — the estimate changes. PMs need to understand that the estimate and its assumptions are a package deal.

Use Ranges, Not Point Estimates

Point estimates communicate false precision. A QA engineer who says "18 hours" sounds less uncertain than one who says "16–22 hours," but the 18-hour estimate is probably less accurate.

Present three-point estimates or ranges. "Between 16 and 24 hours depending on environment stability" is an honest representation of reality. PMs who plan around point estimates are setting themselves up for schedule failures — help them plan with ranges.

Explain the Risk Distribution

Using three-point estimation, you can tell PMs not just what the likely outcome is but what the probability distribution looks like. "We expect 20 hours, but there is about a 15% chance this runs over 30 hours if the payment sandbox has issues — we had problems with it in March."

This kind of communication lets PMs make informed decisions about schedule buffers rather than having them imposed after the fact.

Separate Testing Effort from Testing Time

Effort (hours of work) and time (calendar days) are different. 20 hours of testing effort spread across 4 people takes 3–4 calendar days, not 20. Clarify which you are reporting to avoid confusion.

Also clarify: does your estimate include defect retest cycles? Bug filing time? Test planning? Documentation? Scope the estimate explicitly so PMs are not surprised when "20 hours of testing" turns into 30 hours of total QA activity.

Improving Estimation Accuracy Over Time

The only reliable path to better estimates is feedback loops. After each sprint or release:

Compare estimated effort to actual effort for each test area
Identify the factors that caused the largest deviations
Update your reference library with new data
Adjust your estimation heuristics for next time

Teams that do this consistently will find their estimates converge toward accuracy over 6–12 months. Teams that skip retrospectives on estimation stay at the same accuracy level indefinitely.

How Continuous Testing Changes the Estimation Equation

One underappreciated benefit of continuous test automation is that it changes what you need to estimate. When critical path tests run automatically on every deploy via a tool like HelpMeTest, you are no longer estimating "regression testing" as a sprint task — it is handled automatically.

This means QA estimation in sprints becomes smaller and more focused: you are estimating only the new test coverage needed for new features, not the maintenance of coverage for existing ones. Over time, as more of the critical path is automated and monitored continuously, sprint QA estimates shrink, velocity improves, and the estimation conversations with PMs become more predictable.

The goal is not to make estimation unnecessary — it is to reduce the amount of testing that needs to be estimated at all.

Test Estimation Techniques: Story Points, Function Points, and How to Present Estimates to PMs

HelpMeTest

Key Takeaways

Why Test Estimation Is Hard

Technique 1: Story Points for Testing

How It Works

Applying Story Points to Testing

QA Story Points in Planning Poker

Technique 2: Function Point Analysis

What Gets Counted

FPA Testing Effort Formula

When to Use FPA

Technique 3: Analogy-Based Estimation

How It Works

Building Your Reference Library

Technique 4: Three-Point Estimation

Worked Example

How to Present Estimates to Product Managers

Lead With Assumptions

Use Ranges, Not Point Estimates

Explain the Risk Distribution

Separate Testing Effort from Testing Time

Improving Estimation Accuracy Over Time

How Continuous Testing Changes the Estimation Equation

Read more

Testing React Router v7 with Vite + Vitest: Setup and Best Practices

E2E Testing React Router v7 Apps with Playwright

Migrating from Remix to React Router v7: Testing Your Migration

Testing React Router v7 Loaders and Actions with Vitest