TDD for Legacy Code: Practical Strategies That Actually Work
TDD is easy to practice on a greenfield project. You start from nothing, write the test, write the code, and the cycle works exactly as described in the textbooks. Legacy code is different. The code already exists. It was written without tests. It has dependencies that are hard to isolate, global state, functions that do five things at once, and business logic embedded in places it has no business being.
Michael Feathers defined legacy code bluntly: "Code without tests." By that definition, most production codebases are legacy codebases. Here is how to apply TDD discipline to them without tearing everything down first.
The Core Problem: Tight Coupling
The reason legacy code is hard to test is almost always the same reason it is hard to change: tight coupling. A function that directly instantiates its database connection, sends emails, and logs to disk cannot be tested without a real database, a mail server, and a filesystem. That is not a testing problem — it is a design problem that the tests reveal.
TDD does not solve tight coupling directly. It gives you the discipline to detect coupling and the tests you need to safely undo it.
Strategy 1: Characterization Tests
Before you refactor anything, you need to understand what the code actually does. Characterization tests document existing behavior — not what the code should do, but what it does do, including quirks and bugs.
Suppose you have this function in a Python billing module:
def calculate_invoice_total(customer_id, month, year):
db = DatabaseConnection()
orders = db.query(
f"SELECT * FROM orders WHERE customer_id={customer_id}
AND MONTH(created_at)={month} AND YEAR(created_at)={year}"
)
total = 0
for order in orders:
total += order['amount']
if order['type'] == 'subscription':
total -= order['amount'] * 0.1
tax = total * 0.085
return total + taxYou cannot unit test this — it needs a real database. Write a characterization test at the integration level first:
def test_invoice_total_for_known_customer():
# Customer 42 had orders in Jan 2024 that we verified manually
# Total was $847.23 including tax
result = calculate_invoice_total(42, 1, 2024)
assert result == pytest.approx(847.23, rel=1e-2)This test does not verify correctness. It pins the current behavior. If you change the function and it starts returning $800, you know you broke something. Characterization tests are your safety net before you start refactoring.
Strategy 2: Finding Seams
A seam is a place in the code where you can alter behavior without changing the code itself. Michael Feathers introduced this concept in "Working Effectively with Legacy Code," and it is the most useful idea in legacy code testing.
The database connection in calculate_invoice_total is a seam — if you can inject a different connection, you can test without a real database.
Step 1: Extract the dependency:
def calculate_invoice_total(customer_id, month, year, db=None):
if db is None:
db = DatabaseConnection()
orders = db.query(...)
# rest of the function unchangedThis is a tiny change — a default parameter. It does not change the behavior for any existing caller, but it opens a seam for testing:
def test_invoice_total_subscription_discount():
mock_db = MockDatabase([
{"amount": 100.00, "type": "subscription"},
{"amount": 50.00, "type": "one-time"},
])
result = calculate_invoice_total(1, 1, 2024, db=mock_db)
# subscription: 100 - 10% = 90
# one-time: 50
# subtotal: 140, tax: 140 * 0.085 = 11.90
# total: 151.90
assert result == pytest.approx(151.90, rel=1e-2)You found the seam, opened it with a minimal change, and now you can test the business logic in isolation.
Strategy 3: Extract and Override
When you cannot inject dependencies through parameters, extract them into methods and override them in a test subclass. This is more invasive but works when you cannot change function signatures.
class InvoiceCalculator:
def get_orders(self, customer_id, month, year):
db = DatabaseConnection()
return db.query(...) # hard to test
def calculate_total(self, customer_id, month, year):
orders = self.get_orders(customer_id, month, year)
# business logic hereIn your test:
class TestableInvoiceCalculator(InvoiceCalculator):
def __init__(self, mock_orders):
self._mock_orders = mock_orders
def get_orders(self, customer_id, month, year):
return self._mock_orders
def test_subscription_discount_applied():
calculator = TestableInvoiceCalculator([
{"amount": 100.00, "type": "subscription"},
])
result = calculator.calculate_total(1, 1, 2024)
assert result == pytest.approx(98.415) # 90 + 8.5% taxThis is not ideal design, but it gets tests in place without touching the production code path.
Strategy 4: The Sprout Method
When you need to add new functionality to a legacy function, do not modify the existing untestable code. Sprout a new function, write tests for it first, then call it from the legacy function.
Suppose you need to add a late fee to overdue invoices:
Do not do this:
def calculate_invoice_total(customer_id, month, year):
# ...existing untestable code...
if is_overdue(customer_id):
total += total * 0.05 # late fee added inline
return total + taxDo this instead:
# New tested function
def apply_late_fee(total: float, is_overdue: bool, rate: float = 0.05) -> float:
"""Apply a late fee to overdue invoices."""
if not is_overdue:
return total
return total * (1 + rate)Write the tests first:
def test_late_fee_applied_when_overdue():
assert apply_late_fee(100.00, is_overdue=True) == 105.00
def test_no_late_fee_when_current():
assert apply_late_fee(100.00, is_overdue=False) == 100.00
def test_custom_late_fee_rate():
assert apply_late_fee(100.00, is_overdue=True, rate=0.10) == 110.00Make them pass. Then call apply_late_fee from the legacy function. The new code is fully tested. The legacy code is untouched. You have not broken anything that was working.
Strategy 5: TDD at the Integration Level First
When unit testing is impossible because of extreme coupling, start with integration tests and work inward. Use a tool like HelpMeTest or Playwright to test the user-facing behavior of a feature before you refactor its internals.
Given a customer with overdue invoices
When I generate their invoice for January
Then the invoice shows a 5% late feeThis end-to-end test will pass when the feature works correctly. Now you can refactor the internal code with confidence — the integration test will catch any regression in user-visible behavior, even if you completely rewrite the underlying implementation.
Strategy 6: The Boy Scout Rule, Applied to TDD
The Boy Scout Rule in software: "Leave the code cleaner than you found it." The TDD version: every time you touch legacy code, add a test for the behavior you changed.
You are not going to retrofit tests for all 50,000 lines of your legacy codebase. That is not how this works. But every bug fix should include a test that would have caught the bug. Every new feature should be written TDD-style. Every refactor should have a characterization test before you start and pass the same test when you finish.
Over 6 months of this discipline, the tested portion of your codebase grows organically. The parts you touch most — the parts that change most often — accumulate the most tests. Those are exactly the parts that need them most.
The JavaScript Equivalent
In JavaScript, the same patterns apply. Use dependency injection through constructor parameters or module mocking. Jest's jest.mock() lets you replace modules at the test level:
// invoice-calculator.js
const db = require('./database');
function calculateTotal(customerId, month, year) {
const orders = db.getOrders(customerId, month, year);
// business logic
}
// invoice-calculator.test.js
jest.mock('./database', () => ({
getOrders: jest.fn(),
}));
const db = require('./database');
const { calculateTotal } = require('./invoice-calculator');
test('applies subscription discount correctly', () => {
db.getOrders.mockReturnValue([
{ amount: 100, type: 'subscription' },
{ amount: 50, type: 'one-time' },
]);
expect(calculateTotal(1, 1, 2024)).toBeCloseTo(151.90);
});Module mocking is a seam that Jest provides at the test runner level — you do not need to change the production code to inject the dependency.
What Not to Do
Do not write tests for code you are about to delete. Characterization tests cover behavior you are preserving. Behavior you are removing does not need tests.
Do not try to reach 80% coverage in the first sprint. Coverage targets on legacy codebases are political theater. Focus on tests for the code that changes.
Do not mock everything. Mocking is a tool for breaking dependencies, not a goal in itself. If you can use the real implementation (a fast in-memory database, a local file), prefer it.
Do not refactor without tests in place. Characterization tests first. Every time.
The Long Game
Legacy codebases do not become well-tested overnight. The strategies in this post are all incremental — they each move the needle slightly in the right direction. Seams make individual functions testable. Sprout methods add new functionality safely. The Boy Scout Rule prevents the untested portion from growing.
End-to-end testing tools like HelpMeTest can accelerate this by giving you a coverage layer over the entire application — even before you have a single unit test. A test that says "users can log in, see their invoices, and the total is correct" is not as precise as a unit test, but it will catch a catastrophic regression while you are doing the slower work of adding unit-level coverage.
The goal is a codebase where every change you make is backed by a test that proves it works. You do not get there by declaring a "testing sprint." You get there by adding one test every time you touch the code.