Testing Translation Strings: Completeness, Pluralization, and Variables
Translation files are the source of truth for your localized UI, yet they are often the least tested part of a software project. Developers add keys in English and hand them off to translators, but nobody systematically verifies that the translation files stay in sync, that pluralization rules are correct, or that variable placeholders survive the translation process intact.
This guide covers automated testing strategies for translation files — the kind that catch real bugs before they reach production users.
The Categories of Translation Bugs
Before writing tests, it helps to enumerate what you are looking for.
Missing keys occur when a developer adds a new string to the English source file but the corresponding key is absent from one or more translation files. The result in production is usually the raw key name being shown to users (common.save_button instead of "Enregistrer") or a fallback to English — both of which look broken.
Extra keys occur in the opposite direction: a translation file contains keys that no longer exist in the source. These are usually orphaned after a refactor. They do not cause visible bugs but they waste translator effort and create maintenance noise.
Variable interpolation failures happen when a translator accidentally removes, renames, or reorders a placeholder like {name} or {count}. The interpolation library then either crashes, displays the raw placeholder, or silently substitutes an empty string.
Pluralization errors are common and subtle. English has two plural forms (one item, two items). Russian has four. Arabic has six. Polish has four. A developer who implements pluralization for English and then ships to Arabic without updating the plural rules will show incorrect or crashing output for certain counts.
HTML injection occurs when translators introduce HTML markup that was not intended, or when variable content that contains user-supplied data is interpolated into a string that is then rendered as HTML without escaping.
Detecting Missing and Extra Keys
The simplest audit is a key comparison: compare every key in your source locale (typically English) against every target locale and report the differences.
JavaScript Implementation
// scripts/audit-translations.js
const fs = require('fs');
const path = require('path');
function flattenKeys(obj, prefix = '') {
const result = [];
for (const [key, value] of Object.entries(obj)) {
const fullKey = prefix ? `${prefix}.${key}` : key;
if (value !== null && typeof value === 'object' && !Array.isArray(value)) {
result.push(...flattenKeys(value, fullKey));
} else {
result.push(fullKey);
}
}
return result;
}
function auditLocale(sourceLocale, targetLocale, localesDir) {
const sourcePath = path.join(localesDir, `${sourceLocale}.json`);
const targetPath = path.join(localesDir, `${targetLocale}.json`);
const source = JSON.parse(fs.readFileSync(sourcePath, 'utf8'));
const target = JSON.parse(fs.readFileSync(targetPath, 'utf8'));
const sourceKeys = new Set(flattenKeys(source));
const targetKeys = new Set(flattenKeys(target));
const missing = [...sourceKeys].filter(k => !targetKeys.has(k));
const extra = [...targetKeys].filter(k => !sourceKeys.has(k));
return { locale: targetLocale, missing, extra };
}
function auditAllLocales(sourceLocale, localesDir) {
const files = fs.readdirSync(localesDir)
.filter(f => f.endsWith('.json') && !f.startsWith(sourceLocale));
const results = files.map(f => {
const targetLocale = path.basename(f, '.json');
return auditLocale(sourceLocale, targetLocale, localesDir);
});
let hasErrors = false;
for (const { locale, missing, extra } of results) {
if (missing.length > 0) {
console.error(`\n[${locale}] ${missing.length} MISSING keys:`);
missing.forEach(k => console.error(` - ${k}`));
hasErrors = true;
}
if (extra.length > 0) {
console.warn(`\n[${locale}] ${extra.length} EXTRA keys (orphaned):`);
extra.forEach(k => console.warn(` + ${k}`));
}
}
if (hasErrors) {
process.exit(1);
}
console.log('Translation audit passed: all target locales have complete key coverage.');
}
auditAllLocales('en', path.join(__dirname, '../locales'));Python Implementation
# scripts/audit_translations.py
import json
import sys
from pathlib import Path
def flatten_keys(obj: dict, prefix: str = '') -> list[str]:
keys = []
for key, value in obj.items():
full_key = f'{prefix}.{key}' if prefix else key
if isinstance(value, dict):
keys.extend(flatten_keys(value, full_key))
else:
keys.append(full_key)
return keys
def audit_locale(source_path: Path, target_path: Path) -> dict:
with source_path.open(encoding='utf-8') as f:
source = json.load(f)
with target_path.open(encoding='utf-8') as f:
target = json.load(f)
source_keys = set(flatten_keys(source))
target_keys = set(flatten_keys(target))
return {
'locale': target_path.stem,
'missing': sorted(source_keys - target_keys),
'extra': sorted(target_keys - source_keys),
}
def main(locales_dir: str, source_locale: str = 'en'):
locales_path = Path(locales_dir)
source_path = locales_path / f'{source_locale}.json'
target_files = [
f for f in locales_path.glob('*.json')
if f.stem != source_locale
]
has_errors = False
for target_path in sorted(target_files):
result = audit_locale(source_path, target_path)
if result['missing']:
print(f"\n[{result['locale']}] {len(result['missing'])} MISSING keys:")
for key in result['missing']:
print(f" - {key}")
has_errors = True
if result['extra']:
print(f"\n[{result['locale']}] {len(result['extra'])} EXTRA keys (orphaned):")
for key in result['extra']:
print(f" + {key}")
if has_errors:
sys.exit(1)
else:
print('Translation audit passed.')
if __name__ == '__main__':
main('locales')Variable Placeholder Validation
Every translation string that contains a variable placeholder must preserve that placeholder exactly. The audit should extract all placeholders from the source string and verify they exist in the corresponding translation.
ICU MessageFormat uses {variableName} syntax. Python's .format() style uses {name} or %(name)s. GNU gettext uses %s, %d, %1$s. Whatever your format, the pattern is the same.
// Check that all variables in source strings are present in translations
const ICU_PLACEHOLDER = /\{[a-zA-Z_][a-zA-Z0-9_]*(?:,\s*[^}]*)?\}/g;
function extractPlaceholders(str) {
if (typeof str !== 'string') return new Set();
// Extract simple variable names, stripping ICU format specs
const matches = str.match(ICU_PLACEHOLDER) || [];
return new Set(matches.map(m => m.match(/\{([a-zA-Z_][a-zA-Z0-9_]*)/)[1]));
}
function auditPlaceholders(sourceObj, targetObj, locale, prefix = '') {
const errors = [];
for (const [key, sourceValue] of Object.entries(sourceObj)) {
const fullKey = prefix ? `${prefix}.${key}` : key;
const targetValue = targetObj[key];
if (targetValue === undefined) continue; // Already caught by key audit
if (typeof sourceValue === 'string' && typeof targetValue === 'string') {
const sourcePlaceholders = extractPlaceholders(sourceValue);
const targetPlaceholders = extractPlaceholders(targetValue);
const missingInTarget = [...sourcePlaceholders].filter(p => !targetPlaceholders.has(p));
const extraInTarget = [...targetPlaceholders].filter(p => !sourcePlaceholders.has(p));
if (missingInTarget.length > 0) {
errors.push({
key: fullKey,
locale,
issue: 'MISSING_PLACEHOLDER',
details: `Placeholder(s) ${missingInTarget.map(p => `{${p}}`).join(', ')} present in source but absent in translation`,
source: sourceValue,
translation: targetValue,
});
}
if (extraInTarget.length > 0) {
errors.push({
key: fullKey,
locale,
issue: 'EXTRA_PLACEHOLDER',
details: `Placeholder(s) ${extraInTarget.map(p => `{${p}}`).join(', ')} present in translation but not in source`,
source: sourceValue,
translation: targetValue,
});
}
} else if (typeof sourceValue === 'object' && sourceValue !== null) {
errors.push(...auditPlaceholders(sourceValue, targetValue || {}, locale, fullKey));
}
}
return errors;
}Pluralization Rules and Testing
Pluralization is the most complex area of i18n testing. The CLDR (Common Locale Data Repository) defines plural categories for every language: zero, one, two, few, many, other. Most languages use only a subset of these categories.
| Language | Categories | Rules |
|---|---|---|
| English | one, other | one=1; other=everything else |
| German | one, other | Same as English |
| French | one, many, other | one=0,1; many=large numbers; other=rest |
| Russian | one, few, many, other | Complex modulo rules |
| Arabic | zero, one, two, few, many, other | All 6 categories |
| Chinese | other | Single form for all quantities |
| Polish | one, few, many, other | Complex rules including 12, 13, 14 exceptions |
Testing Pluralization With ICU MessageFormat
The ICU MessageFormat syntax for pluralization:
{count, plural,
=0 {No messages}
one {# message}
other {# messages}
}import MessageFormat from '@messageformat/core';
describe('pluralization correctness', () => {
describe('English plural forms', () => {
const mf = new MessageFormat('en');
const template = mf.compile('{count, plural, one {# item} other {# items}}');
test.each([
[0, '0 items'],
[1, '1 item'],
[2, '2 items'],
[11, '11 items'], // NOT "11 item" — English "one" rule only applies to n=1
[100, '100 items'],
])('count=%d → %s', (count, expected) => {
expect(template({ count })).toBe(expected);
});
});
describe('Russian plural forms', () => {
const mf = new MessageFormat('ru');
// Russian: 1=один, 2-4=мало, 5-20=много, 21=один, 22-24=мало...
const template = mf.compile(
'{count, plural, one {# элемент} few {# элемента} many {# элементов} other {# элемента}}'
);
test.each([
[1, '1 элемент'],
[2, '2 элемента'],
[5, '5 элементов'],
[11, '11 элементов'], // 11 is "many", not "one"
[21, '21 элемент'], // 21 is "one"
[22, '22 элемента'], // 22 is "few"
[25, '25 элементов'], // 25 is "many"
[101, '101 элемент'], // 101 is "one"
])('count=%d → %s', (count, expected) => {
expect(template({ count })).toBe(expected);
});
});
describe('Arabic — all six plural categories', () => {
const mf = new MessageFormat('ar');
const template = mf.compile(
'{count, plural, =0 {لا رسائل} one {رسالة واحدة} two {رسالتان} few {# رسائل} many {# رسالة} other {# رسالة}}'
);
test('zero has special form', () => {
expect(template({ count: 0 })).toBe('لا رسائل');
});
test('count=1 uses one form', () => {
expect(template({ count: 1 })).toBe('رسالة واحدة');
});
test('count=2 uses dual form', () => {
expect(template({ count: 2 })).toBe('رسالتان');
});
test('count=3 uses few form (3-10)', () => {
expect(template({ count: 3 })).toContain('رسائل');
});
});
});HTML in Translations
Some translation strings contain HTML markup — bold text, links, line breaks. This creates two problems.
First, the HTML must survive the translation process without being mangled. Translators working in CAT tools sometimes rearrange or strip tags.
Second, any variable that is interpolated into an HTML-containing string must be HTML-escaped unless you explicitly intend to render HTML from user data (which is almost always a security mistake).
// Safe rendering of translations that contain HTML
// Use react-i18next's Trans component, not dangerouslySetInnerHTML
// WRONG:
<div dangerouslySetInnerHTML={{ __html: t('welcome.message', { name: user.name }) }} />
// If user.name = '<script>alert(1)</script>', you have XSS
// RIGHT with React:
import { Trans } from 'react-i18next';
<Trans i18nKey="welcome.message" values={{ name: user.name }}>
Welcome, <strong>{{name}}</strong>!
</Trans>
// The Trans component handles escaping and treats name as a text node, not HTML// Test: verify HTML is escaped in translations with user-supplied values
describe('XSS prevention in translated strings', () => {
test('user name with HTML characters is escaped', () => {
const output = renderTranslation('welcome.message', {
name: '<script>alert(1)</script>',
});
expect(output).not.toContain('<script>');
expect(output).toContain('<script>');
});
test('translation HTML tags are preserved', () => {
const output = renderTranslation('welcome.message', { name: 'Alice' });
expect(output).toContain('<strong>Alice</strong>');
});
});Translation File Format Validation
Translation files should be validated against a schema before being merged. For JSON files, this is straightforward with AJV or a similar validator.
// scripts/validate-translation-format.js
const Ajv = require('ajv');
const fs = require('fs');
// Recursively validate that all leaf values are strings
const translationSchema = {
type: 'object',
additionalProperties: {
oneOf: [
{ type: 'string' },
{ $ref: '#' }, // Recursive nested objects are allowed
],
},
};
const ajv = new Ajv({ allowUnionTypes: true });
const validate = ajv.compile(translationSchema);
function validateFile(filePath) {
const data = JSON.parse(fs.readFileSync(filePath, 'utf8'));
if (!validate(data)) {
console.error(`${filePath}: Schema validation failed`);
validate.errors.forEach(e => console.error(` ${e.instancePath}: ${e.message}`));
return false;
}
return true;
}For .po files (GNU gettext format, used in many Python/Django/PHP projects), use a dedicated parser to validate the file structure and check for fuzzy entries that indicate untranslated or machine-translated strings:
import polib
from pathlib import Path
def audit_po_file(po_path: Path) -> dict:
po = polib.pofile(str(po_path))
return {
'file': str(po_path),
'total': len(po),
'translated': len(po.translated_entries()),
'untranslated': len(po.untranslated_entries()),
'fuzzy': len(po.fuzzy_entries()), # Machine-translated, needs review
'percent_done': po.percent_translated(),
}
def check_translation_completeness(locales_dir: str, min_percent: float = 95.0):
for po_file in Path(locales_dir).rglob('*.po'):
result = audit_po_file(po_file)
if result['percent_done'] < min_percent:
print(f"FAIL {po_file}: {result['percent_done']:.1f}% translated "
f"({result['untranslated']} missing, {result['fuzzy']} fuzzy)")
if result['fuzzy'] > 0:
print(f"WARN {po_file}: {result['fuzzy']} fuzzy entries need human review")Integrating With Linting Tools
i18n-ally (VS Code extension) provides real-time highlighting of missing translation keys while you type. It reads your translation files and highlights any t('key.that.does.not.exist') calls in red before you even run your tests.
eslint-plugin-i18n-json validates JSON translation files as part of your lint step:
{
"plugins": ["i18n-json"],
"rules": {
"i18n-json/valid-json": "error",
"i18n-json/valid-message-syntax": ["error", {
"syntax": "icu"
}],
"i18n-json/identical-keys": ["error", {
"filePath": "locales/en.json"
}]
}
}eslint-plugin-i18n catches calls to t() with string literals that are not in the translation files, and also catches string literals in JSX that should be translated but are not.
CI/CD Integration
# .github/workflows/i18n-audit.yml
name: Translation Audit
on: [pull_request]
jobs:
audit:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Node.js
uses: actions/setup-node@v4
with:
node-version: '20'
- run: npm ci
- name: Validate translation file format
run: node scripts/validate-translation-format.js
- name: Audit missing and extra keys
run: node scripts/audit-translations.js
- name: Audit variable placeholders
run: node scripts/audit-placeholders.js
- name: Run pluralization unit tests
run: npm test -- --testPathPattern="i18n"
- name: Check translation completeness threshold
run: node scripts/check-completeness.js --min-percent=90Practical Checklist
Before merging any PR that adds or modifies translation strings:
- Key added to English source file
- Key added to all target locale files (or placeholder added)
- All
{placeholder}variables present in every translation - Plural forms defined for all CLDR categories required by each target locale
- HTML content in strings uses safe rendering (Trans component, not dangerouslySetInnerHTML)
- User-supplied data interpolated as text, not HTML
- Translation file passes schema validation
- Audit script passes with zero errors in CI
- No fuzzy entries in
.pofiles (if applicable)
Translation file bugs are the most preventable category of i18n defects. Unlike timezone edge cases or cultural formatting rules, they have a clear mechanical definition: a key exists in one file but not another, or a placeholder is present in one string but absent in its counterpart. That makes them ideal candidates for automated checking — there is no ambiguity about whether the check should pass or fail.