Testing Translation Strings: Completeness, Pluralization, and Variables

Testing Translation Strings: Completeness, Pluralization, and Variables

Translation files are the source of truth for your localized UI, yet they are often the least tested part of a software project. Developers add keys in English and hand them off to translators, but nobody systematically verifies that the translation files stay in sync, that pluralization rules are correct, or that variable placeholders survive the translation process intact.

This guide covers automated testing strategies for translation files — the kind that catch real bugs before they reach production users.

The Categories of Translation Bugs

Before writing tests, it helps to enumerate what you are looking for.

Missing keys occur when a developer adds a new string to the English source file but the corresponding key is absent from one or more translation files. The result in production is usually the raw key name being shown to users (common.save_button instead of "Enregistrer") or a fallback to English — both of which look broken.

Extra keys occur in the opposite direction: a translation file contains keys that no longer exist in the source. These are usually orphaned after a refactor. They do not cause visible bugs but they waste translator effort and create maintenance noise.

Variable interpolation failures happen when a translator accidentally removes, renames, or reorders a placeholder like {name} or {count}. The interpolation library then either crashes, displays the raw placeholder, or silently substitutes an empty string.

Pluralization errors are common and subtle. English has two plural forms (one item, two items). Russian has four. Arabic has six. Polish has four. A developer who implements pluralization for English and then ships to Arabic without updating the plural rules will show incorrect or crashing output for certain counts.

HTML injection occurs when translators introduce HTML markup that was not intended, or when variable content that contains user-supplied data is interpolated into a string that is then rendered as HTML without escaping.

Detecting Missing and Extra Keys

The simplest audit is a key comparison: compare every key in your source locale (typically English) against every target locale and report the differences.

JavaScript Implementation

// scripts/audit-translations.js
const fs = require('fs');
const path = require('path');

function flattenKeys(obj, prefix = '') {
  const result = [];
  for (const [key, value] of Object.entries(obj)) {
    const fullKey = prefix ? `${prefix}.${key}` : key;
    if (value !== null && typeof value === 'object' && !Array.isArray(value)) {
      result.push(...flattenKeys(value, fullKey));
    } else {
      result.push(fullKey);
    }
  }
  return result;
}

function auditLocale(sourceLocale, targetLocale, localesDir) {
  const sourcePath = path.join(localesDir, `${sourceLocale}.json`);
  const targetPath = path.join(localesDir, `${targetLocale}.json`);

  const source = JSON.parse(fs.readFileSync(sourcePath, 'utf8'));
  const target = JSON.parse(fs.readFileSync(targetPath, 'utf8'));

  const sourceKeys = new Set(flattenKeys(source));
  const targetKeys = new Set(flattenKeys(target));

  const missing = [...sourceKeys].filter(k => !targetKeys.has(k));
  const extra = [...targetKeys].filter(k => !sourceKeys.has(k));

  return { locale: targetLocale, missing, extra };
}

function auditAllLocales(sourceLocale, localesDir) {
  const files = fs.readdirSync(localesDir)
    .filter(f => f.endsWith('.json') && !f.startsWith(sourceLocale));
  
  const results = files.map(f => {
    const targetLocale = path.basename(f, '.json');
    return auditLocale(sourceLocale, targetLocale, localesDir);
  });

  let hasErrors = false;

  for (const { locale, missing, extra } of results) {
    if (missing.length > 0) {
      console.error(`\n[${locale}] ${missing.length} MISSING keys:`);
      missing.forEach(k => console.error(`  - ${k}`));
      hasErrors = true;
    }
    if (extra.length > 0) {
      console.warn(`\n[${locale}] ${extra.length} EXTRA keys (orphaned):`);
      extra.forEach(k => console.warn(`  + ${k}`));
    }
  }

  if (hasErrors) {
    process.exit(1);
  }

  console.log('Translation audit passed: all target locales have complete key coverage.');
}

auditAllLocales('en', path.join(__dirname, '../locales'));

Python Implementation

# scripts/audit_translations.py
import json
import sys
from pathlib import Path


def flatten_keys(obj: dict, prefix: str = '') -> list[str]:
    keys = []
    for key, value in obj.items():
        full_key = f'{prefix}.{key}' if prefix else key
        if isinstance(value, dict):
            keys.extend(flatten_keys(value, full_key))
        else:
            keys.append(full_key)
    return keys


def audit_locale(source_path: Path, target_path: Path) -> dict:
    with source_path.open(encoding='utf-8') as f:
        source = json.load(f)
    with target_path.open(encoding='utf-8') as f:
        target = json.load(f)

    source_keys = set(flatten_keys(source))
    target_keys = set(flatten_keys(target))

    return {
        'locale': target_path.stem,
        'missing': sorted(source_keys - target_keys),
        'extra': sorted(target_keys - source_keys),
    }


def main(locales_dir: str, source_locale: str = 'en'):
    locales_path = Path(locales_dir)
    source_path = locales_path / f'{source_locale}.json'

    target_files = [
        f for f in locales_path.glob('*.json')
        if f.stem != source_locale
    ]

    has_errors = False
    for target_path in sorted(target_files):
        result = audit_locale(source_path, target_path)

        if result['missing']:
            print(f"\n[{result['locale']}] {len(result['missing'])} MISSING keys:")
            for key in result['missing']:
                print(f"  - {key}")
            has_errors = True

        if result['extra']:
            print(f"\n[{result['locale']}] {len(result['extra'])} EXTRA keys (orphaned):")
            for key in result['extra']:
                print(f"  + {key}")

    if has_errors:
        sys.exit(1)
    else:
        print('Translation audit passed.')


if __name__ == '__main__':
    main('locales')

Variable Placeholder Validation

Every translation string that contains a variable placeholder must preserve that placeholder exactly. The audit should extract all placeholders from the source string and verify they exist in the corresponding translation.

ICU MessageFormat uses {variableName} syntax. Python's .format() style uses {name} or %(name)s. GNU gettext uses %s, %d, %1$s. Whatever your format, the pattern is the same.

// Check that all variables in source strings are present in translations
const ICU_PLACEHOLDER = /\{[a-zA-Z_][a-zA-Z0-9_]*(?:,\s*[^}]*)?\}/g;

function extractPlaceholders(str) {
  if (typeof str !== 'string') return new Set();
  // Extract simple variable names, stripping ICU format specs
  const matches = str.match(ICU_PLACEHOLDER) || [];
  return new Set(matches.map(m => m.match(/\{([a-zA-Z_][a-zA-Z0-9_]*)/)[1]));
}

function auditPlaceholders(sourceObj, targetObj, locale, prefix = '') {
  const errors = [];
  
  for (const [key, sourceValue] of Object.entries(sourceObj)) {
    const fullKey = prefix ? `${prefix}.${key}` : key;
    const targetValue = targetObj[key];
    
    if (targetValue === undefined) continue; // Already caught by key audit
    
    if (typeof sourceValue === 'string' && typeof targetValue === 'string') {
      const sourcePlaceholders = extractPlaceholders(sourceValue);
      const targetPlaceholders = extractPlaceholders(targetValue);
      
      const missingInTarget = [...sourcePlaceholders].filter(p => !targetPlaceholders.has(p));
      const extraInTarget = [...targetPlaceholders].filter(p => !sourcePlaceholders.has(p));
      
      if (missingInTarget.length > 0) {
        errors.push({
          key: fullKey,
          locale,
          issue: 'MISSING_PLACEHOLDER',
          details: `Placeholder(s) ${missingInTarget.map(p => `{${p}}`).join(', ')} present in source but absent in translation`,
          source: sourceValue,
          translation: targetValue,
        });
      }
      
      if (extraInTarget.length > 0) {
        errors.push({
          key: fullKey,
          locale,
          issue: 'EXTRA_PLACEHOLDER',
          details: `Placeholder(s) ${extraInTarget.map(p => `{${p}}`).join(', ')} present in translation but not in source`,
          source: sourceValue,
          translation: targetValue,
        });
      }
    } else if (typeof sourceValue === 'object' && sourceValue !== null) {
      errors.push(...auditPlaceholders(sourceValue, targetValue || {}, locale, fullKey));
    }
  }
  
  return errors;
}

Pluralization Rules and Testing

Pluralization is the most complex area of i18n testing. The CLDR (Common Locale Data Repository) defines plural categories for every language: zero, one, two, few, many, other. Most languages use only a subset of these categories.

Language Categories Rules
English one, other one=1; other=everything else
German one, other Same as English
French one, many, other one=0,1; many=large numbers; other=rest
Russian one, few, many, other Complex modulo rules
Arabic zero, one, two, few, many, other All 6 categories
Chinese other Single form for all quantities
Polish one, few, many, other Complex rules including 12, 13, 14 exceptions

Testing Pluralization With ICU MessageFormat

The ICU MessageFormat syntax for pluralization:

{count, plural,
  =0 {No messages}
  one {# message}
  other {# messages}
}
import MessageFormat from '@messageformat/core';

describe('pluralization correctness', () => {
  describe('English plural forms', () => {
    const mf = new MessageFormat('en');
    const template = mf.compile('{count, plural, one {# item} other {# items}}');

    test.each([
      [0, '0 items'],
      [1, '1 item'],
      [2, '2 items'],
      [11, '11 items'],  // NOT "11 item" — English "one" rule only applies to n=1
      [100, '100 items'],
    ])('count=%d → %s', (count, expected) => {
      expect(template({ count })).toBe(expected);
    });
  });

  describe('Russian plural forms', () => {
    const mf = new MessageFormat('ru');
    // Russian: 1=один, 2-4=мало, 5-20=много, 21=один, 22-24=мало...
    const template = mf.compile(
      '{count, plural, one {# элемент} few {# элемента} many {# элементов} other {# элемента}}'
    );

    test.each([
      [1, '1 элемент'],
      [2, '2 элемента'],
      [5, '5 элементов'],
      [11, '11 элементов'],  // 11 is "many", not "one"
      [21, '21 элемент'],    // 21 is "one"
      [22, '22 элемента'],   // 22 is "few"
      [25, '25 элементов'],  // 25 is "many"
      [101, '101 элемент'],  // 101 is "one"
    ])('count=%d → %s', (count, expected) => {
      expect(template({ count })).toBe(expected);
    });
  });

  describe('Arabic — all six plural categories', () => {
    const mf = new MessageFormat('ar');
    const template = mf.compile(
      '{count, plural, =0 {لا رسائل} one {رسالة واحدة} two {رسالتان} few {# رسائل} many {# رسالة} other {# رسالة}}'
    );

    test('zero has special form', () => {
      expect(template({ count: 0 })).toBe('لا رسائل');
    });

    test('count=1 uses one form', () => {
      expect(template({ count: 1 })).toBe('رسالة واحدة');
    });

    test('count=2 uses dual form', () => {
      expect(template({ count: 2 })).toBe('رسالتان');
    });

    test('count=3 uses few form (3-10)', () => {
      expect(template({ count: 3 })).toContain('رسائل');
    });
  });
});

HTML in Translations

Some translation strings contain HTML markup — bold text, links, line breaks. This creates two problems.

First, the HTML must survive the translation process without being mangled. Translators working in CAT tools sometimes rearrange or strip tags.

Second, any variable that is interpolated into an HTML-containing string must be HTML-escaped unless you explicitly intend to render HTML from user data (which is almost always a security mistake).

// Safe rendering of translations that contain HTML
// Use react-i18next's Trans component, not dangerouslySetInnerHTML

// WRONG:
<div dangerouslySetInnerHTML={{ __html: t('welcome.message', { name: user.name }) }} />
// If user.name = '<script>alert(1)</script>', you have XSS

// RIGHT with React:
import { Trans } from 'react-i18next';
<Trans i18nKey="welcome.message" values={{ name: user.name }}>
  Welcome, <strong>{{name}}</strong>!
</Trans>

// The Trans component handles escaping and treats name as a text node, not HTML
// Test: verify HTML is escaped in translations with user-supplied values
describe('XSS prevention in translated strings', () => {
  test('user name with HTML characters is escaped', () => {
    const output = renderTranslation('welcome.message', {
      name: '<script>alert(1)</script>',
    });
    expect(output).not.toContain('<script>');
    expect(output).toContain('&lt;script&gt;');
  });

  test('translation HTML tags are preserved', () => {
    const output = renderTranslation('welcome.message', { name: 'Alice' });
    expect(output).toContain('<strong>Alice</strong>');
  });
});

Translation File Format Validation

Translation files should be validated against a schema before being merged. For JSON files, this is straightforward with AJV or a similar validator.

// scripts/validate-translation-format.js
const Ajv = require('ajv');
const fs = require('fs');

// Recursively validate that all leaf values are strings
const translationSchema = {
  type: 'object',
  additionalProperties: {
    oneOf: [
      { type: 'string' },
      { $ref: '#' }, // Recursive nested objects are allowed
    ],
  },
};

const ajv = new Ajv({ allowUnionTypes: true });
const validate = ajv.compile(translationSchema);

function validateFile(filePath) {
  const data = JSON.parse(fs.readFileSync(filePath, 'utf8'));
  if (!validate(data)) {
    console.error(`${filePath}: Schema validation failed`);
    validate.errors.forEach(e => console.error(`  ${e.instancePath}: ${e.message}`));
    return false;
  }
  return true;
}

For .po files (GNU gettext format, used in many Python/Django/PHP projects), use a dedicated parser to validate the file structure and check for fuzzy entries that indicate untranslated or machine-translated strings:

import polib
from pathlib import Path

def audit_po_file(po_path: Path) -> dict:
    po = polib.pofile(str(po_path))
    
    return {
        'file': str(po_path),
        'total': len(po),
        'translated': len(po.translated_entries()),
        'untranslated': len(po.untranslated_entries()),
        'fuzzy': len(po.fuzzy_entries()),  # Machine-translated, needs review
        'percent_done': po.percent_translated(),
    }

def check_translation_completeness(locales_dir: str, min_percent: float = 95.0):
    for po_file in Path(locales_dir).rglob('*.po'):
        result = audit_po_file(po_file)
        if result['percent_done'] < min_percent:
            print(f"FAIL {po_file}: {result['percent_done']:.1f}% translated "
                  f"({result['untranslated']} missing, {result['fuzzy']} fuzzy)")
        if result['fuzzy'] > 0:
            print(f"WARN {po_file}: {result['fuzzy']} fuzzy entries need human review")

Integrating With Linting Tools

i18n-ally (VS Code extension) provides real-time highlighting of missing translation keys while you type. It reads your translation files and highlights any t('key.that.does.not.exist') calls in red before you even run your tests.

eslint-plugin-i18n-json validates JSON translation files as part of your lint step:

{
  "plugins": ["i18n-json"],
  "rules": {
    "i18n-json/valid-json": "error",
    "i18n-json/valid-message-syntax": ["error", {
      "syntax": "icu"
    }],
    "i18n-json/identical-keys": ["error", {
      "filePath": "locales/en.json"
    }]
  }
}

eslint-plugin-i18n catches calls to t() with string literals that are not in the translation files, and also catches string literals in JSX that should be translated but are not.

CI/CD Integration

# .github/workflows/i18n-audit.yml
name: Translation Audit

on: [pull_request]

jobs:
  audit:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Set up Node.js
        uses: actions/setup-node@v4
        with:
          node-version: '20'
      
      - run: npm ci
      
      - name: Validate translation file format
        run: node scripts/validate-translation-format.js
      
      - name: Audit missing and extra keys
        run: node scripts/audit-translations.js
      
      - name: Audit variable placeholders
        run: node scripts/audit-placeholders.js
      
      - name: Run pluralization unit tests
        run: npm test -- --testPathPattern="i18n"
      
      - name: Check translation completeness threshold
        run: node scripts/check-completeness.js --min-percent=90

Practical Checklist

Before merging any PR that adds or modifies translation strings:

  • Key added to English source file
  • Key added to all target locale files (or placeholder added)
  • All {placeholder} variables present in every translation
  • Plural forms defined for all CLDR categories required by each target locale
  • HTML content in strings uses safe rendering (Trans component, not dangerouslySetInnerHTML)
  • User-supplied data interpolated as text, not HTML
  • Translation file passes schema validation
  • Audit script passes with zero errors in CI
  • No fuzzy entries in .po files (if applicable)

Translation file bugs are the most preventable category of i18n defects. Unlike timezone edge cases or cultural formatting rules, they have a clear mechanical definition: a key exists in one file but not another, or a placeholder is present in one string but absent in its counterpart. That makes them ideal candidates for automated checking — there is no ambiguity about whether the check should pass or fail.

Read more