Pseudo-Localization Testing: Catch i18n Bugs Without a Real Translation

Pseudo-Localization Testing: Catch i18n Bugs Without a Real Translation

Pseudo-localization is one of the most underused techniques in software quality assurance, yet it consistently catches categories of bugs that code review and unit tests miss entirely. The core idea is simple: instead of waiting for real translations to discover that your UI breaks in German, or that your date formatting logic hardcodes English month names, you simulate the effects of translation during development using algorithmically generated fake text.

This guide covers what pseudo-localization is, how to implement it, what bugs it catches, and how to integrate it into your CI/CD pipeline.

What Is Pseudo-Localization?

Pseudo-localization replaces each character in your source strings with visually similar but accented or extended Unicode characters. The result looks vaguely foreign to an English speaker but is still readable — which matters because developers need to inspect the UI, not just run automated checks.

A typical pseudo-localization transformation does three things:

  1. Character substitution — replaces ASCII letters with accented equivalents (aà, eë, oô)
  2. String expansion — pads strings to simulate the extra length of translated text (German typically runs 30% longer than English; Finnish can run 50% longer)
  3. Boundary markers — wraps each string in brackets or other delimiters so you can immediately spot any string that was not processed, revealing hardcoded text

For example, the string "Submit order" becomes something like "[Šüƀmïƭ öřđëř_____]". The brackets signal the start and end of a translatable unit. The underscores represent the expansion padding. If you see "Submit order" anywhere in a pseudo-localized build, you know that string is hardcoded and will never be translated.

Why Bother? The Bugs Pseudo-Localization Catches

Before diving into implementation, it helps to understand what you are actually hunting for.

Hardcoded Strings

The most obvious target. Any developer who writes <button>Submit</button> instead of <button>{t('submit')}</button> will immediately see their button display non-pseudo-localized text in a pseudo-localized build. These bugs are trivially obvious once you see the raw English string surrounded by accented neighbors.

Layout and Truncation Bugs

German UI text routinely runs 30–40% longer than English. Finnish and Dutch can be even worse. A button that fits the word "Settings" will truncate the German "Einstellungen" if you have set a fixed width. Pseudo-localization's string expansion catches this without waiting for a German speaker to file a bug report.

A navigation menu that reads "[Fïlë___] [Ëđïƭ___] [Vïëω___] [Hëlƥ___]" tells you immediately whether your flex layout handles longer strings gracefully or whether items start wrapping or overflowing.

Concatenated Strings

A common anti-pattern: "Hello, " + username + "! You have " + count + " messages." — this works in English but fails in most other languages because grammatical word order varies. When pseudo-localized, this becomes visually obvious because only the static parts get transformed; the dynamic parts remain plain, creating a jarring mix of styles. Automated tools can also detect concatenation patterns in source.

Locale-Sensitive Comparisons and Sorts

If your code does str.toLowerCase() or str.localeCompare() without specifying a locale, you may get incorrect results in Turkish (where I.toLowerCase() is ı, not i). Pseudo-localization does not catch this directly, but running integration tests against a pseudo-locale forces your locale-sensitive code paths to execute with a non-English locale.

Missing Pluralization Logic

English has two plural forms: singular and plural. Russian has four. Arabic has six. If your code does count + " items" without using a pluralization library, a pseudo-localized test run that checks plural forms will expose this.

Implementing Pseudo-Localization in JavaScript

The pseudolocalization npm package provides a straightforward API. Install it with:

npm install pseudolocalization

Then transform your i18n JSON files at build time:

const { pseudolocalize } = require('pseudolocalization');
const fs = require('fs');
const path = require('path');

function transformJsonValues(obj) {
  if (typeof obj === 'string') {
    return pseudolocalize(obj);
  }
  if (Array.isArray(obj)) {
    return obj.map(transformJsonValues);
  }
  if (obj !== null && typeof obj === 'object') {
    const result = {};
    for (const [key, value] of Object.entries(obj)) {
      result[key] = transformJsonValues(value);
    }
    return result;
  }
  return obj;
}

function generatePseudoLocale(sourceFile, outputFile) {
  const source = JSON.parse(fs.readFileSync(sourceFile, 'utf8'));
  const pseudolocalized = transformJsonValues(source);
  fs.writeFileSync(outputFile, JSON.stringify(pseudolocalized, null, 2), 'utf8');
  console.log(`Generated pseudo-locale: ${outputFile}`);
}

// Usage
generatePseudoLocale(
  path.join(__dirname, 'locales/en.json'),
  path.join(__dirname, 'locales/qps-ploc.json')
);

The conventional locale code for pseudo-localization is qps-ploc (used by Microsoft) or en-XA (used by Android). Using a real locale code for your fake locale means your i18n framework handles it like any other locale without special casing.

Custom Character Map

If you want more control over the transformation — for example, to ensure markers are always present, or to control expansion ratio — you can implement the transformation yourself:

const CHAR_MAP = {
  a: 'à', b: 'ƀ', c: 'ć', d: 'đ', e: 'ë', f: 'ƒ', g: 'ĝ',
  h: 'ĥ', i: 'ï', j: 'ĵ', k: 'ķ', l: 'ĺ', m: 'mˈ', n: 'ñ',
  o: 'ô', p: 'ƥ', q: 'q̃', r: 'ř', s: 'š', t: 'ƭ', u: 'û',
  v: 'v̈', w: 'ω', x: 'x̃', y: 'ý', z: 'ž',
  A: 'À', B: 'Ɓ', C: 'Ć', D: 'Đ', E: 'Ë', F: 'Ƒ', G: 'Ĝ',
  H: 'Ĥ', I: 'Ï', J: 'Ĵ', K: 'Ķ', L: 'Ĺ', M: 'Mˈ', N: 'Ñ',
  O: 'Ô', P: 'Ƥ', Q: 'Q̃', R: 'Ř', S: 'Š', T: 'Ƭ', U: 'Û',
  V: 'V̈', W: 'Ω', X: 'X̃', Y: 'Ý', Z: 'Ž',
};

const EXPANSION_RATIO = 0.4; // 40% expansion

function pseudolocalizeString(str) {
  // Don't transform ICU placeholders like {name} or {count, plural, ...}
  // Split on placeholders and only transform the literal parts
  const parts = str.split(/(\{[^}]+\})/g);
  
  const transformed = parts.map((part, index) => {
    if (index % 2 === 1) return part; // Placeholder, leave untouched
    return part.split('').map(ch => CHAR_MAP[ch] || ch).join('');
  }).join('');

  // Add expansion padding
  const padding = '_'.repeat(Math.ceil(str.length * EXPANSION_RATIO));
  
  return `[${transformed}${padding}]`;
}

Note the handling of ICU placeholders like {name} and {count, plural, ...} — these must be preserved exactly as-is, or your i18n library will fail to interpolate them.

Python Implementation for Backend String Files

If your backend uses Python with .po files or a JSON catalog:

import json
import re
from pathlib import Path

CHAR_MAP = {
    'a': 'à', 'b': 'ƀ', 'c': 'ć', 'd': 'đ', 'e': 'ë', 'f': 'ƒ',
    'g': 'ĝ', 'h': 'ĥ', 'i': 'ï', 'j': 'ĵ', 'k': 'ķ', 'l': 'ĺ',
    'm': 'mˈ', 'n': 'ñ', 'o': 'ô', 'p': 'ƥ', 'r': 'ř', 's': 'š',
    't': 'ƭ', 'u': 'û', 'v': 'v̈', 'w': 'ω', 'x': 'x̃', 'y': 'ý', 'z': 'ž',
}
CHAR_MAP.update({k.upper(): v.upper() for k, v in CHAR_MAP.items()})

PLACEHOLDER_PATTERN = re.compile(r'(\{[^}]+\}|%\([^)]+\)s|%[sd])')

def pseudolocalize(text: str, expansion: float = 0.4) -> str:
    """Transform a string for pseudo-localization, preserving placeholders."""
    parts = PLACEHOLDER_PATTERN.split(text)
    transformed_parts = []
    
    for i, part in enumerate(parts):
        if i % 2 == 1:
            # This is a placeholder match — preserve it
            transformed_parts.append(part)
        else:
            transformed_parts.append(''.join(CHAR_MAP.get(c, c) for c in part))
    
    transformed = ''.join(transformed_parts)
    padding = '_' * int(len(text) * expansion)
    return f'[{transformed}{padding}]'


def pseudolocalize_json(source_path: Path, output_path: Path) -> None:
    """Generate a pseudo-localized version of a JSON translation file."""
    with source_path.open(encoding='utf-8') as f:
        data = json.load(f)
    
    def transform(obj):
        if isinstance(obj, str):
            return pseudolocalize(obj)
        if isinstance(obj, dict):
            return {k: transform(v) for k, v in obj.items()}
        if isinstance(obj, list):
            return [transform(item) for item in obj]
        return obj
    
    result = transform(data)
    output_path.parent.mkdir(parents=True, exist_ok=True)
    
    with output_path.open('w', encoding='utf-8') as f:
        json.dump(result, f, ensure_ascii=False, indent=2)
    
    print(f'Pseudo-locale written to {output_path}')


if __name__ == '__main__':
    pseudolocalize_json(
        Path('locales/en.json'),
        Path('locales/qps-ploc.json'),
    )

Platform-Specific Tools

Android has built-in pseudo-localization support. In gradle.properties, set:

android.enablePseudoLocalesInDebugBuilds=true

This generates two pseudo-locales: en-XA (Latin with accents and expansion) and ar-XB (bidirectional text, simulating RTL languages). You get pseudo-localization for free in debug builds.

iOS does not have native pseudo-localization, but you can add a custom qps-ploc.lproj directory alongside your other .lproj directories and populate it using a script similar to the one above.

Google's pseudo-localization tool is available as a standalone Java JAR and is what Android Studio uses internally. It can process Android XML resource files directly:

java -jar pseudolocalize.jar --input res/values/ --output res/values-qps-ploc/

Integrating Into CI/CD

The most effective approach is to generate the pseudo-locale as part of your build and run your existing end-to-end test suite against it. This does not require writing new tests — just point your existing tests at the pseudo-locale.

# .github/workflows/i18n-check.yml
name: i18n Pseudo-Localization Check

on: [pull_request]

jobs:
  pseudo-l10n:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Set up Node.js
        uses: actions/setup-node@v4
        with:
          node-version: '20'
      
      - name: Install dependencies
        run: npm ci
      
      - name: Generate pseudo-locale
        run: node scripts/generate-pseudo-locale.js
      
      - name: Run E2E tests with pseudo-locale
        run: npx playwright test --project=pseudo-l10n
        env:
          LOCALE: qps-ploc
          BASE_URL: http://localhost:3000
      
      - name: Check for untransformed strings
        run: node scripts/check-hardcoded-strings.js

The check-hardcoded-strings.js script can scrape your app with Playwright in headless mode and look for text nodes that contain only ASCII Latin characters — a reliable signal that a string was not routed through your i18n system.

Automated Detection of Hardcoded Strings

const { chromium } = require('@playwright/test');

async function findHardcodedStrings(url) {
  const browser = await chromium.launch();
  const page = await browser.newPage();
  
  // Load the app with pseudo-locale active
  await page.goto(url, { locale: 'qps-ploc' });
  
  // Find all text nodes that contain only ASCII letters
  // These were not processed by the i18n system
  const hardcoded = await page.evaluate(() => {
    const results = [];
    const walker = document.createTreeWalker(
      document.body,
      NodeFilter.SHOW_TEXT,
      null
    );
    
    let node;
    while ((node = walker.nextNode())) {
      const text = node.textContent.trim();
      // Skip whitespace-only, numbers, and punctuation
      if (text.length > 2 && /^[a-zA-Z][a-zA-Z\s]+$/.test(text)) {
        results.push({
          text,
          parent: node.parentElement?.tagName,
          id: node.parentElement?.id,
          class: node.parentElement?.className,
        });
      }
    }
    return results;
  });
  
  await browser.close();
  
  if (hardcoded.length > 0) {
    console.error('Hardcoded strings found (not routed through i18n):');
    hardcoded.forEach(({ text, parent, id, class: cls }) => {
      console.error(`  "${text}" in <${parent}#${id}.${cls}>`);
    });
    process.exit(1);
  }
  
  console.log('No hardcoded strings detected.');
}

findHardcodedStrings('http://localhost:3000');

What Pseudo-Localization Does Not Catch

It is worth being clear about the limits. Pseudo-localization catches structural and layout bugs, but it does not validate actual translations. You need native speakers for:

  • Tone and formality — many languages have formal and informal registers that English does not distinguish
  • Gender agreement — Romance and Slavic languages require gender agreement between nouns, adjectives, and verbs
  • Cultural appropriateness — icons, colors, and idioms that work in one culture may be confusing or offensive in another
  • Right-to-left layout bugs — while Android's ar-XB pseudo-locale helps, full RTL testing requires a real Arabic or Hebrew build

Think of pseudo-localization as your first line of defense — an automated, always-on check that catches the mechanical bugs early, so human translators can focus on the nuanced linguistic problems that automation cannot detect.

Practical Checklist

Before shipping any feature that touches UI strings:

  • Pseudo-locale file generated and up to date
  • App renders correctly in pseudo-locale (no overflow, no truncation)
  • No hardcoded strings detected by automated scan
  • All string concatenations replaced with ICU MessageFormat patterns
  • Plural forms handled via i18n library, not conditional logic
  • Boundary markers [...] visible and complete on all strings
  • CI pipeline runs E2E tests against pseudo-locale on every PR

Pseudo-localization is cheap to set up, free to run in CI, and consistently finds bugs that would otherwise surface only after strings reach human translators — at which point they are far more expensive to fix. Add it to your workflow once and it pays dividends on every feature you ship.

Read more