Protobuf Testing: Validating Schemas & Serialization

Protobuf Testing: Validating Schemas & Serialization

Protocol Buffers (Protobuf) is not just a serialization format — it's a schema system. When you change a .proto file, you're changing a contract that potentially affects every service that uses it. Testing Protobuf schemas means validating not just that serialization works, but that your changes don't break consumers across language boundaries and service versions.

What Needs Testing in Protobuf

Before jumping to code, clarify what can go wrong:

  1. Serialization correctness — does the message serialize and deserialize without data loss?
  2. Backward compatibility — can the new schema read messages written by the old schema?
  3. Forward compatibility — can the old schema read messages written by the new schema?
  4. Validation — are invalid messages rejected before reaching business logic?
  5. Cross-language compatibility — does a Go-serialized message deserialize correctly in Python?
  6. Default values — are unset fields handled consistently?

Testing Serialization Roundtrips

The most basic test: serialize a message, deserialize it, assert equality.

Python

import unittest
from google.protobuf import json_format
import user_pb2

class TestUserSerialization(unittest.TestCase):
    def test_serialize_deserialize_preserves_all_fields(self):
        original = user_pb2.User(
            user_id="user-123",
            name="Alice",
            email="alice@example.com",
            age=30,
            tags=["admin", "beta-tester"],
            metadata={"plan": "pro", "region": "us-east-1"},
        )
        
        # Serialize to binary
        serialized = original.SerializeToString()
        
        # Deserialize
        restored = user_pb2.User()
        restored.ParseFromString(serialized)
        
        self.assertEqual(restored.user_id, "user-123")
        self.assertEqual(restored.name, "Alice")
        self.assertEqual(restored.email, "alice@example.com")
        self.assertEqual(restored.age, 30)
        self.assertEqual(list(restored.tags), ["admin", "beta-tester"])
        self.assertEqual(dict(restored.metadata), {"plan": "pro", "region": "us-east-1"})
    
    def test_json_serialization_roundtrip(self):
        original = user_pb2.User(user_id="user-456", name="Bob")
        
        # To JSON
        json_str = json_format.MessageToJson(original)
        
        # From JSON
        restored = json_format.Parse(json_str, user_pb2.User())
        
        self.assertEqual(restored.user_id, "user-456")
        self.assertEqual(restored.name, "Bob")
    
    def test_empty_string_vs_unset_field(self):
        """Verify proto3 default value behavior."""
        msg = user_pb2.User()
        
        # In proto3, string fields default to empty string
        self.assertEqual(msg.name, "")
        self.assertFalse(msg.HasField("name"))  # Only works for optional fields

Go

package proto_test

import (
    "testing"
    
    "github.com/stretchr/testify/assert"
    "github.com/stretchr/testify/require"
    "google.golang.org/protobuf/proto"
    
    pb "github.com/your-org/user/pb"
)

func TestUserSerialization(t *testing.T) {
    original := &pb.User{
        UserId: "user-123",
        Name:   "Alice",
        Email:  "alice@example.com",
        Age:    30,
        Tags:   []string{"admin", "beta-tester"},
    }
    
    // Serialize
    data, err := proto.Marshal(original)
    require.NoError(t, err)
    assert.NotEmpty(t, data)
    
    // Deserialize
    restored := &pb.User{}
    err = proto.Unmarshal(data, restored)
    require.NoError(t, err)
    
    // Compare using proto.Equal (not reflect.DeepEqual)
    assert.True(t, proto.Equal(original, restored))
}

func TestProtoEqual_HandlesDefaultValues(t *testing.T) {
    // Two messages with different construction but same logical value
    msg1 := &pb.User{Name: "Alice", Age: 0} // Age explicitly 0
    msg2 := &pb.User{Name: "Alice"}          // Age unset (defaults to 0)
    
    // proto.Equal handles this correctly
    assert.True(t, proto.Equal(msg1, msg2))
    
    // reflect.DeepEqual would also be equal here, but proto.Equal is safer
    // for complex messages with nested types and maps
}

Java

import org.junit.jupiter.api.Test;
import static org.junit.jupiter.api.Assertions.*;
import com.google.protobuf.InvalidProtocolBufferException;
import com.example.UserProto.User;

class UserSerializationTest {
    
    @Test
    void serializeDeserialize_preservesAllFields() throws InvalidProtocolBufferException {
        User original = User.newBuilder()
            .setUserId("user-123")
            .setName("Alice")
            .setEmail("alice@example.com")
            .setAge(30)
            .addTags("admin")
            .addTags("beta-tester")
            .build();
        
        byte[] serialized = original.toByteArray();
        User restored = User.parseFrom(serialized);
        
        assertEquals(original, restored);
        assertEquals("user-123", restored.getUserId());
        assertEquals(2, restored.getTagsCount());
    }
    
    @Test
    void jsonSerialization_roundtrip() throws Exception {
        User original = User.newBuilder()
            .setUserId("user-456")
            .setName("Bob")
            .build();
        
        String json = JsonFormat.printer().print(original);
        
        User.Builder builder = User.newBuilder();
        JsonFormat.parser().merge(json, builder);
        User restored = builder.build();
        
        assertEquals(original, restored);
    }
}

Testing Backward Compatibility

Proto3 backward compatibility rules: adding new fields is safe; removing or changing field numbers is not.

def test_new_schema_reads_old_message(self):
    """
    Simulate: producer uses old schema (without 'phone' field),
    consumer uses new schema (with 'phone' field added).
    The consumer should read the message successfully, with phone defaulting.
    """
    # Old message — serialized without phone field
    old_message = user_v1_pb2.User(
        user_id="user-123",
        name="Alice",
    )
    serialized = old_message.SerializeToString()
    
    # New schema — includes optional phone field
    new_message = user_v2_pb2.User()
    new_message.ParseFromString(serialized)
    
    # Fields present in old schema are preserved
    self.assertEqual(new_message.user_id, "user-123")
    self.assertEqual(new_message.name, "Alice")
    
    # New field defaults to empty string (proto3 default)
    self.assertEqual(new_message.phone, "")

def test_old_schema_reads_new_message(self):
    """Forward compatibility: old reader ignores unknown fields."""
    new_message = user_v2_pb2.User(
        user_id="user-456",
        name="Bob",
        phone="+1-555-0100",  # field unknown to old schema
    )
    serialized = new_message.SerializeToString()
    
    old_message = user_v1_pb2.User()
    old_message.ParseFromString(serialized)
    
    # Known fields preserved
    self.assertEqual(old_message.user_id, "user-456")
    self.assertEqual(old_message.name, "Bob")
    # 'phone' is silently ignored by old schema — expected behavior

buf is the standard tool for Protobuf linting and breaking change detection. Add it to CI to catch compatibility issues before they reach production.

Install:

brew install bufbuild/buf/buf  # macOS
<span class="hljs-comment"># or download from https://github.com/bufbuild/buf/releases

buf.yaml:

version: v1
breaking:
  use:
    - FILE
lint:
  use:
    - DEFAULT

Check for breaking changes against main branch:

# Check current changes against the last committed state
buf breaking --against <span class="hljs-string">'.git#branch=main'

<span class="hljs-comment"># Check against a remote schema registry
buf breaking --against <span class="hljs-string">'buf.build/your-org/your-schemas'

Add to GitHub Actions:

- name: Protobuf lint
  uses: bufbuild/buf-lint-action@v1
  
- name: Check breaking changes
  uses: bufbuild/buf-breaking-action@v1
  with:
    against: 'https://github.com/${GITHUB_REPOSITORY}.git#branch=main'

Validating Proto Messages

Proto3 doesn't enforce validation rules by default. Use protoc-gen-validate (PGV) or protovalidate for field-level validation:

With protovalidate:

import "buf/validate/validate.proto";

message CreateUserRequest {
  string name = 1 [(buf.validate.field).string = {
    min_len: 1,
    max_len: 100,
  }];
  
  string email = 2 [(buf.validate.field).string.email = true];
  
  int32 age = 3 [(buf.validate.field).int32 = {
    gte: 0,
    lte: 150,
  }];
}

Test the validation:

from buf.validate import validate_pb2
from buf.validate.python import validate

def test_valid_user_request_passes_validation(self):
    request = user_pb2.CreateUserRequest(
        name="Alice",
        email="alice@example.com",
        age=30,
    )
    
    violations = validate.validate(request)
    self.assertEqual(len(violations), 0)

def test_invalid_email_fails_validation(self):
    request = user_pb2.CreateUserRequest(
        name="Alice",
        email="not-an-email",
        age=30,
    )
    
    violations = validate.validate(request)
    self.assertGreater(len(violations), 0)
    
    field_names = [v.field_path for v in violations]
    self.assertIn("email", field_names)

def test_empty_name_fails_validation(self):
    request = user_pb2.CreateUserRequest(
        name="",
        email="alice@example.com",
    )
    
    violations = validate.validate(request)
    self.assertTrue(any(v.field_path == "name" for v in violations))

Cross-Language Compatibility Testing

If your services span multiple languages, test that a message serialized in one language is readable in another:

# test_cross_language.py
# Strategy: serialize in Python, write bytes to file, read in Go test
import subprocess
import tempfile
import os

def test_python_serialized_message_readable_by_go(self):
    user = user_pb2.User(
        user_id="cross-lang-test",
        name="CrossLang User",
        email="cross@example.com",
    )
    
    with tempfile.NamedTemporaryFile(suffix='.bin', delete=False) as f:
        f.write(user.SerializeToString())
        temp_path = f.name
    
    try:
        # Run Go test that reads this file
        result = subprocess.run(
            ['go', 'test', '-run', 'TestReadPythonSerializedMessage',
             f'-test-file={temp_path}', './proto/...'],
            capture_output=True, text=True,
            cwd='/path/to/go-service'
        )
        self.assertEqual(result.returncode, 0, result.stdout + result.stderr)
    finally:
        os.unlink(temp_path)
// Go test that reads Python-serialized binary
func TestReadPythonSerializedMessage(t *testing.T) {
    filePath := flag.String("test-file", "", "path to protobuf binary")
    flag.Parse()
    
    if *filePath == "" {
        t.Skip("No test file provided")
    }
    
    data, err := os.ReadFile(*filePath)
    require.NoError(t, err)
    
    user := &pb.User{}
    err = proto.Unmarshal(data, user)
    require.NoError(t, err)
    
    assert.Equal(t, "cross-lang-test", user.UserId)
    assert.Equal(t, "CrossLang User", user.Name)
}

Testing Oneof Fields

def test_payment_method_oneof(self):
    # Only one payment method should be set
    order = order_pb2.Order(
        order_id="order-1",
        credit_card=order_pb2.CreditCard(
            number="4242424242424242",
            expiry="12/28",
        )
    )
    
    # Check which oneof is set
    self.assertEqual(order.WhichOneof('payment_method'), 'credit_card')
    self.assertEqual(order.credit_card.number, "4242424242424242")
    
    # Setting another oneof field clears the first
    order.bank_transfer.CopyFrom(order_pb2.BankTransfer(account="DE89..."))
    self.assertEqual(order.WhichOneof('payment_method'), 'bank_transfer')
    self.assertEqual(order.credit_card.number, "")  # cleared

Regression Testing for Schema Changes

Create a golden file test to catch unexpected schema changes:

import hashlib
import json

def test_user_proto_descriptor_unchanged(self):
    """Detect unintended proto schema changes."""
    descriptor = user_pb2.User.DESCRIPTOR
    
    schema_info = {
        'full_name': descriptor.full_name,
        'fields': [
            {
                'name': f.name,
                'number': f.number,
                'type': f.type,
                'label': f.label,
            }
            for f in descriptor.fields
        ]
    }
    
    schema_json = json.dumps(schema_info, sort_keys=True)
    schema_hash = hashlib.sha256(schema_json.encode()).hexdigest()
    
    # Update this hash intentionally when schema changes are approved
    EXPECTED_HASH = "abc123..."
    
    self.assertEqual(
        schema_hash, EXPECTED_HASH,
        "User proto schema changed unexpectedly. If intentional, "
        "update EXPECTED_HASH and verify backward compatibility."
    )

Monitoring Protobuf Services

gRPC services built on Protobuf schemas often expose metrics and health endpoints. HelpMeTest monitors these with 5-minute intervals:

curl -fsSL https://helpmetest.com/install | bash
helpmetest health <span class="hljs-string">"user-grpc-service" <span class="hljs-string">"5m"

Schema changes that break clients show up as service errors — monitoring catches these before users escalate.

Summary

Test Type What It Catches Tool
Serialization roundtrip Data loss in encode/decode unittest, testify
Backward compatibility New schema breaking old readers Python/Go roundtrip tests
Breaking change detection Removed fields, changed types buf CLI in CI
Field validation Invalid data bypassing service protovalidate
Cross-language compat Binary format consistency Shared binary files
Golden file tests Unintended schema drift Hash comparison

Protobuf schemas are contracts. Test them like contracts — with explicit compatibility assertions, not just "does it serialize."

Read more