Testing Anthropic Computer Use API: UI Actions, Screenshot Assertions, and Agent Loops
Anthropic's Computer Use API lets Claude control a computer — clicking, typing, scrolling, and taking screenshots — to complete tasks autonomously. Testing Computer Use integrations is uniquely challenging because you're testing an agentic system that interacts with a real or simulated UI. This guide covers testing strategies for Computer Use agent loops, tool call validation, and screenshot-based assertions.
Computer Use Architecture Overview
A Computer Use agent loop works like this:
- Send a task prompt to Claude with computer tools enabled
- Claude returns a
tool_useblock (e.g.,screenshot,click,type) - Your code executes the tool action on the real or virtual desktop
- You send the tool result (screenshot, success status) back to Claude
- Repeat until Claude responds with no tool calls (task complete)
Testing this system means testing each layer: the tool parsing, the tool execution, the loop control logic, and the agent's decision quality.
Mocking the Anthropic SDK
import Anthropic from '@anthropic-ai/sdk';
jest.mock('@anthropic-ai/sdk');
const mockCreate = jest.fn();
(Anthropic as jest.Mock).mockImplementation(() => ({
beta: {
messages: {
create: mockCreate,
},
},
}));
// Helper to create mock Computer Use responses
function createComputerUseResponse(toolUses: any[]) {
return {
id: 'msg_test',
type: 'message',
role: 'assistant',
content: toolUses.map(tool => ({
type: 'tool_use',
id: `toolu_${Math.random().toString(36).slice(2)}`,
name: tool.name,
input: tool.input,
})),
model: 'claude-opus-4-6',
stop_reason: toolUses.length > 0 ? 'tool_use' : 'end_turn',
usage: { input_tokens: 500, output_tokens: 100 },
};
}
function createFinalResponse(text: string) {
return {
id: 'msg_final',
type: 'message',
role: 'assistant',
content: [{ type: 'text', text }],
model: 'claude-opus-4-6',
stop_reason: 'end_turn',
usage: { input_tokens: 600, output_tokens: 50 },
};
}Testing Tool Call Parsing
Verify that your agent correctly parses Claude's tool call responses:
import { ComputerUseAgent } from '../agents/computer-use-agent';
describe('Tool call parsing', () => {
test('parses screenshot tool call correctly', () => {
const response = createComputerUseResponse([{
name: 'computer',
input: { action: 'screenshot' },
}]);
const agent = new ComputerUseAgent(new Anthropic('test-key'));
const toolCalls = agent.parseToolCalls(response);
expect(toolCalls).toHaveLength(1);
expect(toolCalls[0].name).toBe('computer');
expect(toolCalls[0].input.action).toBe('screenshot');
});
test('parses click action with coordinates', () => {
const response = createComputerUseResponse([{
name: 'computer',
input: {
action: 'left_click',
coordinate: [640, 480],
},
}]);
const agent = new ComputerUseAgent(new Anthropic('test-key'));
const toolCalls = agent.parseToolCalls(response);
expect(toolCalls[0].input.action).toBe('left_click');
expect(toolCalls[0].input.coordinate).toEqual([640, 480]);
});
test('parses type action with text', () => {
const response = createComputerUseResponse([{
name: 'computer',
input: {
action: 'type',
text: 'Hello, World!',
},
}]);
const agent = new ComputerUseAgent(new Anthropic('test-key'));
const toolCalls = agent.parseToolCalls(response);
expect(toolCalls[0].input.action).toBe('type');
expect(toolCalls[0].input.text).toBe('Hello, World!');
});
test('identifies end_turn as task completion', () => {
const finalResponse = createFinalResponse('Task completed successfully.');
const agent = new ComputerUseAgent(new Anthropic('test-key'));
expect(agent.isTaskComplete(finalResponse)).toBe(true);
expect(agent.isTaskComplete(createComputerUseResponse([{ name: 'computer', input: { action: 'screenshot' } }]))).toBe(false);
});
});Testing the Agent Loop
The agent loop is the critical orchestration layer:
describe('Computer Use agent loop', () => {
let mockScreenshotFn: jest.Mock;
let mockClickFn: jest.Mock;
let mockTypeFn: jest.Mock;
beforeEach(() => {
mockScreenshotFn = jest.fn().mockResolvedValue(Buffer.from('fake-screenshot-data'));
mockClickFn = jest.fn().mockResolvedValue({ success: true });
mockTypeFn = jest.fn().mockResolvedValue({ success: true });
});
test('completes task after screenshot and click sequence', async () => {
// Simulate: screenshot → click → screenshot → task complete
mockCreate
.mockResolvedValueOnce(createComputerUseResponse([{
name: 'computer',
input: { action: 'screenshot' },
}]))
.mockResolvedValueOnce(createComputerUseResponse([{
name: 'computer',
input: { action: 'left_click', coordinate: [100, 200] },
}]))
.mockResolvedValueOnce(createFinalResponse('I clicked the button successfully.'));
const agent = new ComputerUseAgent(new Anthropic('test-key'), {
screenshot: mockScreenshotFn,
click: mockClickFn,
type: mockTypeFn,
});
const result = await agent.runTask('Click the Submit button');
expect(mockCreate).toHaveBeenCalledTimes(3);
expect(mockScreenshotFn).toHaveBeenCalledTimes(1);
expect(mockClickFn).toHaveBeenCalledWith(100, 200, 'left');
expect(result.success).toBe(true);
expect(result.finalMessage).toBe('I clicked the button successfully.');
});
test('handles max iterations limit to prevent infinite loops', async () => {
// Always return screenshot — never complete the task
mockCreate.mockResolvedValue(createComputerUseResponse([{
name: 'computer',
input: { action: 'screenshot' },
}]));
mockScreenshotFn.mockResolvedValue(Buffer.from('same-screenshot'));
const agent = new ComputerUseAgent(new Anthropic('test-key'), {
screenshot: mockScreenshotFn,
click: mockClickFn,
type: mockTypeFn,
}, { maxIterations: 5 });
const result = await agent.runTask('Do something impossible');
expect(result.success).toBe(false);
expect(result.error).toContain('max iterations');
expect(mockCreate).toHaveBeenCalledTimes(5);
});
test('stops loop when consecutive screenshots are identical', async () => {
const identicalScreenshot = Buffer.from('identical-data');
mockScreenshotFn.mockResolvedValue(identicalScreenshot);
mockCreate.mockResolvedValue(createComputerUseResponse([{
name: 'computer',
input: { action: 'screenshot' },
}]));
const agent = new ComputerUseAgent(new Anthropic('test-key'), {
screenshot: mockScreenshotFn,
click: mockClickFn,
type: mockTypeFn,
}, { stuckDetectionThreshold: 3 });
const result = await agent.runTask('Navigate somewhere');
expect(result.success).toBe(false);
expect(result.error).toContain('stuck');
});
test('retries on tool execution failure', async () => {
mockCreate
.mockResolvedValueOnce(createComputerUseResponse([{
name: 'computer',
input: { action: 'left_click', coordinate: [100, 200] },
}]))
.mockResolvedValueOnce(createFinalResponse('Done.'));
// First click fails, second succeeds
mockClickFn
.mockRejectedValueOnce(new Error('Click failed: element not found'))
.mockResolvedValueOnce({ success: true });
const agent = new ComputerUseAgent(new Anthropic('test-key'), {
screenshot: mockScreenshotFn,
click: mockClickFn,
type: mockTypeFn,
}, { retryOnError: true });
const result = await agent.runTask('Click something');
// Should have retried the click
expect(mockClickFn).toHaveBeenCalledTimes(2);
expect(result.success).toBe(true);
});
});Testing Screenshot Assertions
Computer Use's screenshot-based interactions need validation that the agent is looking at what you expect:
import { compareImages, PixelDiff } from '../utils/image-comparison';
describe('Screenshot assertion helpers', () => {
test('detects when target element appears in screenshot', async () => {
const testScreenshot = Buffer.from('fake-screenshot');
// Mock image comparison
const mockFindElement = jest.fn().mockResolvedValue({
found: true,
boundingBox: { x: 100, y: 200, width: 150, height: 50 },
confidence: 0.95,
});
const result = await mockFindElement(testScreenshot, 'submit-button-template.png');
expect(result.found).toBe(true);
expect(result.confidence).toBeGreaterThan(0.9);
expect(result.boundingBox.x).toBe(100);
});
test('validates that Claude clicks within expected region', async () => {
const clickCoordinate = [125, 225]; // Should be inside bounding box [100, 200, 150, 50]
const expectedRegion = { x: 100, y: 200, width: 150, height: 50 };
function isWithinRegion(coord: number[], region: typeof expectedRegion): boolean {
return coord[0] >= region.x &&
coord[0] <= region.x + region.width &&
coord[1] >= region.y &&
coord[1] <= region.y + region.height;
}
expect(isWithinRegion(clickCoordinate, expectedRegion)).toBe(true);
expect(isWithinRegion([500, 500], expectedRegion)).toBe(false);
});
});Testing Tool Result Construction
When you return tool results to Claude, they must be in the correct format:
describe('Tool result construction', () => {
test('builds screenshot tool result with correct base64', async () => {
const screenshotBuffer = Buffer.from('fake-png-data');
const toolCallId = 'toolu_test123';
const agent = new ComputerUseAgent(new Anthropic('test-key'), {
screenshot: jest.fn().mockResolvedValue(screenshotBuffer),
click: jest.fn(),
type: jest.fn(),
});
const toolResult = await agent.executeToolAndBuildResult({
id: toolCallId,
name: 'computer',
input: { action: 'screenshot' },
});
expect(toolResult).toEqual({
type: 'tool_result',
tool_use_id: toolCallId,
content: [{
type: 'image',
source: {
type: 'base64',
media_type: 'image/png',
data: screenshotBuffer.toString('base64'),
},
}],
});
});
test('builds action result for successful click', async () => {
const agent = new ComputerUseAgent(new Anthropic('test-key'), {
screenshot: jest.fn(),
click: jest.fn().mockResolvedValue({ success: true }),
type: jest.fn(),
});
const toolResult = await agent.executeToolAndBuildResult({
id: 'toolu_click',
name: 'computer',
input: { action: 'left_click', coordinate: [100, 200] },
});
expect(toolResult).toEqual({
type: 'tool_result',
tool_use_id: 'toolu_click',
content: [{ type: 'text', text: 'Action completed successfully.' }],
});
});
test('builds error result for failed action', async () => {
const agent = new ComputerUseAgent(new Anthropic('test-key'), {
screenshot: jest.fn(),
click: jest.fn().mockRejectedValue(new Error('Element not found')),
type: jest.fn(),
});
const toolResult = await agent.executeToolAndBuildResult({
id: 'toolu_fail',
name: 'computer',
input: { action: 'left_click', coordinate: [999, 999] },
});
expect(toolResult.content[0].text).toContain('Error: Element not found');
});
});Agent Loop Reliability Testing
Test the overall reliability of your Computer Use integration:
describe('Agent reliability', () => {
test('handles API rate limit errors with exponential backoff', async () => {
const rateLimitError = new Anthropic.RateLimitError(
429,
{ error: { type: 'rate_limit_error', message: 'Rate limit exceeded' } },
'Rate limit exceeded',
new Headers()
);
mockCreate
.mockRejectedValueOnce(rateLimitError)
.mockRejectedValueOnce(rateLimitError)
.mockResolvedValueOnce(createFinalResponse('Done after retry.'));
const agent = new ComputerUseAgent(new Anthropic('test-key'), {
screenshot: jest.fn().mockResolvedValue(Buffer.from('ss')),
click: jest.fn(),
type: jest.fn(),
}, { retryOnRateLimit: true, maxRetries: 3 });
const result = await agent.runTask('Test task');
expect(result.success).toBe(true);
expect(mockCreate).toHaveBeenCalledTimes(3);
});
test('logs all tool calls for audit trail', async () => {
const auditLog: any[] = [];
mockCreate
.mockResolvedValueOnce(createComputerUseResponse([{
name: 'computer',
input: { action: 'screenshot' },
}]))
.mockResolvedValueOnce(createFinalResponse('Done.'));
const agent = new ComputerUseAgent(new Anthropic('test-key'), {
screenshot: jest.fn().mockResolvedValue(Buffer.from('ss')),
click: jest.fn(),
type: jest.fn(),
}, {
onToolCall: (call) => auditLog.push({ ...call, timestamp: Date.now() }),
});
await agent.runTask('Log all actions');
expect(auditLog).toHaveLength(1);
expect(auditLog[0].input.action).toBe('screenshot');
expect(auditLog[0].timestamp).toBeDefined();
});
});E2E Testing with HelpMeTest
Computer Use agents need monitoring in production to catch failures in real UI interactions. HelpMeTest can test your agent wrapper:
Navigate to https://your-app.com/agent-demo
Type "Search for 'best testing tools' and click the first result" in the task input
Click Run Agent
Verify agent starts executing within 5 seconds
Verify agent completes the task within 60 seconds
Verify a success message appears
Verify the agent log shows at least 3 tool callsThis verifies that your Computer Use agent integration, UI, and backend are all working together correctly in production.
Summary
Testing Computer Use API integrations requires:
- Tool call parsing tests — verify screenshot, click, type, and scroll action parsing
- Agent loop tests — completion detection, max iteration limits, stuck detection
- Tool execution tests — success/failure handling, retry logic
- Tool result construction tests — correct base64 encoding, format validation
- Reliability tests — rate limit backoff, audit logging, error propagation
- E2E monitoring — HelpMeTest for production agent loop health
The key insight: test the loop control logic and tool parsing with mocks, then use integration tests against a real sandboxed desktop to verify the actual click/type behavior.