ARM vs AMD Screenshots | DarshanPandya

Context

This issue came up right after I finished optimizing docker-browser. Once that was done, I created a PR to update Appwrite's stack with the optimized browser and wrote tests for it. That's when I discovered this fun little quirk.

I was working on screenshot tests for Appwrite, and everything worked perfectly on my Mac. Tests passed, screenshots looked identical, life was good.

Then CI happened 🥲.

diff

Failed asserting that two strings are identical.
--- Expected: '83bf638ab604c2e15dcf6b58baeac6c16dbb56a0dc227671f95a7478497b3df3'
+++ Actual:   '0a312012c4f6caec17f67898593148974e7450065c35b8f6ac59677d04386010'

The screenshots? IDENTICAL. At least to my eyes or any other human's. But the SHA-256 hashes? Completely different.

What's Happening

After a lot of debugging, I found the culprit: ARM64 and AMD64 render text differently by exactly ±1 RGB value on anti-aliased edges.

Yeah, you read that right: one RGB value.

Here's how we were comparing screenshots:

php

protected function assertSamePixels(string $expectedImagePath, string $actualImageBlob): void
{
    $expected = new \Imagick($expectedImagePath);
    $actual = new \Imagick();
    $actual->readImageBlob($actualImageBlob);
    
    // Normalize metadata
    foreach ([$expected, $actual] as $image) {
        $image->setImageFormat('PNG');
        $image->stripImage();
        $image->setOption('png:exclude-chunks', 'date,time,iCCP,sRGB,gAMA,cHRM');
    }
    
    // This requires EXACT byte-for-byte match
    $this->assertSame($expected->getImageSignature(), $actual->getImageSignature());
}

getImageSignature() generates a SHA-256 hash of the pixel data. Even one pixel off by one RGB value changes the entire hash. That's why tests failed despite screenshots looking identical.

The Technical Reason

ARM64 and AMD64 handle floating-point operations differently. When the graphics engine blends text edges with backgrounds, these architectural differences produce mathematically equivalent but numerically different results — typically off by exactly ±1 per RGB channel.

Good to know

This isn't a bug in either architecture. It's a known issue with cross-platform rendering — different CPUs, math libraries, and rasterizers can produce slightly different results even for identical inputs.

Proving It

The page I made changes to capture the screenshots had a simple, innocent looking emoji: 🎨

107 different pixels out of 921,600 total (0.012%)
Every difference: exactly ±1 on each RGB channel
All differences on anti-aliased edges only

Here's what the pixel comparison looked like:

Location	ARM64 RGB	AMD64 RGB	Delta
(392,542)	[27 26 26]	[26 25 25]	[1 1 1]
(393,542)	[80 59 42]	[79 58 41]	[1 1 1]
(397,542)	[143 100 60]	[142 99 59]	[1 1 1]

Every. Single. Difference. Exactly ±1.

How I Debugged This

To isolate the issue, I ran both browser containers side-by-side:

bash

# ARM64 browser on port 3000
docker run --platform linux/arm64 -p 3000:3000 appwrite/browser:0.3.2

# AMD64 browser on port 3001
docker run --platform linux/amd64 -p 3001:3000 appwrite/browser:0.3.2

Then captured identical screenshots from both:

bash

# get arm based screenshot
curl -X POST http://localhost:3000/v1/screenshots \
  -d '{"url": "http://host:8888/test.html", "theme": "dark"}' \
  --output arm64.png

# get amd based screenshot
curl -X POST http://localhost:3001/v1/screenshots \
  -d '{"url": "http://host:8888/test.html", "theme": "dark"}' \
  --output amd64.png

Finally, pixel-level analysis with Python - thanks to Claude Code 😉:

python

import numpy as np
from PIL import Image

arm64 = np.array(Image.open('arm64.png'))
amd64 = np.array(Image.open('amd64.png'))

# Find all different pixels
diff_mask = np.any(arm64 != amd64, axis=-1)
diff_coords = np.argwhere(diff_mask)

# Analyze delta magnitude
for y, x in diff_coords:
    delta = arm64[y, x] - amd64[y, x]
    print(f"({x},{y}): {delta}")  # always ±1 per channel

This confirmed that 100% of differences were exactly ±1 per channel — floating-point rounding differences, not rendering bugs.

ARM64

AMD64

Difference

Interestingly, light mode happens to round identically on both architectures, while dark mode creates higher contrast at edges and exposes the floating-point precision differences.

The Fixes

There are a few ways to handle this:

1. Simplify your test HTML

If your test doesn't need complex elements, remove them. I stripped out emojis, heavy text, and multiple fonts since the test was only validating theme modes anyway.

2. Use fuzzy comparison

If you need to test complex rendering, accept small pixel differences instead of demanding a byte-perfect match:

php

$result = $expected->compareImages($actual, Imagick::METRIC_MEANSQUAREERROR);
$this->assertLessThan(0.0025, $result[1]); // threshold for visual equivalence

This tolerates the ±1 RGB differences while still catching real rendering regressions.

Takeaways

Pixel-perfect screenshot testing across architectures can fail due to floating-point and rendering differences.
If you hit this, either use fuzzy comparison or test on the same architecture as your CI.
The ±1 RGB difference isn't a bug in your code — it's how different CPUs and graphics pipelines handle rendering.

Cross-platform visual testing needs tolerance, not perfection.