What happens when you put random data through a PNG decoder?

Skip to the results

Well, what happens when you treat random data as an image?

Consider this block of rainbow static, picked at random from the space of all possible 256x256 8-bit RGB images when you opened this page:

(Imagine there was a colored block of static here. If you can view this article in a graphical browser with JavaScript turned on it's going to be more fun.)

It contains 256×256 pixels × 3 channels/pixel × 8 bits/channel = 1,572,864 bits of random information, making for 21,572,864 possible images that could have been shown. I'm confident in calling it a "block of rainbow static", even though many of the 21,572,864 possibilities don't match that description, like these four:

Van Gogh's The Starry Night Conway's Game of Life Tiny Rocket Much doge, very example, wow

If sampling an image uniformly at random from the space of all possible images almost always gives us rainbow static, it follows that the vast majority of possible images are blocks of rainbow static. The images that aren't rainbow static - all the meaningful images that anyone might ever care to create or save or share with their friends - make up only an unimaginably tiny fraction of the whole space of possibilities.

The 1,572,864 bits that it takes to store a 256x256 image work out to 192KiB. 256x256 is a low resolution by modern standards. There are a lot of 1920x1080 images running around nowadays; an image at that resolution is going to be 5.9MiB. These files are getting impractically large.

Are you working with images sampled uniformly from the space of all possible images? Meaningless blocks of static, as it were? I'm afraid that you're going to have to deal with the large file sizes. I'm terribly sorry. You can stop reading now.

...

You're all still here?

Excellent.

Data compression

If we're working with images picked uniformly at random, it's mathematically impossible to do better than 192KiB per image, but in practice people don't pick images uniformly at random. Some images (meaningful images) are more likely to be picked than others (rainbow static). It should be possible to develop a variable-length image encoding scheme that takes fewer than 1,572,864 bits to store the more common images at the cost of taking more than 1,572,864 bits to store some of the less common images.

Someone has already done the developing for us. Behold, the PNG spec.

How does PNG work?

PNG took the general purpose DEFLATE compression algorithm and added a preprocessing step to help DEFLATE see structure in images that it couldn't see otherwise. Encoding roughly consists of three steps. To summarize, working backwards:

Taken together, these three steps find and exploit structure in intelligible images to give them shorter encodings. In exchange, the format adds a tiny amount of overhead in images where it can't find any structure, if only to say that no fancy encoding tricks were used.


By using PNG, we're saying that we expect to encounter the kind of images that PNG compresses well. (That's usually a safe assumption, but it wouldn't be true if we were working with pictures of unstructured static.) If you tilt your head and squint, PNG defines a probability distribution over the space of possible images. Images that are given shorter encodings are considered more likely to come up than images that are given longer encodings. We set out to save storage space and bandwidth, but along the way we accidentally characterized some of what it is about an image that makes it likely to come up in practice, which is to say, what makes it something other than a block of meaningless static.

Sampling from the distribution that PNG defines ought to produce more intelligible images than sampling from a uniform distribution. How do we sample from the PNG distribution? Well, optimally compressed data looks like random static, so we ought to be able to generate some random bits, assume that they're an optimally compressed image, and feed them through the PNG decoder.

Unfortunately most bit strings aren't well-formed PNG files, so we need to make a few small tweaks to the format to get a decoder that can process random bits without choking. You probably don't care about the details.

Yes, I do care about the details, tell me!

With those hopefully acceptable departures from the real PNG format, we finally have something that can run on random data without choking.


Demo

In conclusion, I am proud to present a freshly generated x pixel image, using

and

bytes of image data inflated from bytes of compressed data

For UI source code, use your browser's "view source" button. Source code for the PNG decoder WASM module lives on GitHub.