Understanding and writing a JPEG decoder in Python

yasoob · on July 14, 2020

Hi everyone! OP here.

Why write another article on JPEG when there are already hundreds of articles on the internet? Well, normally when you read articles on JPEG, the author just gives you details about what the format looks like. You don’t implement any code to do the actual decompression and decoding. Even if you do write code, it is in C/C++ and not accessible to a wide group of people. I tried to change that through this article. I give you a guided tour of JPEG encoding/decoding process and show you how it can be implemented in Python3.

I mainly focus on decoding baseline encoded JPEG images.

userbinator · on July 14, 2020

It's great to see more people complete the "JPEG decoder challenge" --- and document it too. Have you read the official T.81 spec[1]? It is one of the easier standards to read, and it has flowcharts of all the algorithms which mean you can implement without really understanding the theory.

As you mentioned there are lots of other articles about writing JPEG decoders, as well as for the other popular image formats GIF and PNG. What I haven't seen much of if at all, however, are articles about video decoding; if you're interested to try another challenge, I recommend a decoder for H.261: https://www.itu.int/rec/T-REC-H.261-199303-I/en

(I've written one --- in C --- and it is actually far simpler than JPEG in many ways, although it shares many similarities. A H.261 decoder in Python, with associated article, would definitely be very interesting to see, and AFAIK also a world-first!)

[1] https://www.w3.org/Graphics/JPEG/itu-t81.pdf

yasoob · on July 15, 2020

Thanks for the H.261 decoder suggestion! I will actually seriously consider it because I am personally curious how video decoders work as well. I will make sure to document the process if I end up working on it.

jart · on July 15, 2020

If you understand JPEG then you already understand the important bits of MPEG. It's also fun to study scaling and colorspaces. See https://justine.storage.googleapis.com/printimage.html and https://justine.storage.googleapis.com/printvideo.html

oumua_don17 · on July 15, 2020

And have more fun by doing the encoding/decoding on GPU [1]

[1] https://github.com/CESNET/GPUJPEG

enriquto · on July 14, 2020

> Even if you do write code, it is in C/C++ and not accessible to a wide group of people.

To be honest, writing a jpeg decoder in C seems easier and more natural than doing it in python.

userbinator · on July 14, 2020

As someone who has done it in C, I would both agree and disagree --- reading the file format will definitely be easier since C lets you pick bits/bytes/words/etc. off the stream directly, but on the other hand the high-level structures (looping, etc.) would probably be easier with Python.

Overall, seeing as OP's Python implementation is less than 300 LoC while my C implementation was closer to 750, the Python might be easier. Then again, I followed the algorithms/flowcharts in the spec, which do a faster table-based Huffman decoding than the bit-by-bit of this Python version, and also implemented arbitrary(!) chroma subsampling as the spec allows, so it's not a direct comparison.

a_t48 · on July 15, 2020

Python also lets you read bytes off the stream directly.

eru · on July 15, 2020

The main benefit I see here of Python over C is when your program doesn't work (yet): Python has nicer error handling built straight into the language.

If you just want to present a working implementation, both C and Python can express readable versions.

smabie · on July 15, 2020

Using Haskell and a parser combinator would probably be the easiest

enriquto · on July 15, 2020

Easier for whom? Most programmers would not be able to follow that code. On the other hand, a really simple C or Python algorithm is more or less universally readable.

eru · on July 15, 2020

It's not so much about the algorithm, but the parsing itself.

Parser combinators give you one of the most straightforward ways to express parsers.

You are right, that the language you express your parser combinators in vs the language you express an alternative solution in also will have an impact.

You can in theory express parser combinators in Python. But it's gonna be a bit ugly, because Python has relatively clunky syntax for functions.

Some people have tried writer parser combinator libraries for eg JavaScript. See https://github.com/GregRos/parjs

I just found a parser combinator library for Python https://pypi.org/project/parsita/ But it looks like they have to abuse some Python magic to make the code readable.

enriquto · on July 15, 2020

Alright, but jpeg decoding is nearly all algorithm, there's almost no parsing. Do you have a particular haskell jpeg decoder in mind that uses parser combinators?

eru · on July 15, 2020

No. But the author of the original comment mentioning Haskell https://news.ycombinator.com/item?id=23841865 might have had?

pansa2 · on July 14, 2020

If you’re trying to describe a file format or algorithm in the most accessible way, wouldn’t pseudocode be the best thing to use?

If that’s not an option (because you want the code to be executable), I think Python is the closest popular language to pseudocode.

kerkeslager · on July 15, 2020

I think being executable is really key, because one way I check my understanding of code is to make modifications to the code and see if they do what I expect them to do.

pansa2 · on July 15, 2020

IMO the ideal setup would be for an article to contain pseudocode and to have supplementary executable code. This would allow the article to explain concepts without boilerplate and unnecessary details, but also ensure that those details are available for readers who wish to investigate further.

I still think it's an advantage to write the executable code as close to pseudocode as possible.

There is further discussion of pseudocode vs executable code at https://academia.stackexchange.com/q/140986.

kerkeslager · on July 15, 2020

So, what elements of Python do you consider boilerplate?

I say this not as someone defending Python, but as someone writing a programming language.

pansa2 · on July 15, 2020

It’s not so much that Python requires boilerplate - all executable code does. For example, the article includes code to read the input data from a file, which is not relevant to the algorithm and could be omitted from a pseudocode version.

Similarly, executable code requires unnecessary detail, like the article’s `Stream` class. In pseudocode this could be replaced by simply saying “get the next N bits”.

kerkeslager · on July 15, 2020

I don't think we're defining "boilerplate" the same way, then. Reading from a file isn't something that every executable does, so when you add code for it, it's because you want your program to do it. Boilerplate is things that have to be added to every program regardless of whether they communicate anything that differentiates your program: main(), for example, or class in Java when you're not actually trying to define a class that will be instantiated by later code.

smabie · on July 15, 2020

Lack of function currying and anonymous function arguments. Also having to explicitly convert a zip, map, etc to a list. Also the lambda keyword.

kerkeslager · on July 15, 2020

Hmm, I think you're using the word "boilerplate" differently than it is usually used. The changes you're proposing are functional changes that can't be transparently applied by a compiler/interpreter without changing behavior.

Currying is a subset of partial application, which Python supports via the standard library. Universal currying would be terser in some situations, but it conflicts with Python's args/kwargs implementation. Frankly, I don't think currying is a good thing even in languages such as Haskell where it doesn't conflict, as it makes it much less clear what the flow is when functions can be called with different signatures.

Anonymous function arguments, I'm just not sure what you're talking about there.

Explicitly converting higher order function results to lists is not boilerplate, it's a meaningful difference. The explicit conversion allows you to determine when code is actually executed, instead of having it executed eagerly. Universal lazy application like in Haskell has broader implications which are generally negative, particularly with regards to performance and being able to reason about execution: production Haskell often turns this off, and for good reason.

The lambda keyword is arguably boilerplate, but I'm skeptical of whatever alternate syntax you're proposing there. C#-style anonymous functions wouldn't work with Python syntax. If you're proposing people use the λ symbol, I'm going to frankly say that's a bad idea, because most developers' keyboards don't have that symbol. Like it or not, English with a latin keyboard is the lingua franca of software--if you want to target another language that would be reasonable, but that's a big choice with broader implications.

smabie · on July 15, 2020

How about some variation of: (x, y) -> x*y for lambdas?

Scala's method for dealing with laziness through views and iterators works really well.

kerkeslager · on July 15, 2020

> How about some variation of: (x, y) -> x*y for lambdas?

Why? What problem are you trying to solve here?

That syntax is inconsistent with other syntax in Python and is harder to type (lambda gets tab-completed after typing two home-row characters).

> Scala's method for dealing with laziness through views and iterators works really well.

If you're okay with generic iterables in Scala, I'm unsure what your objection to iter types in Python is.

smabie · on July 15, 2020

lambda's take up too much space and are really ugly. They're visual noise and serve no purpose. Besides, the name 'lambda' is bad, and intuitive. There's a reason why no new languages have chosen 'lambda' for anonymous functions. Python:

map, filter, zip, etc should produce concrete values, not iterators. It's annoying to constantly turn iterators into lists. Like if scala's map or fold took in a list and produced an iterator, that would be super annoying. There's no point. If you want an iterator, turn the collection into an iterator. There's no reason for a function to take a list and return an iterator, unless that function is 'toIter.'

Ugly python:

    list(map(lambda x: 2*x, [1, 2, 3, 4]))

Julia:

   map(x -> 2*x, [1, 2, 3, 4])

Or even better Julia:

   2*.[1, 2, 3, 4]

I dunno, it's like Python tries its best to make code super verbose and ugly. I just don't understand why it is the way it is. It doesn't make sense.

kerkeslager · on July 16, 2020

> There's a reason why no new languages have chosen 'lambda' for anonymous functions.

Beware of people who say "there's a reason" and then don't say what that reason is.

I suspect the reason in this case is that languages which use C-ish syntax it would make no senes to have lambda as a keyword. Python doesn't use C-ish syntax, so that reason doesn't apply.

> It's annoying to constantly turn iterators into lists.

So don't. Why are you constantly turning iterators into lists?

> There's no reason for a function to take a list and return an iterator,

Here are 5 reasons:

1. There are many cases where the iterator will never be evaluated, meaning you save O(n) time.

2. Even in cases where you evaluate the iterator, you'll often have performed many transformations upon that iterator. Forcing 5 transformations all at once means you have 1 loop instead of 5, meaning again you save O(n) time.

3. It encourages writing functional code, which doesn't depend on order of execution. (This is why Simon Peyton Jones says Haskell has lazy evaluation: it keeps them honest about side effects).

4. Explicit is better than implicit. Guessing that a user will want a list is a reasonable guess, but you're doing a lot of work based on that guess, and some percentage of the time you'll be wrong. It's better to do the minimum work necessary and let the user explicitly determine what type of result they want.

5. Consistent extensibility: map, zip, etc. take lots of iterable types. You could attempt to return the same iterable type as you receive, but then there isn't an easy way for users to use zip on user-defined types. It's better to call the type's __iter__ function and then just use that, as this allows users to define higher-order functions on their own types.

smabie · on July 16, 2020

Look, functors (and therefore monads as well) are closed under fmap. That should be the end of the story.

kerkeslager · on July 16, 2020

It's pretty bizarre to assert that structures made up by some mathematician in a void should be the basis of a programming language intended to solve real-world problems, especially in response to giving real-world reasons for why it is the way it is.

voltagex_ · on July 14, 2020

I really really like this - especially with all the references you've listed. I think more programmers should have a go at working with various file formats. Well done on this article.

yasoob · on July 14, 2020

Thanks! And I completely agree with the idea that more programmers should have a go at working with various file formats. I learned so much about binary file layouts through this one project.

Before writing this article I had no idea that JPEGs use Huffman tables. Now all those Huffman table lectures from college algos classes make sense. I was never told in that class that this is a major use case for Huffman tables. Maybe I would have shown more interest then if I knew this fact.

bhhaskin · on July 14, 2020

This is a cool article. I have always wondered the internals of JPEG and this has lots of details. Just as an FYI it should be a "Show HN:".

Edit: NVM! I didn't realize you didn't post this.

rohan1024 · on July 15, 2020

This. I know C, C++ implementations are efficient but to see python implementation of this makes it lot easier to understand.

> You don’t implement any code to do the actual decompression and decoding

This is also true in case of face recognition. Nobody tells you internals and just tells you to use tools like OpenCV. Maybe it's too complicated but a post like OP's for face recognition would be a treat!

mehh · on July 15, 2020

Very cool!

It's not that far off going for an MPEG-2 video decoder next, highly recommend (a long time since I did, so take with a pinch of salt). That would be great to see in python to make that realm more accessible.

pjmlp · on July 15, 2020

Very good article, congratulations on doing it and choosing a more approachable language as well.

cgrealy · on July 15, 2020

> Even if you do write code, it is in C/C++ and not accessible to a wide group of people.

Eh??

There are literally millions of C++ programmers and high quality C++ compilers are available on almost every platform for free.

How is that “not accessible”?

smcl · on July 15, 2020

Equally there are millions who don't use C or C++. I think it’s pretty fair to describe this as a "wide group of people" who would consider C or C++ inaccessible.

cgrealy · on July 15, 2020

That’s a nonsense argument.

For any given programming language, the number of people not using it outweighs the number of people using it by several orders of magnitude.

It’s hard to get accurate numbers, but a quick google estimates there is roughly double the number of Python programmers than C++ programmers. I find it very difficult to imagine there isn’t some overlap there and even more difficult to imagine a competent Python programmer looking at C++ code and being completely unable to read it.

smcl · on July 15, 2020

There is definitely a substantial number of people who know Python and have had little to no exposure to C or C++. To the point where they'd really rather see something like this presented in a language they're familiar with and that they can try out in an environment they've already got setup.

There's no harm in having many articles on a given subject (think how many monad posts there are) - if you dislike one you don't have to read it, there's plenty of other interesting stuff out there :-)

cgrealy · on July 15, 2020

I don't have a problem with someone writing an article on whatever for whatever language. Go nuts.

I just think saying "articles written in C++ are inaccessible" is kinda dumb.

pratio · on July 15, 2020

This is a fantastic writeup, love the attention to detail and thank you so much for not posting it on medium.

greggman3 · on July 15, 2020

This is great!

I first learned the general ideas from this Computerphile video

https://www.youtube.com/watch?v=n_uNPbdenRs

While we're on it does anyone know of articles/videos that go into similar detail for HEVC?

Ahmd72 · on July 15, 2020

Thanks for making this, a couple of months ago I was working on a jpeg project and needed to understand how it works. I was searching wildly for a tutorial with a python implementation, unfortunately never found one. I found this one in C++ though, it goes a lot more in depth https://www.youtube.com/playlist?list=PLpsTn9TA_Q8VMDyOPrDKm...

iAmAPencilYo · on July 14, 2020

Thank you for this; lots of details, references and detailed diagrams!

One small note - the first image you mention that we'll need to reference is a bit small to read. Zooming in on the page does not zoom it in due to format of the page. I had to right click -> View image, in a separate tab in order to be able to zoom in enough to read the righthand side of it. edit: this is on the latest version of Firefox

jonny789 · on July 15, 2020

Thanks a lot for your article. I think,for me it will be a perfect start for understanding JPEG decoders & making one too. :)

subins2000 · on July 14, 2020

This was a really good read to understand JPEG with the right videos and images included. Thanks!

yasoob · on July 14, 2020

Glad to hear that. :) IMO some content is just better suited as visualizations in the form of videos and images.

vmilner · on July 16, 2020

It always puzzles me that the quantisation matrices don't have symmetry along the diagonal. I guess it suggests the distribution of vertical and horizontal lines in typical pictures differs.