"Having developed a blazing fast DCT implementation, Huffman then became a bottleneck. We innovated on that portion with tight hand-tuned assembly code that leverages special features of the ARM processor instruction set to make it as fast as possible.”
I understood he first optimized the algorithm, then tuned it to be even faster on ARM?
I read it as saying that, after optimizing the DCT (lossy) compression as much as he could, he focused on the Huffman (lossless) encoding phase which gave him further optimization opportunities.
I understood he first optimized the algorithm, then tuned it to be even faster on ARM?