The hard part here is figuring out which pixel to diff to. You would almost certainly have to look at all 64 pixels which would be a lot slower. That said, you can probably get noticably better compression from it.
I think O(log(M)) where M is the number of remembered pixels. Index them by their (R,G,B,A) quad and do a range lookup sequentially by pixel, since any valid pixels would be within the same range as earlier potential matches.