I made my own encoder and experimented with variable block size (with dynamic programming) plus a massive amount of brute-force search for LPC parameters. But the amount of compression gain I could achieve over libflac was small and not worth the hours of encoding time. https://www.nayuki.io/page/benchmark-of-nayukis-flac-encoder
Sometimes encoding time doesn't matter. Archiving is done once in a lifetime. I encode everything with "--lax -8Vepl32" which takes ages but since it runs as a low priority background task I don't care.
Also, you can use heuristics to choose the right blocksize instead of brute-forcing it. If the algorithm is clever enough you might achieve slightly better results at almost zero costs.
I think "If the algorithm is clever enough" is a big weasel phrase just like https://wiki.c2.com/?SufficientlySmartCompiler . Having studied and written many algorithms, I have to conservatively assume that a clever algorithm doesn't exist unless it is proven by construction.