Of course it's a herculean task to write a library with that many features. But I don't think that's the issue, it's more that the devs of TF can't possibly optimize for every use case. For me, I knew what kind of ops I needed, so I could focus on getting those as fast as possible.