I'm a professional scientist, so let me give my two cents on this matter. Being able to compare your work against SOTA (state of the art) is pretty critical in academic publications. If everyone else in your area uses framework X, it makes a lot of sense for you to do it too. For the last few years, Pytorch has been king for the topics I care about.
However .. one area where Tensorflow shined was the static graph. As our models get even more intensive and needs different parts to execute in parallel, we are seeing some challenges in PyTorch's execution model. For example:
It appears to me that high performance model execution is a bit tricky if you want to do lots of things in parallels. TorchServe also seems quite simple compared to offerings from Tensorflow. So in summary, I think Tensorflow still has some features unmatched by others. It really depends on what you are doing.
I think Tensorflow made a bad move in academia by being so damn difficult to use on their earlier versions. Sure, their performance was always better than PyTorch's, but when you are an overworked PhD student you care less about your code being efficient and more about your code working at all.
Word got around that debugging PyTorch was relatively painless, those earlier models made it into publications, and now here we are.
Not even remotely true. Or rather, our experiences differ greatly.
The #1 problem with PyTorch is that it’s great if you want to use one videocard for training. Facebook has completely failed to support research scientists that want to do more than this.
It’s no secret that I’m a jax fanboy. But I drink the koolaid because it tastes better than anyone else’s. PyTorch is gonna have a rude wake up call in about… oh, four years. They’ll wake up and hear everyone else comparing them to tensorflow, and it won’t be for the rosy reasons they currently enjoy. PyTorch devs are living in the dark ages without even realizing how much better it is when you have actual control over which parts of your program are JITed, along with an actual execution graph that you can walk and macroexpand lisp-style.
https://jax.readthedocs.io/en/latest/autodidax.html should be required reading for every ML dev, and I can hardly get anyone to look at it. Sometimes I wonder if people just don’t see the steamroller coming for PyTorch. Probably — jax still reads to outsiders as a toy.
Jax might be faster than Pytorch, I don’t know. I’m talking about TF. When I switched from TF to Pytorch 3 years ago, I got no slowdown on any of computer vision models at the time. And I remember looking at a couple of independent benchmarks which also showed them to be roughly the same in speed.
It really depends on what your model is doing. For a time, sequence models were easier to do with Pytorch than TF (due to control flow). On the efficiency side, for vanilla CV models, I also did not observe major differences last time I looked, but when I started to do lots of things in parallel, multi-gpu training, heavy data augmentation, I think TF has some well-engineered capabilities that are not matched yet.
> The #1 problem with PyTorch is that it’s great if you want to use one videocard for training
Incorrect information so confidently stated here. Tons of research papers that use more than one GPU for training, not sure what you're referring to? Standard DDP works fine, for starters.
Indeed, Google/Alphabet is gradually making the shift to JAX but also to ML Pathways towards models that support multiple tasks and multiple sensory inputs and sparse instead of dense:
sure, I know :) I used to work on TPUs at Google in Platforms, much of my time was spent working with Jax and other teams/researchers who found novel ways to break TPU hardware.
What continues to surprise me is that for all his cleverness, Jeff Dean and the rest of the TF leadership spend the last 10 years basically recreating MPI-style high performance computing, but threw away all the learning and rebuilt every bit (except the matrix libraries) from scratch.
TF started with parameter servers (every machine has its own copy of weights and periodically contributes them to a common model, asynchronously) to models that are sharded by data input and model structure that is mapped to the TPU topology (TPUv4 is a 3D wrapped torus). Really not that different from the T3E I used in the 90s.
Can someone please share the current state of deploying Pytorch models to productions? TensorFlow has TF serving which is excellent and scalable. Last I checked there wasn't a PyTorch equivalent.
I'm curious how these charts look for companies that are serving ML in production, not just research. Research is biased towards flexibility and ease of use, not necessarily scalability or having a production ecosystem.
There is TorchServe but I haven't used it so I'm not sure how production ready it is.
You have Nvidia's triton server which support cpu and gpu with tf1,tf2,pytorch,onnx and tensorRT.
You have onnx runtime which can run on cpu and gpu and there are convertors from tf and pytorch to onnx.
Then you have cloud based solutions like AWS sagemaker, elastic inference endpoints and even Inf1 instances that use AWS Inferentia chips which you would run with the Neuron SDK, they even have TensorFlow serving containers with built it support for Inferentia.
End of the day it really depends on your model, size, latency, inference runtime and the cost obviously.
And that's before optimizations like FP16, BFLOAT16, TF32, INT8, pruning, layers rewrite, getting rid of batch normalization etc.
Then you have up and coming solutions like Neural Magic (not associated) deepsparse to create sparse models for inference.
And that's just for cloud if you are talking about edge ml it's even more down the rabbit hole..
If I may, there's no real reason to break out ACL vs. NAACL vs. EMNLP, since they're all run by the ACL and one would be hard-pressed to say how the EMNLP community might differ from the ACL community at this point. And if you're doing NAACL you might want to do EACL and IJCNLP too.
The graph on here seems similar to what I've noticed. My lab mainly uses Tensorflow mainly driven by my knowledge. And the only reason why I learned Tensorflow initially was that PyTorch was just starting when I was choosing a framework and the documentation wasn't as established. However, recently I recommended a student who was asking me which framework to probably use PyTorch due to the ease of implementation comparatively.
A big mistake on the side of tensorflow was trying to copy theano including those dreadful functional loops whereas in pytorch for loops are not pain to use and very well integrated with the language
JAX is really cool, but still somewhat immature. I would love to see it taking more ground and improving wrt e.g. integration with tensorboard and getting all the goodies we have in tensorflow. If you are looking for a higher level framework, I would recommend elegy [0] which is very close to the keras API.
I'm happy to hear about Equinox being used! (I'm the author.)
I'm curious what your workloads are that you're seeing speedups of as much as 1e4? Greatest I've heard of before was ~1e2 on some differential equation solving.
Mainly optimizers/solvers for multivariate functions.
The 1e4 speedup was on a Trust Region optimizer. The algorithm was implemented to solve "the hard case"[1] involves multiple Cholesky factorizations, a matrix inversion, an eigenvalue decomposition on each step, and a call to scipy.linalg.solve_triangular.
Part of the speedup is likely from caching/avoiding recomputations things.
Granted, I had to rewrite a lot of the code to accommodate jax's peculiarities around python semantics, and made extensive use of jax.lax.{fori_loop, while_loop, scan, cond}.
I mostly used Tensorflow and I'm curious what makes PyTorch models easier to implement?
With Tensorflow 2 you get the Keras API which is really easy to use.
However .. one area where Tensorflow shined was the static graph. As our models get even more intensive and needs different parts to execute in parallel, we are seeing some challenges in PyTorch's execution model. For example:
https://pytorch.org/docs/stable/notes/cuda.html#use-nn-paral...
It appears to me that high performance model execution is a bit tricky if you want to do lots of things in parallels. TorchServe also seems quite simple compared to offerings from Tensorflow. So in summary, I think Tensorflow still has some features unmatched by others. It really depends on what you are doing.