PyTorch vs. TensorFlow in Academic Papers

brutus1213 · on March 13, 2022

I'm a professional scientist, so let me give my two cents on this matter. Being able to compare your work against SOTA (state of the art) is pretty critical in academic publications. If everyone else in your area uses framework X, it makes a lot of sense for you to do it too. For the last few years, Pytorch has been king for the topics I care about.

However .. one area where Tensorflow shined was the static graph. As our models get even more intensive and needs different parts to execute in parallel, we are seeing some challenges in PyTorch's execution model. For example:

https://pytorch.org/docs/stable/notes/cuda.html#use-nn-paral...

It appears to me that high performance model execution is a bit tricky if you want to do lots of things in parallels. TorchServe also seems quite simple compared to offerings from Tensorflow. So in summary, I think Tensorflow still has some features unmatched by others. It really depends on what you are doing.

probably_wrong · on March 13, 2022

I think Tensorflow made a bad move in academia by being so damn difficult to use on their earlier versions. Sure, their performance was always better than PyTorch's, but when you are an overworked PhD student you care less about your code being efficient and more about your code working at all.

Word got around that debugging PyTorch was relatively painless, those earlier models made it into publications, and now here we are.

p1esk · on March 13, 2022

But the funny thing is - TF has never been faster than Pytorch. Even when Pytorch just came out they were roughly the same in terms of speed.

sillysaurusx · on March 13, 2022

Not even remotely true. Or rather, our experiences differ greatly.

The #1 problem with PyTorch is that it’s great if you want to use one videocard for training. Facebook has completely failed to support research scientists that want to do more than this.

It’s no secret that I’m a jax fanboy. But I drink the koolaid because it tastes better than anyone else’s. PyTorch is gonna have a rude wake up call in about… oh, four years. They’ll wake up and hear everyone else comparing them to tensorflow, and it won’t be for the rosy reasons they currently enjoy. PyTorch devs are living in the dark ages without even realizing how much better it is when you have actual control over which parts of your program are JITed, along with an actual execution graph that you can walk and macroexpand lisp-style.

https://jax.readthedocs.io/en/latest/autodidax.html should be required reading for every ML dev, and I can hardly get anyone to look at it. Sometimes I wonder if people just don’t see the steamroller coming for PyTorch. Probably — jax still reads to outsiders as a toy.

p1esk · on March 14, 2022

Jax might be faster than Pytorch, I don’t know. I’m talking about TF. When I switched from TF to Pytorch 3 years ago, I got no slowdown on any of computer vision models at the time. And I remember looking at a couple of independent benchmarks which also showed them to be roughly the same in speed.

brutus1213 · on March 15, 2022

It really depends on what your model is doing. For a time, sequence models were easier to do with Pytorch than TF (due to control flow). On the efficiency side, for vanilla CV models, I also did not observe major differences last time I looked, but when I started to do lots of things in parallel, multi-gpu training, heavy data augmentation, I think TF has some well-engineered capabilities that are not matched yet.

forgotmyoldacc · on March 14, 2022

> The #1 problem with PyTorch is that it’s great if you want to use one videocard for training

Incorrect information so confidently stated here. Tons of research papers that use more than one GPU for training, not sure what you're referring to? Standard DDP works fine, for starters.

bertr4nd · on March 15, 2022

> how much better it is when you have actual control over which parts of your program are JITed

Can you elaborate? What’s the advantage of controlling which parts are jitted?

erwincoumans · on March 13, 2022

Indeed, Google/Alphabet is gradually making the shift to JAX but also to ML Pathways towards models that support multiple tasks and multiple sensory inputs and sparse instead of dense:

See https://blog.google/technology/ai/introducing-pathways-next-...

and Jeff Dean's TED talk: https://www.ted.com/talks/jeff_dean_ai_isn_t_as_smart_as_you...

The_rationalist · on March 13, 2022

what are your thoughts on https://github.com/tensorflow/runtime ?

erwincoumans · on March 13, 2022

For Academic Papers (the context of this HN topic), JAX and PyTorch makes more sense. A new runtime could be useful in production.

dekhn · on March 14, 2022

tfrt underlies JAX. In many ways it's the system TF was supposed to have a 5 years ago.

erwincoumans · on March 15, 2022

Yes indeed: https://blog.tensorflow.org/2022/02/tfrt-progress-update.htm... The ML Pathways infra is more complicated, and about more efficient scheduling many ops (in a graph) across many accelerators (rather than a small model on a single local GPU)

dekhn · on March 15, 2022

sure, I know :) I used to work on TPUs at Google in Platforms, much of my time was spent working with Jax and other teams/researchers who found novel ways to break TPU hardware.

What continues to surprise me is that for all his cleverness, Jeff Dean and the rest of the TF leadership spend the last 10 years basically recreating MPI-style high performance computing, but threw away all the learning and rebuilt every bit (except the matrix libraries) from scratch.

TF started with parameter servers (every machine has its own copy of weights and periodically contributes them to a common model, asynchronously) to models that are sharded by data input and model structure that is mapped to the TPU topology (TPUv4 is a 3D wrapped torus). Really not that different from the T3E I used in the 90s.

cweill · on March 13, 2022

Can someone please share the current state of deploying Pytorch models to productions? TensorFlow has TF serving which is excellent and scalable. Last I checked there wasn't a PyTorch equivalent. I'm curious how these charts look for companies that are serving ML in production, not just research. Research is biased towards flexibility and ease of use, not necessarily scalability or having a production ecosystem.

TheGuyWhoCodes · on March 14, 2022

Honestly today there are way too many options.

There is TorchServe but I haven't used it so I'm not sure how production ready it is. You have Nvidia's triton server which support cpu and gpu with tf1,tf2,pytorch,onnx and tensorRT.

You have onnx runtime which can run on cpu and gpu and there are convertors from tf and pytorch to onnx.

Then you have cloud based solutions like AWS sagemaker, elastic inference endpoints and even Inf1 instances that use AWS Inferentia chips which you would run with the Neuron SDK, they even have TensorFlow serving containers with built it support for Inferentia.

End of the day it really depends on your model, size, latency, inference runtime and the cost obviously.

And that's before optimizations like FP16, BFLOAT16, TF32, INT8, pruning, layers rewrite, getting rid of batch normalization etc.

Then you have up and coming solutions like Neural Magic (not associated) deepsparse to create sparse models for inference.

And that's just for cloud if you are talking about edge ml it's even more down the rabbit hole..

kylebgorman · on March 13, 2022

If I may, there's no real reason to break out ACL vs. NAACL vs. EMNLP, since they're all run by the ACL and one would be hard-pressed to say how the EMNLP community might differ from the ACL community at this point. And if you're doing NAACL you might want to do EACL and IJCNLP too.

siver_john · on March 14, 2022

The graph on here seems similar to what I've noticed. My lab mainly uses Tensorflow mainly driven by my knowledge. And the only reason why I learned Tensorflow initially was that PyTorch was just starting when I was choosing a framework and the documentation wasn't as established. However, recently I recommended a student who was asking me which framework to probably use PyTorch due to the ease of implementation comparatively.

rg111 · on March 14, 2022

That’s a good decision.

I know both Keras and PyTorch, and I will recommend PyTorch any day.

I haven’t used TF/Keras in the last 2.5 years outside of Edge AI projects- neither at work nor at personal projects.

JAX is interesting, though. But definitely not advisable as someone's first DL framework.

savant_penguin · on March 13, 2022

A big mistake on the side of tensorflow was trying to copy theano including those dreadful functional loops whereas in pytorch for loops are not pain to use and very well integrated with the language

jitl · on March 13, 2022

What about JAX?

kavalg · on March 13, 2022

JAX is really cool, but still somewhat immature. I would love to see it taking more ground and improving wrt e.g. integration with tensorboard and getting all the goodies we have in tensorflow. If you are looking for a higher level framework, I would recommend elegy [0] which is very close to the keras API.

[0] https://github.com/poets-ai/elegy

PartiallyTyped · on March 13, 2022

Jax is great, but there are some rough edges.

I am using Jax for differentiable programming, and in many cases, I saw enormous speedup after jit, sometimes in the ballpark of 1e4.

For Neural Networks, I use Equinox, and/or Elegy.

patrickkidger · on March 13, 2022

I'm happy to hear about Equinox being used! (I'm the author.)

I'm curious what your workloads are that you're seeing speedups of as much as 1e4? Greatest I've heard of before was ~1e2 on some differential equation solving.

PartiallyTyped · on March 14, 2022

Mainly optimizers/solvers for multivariate functions.

The 1e4 speedup was on a Trust Region optimizer. The algorithm was implemented to solve "the hard case"[1] involves multiple Cholesky factorizations, a matrix inversion, an eigenvalue decomposition on each step, and a call to scipy.linalg.solve_triangular.

Part of the speedup is likely from caching/avoiding recomputations things.

Granted, I had to rewrite a lot of the code to accommodate jax's peculiarities around python semantics, and made extensive use of jax.lax.{fori_loop, while_loop, scan, cond}.

[1] http://www.apmath.spbu.ru/cnsa/pdf/monograf/Numerical_Optimi..., see page 87.

chillee · on March 14, 2022

If you’re massively dominated by overhead I can see it. I’ve definitely done microbenchmarks before where compilation gets you 1000x improvement.

jankaul · on March 14, 2022

I mostly used Tensorflow and I'm curious what makes PyTorch models easier to implement? With Tensorflow 2 you get the Keras API which is really easy to use.

fartcannon · on March 14, 2022

Tensorflow was too hard.

Keras wasn't hard enough.

Pytorch was juuuuust right.