PyTorch vs. TensorFlow in 2022

tehf0x · on Dec 14, 2021

As a researcher in RL & ML in a big industry lab, I would say most of my colleagues are moving to JAX [https://github.com/google/jax], which this article kind of ignores. JAX is XLA-accelerated NumPy, it's cool beyond just machine learning, but only provides low-level linear algebra abstractions. However you can put something like Haiku [https://github.com/deepmind/dm-haiku] or Flax [https://github.com/google/flax] on top of it and get what the cool kids are using :)

6gvONxR4sf7o · on Dec 14, 2021

> but only provides low-level linear algebra abstractions.

Just to make sure people aren't scared off by this: jax provides a lot more than just low level linear algebra. It has some fundamental NN functions in its lax submodule, and the numpy API itself goes way way past linear algebra. Numpy plus autodiff, plus automatic vectorization, plus automatic parallelization, plus some core NN functions, plus a bunch more stuff.

Jax plus optax (for common optimizers and easily making new optimizers) is plenty sufficient for a lot of NN needs. After that, the other libraries are really just useful for initialization and state management (which is still very useful; I use haiku myself).

SleekEagle · on Dec 14, 2021

Would you mind commenting on Haiku vs Flax? I'm partial to Haiku because I'm a fan of Sonnet/DeepMind, but I've not looked into Flax much!

6gvONxR4sf7o · on Dec 14, 2021

I haven't used flax, but it seems more like pytorch. I like haiku because it's relatively minimal. The simplest transform does init and that's all. I like that.

SleekEagle · on Dec 14, 2021

Got it. Awesome, thanks for the info I'll check them both out this week

nestorD · on Dec 14, 2021

I have a preference for Flax, you basically get Pytorch but streamlined thanks to the Jax foundation.

SleekEagle · on Dec 14, 2021

Awesome, thanks for the suggestion. I'll check out both for sure!

tehf0x · on Dec 14, 2021

Indeed! Sorry, I was thinking more about the layers but of course JAX is way more than numpy on steroids. (Although it is also that: https://dionhaefner.github.io/2021/12/supercharged-high-reso...). JAX has a very nice vmap for easy parallelization on SIMD accelerators, and pmap even allows cross-device vectorization with a single line which is just beautiful !

time_to_smile · on Dec 14, 2021

What I love about JAX is that it essentially just makes Python into a performant, differentiable programming language.

I'm a pretty big fan of moving away from thinking about ML/Stats/etc specifically and people should more generally embrace the idea of differentiable programming as just a way to program and solve a range of problems.

JAX means that the average python programmer just needs to understand the basics of derivatives and their use (not how to compute them, just what they are and why they're useful) and suddenly has an amazing amount of power they can add to normal code.

The real power of JAX, for me at least, is that you can write the solution to your problem, whatever that problem may be, and use derivatives and gradient descent to find an answer. Sometimes this solution might be essentially a neural network, other times the generalized linear model, but sometimes it might not fit obviously into either of these paradigms.

adgjlsfhk1 · on Dec 14, 2021

This isn't quite true. Jax works well for "quasistatic" code, but can't handle more dynamic types of problems (see https://www.stochasticlifestyle.com/useful-algorithms-that-a... for a more detailed explanation).

Jax is definitely the right direction for the python ecosystem, but it can't solve all your problems. At some point you still need a fast language.

market_hacker · on Dec 14, 2021

Do any JAX experts know if there is an equivalent to https://captum.ai/ - a model interpretability library for pytorch?

In particular i want to be able to measure feature importance on both inputs and internal layers on a sample by sample basis. This is the only thing currently holding me back from using JAX right now.

Alternatively a simle to read/understand/port implementation of DeepLIFT would work too.

thanks

bjourne · on Dec 14, 2021

Most? Last I tried JAX it had no real documentation to speak of and all the tutorials you could find on the net were woefully out of date. Even simple toy examples broke with weird error messages. Maybe the situation is better now. I'd rather wait for JAX 2.0 though. :)

tehf0x · on Dec 14, 2021

Give it another try, I found the docs pretty good, you need to get your head around XLA tracing, and read "the sharp bits" section and you should be pretty set!

mrdoops · on Dec 14, 2021

Yep and this approach also allows languages like Julia and Elixir to compile their expressions into valid compute graphs that target JAX/XLA. That polyglot capability opens up cutting edge machine learning into quite a bit more ecosystems with another level of capabilities in distribution and fault tolerance as is the case with Elixir + Nx.

servytor · on Dec 14, 2021

Could you or someone elaborate more on how other languages can hook into JAX/XLA?

nestorD · on Dec 14, 2021

Julia has XLA.jl [0] which interoperate with their deep-learning stack and Elixir has NX [1] which is higher level (basically JAX but in Elixir). I would love to see someone do something like that in Rust...

[0]: https://github.com/FluxML/XLA.jl

[1]: https://github.com/elixir-nx/nx/tree/main/nx

runnerup · on Dec 14, 2021

Is there a straightforward way to move models/pipelines created in JAX to EdgeTPU (TFLite)?

SleekEagle · on Dec 14, 2021

You can try this https://www.tensorflow.org/lite/examples/jax_conversion/over...

xvilka · on Dec 15, 2021

How is it compared with Julia + Flux.ml[1]?

[1] https://fluxml.ai/

manojlds · on Dec 14, 2021

The article mentions JAX many times and literally has a flow chart for RL and when to use JAX.

chillee · on Dec 14, 2021

> As a researcher in RL & ML in a big industry lab

Is that big industry lab Google or Deepmind? haha

lyschoening · on Dec 14, 2021

The article highlights JAX as a framework to watch in several places.

SleekEagle · on Dec 14, 2021

JAX is really exciting! JAX is mentioned in the Research subsection of the "Which should I pick" section. Do you think that the fundamental under-the-hood differences of JAX compared to TensorFlow and PyTorch will affect its adoption?

Haiku is really cool - I haven't used Flax. It'll be really interested to see the development of JAX as time goes on. I also saw some benchmarks that show its neck-and-neck with PyTorch as the fastest of the three, but I think with more optimization its ceiling is higher than PyTorch's.

sillysaurusx · on Dec 14, 2021

> Do you think that the fundamental under-the-hood differences of JAX compared to TensorFlow and PyTorch will affect its adoption?

Of course. It's the only library that can be explained from first principles: https://jax.readthedocs.io/en/latest/autodidax.html

SleekEagle · on Dec 14, 2021

Wow that's a really cool resource. Thanks for linking!

Even still, do you think researchers will want to take the time to learn all of that when PyTorch gives them no real reason to switch? Every day spent learning JAX is another day spent not reviewing literature, writing papers, or developing new models.

fault1 · on Dec 14, 2021

It depends on what you want to do obviously.

pytorch historically hasn't really focused on forward mode auto differentiation: https://github.com/pytorch/pytorch/issues/10223

this definitely limits its generality relative to jax, which makes it less than ideal for anything other than 'typical' deep neural networks

this is especially true when the research in question is related to things like physics or combining physical models and machine learning, which imho is very interesting. those are use cases that pytorch just isn't good at.

SleekEagle · on Dec 14, 2021

Interesting - I didn't realize that it was that important for computational physics. Very cool, I'll have to read up!

6gvONxR4sf7o · on Dec 14, 2021

> Every day spent learning JAX is another day spent not reviewing literature, writing papers, or developing new models.

Every day spent learning JAX is also another day spent not trying to fit a round peg into a square hole of other libraries. I made the leap when I was doing things that were painful in pytorch. In terms of time, I think I came out ahead.

Not everything is a nail, and pytorch is better for some things, an jax is better for others. "Every day spent learning the screwdriver is a day spent not using your hammer."

SleekEagle · on Dec 14, 2021

Totally agree! Always a cost/benefit analysis to consider, so it's nice to hear that it was worth it for someone who made the jump.

time_to_smile · on Dec 14, 2021

> Every day spent learning JAX

To get started JAX is just knowing Python and adding `grad`, `jit` and `vmap` to the mix, it takes about 5 minutes to get going.

To me this is the real power of JAX, it can be viewed as a few functions that make it easy to take any python code you've written and work with derivatives using that. This gives it tremendous flexibility in helping you solve problems.

As an example, I mostly do statistical work with it, rather than NN focused work. It took probably a few minutes to implement a GLM with custom priors over all the parameters, and the use then Hessian for the Laplace approximation of parameter uncertainty. The proper way to solve this would have been using PyMC but this worked good enough for me, and building the model in scratch in JAX took less time than refreshing the PyMC api for me.

6gvONxR4sf7o · on Dec 14, 2021

The autodidax section of the jax docs is such a wonderful thing. I wish every library had that.

SleekEagle · on Dec 14, 2021

So cool! I'm a bit surprised they took the time to put it together, but I'm definitely not complaining! LOL

bertday · on Dec 15, 2021

Having read some Jax high-performance code, I do like Jax, but it does feel a bit too abstract and low level sometimes. Maybe there aren’t good coding conventions or performance trumped them? Definitely needs improvement on error messages, as well.

For example, a long chain of pmaps, each with some sort of device partitioning logic, not JIT compiling is extremely hard to understand. I basically had to binary search code until the compile errors disappeared.

was_a_dev · on Dec 15, 2021

How performant is JAX vs Numba in terms of non-ML applications?

anton_ai · on Dec 14, 2021

I'm as well going from TF2 to JAX, it's like TF3 and I hope that google will just put the keras team to work on a JAX version

tadeegan · on Dec 14, 2021

Tensorflow is just such a classic clusterfuck google project. V2 had huge breaking changes (reminiscent of angular) and tons of the apis are clunky and don’t work well together. There are like 3 different ways to save a model. It’s almost like a bunch of teams built features with no oversight.

I’m pretty sure tf is considered in maintenance mode within google as Brain and the tf creators themselves have moved to Jax. I do think Google learned a lot from tensorflow and am excited to see Jax pan out.

Pytorch is a pleasure to debug. I think pytorch jit could close the deployment gap.

stathibus · on Dec 14, 2021

The article gives credit to TF server and TFLite and so forth as being better for deployment, but leaves out the fact that those systems don't fucking work most of the time, and support is pretty much over at this point. The same goes for model support; even the models in TF's own repository are sometimes broken or don't follow the API conventions set forth in the documentation. I honestly don't know how anyone uses TF in production at this point, unless they are frozen on a specific old version and have figured out an environment that works with their specific models already.

SleekEagle · on Dec 14, 2021

Yeah, TensorFlow's API has definitely gotten convoluted and confusing. I think the shift from TF1 to TF2 and then later wrapping Keras in TF just caused a lot of problems.

TensorFlow seems to be spreading itself pretty thin. Maintaining so many language bindings, TensorFlow.js, TFlite, Server, etc. seem like they could all use some focus, BUT, and this is a big but, do you think if they can get each part of their ecosystem to an easily usable point that they'll have cornered the industry sector?

PyTorch is taking a much more targeted approach as seen with PyTorch Live, but I truly think that TFLite + Coral will be a game-changer for a lot of industries (and Google will make a fortune in the process). To me it seems like this is where Google's focus has lain in the AI space for the past couple of years.

What do you think?

runnerup · on Dec 14, 2021

> I truly think that TFLite + Coral will be a game-changer for a lot of industries

I'd like to agree. Google was very far ahead of the curve when they released Coral. I was completely stoked when they finally added hardware video encoding to the platform with the release of the Dev Board Mini.

I want them to succeed but I fear if they don't drastically improve their Developer Experience, others will catch up and eat their lunch. TensorFlow has been hard to pick up. A few years ago when I was trying to pick this up to create some edge applications, PyTorch wasn't so much easier that it seemed worth sacrificing EdgeTPU support. But now PyTorch seems much, much easier than it did then, while TensorFlow hasn't seemed to improve in ease-of-use.

Now I'm genuinely considering sacrificing TFLite / EdgeTPU in favor of, say Jetson-esque solutions just so that I can start doing something.

Note: I am an amateur/hobbyist in this context, I am not doing Edge machine learning professionally.

SleekEagle · on Dec 14, 2021

Yeah, I hear you loud and clear on a lot of those points. I think the most important think honestly is the fact that most PhDs use PyTorch in academia, so industry will inevitably shift to tailor to this growing supply if possible. Of course, Coral/TFLite are really useful, so a balance will be found, but it'll be interesting to see how it plays out.

dave_sullivan · on Dec 15, 2021

> unless they are frozen on a specific old version and have figured out an environment that works with their specific models already

Mostly this I suspect

SleekEagle · on Dec 14, 2021

Totally agree on the debugging. The fact that PyTorch is more pythonic and easier to debug makes it the better choice for a lot of applications.

Are you in research? I think TensorFlow's position in industry puts it in a kind of too-big-to-fail situation at this point. It'll be interesting to see what happens with JAX, but for now TensorFlow really is the option for industry.

Do you think TFLite + Coral devices will help breathe new life into TF?

beltsazar · on Dec 15, 2021

> Tensorflow ... V2 had huge breaking changes

Meanwhile PyTorch doesn't follow SemVer and always has breaking changes for every minor version increment. There's always "Backwards Incompatible Changes" section for every minor version release: https://github.com/pytorch/pytorch/releases

xiphias2 · on Dec 14, 2021

Even TF 1 was just an extension of Google Brain: the project that took a datacenter of CPUs in Google to distinguish cats and dogs in Youtube videos with very high accuracy. I remember when Jeff Dean was talking about it the first time, it felt like magic (though it still feels like it, it’s just more optimized magic :) ).

dbish · on Dec 14, 2021

Any other deployment issues for PyTorch you’re aware of that would help ‘close the gap’?

tadeegan · on Dec 14, 2021

I think PyTorch c++ api is less mature and harder to compile into other projects. Tensorflow started with the c++ api exposed which is why the graph format is so stable and favorable to deployment in heterogeneous environments.

axegon_ · on Dec 14, 2021

At one point I lost interest in both and ML/AI in general. I think eventually I got frustrated with the insane amounts of abuses for marketing purposes and never truly delivering what was promised(I know, I know, fake it till you make it). For better or worse far too few managed to make it, so most stuck with faking. I think I lost interest completely around 2019. But even back then they were starting to seem like twins - practically identical with some minor but sometimes subtle differences. Looking at random sections of the documentation, all you gotta do is ignore the semantics...

> tf.nn.conv2d(input, filters, strides, padding, data_format='NHWC', dilations=None, name=None)

> torch.nn.functional.conv2d(input, weight, bias=None, stride=1, padding=0, dilation=1, groups=1)

SleekEagle · on Dec 14, 2021

Yeah, since the release of TF2 in 2019 they're a lot more similar. PT is still more pythonic and easier to debug, but TF has a way better industry infrastructure. The documentation is comically similar though! lol

Have you checked out Google's Coral devices? DL has definitely been abused for marketing purposes, but I think the lack of delivery had more to do with the fact that DL was progressing far faster than the tools around them which make their intelligence actionable.

Part of this is because so many DL applications had to be delivered in a SaaS way, when local AI makes much more sense for a lot of applications. I think the TF -> TFLite -> Coral Device pipeline has the potential to revolutionize a LOT of industries.

axegon_ · on Dec 14, 2021

Now that you mention it Coral products are the only thing still of some interest to me. Though with so many alternatives around I'm not completely sure they are justified. Say the SBC - even though it is relatively smaller, it's still more expensive than the Jetson which is in a different category in terms of specs. In all fairness I'm interested to get some opinions on the jetson since I have a project where it might come in handy(automotive related). I'm still on the fence as to what I want to use and if I should consider a USFF desktop computer and have an x86 at my disposal and avoid risking having to dig deep into some library when something goes wrong. The one thing that I'm keeping my eyes on are the M.2 Coral devices, though I personally have no use for them.

mianos · on Dec 15, 2021

Unless I missed something, I recently tried to buy a Coral. It was not available anywhere except a few places at 10X RRP. You can buy a Jetson easily. I just got one a few weeks ago.

I have done the tutorials and they all work. They seem to be very well maintained.

People I know said they never got the google Coral SDK working. Unfortunately they wouldn't give me their Corals. :(

axegon_ · on Dec 15, 2021

Depends which of their products you are after and where you are located. At this very moment Mouser[1] has a very limited quantity in stock.

[1] https://eu.mouser.com/c/?b=Coral

jstx1 · on Dec 14, 2021

> At one point I lost interest in both and ML/AI in general.

Kind of offtopic but yeah, same. I'm a data scientist and right now I'm learning Django.

FridgeSeal · on Dec 14, 2021

Same here-spent so much time having to spin up all of the infrastructure and data I needed everywhere I’ve worked that I basically do more architecture and engineering that going back to TF or PyTorch or figuring out the new framework/model arch de-jour just lost all appeal.

SleekEagle · on Dec 14, 2021

Both frameworks have matured a lot and I think we're coming up on some really awesome applications in the coming years. The tools around them which make it easier to apply DL everywhere are really the bottleneck at this point!

mrtranscendence · on Dec 14, 2021

In my experience web developers don't get paid as much as data scientists, unfortunately. But I feel you.

hwers · on Dec 14, 2021

Personally I wish we'd get beyond these low level abstractions by now. Machine learning is so wonderfully mathematical that making abstractions from the details should be incredibly easy and powerful. I can't believe like 8 years after the big ML wave e.g. frontend javascript developers aren't enjoying the fruits of these labors (despite there being no good reason for them not to be able to).

jstx1 · on Dec 14, 2021

There are high level abstractions like keras and tensorflow.js (or even higher level GUI tools). All of them are fairly accessible to people with some basic programming knowledge.

I don't get your point about JS developers not enjoying the fruits of these labors - they don't need to enjoy them because they work in a different domain. And if they're interested in playing around with deep learning, the higher level APIs are easy to pick up. I'm not sure what you're expecting to see.

SleekEagle · on Dec 14, 2021

Yeah they made ml5.js for this reason: https://ml5js.org/

I do feel like Google could do better communicating all of their different tools though. Their ecosystem is large and pretty confusing - they've got so many projects going on at once that it always seems like everyone gets fed up with them before they take a second pass and make them more friendly to newcomers.

Facebook seems to have taken a much more focused approach as you can see with PyTorch Live

dbish · on Dec 14, 2021

What do you envision that would help JavaScript devs take advantage of ML? There is tensorflow.js. Are you thinking completely different ‘building blocks’ that provide higher level apis like fast.ai geared towards frontend devs or something else?

SleekEagle · on Dec 14, 2021

They actually made ml5.js for this reason, not sure if you've seen it https://ml5js.org/

Pretty cool imo

SleekEagle · on Dec 14, 2021

Have you checked out TensorFlow.js? May be what you're looking for...

DrNuke · on Dec 14, 2021

You can work within a niche domain or an applied industry, for which ML/AI is just another tool in the bag (admittedly: sometimes revolutionary, many other times irrelevant); or you may want to do bleeding-edge research, only to find that you just cannot compete against the top dogs (even the wonderful fast.ai couldn’t follow suit without refactoring heavily every six or twelve months). What’s the point, then? Set yourself a clearly interesting and achievable target (learn and find a job, get a paper published, release an applied library, etc.) and a challenging deadline with milestones (say, 3-6-9-12 months). After that, wrap up and move forward or move on.

cfcf14 · on Dec 14, 2021

For any kind of research or experimental work, I cannot imagine using anything other than PyTorch, with the caveat that I do think JAX is extremely impressive and I've been meaning to learn more about it for a while.

Even though I've been working with Tensorflow for a few years now and I feel like I do understand the API pretty well, to some extent that just means I'm _really_ good at navigating the documentation, because there's no way to intuit the way things work. And I still run into bizarre performance issues when profiling graphs pretty much all the time. Some ops are just inefficient - oh but it was fixed in 2.x.yy! Oh but then it broke again in 2.x.yy+1! Sigh.

However - and I know this is a bit of a tired trope, but any kind of industrial deployment is just vastly, vastly easier with Tensorflow. I'm currently working with ultra-low-latency model development targeting a Tensorflow-Lite inference engine (C-API, wrapped via Rust) and it's just incredibly easy. With some elbow grease and willingness to dive into low level TF-Lite optimisations, one can see end to end model inference times in the order of 10-100us for simple models (say, a fully connected dnn with a few million parameters), and between 100us-1ms for fairly complex models utilising contemporary architectures in computer vision or NLP. Memory overhead and control over inference computation semantics are easy.

As a nice cherry on top, we can take the same Tensorflow SavedModels that get compiled to TF-Lite files and instead compile them to tensorflow-js for easy web deployment, which is a great portability upside.

However, I know there's some incredible progress being made on what one might call 'environmental agnostic computational graph ILs' (on second thought, let's not keep that name) which should open up more options for inference engines and graph optimisations (operator fusion, rollups, hardware dependant stuff, etc).

Overall I feel like things have been continuously getting better for the last 5 years or so. I'm pleased to see so many more options.

SleekEagle · on Dec 14, 2021

Agreed - JAX is really cool. It will be interesting to see how TF & JAX develop considering they're both made by Google. I also think JAX has the potential to be the fastest, although right now it's neck-and-neck with PyTorch.

Yes - a lot of TF users don't realize that knowing the "tricks of the trade" for wrangling TF don't apply in PT because it just works more easily.

I agree that industry-centric applications should probably use TF. TFX is just invaluable. Have you checked out Google's Coral devices? TFLite + Coral = revolution for a lot of industries.

Thanks for all your comments - I'm also really excited to see what the coming years bring. While we might debate if PT or TF is better, they're both undoubtedly improving very rapidly! So excited to see how ML/DL applications start permeating other industries

tubby12345 · on Dec 15, 2021

>10-100us for simple models (say, a fully connected dnn with a few million parameters)

I basically don't believe you. I'm a researcher in this area (DNNs on FPGAs) and you cannot get these latencies on real models without going to FPGA (and you're not synthesizing Verilog from TF, unless you're one of my competitors...). Just your kernel launch overheads for GPU are on the order of 10ms. For example, here's a talk given at GTC a couple of years ago where they do get down to 35us (on tensorcores) using persistent kernels, but on a mickey mouse network

https://www.nvidia.com/en-us/on-demand/session/gtcsiliconval...

CPU (where you don't have to deal with async CUDA calls) won't save you either; again here's a paper from USENIX (so you know it's legit) that shows that lowest times for real networks on CPU are ~2ms (and that's on resnet18, far shy of "millions" of weights)

https://www.usenix.org/system/files/atc19-liu-yizhi.pdf

cfcf14 · on Dec 15, 2021

Hi, do you have an email I can reach you at? Would love to chat more about this.

tubby12345 · on Dec 15, 2021

not comfortable putting any emails/handles on here (bitten before). but if you put an email in your profile i'll reach out.

cfcf14 · on Dec 15, 2021

I've added one, will be pleased to hear from you :)

nmca · on Dec 14, 2021

This article says that Google and DeepMind research use TF - but they don't. DeepMind use JAX almost exclusively, and many brain researchers use JAX too.

ML eng is my area of expertise, and I would advise strongly against tensorflow.

artdigital · on Dec 15, 2021

What would you recommend for new comers that want to get into the space?

lvl100 · on Dec 14, 2021

I just want an alternative to CUDA. I am sick of Nvidia’s monopoly in this space.

YetAnotherNick · on Dec 14, 2021

The best thing that worked for me is the Apple's tensorflow pluggable device for metal. It could utilize both AMD and M1 GPU to 100% capacity. It's a shame apple could do it as a small project in the side to use metal but AMD couldn't do the same with Vulkan.

nestorD · on Dec 14, 2021

I believe all big frameworks have some work being done to make them compatible with AMD GPUs. Here is the relevant issue for JAX (support seem to be in alpha but viable): https://github.com/google/jax/issues/2012

dash2 · on Dec 14, 2021

"PyTorch and TensorFlow are far and away the two most popular Deep Learning frameworks today. The debate over whether PyTorch or TensorFlow is superior is a longstanding point of contentious debate, with each camp having its share of fervent supporters.

Both PyTorch and TensorFlow..."

Can an article really be any good if it starts off with such obvious SEO spam?

breezeTrowel · on Dec 14, 2021

> Can an article really be any good if it starts off with such obvious SEO spam?

That's an interesting take. I fail to see how such mentions, in an article that compares two things and, thus, mentions both things together, are in any way SEO spam?

For example, in an article comparing apples and oranges I would expect to see a rather high number of mentions of "apples and oranges". After all, that is the topic.

ojosilva · on Dec 14, 2021

It's SEO for the OP site. The parent comment is saying the words TF and PyTorch are repeated exhaustively, I think sometimes senselessly, throughout the entire article. It doesn't mean the content is not of value.

edgyquant · on Dec 14, 2021

That may be the topic but that first sentence reads like they are aiming to have a high number of mentions not a quality of content. For instance you can replace the second and third mentions of “PyTorch and Tensorflow” with something like “the two” or “they both.”

_skzr · on Dec 15, 2021

This is keyword stuffing, a very old SEO technique which is obsolete. In fact such behavior works against you these days.

sajforbes · on Dec 14, 2021

Though I agree with you it's annoying, you gotta hate the game not the player. It's Google's fault for valuing things like this in its search results. All the author is doing is trying to get seen. I'd say that if they have to irk a few people like us to get themselves higher on search results, they'd probably judge it as a good trade-off.

johnnyApplePRNG · on Dec 15, 2021

I always look to Google trends when trying to decide upon which framework to use.

PyTorch just recently took the lead. [0] So if I were having to choose between learning the either of them, I would go with PyTorch.

[0] https://trends.google.com/trends/explore?date=all&q=tensorfl...

g42gregory · on Dec 14, 2021

As a practitioner, I feel that oftentimes you are extending, fine tuning somebody else’s code or pre-trained models (DeepMind’s for example). This means that you should be able to work on whatever the platform this code came with. Basically, you should be able to work with JAX, TF or PyTorch with equal ease.

SleekEagle · on Dec 14, 2021

Given that the vast majority of publications are using PyTorch, it kind of makes it the go-to

bjourne · on Dec 14, 2021

Great article. While I only had time to skim the article, I'll still offer my uninformed opinions. :) None of the hw I own is particularly great. I don't even own a decent GPU. But I don't need to because you can train your models for FREE on Google Colab's TPUs and GPUs. PyTorch's TPU support is still not that great while TensorFlow's is maturing. It's obviously a priority for Google to make TensorFlow work well on their own hardware.

So for me the choice is TF 2 because I can train models 5-10x faster using Google's TPUs than if I had used PyTorch. I know the PyTorch developers are working on TPU support but last I checked (this spring) it wasn't there yet and I wasn't able to make it work well on Google Colab.

synergy20 · on Dec 15, 2021

As a layman in ML, I thought PyTorch is more(or only) geared towards researchers while Tensorflow has its problems it is the _only_ one that provides a commercial solutions that you can deploy, is this still true?

JAX is totally new to me, is this Google's new Tensorflow in the future?

lettergram · on Dec 14, 2021

The thing is, tensorflow has more ability to run cross platform.

I help maintain https://github.com/capitalone/DataProfiler

Our sensitive data detection library is exported to iOS, android, and Java; in addition to Python. We also run distributed and federated use cases with custom layers. All of which are improved in tensorflow.

That said, I’d use pytorch if I could. Simply put, it has a better user experience.

SleekEagle · on Dec 14, 2021

Yeah, it's pretty funny how reluctantly some people use TF because they have too, lol.

The fact that PyTorch is pythonic and easier to debug makes it better for a ton of users, but TensorFlow keeps the entire DL process in mind more, not just modeling.

ojosilva · on Dec 14, 2021

I'm on that boat. Tensorflow.js is the only decent ML library for JS. Google support for TF.js has been dwindling, but new versions are still coming out AFAIK and we've just got Apple Silicon (M1) support.

astroalex · on Dec 14, 2021

No mention of Google MediaPipe (https://mediapipe.dev/), which is a mobile/edge framework for deploying TFLite models. MediaPipe has the advantage of letting you stitch together multiple models, transformations, and heuristics into a higher level computational graph. I'm not aware of any equivalent for PyTorch, although PyTorch Live seems like baby steps in that direction.

justinlloyd · on Dec 14, 2021

I am using mediapipe extensively in my day-to-day job. I've been impressed with it so far, and the ability to patch together multiple models in to, well... a media pipe, has been impressively useful. But I am also waiting for the penny to drop and google to announce they've abandoned it, or that their API has completely changed.

astroalex · on Dec 17, 2021

Do you work at my company? Because that's our biggest fear too.

I asked a friend of mine at Google to sleuth around internally and get a sense for the health of the project. He said that it's used on some internal projects and seems to have a pretty healthy internal website. So hopefully it won't be cancelled soon.

justinlloyd · on Dec 17, 2021

> Do you work at my company?

Maybe. But I only get paid in whatever snacks I can scrounge from the employee fridge and pickings have been slim of late since so few people come into the office these days. I am down to the "expired mystery mozzarella cheese sticks" and the leftover ketchup packages from Woodranch BBQ & Grill that are in the drawer with all the unused chopsticks and plastic forks.

I too spoke to a friend at Google that is part of the team, and whilst he said there were no plans to cancel it, or make radical changes, when I asked about unplanned plans, he kinda just shrugged and said "You know Google..."

I have a dual solution approach, Mediapipe for "in use now" and OpenPose for validation, slower processing and the "Google just **ed us" moment we're both anticipating. I need to build my own pose analysis system, but right now I don't have the bandwidth.

On the last day of Christmas the CEO sent to me:

    Thirty-two Manfrotto Tripod extenders

    Sixteen Manfrotto tripods

    Sixteen high speed cables

    Sixteen Manfrotto C-clamps

    Sixteen Manfrotto 3/8 to 1/4-20 reducers

    Sixteen Quick release mounts

    Sixteen 4K cameras

    Fooouuuurrrrr high-speeeeeed PCIe capture cards

    Three days to hit deadline

    Two triggered circuit breakers

    One really huge headache

    And a new VR H.M.D.

Kalanos · on Dec 14, 2021

Post doesn't talk about the actual libraries, just the ecosystems surrounding them.

TF has more layer types, parallelizes better, is easier to assemble w keras, and you don't have to recreate the optimizer when loading from disk. pytorch doesn't have metrics out of the box. TF all the way.

github.com/aiqc/aiqc

Kalanos · on Dec 27, 2021

challenging to get multi-dimensional layer sizes right. torch is not compatible numpy as input, which is painful for pre and post processing. no metrics means no History object. no activation attribute for Linear layers and Softmax layer behaves weird.

SleekEagle · on Dec 14, 2021

Do you think it's a bit easier to build custom layers in PyTorch though?

Also, I think Lightning handles the issue of loading optimizers, but I'm not sure about that.

It's nice to see TF get some love, but I still think PyTorch has easier debugging and is more pythonic which lowers the barrier to entry for a lot of people.

dbish · on Dec 14, 2021

Curious since I’ve been looking at this recently. What out of the box metrics would matter most to you? There are lots of libraries in the ecosystem for metrics but I’ve seen the request for built in metrics a few times now so it must be a clear need.

Kalanos · on Dec 14, 2021

mostly just accuracy and r squared. i use lightning's torchmetrics.

d4rkp4ttern · on Dec 15, 2021

Does TF have any advantages in terms of ease of acceleration (training and inference) with multiple GPUs?

Our pipeline is all PyTorch Lightning — this made development easy but we have been having numerous issues trying to leverage multiple GPUs (this is for sequence models), keep getting strange errors.

strzibny · on Dec 14, 2021

This looks like a good comparison. A lot of comparisons online are way superficial or plain wrong.

SleekEagle · on Dec 14, 2021

I think the checklist of technical differences is a bit outdated at this point!

cinntaile · on Dec 14, 2021

In TF you don't have to manually calculate the number of parameters of a fully connected layer. That's kind of nice. I don't really understand why PyTorch requires me to do that. Surely this can be infered from the previous layer.

jbschlosser · on Dec 14, 2021

This is supported by PyTorch as well with nn.LazyLinear: https://pytorch.org/docs/stable/generated/torch.nn.LazyLinea....

cinntaile · on Dec 14, 2021

I tried this a couple of months ago. At the time it didn't really work as intended but I'll give it another shot.

cinntaile · on Dec 15, 2021

Tried it again, it works fine. Thanks

SleekEagle · on Dec 14, 2021

"same" padding for convolutional layers is super convenient too

jbschlosser · on Dec 14, 2021

FYI PyTorch now supports "same" padding for convolutions with stride=1; see the docs here: https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.ht...

The stride>1 case has been a bit more controversial within TensorFlow, and there is ongoing discussion on the correct way to implement it within PyTorch on the issue: https://github.com/pytorch/pytorch/issues/3867

SleekEagle · on Dec 15, 2021

Thanks for letting me know!

cinntaile · on Dec 14, 2021

Yeah true, that's also a bit annoying in PyTorch compared to TF. I remember reading on the PyTorch forums that this was a bit hard to implement for some reason, but I can't recall any details if there were any.

bigbillheck · on Dec 14, 2021

I started with tensorflow back in 2017 or so, then in my current job I am touching a lot of pytorch, and as such I say a pox on both their houses.

But keras is OK and I greatly appreciate that you can (sometimes) serialize everything to hdf5.

minimaxir · on Dec 14, 2021

At this point, there's so much conceptual and syntactical overlap between the two frameworks that there isn't a "winner." Like most DS/ML tools, use which one fits your use case.

SleekEagle · on Dec 14, 2021

I totally agree! I think their technical differences are superseded by other practical considerations at this point. It's interesting to see how the landscape for which is better given a use case is changing though - PyTorch has made a lot of effort to grab more of the industry sector with TorchServe and PyTorch Live.

Do you think PyTorch can catch up here? I think Google's Coral devices give them a lock on embedded devices in the coming years

dontreact · on Dec 14, 2021

I tried and failed to use Google Coral about 8months ago. The dev experience was terrible. Our company just went with deploying using openvino on cpu, which was fast enough.

I’m not sure coral has enough of an edge to make it worthwhile relative to simpler edge deployment options like cpu

303bookworm · on Dec 14, 2021

Mumble grumble Keras.

("Shakes old man's fist at sky, and at people who seems to enjoy boilerplate code too much")

yboris · on Dec 14, 2021

Keras is wonderful. I can't understand whether your comment is in favor or against it. Either way, could you share why?

oofbey · on Dec 14, 2021

Keras is great for easy problems. But as soon as you want to color outside the lines its opinionated simple model just gets in the way. Every time I've tried to use keras for real work, it was a total pain in the butt. Do your homework in keras. Use something else for real work.

otacust · on Dec 14, 2021

I used fastai quite a bit with PyTorch and ended up feeling the same way. Great for spinning up an effective model in a few lines, but really tough to make it do exactly what you want it to do.

yboris · on Dec 14, 2021

As the originally-posted-link suggests, Keras is a great way to start in ML (this is how I started, with the book Deep Learning with Python by François Chollet, the creator of Keras). And after one is comfortable with the concepts and the general flow of things, and if one needs to do more advanced stuff (as you say), then move on to something like PyTorch (I love that they mention PyTorch Lightning in the article).

streamofdigits · on Dec 14, 2021

in the fullness of time JAX might prove to be more important than either. don't give people fish, teach them how to catch fish and all that...

justinlloyd · on Dec 14, 2021

I will preface with the statement that my knowledge may be slightly out of date as I don't keep up on every nuanced change.

I use PyTorch and TensorFlow, and the article is spot-on in regard to "mystery operations that take a long time" with no real rhyme or reason behind them with regard to TensorFlow. That said, on the whole, I skew more towards TensorFlow because it is generally easier to reason about the graph and how it connects. I also find the models that are available to usually be more refined, robust and useful straight out of the box.

With PyTorch I am usually fighting a slew of version incompatibilities in the API between even more point releases. The models often feel more slap-dash thrown together, research like projects or toy projects, and whilst the article points out the number of papers that use PyTorch far exceeds those that use TensorFlow, and the number of models for PyTorch dwarfs that of TensorFlow, there isn't a lot of quality in the quantity. "90% of everything is crap." Theodore Sturgeon. And that goes double for PyTorch models. A lot of the models, and even some datasets, just feel like throwaway projects that someone put up online.

If you are on macOS or Linux and using Python, PyTorch works fine, but don't step outside of that boundary. PyTorch and TensorFlow work with other operating systems, and other languages besides Python, but working with anything but Python when using PyTorch is a painful process fraught with pain. And yes, I expect someone to drop in and say "but what about this C++ framework?" or "I use language X with PyTorch every day and it works fine for me!" But again, the point stands, anything but Python with PyTorch is painful. The support of other languages for TensorFlow is far richer and far better.

And I will preface this with, "my knowledge may be out of date" but I've also noticed the type of models and project code available for TensorFlow and PyTorch diverge wildly once you get outside of the toy projects. If you are doing computer vision, especially with video and people, and you are not working on the most simplest of pose analysis, TensorFlow offers a lot more options of stuff straight out of the box. PyTorch has some good projects and models, but they are mostly of the Mickey Mouse hobby stuff, or an abstract research project that isn't very robust or immediately deployable.

I use TensorFlow in my day-to-day job. All that said, I like PyTorch for its quirkiness, its rapid prototyping, its popularity, and the fact that so many people are trying out a lot of different things, even if they don't work particularly well. I use PyTorch in almost all of my personal research projects.

I expect in the future for PyTorch to get more stable and more deployable and have better tools, if it can move slightly away from the "research tool" phase it is currently in. I expect Google to do the usual Google-Fuck-Up and completely change TF for the worse, break compatibility (TF1 to TF2) or just abandon the project entirely and move on to the next new shiny.