As a researcher in RL & ML in a big industry lab, I would say most of my colleagues are moving to JAX [https://github.com/google/jax], which this article kind of ignores. JAX is XLA-accelerated NumPy, it's cool beyond just machine learning, but only provides low-level linear algebra abstractions. However you can put something like Haiku [https://github.com/deepmind/dm-haiku] or Flax [https://github.com/google/flax] on top of it and get what the cool kids are using :)
> but only provides low-level linear algebra abstractions.
Just to make sure people aren't scared off by this: jax provides a lot more than just low level linear algebra. It has some fundamental NN functions in its lax submodule, and the numpy API itself goes way way past linear algebra. Numpy plus autodiff, plus automatic vectorization, plus automatic parallelization, plus some core NN functions, plus a bunch more stuff.
Jax plus optax (for common optimizers and easily making new optimizers) is plenty sufficient for a lot of NN needs. After that, the other libraries are really just useful for initialization and state management (which is still very useful; I use haiku myself).
I haven't used flax, but it seems more like pytorch. I like haiku because it's relatively minimal. The simplest transform does init and that's all. I like that.
Indeed! Sorry, I was thinking more about the layers but of course JAX is way more than numpy on steroids. (Although it is also that: https://dionhaefner.github.io/2021/12/supercharged-high-reso...). JAX has a very nice vmap for easy parallelization on SIMD accelerators, and pmap even allows cross-device vectorization with a single line which is just beautiful !
What I love about JAX is that it essentially just makes Python into a performant, differentiable programming language.
I'm a pretty big fan of moving away from thinking about ML/Stats/etc specifically and people should more generally embrace the idea of differentiable programming as just a way to program and solve a range of problems.
JAX means that the average python programmer just needs to understand the basics of derivatives and their use (not how to compute them, just what they are and why they're useful) and suddenly has an amazing amount of power they can add to normal code.
The real power of JAX, for me at least, is that you can write the solution to your problem, whatever that problem may be, and use derivatives and gradient descent to find an answer. Sometimes this solution might be essentially a neural network, other times the generalized linear model, but sometimes it might not fit obviously into either of these paradigms.
Do any JAX experts know if there is an equivalent to https://captum.ai/ - a model interpretability library for pytorch?
In particular i want to be able to measure feature importance on both inputs and internal layers on a sample by sample basis. This is the only thing currently holding me back from using JAX right now.
Alternatively a simle to read/understand/port implementation of DeepLIFT would work too.
Most? Last I tried JAX it had no real documentation to speak of and all the tutorials you could find on the net were woefully out of date. Even simple toy examples broke with weird error messages. Maybe the situation is better now. I'd rather wait for JAX 2.0 though. :)
Give it another try, I found the docs pretty good, you need to get your head around XLA tracing, and read "the sharp bits" section and you should be pretty set!
Yep and this approach also allows languages like Julia and Elixir to compile their expressions into valid compute graphs that target JAX/XLA. That polyglot capability opens up cutting edge machine learning into quite a bit more ecosystems with another level of capabilities in distribution and fault tolerance as is the case with Elixir + Nx.
Julia has XLA.jl [0] which interoperate with their deep-learning stack and Elixir has NX [1] which is higher level (basically JAX but in Elixir). I would love to see someone do something like that in Rust...
JAX is really exciting! JAX is mentioned in the Research subsection of the "Which should I pick" section. Do you think that the fundamental under-the-hood differences of JAX compared to TensorFlow and PyTorch will affect its adoption?
Haiku is really cool - I haven't used Flax. It'll be really interested to see the development of JAX as time goes on. I also saw some benchmarks that show its neck-and-neck with PyTorch as the fastest of the three, but I think with more optimization its ceiling is higher than PyTorch's.
Wow that's a really cool resource. Thanks for linking!
Even still, do you think researchers will want to take the time to learn all of that when PyTorch gives them no real reason to switch? Every day spent learning JAX is another day spent not reviewing literature, writing papers, or developing new models.
this definitely limits its generality relative to jax, which makes it less than ideal for anything other than 'typical' deep neural networks
this is especially true when the research in question is related to things like physics or combining physical models and machine learning, which imho is very interesting. those are use cases that pytorch just isn't good at.
> Every day spent learning JAX is another day spent not reviewing literature, writing papers, or developing new models.
Every day spent learning JAX is also another day spent not trying to fit a round peg into a square hole of other libraries. I made the leap when I was doing things that were painful in pytorch. In terms of time, I think I came out ahead.
Not everything is a nail, and pytorch is better for some things, an jax is better for others. "Every day spent learning the screwdriver is a day spent not using your hammer."
To get started JAX is just knowing Python and adding `grad`, `jit` and `vmap` to the mix, it takes about 5 minutes to get going.
To me this is the real power of JAX, it can be viewed as a few functions that make it easy to take any python code you've written and work with derivatives using that. This gives it tremendous flexibility in helping you solve problems.
As an example, I mostly do statistical work with it, rather than NN focused work. It took probably a few minutes to implement a GLM with custom priors over all the parameters, and the use then Hessian for the Laplace approximation of parameter uncertainty. The proper way to solve this would have been using PyMC but this worked good enough for me, and building the model in scratch in JAX took less time than refreshing the PyMC api for me.
Having read some Jax high-performance code, I do like Jax, but it does feel a bit too abstract and low level sometimes. Maybe there aren’t good coding conventions or performance trumped them? Definitely needs improvement on error messages, as well.
For example, a long chain of pmaps, each with some sort of device partitioning logic, not JIT compiling is extremely hard to understand. I basically had to binary search code until the compile errors disappeared.
Tensorflow is just such a classic clusterfuck google project. V2 had huge breaking changes (reminiscent of angular) and tons of the apis are clunky and don’t work well together. There are like 3 different ways to save a model. It’s almost like a bunch of teams built features with no oversight.
I’m pretty sure tf is considered in maintenance mode within google as Brain and the tf creators themselves have moved to Jax. I do think Google learned a lot from tensorflow and am excited to see Jax pan out.
Pytorch is a pleasure to debug. I think pytorch jit could close the deployment gap.
The article gives credit to TF server and TFLite and so forth as being better for deployment, but leaves out the fact that those systems don't fucking work most of the time, and support is pretty much over at this point. The same goes for model support; even the models in TF's own repository are sometimes broken or don't follow the API conventions set forth in the documentation. I honestly don't know how anyone uses TF in production at this point, unless they are frozen on a specific old version and have figured out an environment that works with their specific models already.
Yeah, TensorFlow's API has definitely gotten convoluted and confusing. I think the shift from TF1 to TF2 and then later wrapping Keras in TF just caused a lot of problems.
TensorFlow seems to be spreading itself pretty thin. Maintaining so many language bindings, TensorFlow.js, TFlite, Server, etc. seem like they could all use some focus, BUT, and this is a big but, do you think if they can get each part of their ecosystem to an easily usable point that they'll have cornered the industry sector?
PyTorch is taking a much more targeted approach as seen with PyTorch Live, but I truly think that TFLite + Coral will be a game-changer for a lot of industries (and Google will make a fortune in the process). To me it seems like this is where Google's focus has lain in the AI space for the past couple of years.
> I truly think that TFLite + Coral will be a game-changer for a lot of industries
I'd like to agree. Google was very far ahead of the curve when they released Coral. I was completely stoked when they finally added hardware video encoding to the platform with the release of the Dev Board Mini.
I want them to succeed but I fear if they don't drastically improve their Developer Experience, others will catch up and eat their lunch. TensorFlow has been hard to pick up. A few years ago when I was trying to pick this up to create some edge applications, PyTorch wasn't so much easier that it seemed worth sacrificing EdgeTPU support. But now PyTorch seems much, much easier than it did then, while TensorFlow hasn't seemed to improve in ease-of-use.
Now I'm genuinely considering sacrificing TFLite / EdgeTPU in favor of, say Jetson-esque solutions just so that I can start doing something.
Note: I am an amateur/hobbyist in this context, I am not doing Edge machine learning professionally.
Yeah, I hear you loud and clear on a lot of those points. I think the most important think honestly is the fact that most PhDs use PyTorch in academia, so industry will inevitably shift to tailor to this growing supply if possible. Of course, Coral/TFLite are really useful, so a balance will be found, but it'll be interesting to see how it plays out.
Totally agree on the debugging. The fact that PyTorch is more pythonic and easier to debug makes it the better choice for a lot of applications.
Are you in research? I think TensorFlow's position in industry puts it in a kind of too-big-to-fail situation at this point. It'll be interesting to see what happens with JAX, but for now TensorFlow really is the option for industry.
Do you think TFLite + Coral devices will help breathe new life into TF?
Meanwhile PyTorch doesn't follow SemVer and always has breaking changes for every minor version increment. There's always "Backwards Incompatible Changes" section for every minor version release: https://github.com/pytorch/pytorch/releases
Even TF 1 was just an extension of Google Brain: the project that took a datacenter of CPUs in Google to distinguish cats and dogs in Youtube videos with very high accuracy. I remember when Jeff Dean was talking about it the first time, it felt like magic (though it still feels like it, it’s just more optimized magic :) ).
I think PyTorch c++ api is less mature and harder to compile into other projects. Tensorflow started with the c++ api exposed which is why the graph format is so stable and favorable to deployment in heterogeneous environments.
At one point I lost interest in both and ML/AI in general. I think eventually I got frustrated with the insane amounts of abuses for marketing purposes and never truly delivering what was promised(I know, I know, fake it till you make it). For better or worse far too few managed to make it, so most stuck with faking. I think I lost interest completely around 2019. But even back then they were starting to seem like twins - practically identical with some minor but sometimes subtle differences. Looking at random sections of the documentation, all you gotta do is ignore the semantics...
Yeah, since the release of TF2 in 2019 they're a lot more similar. PT is still more pythonic and easier to debug, but TF has a way better industry infrastructure. The documentation is comically similar though! lol
Have you checked out Google's Coral devices? DL has definitely been abused for marketing purposes, but I think the lack of delivery had more to do with the fact that DL was progressing far faster than the tools around them which make their intelligence actionable.
Part of this is because so many DL applications had to be delivered in a SaaS way, when local AI makes much more sense for a lot of applications. I think the TF -> TFLite -> Coral Device pipeline has the potential to revolutionize a LOT of industries.
Now that you mention it Coral products are the only thing still of some interest to me. Though with so many alternatives around I'm not completely sure they are justified. Say the SBC - even though it is relatively smaller, it's still more expensive than the Jetson which is in a different category in terms of specs. In all fairness I'm interested to get some opinions on the jetson since I have a project where it might come in handy(automotive related). I'm still on the fence as to what I want to use and if I should consider a USFF desktop computer and have an x86 at my disposal and avoid risking having to dig deep into some library when something goes wrong. The one thing that I'm keeping my eyes on are the M.2 Coral devices, though I personally have no use for them.
Unless I missed something, I recently tried to buy a Coral. It was not available anywhere except a few places at 10X RRP. You can buy a Jetson easily. I just got one a few weeks ago.
I have done the tutorials and they all work. They seem to be very well maintained.
People I know said they never got the google Coral SDK working. Unfortunately they wouldn't give me their Corals. :(
Same here-spent so much time having to spin up all of the infrastructure and data I needed everywhere I’ve worked that I basically do more architecture and engineering that going back to TF or PyTorch or figuring out the new framework/model arch de-jour just lost all appeal.
Both frameworks have matured a lot and I think we're coming up on some really awesome applications in the coming years. The tools around them which make it easier to apply DL everywhere are really the bottleneck at this point!
Personally I wish we'd get beyond these low level abstractions by now. Machine learning is so wonderfully mathematical that making abstractions from the details should be incredibly easy and powerful. I can't believe like 8 years after the big ML wave e.g. frontend javascript developers aren't enjoying the fruits of these labors (despite there being no good reason for them not to be able to).
There are high level abstractions like keras and tensorflow.js (or even higher level GUI tools). All of them are fairly accessible to people with some basic programming knowledge.
I don't get your point about JS developers not enjoying the fruits of these labors - they don't need to enjoy them because they work in a different domain. And if they're interested in playing around with deep learning, the higher level APIs are easy to pick up. I'm not sure what you're expecting to see.
I do feel like Google could do better communicating all of their different tools though. Their ecosystem is large and pretty confusing - they've got so many projects going on at once that it always seems like everyone gets fed up with them before they take a second pass and make them more friendly to newcomers.
Facebook seems to have taken a much more focused approach as you can see with PyTorch Live
What do you envision that would help JavaScript devs take advantage of ML? There is tensorflow.js. Are you thinking completely different ‘building blocks’ that provide higher level apis like fast.ai geared towards frontend devs or something else?
You can work within a niche domain or an applied industry, for which ML/AI is just another tool in the bag (admittedly: sometimes revolutionary, many other times irrelevant); or you may want to do bleeding-edge research, only to find that you just cannot compete against the top dogs (even the wonderful fast.ai couldn’t follow suit without refactoring heavily every six or twelve months). What’s the point, then? Set yourself a clearly interesting and achievable target (learn and find a job, get a paper published, release an applied library, etc.) and a challenging deadline with milestones (say, 3-6-9-12 months). After that, wrap up and move forward or move on.
For any kind of research or experimental work, I cannot imagine using anything other than PyTorch, with the caveat that I do think JAX is extremely impressive and I've been meaning to learn more about it for a while.
Even though I've been working with Tensorflow for a few years now and I feel like I do understand the API pretty well, to some extent that just means I'm _really_ good at navigating the documentation, because there's no way to intuit the way things work. And I still run into bizarre performance issues when profiling graphs pretty much all the time. Some ops are just inefficient - oh but it was fixed in 2.x.yy! Oh but then it broke again in 2.x.yy+1! Sigh.
However - and I know this is a bit of a tired trope, but any kind of industrial deployment is just vastly, vastly easier with Tensorflow. I'm currently working with ultra-low-latency model development targeting a Tensorflow-Lite inference engine (C-API, wrapped via Rust) and it's just incredibly easy. With some elbow grease and willingness to dive into low level TF-Lite optimisations, one can see end to end model inference times in the order of 10-100us for simple models (say, a fully connected dnn with a few million parameters), and between 100us-1ms for fairly complex models utilising contemporary architectures in computer vision or NLP. Memory overhead and control over inference computation semantics are easy.
As a nice cherry on top, we can take the same Tensorflow SavedModels that get compiled to TF-Lite files and instead compile them to tensorflow-js for easy web deployment, which is a great portability upside.
However, I know there's some incredible progress being made on what one might call 'environmental agnostic computational graph ILs' (on second thought, let's not keep that name) which should open up more options for inference engines and graph optimisations (operator fusion, rollups, hardware dependant stuff, etc).
Overall I feel like things have been continuously getting better for the last 5 years or so. I'm pleased to see so many more options.
Agreed - JAX is really cool. It will be interesting to see how TF & JAX develop considering they're both made by Google. I also think JAX has the potential to be the fastest, although right now it's neck-and-neck with PyTorch.
Yes - a lot of TF users don't realize that knowing the "tricks of the trade" for wrangling TF don't apply in PT because it just works more easily.
I agree that industry-centric applications should probably use TF. TFX is just invaluable. Have you checked out Google's Coral devices? TFLite + Coral = revolution for a lot of industries.
Thanks for all your comments - I'm also really excited to see what the coming years bring. While we might debate if PT or TF is better, they're both undoubtedly improving very rapidly! So excited to see how ML/DL applications start permeating other industries
>10-100us for simple models (say, a fully connected dnn with a few million parameters)
I basically don't believe you. I'm a researcher in this area (DNNs on FPGAs) and you cannot get these latencies on real models without going to FPGA (and you're not synthesizing Verilog from TF, unless you're one of my competitors...). Just your kernel launch overheads for GPU are on the order of 10ms. For example, here's a talk given at GTC a couple of years ago where they do get down to 35us (on tensorcores) using persistent kernels, but on a mickey mouse network
CPU (where you don't have to deal with async CUDA calls) won't save you either; again here's a paper from USENIX (so you know it's legit) that shows that lowest times for real networks on CPU are ~2ms (and that's on resnet18, far shy of "millions" of weights)
This article says that Google and DeepMind research use TF - but they don't. DeepMind use JAX almost exclusively, and many brain researchers use JAX too.
ML eng is my area of expertise, and I would advise strongly against tensorflow.
The best thing that worked for me is the Apple's tensorflow pluggable device for metal. It could utilize both AMD and M1 GPU to 100% capacity. It's a shame apple could do it as a small project in the side to use metal but AMD couldn't do the same with Vulkan.
I believe all big frameworks have some work being done to make them compatible with AMD GPUs. Here is the relevant issue for JAX (support seem to be in alpha but viable): https://github.com/google/jax/issues/2012
"PyTorch and TensorFlow are far and away the two most popular Deep Learning frameworks today. The debate over whether PyTorch or TensorFlow is superior is a longstanding point of contentious debate, with each camp having its share of fervent supporters.
Both PyTorch and TensorFlow..."
Can an article really be any good if it starts off with such obvious SEO spam?
> Can an article really be any good if it starts off with such obvious SEO spam?
That's an interesting take. I fail to see how such mentions, in an article that compares two things and, thus, mentions both things together, are in any way SEO spam?
For example, in an article comparing apples and oranges I would expect to see a rather high number of mentions of "apples and oranges". After all, that is the topic.
It's SEO for the OP site. The parent comment is saying the words TF and PyTorch are repeated exhaustively, I think sometimes senselessly, throughout the entire article. It doesn't mean the content is not of value.
That may be the topic but that first sentence reads like they are aiming to have a high number of mentions not a quality of content. For instance you can replace the second and third mentions of “PyTorch and Tensorflow” with something like “the two” or “they both.”
Though I agree with you it's annoying, you gotta hate the game not the player. It's Google's fault for valuing things like this in its search results. All the author is doing is trying to get seen. I'd say that if they have to irk a few people like us to get themselves higher on search results, they'd probably judge it as a good trade-off.
As a practitioner, I feel that oftentimes you are extending, fine tuning somebody else’s code or pre-trained models (DeepMind’s for example). This means that you should be able to work on whatever the platform this code came with. Basically, you should be able to work with JAX, TF or PyTorch with equal ease.
Great article. While I only had time to skim the article, I'll still offer my uninformed opinions. :) None of the hw I own is particularly great. I don't even own a decent GPU. But I don't need to because you can train your models for FREE on Google Colab's TPUs and GPUs. PyTorch's TPU support is still not that great while TensorFlow's is maturing. It's obviously a priority for Google to make TensorFlow work well on their own hardware.
So for me the choice is TF 2 because I can train models 5-10x faster using Google's TPUs than if I had used PyTorch. I know the PyTorch developers are working on TPU support but last I checked (this spring) it wasn't there yet and I wasn't able to make it work well on Google Colab.
As a layman in ML, I thought PyTorch is more(or only) geared towards researchers while Tensorflow has its problems it is the _only_ one that provides a commercial solutions that you can deploy, is this still true?
JAX is totally new to me, is this Google's new Tensorflow in the future?
Our sensitive data detection library is exported to iOS, android, and Java; in addition to Python. We also run distributed and federated use cases with custom layers. All of which are improved in tensorflow.
That said, I’d use pytorch if I could. Simply put, it has a better user experience.
Yeah, it's pretty funny how reluctantly some people use TF because they have too, lol.
The fact that PyTorch is pythonic and easier to debug makes it better for a ton of users, but TensorFlow keeps the entire DL process in mind more, not just modeling.
I'm on that boat. Tensorflow.js is the only decent ML library for JS. Google support for TF.js has been dwindling, but new versions are still coming out AFAIK and we've just got Apple Silicon (M1) support.
No mention of Google MediaPipe (https://mediapipe.dev/), which is a mobile/edge framework for deploying TFLite models. MediaPipe has the advantage of letting you stitch together multiple models, transformations, and heuristics into a higher level computational graph. I'm not aware of any equivalent for PyTorch, although PyTorch Live seems like baby steps in that direction.
I am using mediapipe extensively in my day-to-day job. I've been impressed with it so far, and the ability to patch together multiple models in to, well... a media pipe, has been impressively useful. But I am also waiting for the penny to drop and google to announce they've abandoned it, or that their API has completely changed.
Do you work at my company? Because that's our biggest fear too.
I asked a friend of mine at Google to sleuth around internally and get a sense for the health of the project. He said that it's used on some internal projects and seems to have a pretty healthy internal website. So hopefully it won't be cancelled soon.
Maybe. But I only get paid in whatever snacks I can scrounge from the employee fridge and pickings have been slim of late since so few people come into the office these days. I am down to the "expired mystery mozzarella cheese sticks" and the leftover ketchup packages from Woodranch BBQ & Grill that are in the drawer with all the unused chopsticks and plastic forks.
I too spoke to a friend at Google that is part of the team, and whilst he said there were no plans to cancel it, or make radical changes, when I asked about unplanned plans, he kinda just shrugged and said "You know Google..."
I have a dual solution approach, Mediapipe for "in use now" and OpenPose for validation, slower processing and the "Google just **ed us" moment we're both anticipating. I need to build my own pose analysis system, but right now I don't have the bandwidth.
On the last day of Christmas the CEO sent to me:
Thirty-two Manfrotto Tripod extenders
Sixteen Manfrotto tripods
Sixteen high speed cables
Sixteen Manfrotto C-clamps
Sixteen Manfrotto 3/8 to 1/4-20 reducers
Sixteen Quick release mounts
Sixteen 4K cameras
Fooouuuurrrrr high-speeeeeed PCIe capture cards
Three days to hit deadline
Two triggered circuit breakers
One really huge headache
And a new VR H.M.D.
Post doesn't talk about the actual libraries, just the ecosystems surrounding them.
TF has more layer types, parallelizes better, is easier to assemble w keras, and you don't have to recreate the optimizer when loading from disk. pytorch doesn't have metrics out of the box. TF all the way.
challenging to get multi-dimensional layer sizes right. torch is not compatible numpy as input, which is painful for pre and post processing. no metrics means no History object. no activation attribute for Linear layers and Softmax layer behaves weird.
Do you think it's a bit easier to build custom layers in PyTorch though?
Also, I think Lightning handles the issue of loading optimizers, but I'm not sure about that.
It's nice to see TF get some love, but I still think PyTorch has easier debugging and is more pythonic which lowers the barrier to entry for a lot of people.
Curious since I’ve been looking at this recently. What out of the box metrics would matter most to you? There are lots of libraries in the ecosystem for metrics but I’ve seen the request for built in metrics a few times now so it must be a clear need.
Does TF have any advantages in terms of ease of acceleration (training and inference) with multiple GPUs?
Our pipeline is all PyTorch Lightning — this made development easy but we have been having numerous issues trying to leverage multiple GPUs (this is for sequence models), keep getting strange errors.
In TF you don't have to manually calculate the number of parameters of a fully connected layer. That's kind of nice. I don't really understand why PyTorch requires me to do that. Surely this can be infered from the previous layer.
The stride>1 case has been a bit more controversial within TensorFlow, and there is ongoing discussion on the correct way to implement it within PyTorch on the issue: https://github.com/pytorch/pytorch/issues/3867
Yeah true, that's also a bit annoying in PyTorch compared to TF. I remember reading on the PyTorch forums that this was a bit hard to implement for some reason, but I can't recall any details if there were any.
At this point, there's so much conceptual and syntactical overlap between the two frameworks that there isn't a "winner." Like most DS/ML tools, use which one fits your use case.
I totally agree! I think their technical differences are superseded by other practical considerations at this point. It's interesting to see how the landscape for which is better given a use case is changing though - PyTorch has made a lot of effort to grab more of the industry sector with TorchServe and PyTorch Live.
Do you think PyTorch can catch up here? I think Google's Coral devices give them a lock on embedded devices in the coming years
I tried and failed to use Google Coral about 8months ago. The dev experience was terrible. Our company just went with deploying using openvino on cpu, which was fast enough.
I’m not sure coral has enough of an edge to make it worthwhile relative to simpler edge deployment options like cpu
Keras is great for easy problems. But as soon as you want to color outside the lines its opinionated simple model just gets in the way. Every time I've tried to use keras for real work, it was a total pain in the butt. Do your homework in keras. Use something else for real work.
I used fastai quite a bit with PyTorch and ended up feeling the same way. Great for spinning up an effective model in a few lines, but really tough to make it do exactly what you want it to do.
As the originally-posted-link suggests, Keras is a great way to start in ML (this is how I started, with the book Deep Learning with Python by François Chollet, the creator of Keras). And after one is comfortable with the concepts and the general flow of things, and if one needs to do more advanced stuff (as you say), then move on to something like PyTorch (I love that they mention PyTorch Lightning in the article).
I will preface with the statement that my knowledge may be slightly out of date as I don't keep up on every nuanced change.
I use PyTorch and TensorFlow, and the article is spot-on in regard to "mystery operations that take a long time" with no real rhyme or reason behind them with regard to TensorFlow. That said, on the whole, I skew more towards TensorFlow because it is generally easier to reason about the graph and how it connects. I also find the models that are available to usually be more refined, robust and useful straight out of the box.
With PyTorch I am usually fighting a slew of version incompatibilities in the API between even more point releases. The models often feel more slap-dash thrown together, research like projects or toy projects, and whilst the article points out the number of papers that use PyTorch far exceeds those that use TensorFlow, and the number of models for PyTorch dwarfs that of TensorFlow, there isn't a lot of quality in the quantity. "90% of everything is crap." Theodore Sturgeon. And that goes double for PyTorch models. A lot of the models, and even some datasets, just feel like throwaway projects that someone put up online.
If you are on macOS or Linux and using Python, PyTorch works fine, but don't step outside of that boundary. PyTorch and TensorFlow work with other operating systems, and other languages besides Python, but working with anything but Python when using PyTorch is a painful process fraught with pain. And yes, I expect someone to drop in and say "but what about this C++ framework?" or "I use language X with PyTorch every day and it works fine for me!" But again, the point stands, anything but Python with PyTorch is painful. The support of other languages for TensorFlow is far richer and far better.
And I will preface this with, "my knowledge may be out of date" but I've also noticed the type of models and project code available for TensorFlow and PyTorch diverge wildly once you get outside of the toy projects. If you are doing computer vision, especially with video and people, and you are not working on the most simplest of pose analysis, TensorFlow offers a lot more options of stuff straight out of the box. PyTorch has some good projects and models, but they are mostly of the Mickey Mouse hobby stuff, or an abstract research project that isn't very robust or immediately deployable.
I use TensorFlow in my day-to-day job. All that said, I like PyTorch for its quirkiness, its rapid prototyping, its popularity, and the fact that so many people are trying out a lot of different things, even if they don't work particularly well. I use PyTorch in almost all of my personal research projects.
I expect in the future for PyTorch to get more stable and more deployable and have better tools, if it can move slightly away from the "research tool" phase it is currently in. I expect Google to do the usual Google-Fuck-Up and completely change TF for the worse, break compatibility (TF1 to TF2) or just abandon the project entirely and move on to the next new shiny.