PyTorch 2.0

fpgaminer · on March 15, 2023

The thing I'm looking forward to most is having Flash Attention built-in. Right now you have to use xformers or similar, but that dependency has been a nightmare to use, from breaking, to requiring specific concoctions of installing dependencies or else conda will barf, to being impossible to pin because I have to use -dev releases which they constantly drop from the repositories.

PyTorch 2.0 comes with a few different efficient transformer implementations built-in. And unlike 1.13, they work during training and don't require specific configurations. Seemed to work just fine during my pre-release testing. Also, having it built into PyTorch might mean more pressure to keep it optimized. As-is xformers targets A100 primarily, with other archs as an afterthought.

And, as promised, `torch.compile` worked out of the box, providing IIRC a nice ~20% speed up on a ViT without any other tuning.

I did have to do some dependency fiddling on the pre-release version. Been looking forward to the "stable" release before using it more extensively.

Anyone else seeing nice boosts from `torch.compile`?

saiojd · on March 16, 2023

I really wish compiling cuda extensions worked better out of the box. Is there a reason they can't bundle nvcc alongside pytorch outside of complexity/expense?

brucethemoose2 · on March 16, 2023

Legal reasons.

Filesize.

Platform compatibility.

saiojd · on March 18, 2023

Interesting, I had not considered these points outside of file size! Do you think it is possible they will be overcome or is the chance 0?

xformers · on March 16, 2023

I work on xFormers and we definitely appreciate the candid feedback:

- We partnered with our PyTorch colleagues and some of the PyTorch 2.0 kernels for efficient attention actually originated from xFormers, so glad to read that having this now built-in into PyTorch is something users are really eager to use.

- While xFormers was originally targeting a pure researcher audience, we were aware of the installation problems: we started end of last year gradually making it easier to setup and use the library (both internally and externally). We have recently introduced non-dev conda packages, pip wheels and are also trying to release more often,

- We very much welcome hearing about any issue with the library and would certainly love discussing more the specifics of your experience (or others' who read this) if you have time (maybe via our GitHub to start with). Thanks again for the feedback here!

bobbygunderson · on March 16, 2023

What size of ViT? I’ve tried it with both a unet and an LM and didn’t see any benefit with the default args (and got a CUDA error after 30 mins of processing trying to compile an AR generation routine with all optimization turned on).

fpgaminer · on March 16, 2023

mardifoufs · on March 15, 2023

>Python 3.11 support on Anaconda Platform

>Due to lack of Python 3.11 support for packages that PyTorch depends on, including NumPy, SciPy, SymPy, Pillow and others on the Anaconda platform. We will not be releasing Conda binaries compiled with Python 3.11 for PyTorch Release 2.0. The Pip packages with Python 3.11 support will be released, hence if you intend to use PyTorch 2.0 with Python 3.11 please use our Pip packages.

It really sucks that anaconda always lags behind. I know the reasoning*, and I know it makes sense for what a lot of teams use it for... but on our side we are now looking more and more into dropping it since we are more of an R&D team. We already use containers for most of our pipelines, so just using pip might be viable.

*Though I guess Anaconda chewed more than it can handle w.r.t managing an entire Python universe, and keeping up to date. Conda-forge is already almost a requirement but using the official package (with pip, in this case) has its own benefits for very complex packages like pytorch.

brucethemoose2 · on March 15, 2023

The Arch Linux PyTorch 2.0 packages are great if you are looking for "cutting edge," as they are compiled against CUDA 12.1 now, instead of 11.8 like the official nightly releases. You can also get AVX2 patched Python and optimized C Python packages through CachyOS or ALHP.

But even Arch is still stuck on Python 3.10

varispeed · on March 15, 2023

Why people use conda? I couldn't figure out what it brings to the table.

I use pyenv on Linux and Mac and it basically works and with virtualenv plugin I can quickly switch environments to different Python versions.

aldanor · on March 15, 2023

- Managing non-python / binary dependencies alongside python ones

- Managing a mixture of python / binary dependencies in local registries

- conda-forge managing builds of some really flaky binary python packages that are sometimes a nightmare to build locally

nl · on March 16, 2023

These are all legitimate reasons, but my personal experience (and perhaps preference?) is to use Docker for anything that is more complex than pip can handle.

> conda-forge managing builds of some really flaky binary python packages that are sometimes a nightmare to build locally

Yeah this is fair. Fortunately it's becoming rarer.

aldanor · on March 16, 2023

Well. To each their own, use cases differ so wildly it's hard to compare them.

The key audience for conda is ML/DS space, where most if not all packages come from either C/C++/Rust/Fortran and have to be compiled, while also requiring a consistent set of external C libraries like libblas, etc. As I said, some of those packages are a completely nightmare to build locally. Conda simplifies this by a lot in that you can just 'conda create -n myenv some=1.0 crazy=2.0 deps=2.0' and in a few seconds (if you use mamba and not conda) you have a working Python environment so off you go; no dockers, no local builds etc.

techwizrd · on March 16, 2023

Honestly, I've found that conda has made operationalizing code very difficult. We've found it much easier to simply switch back to using pip, poetry, docker, and the standard OS package management tools rather than conda. Conda's dependency resolution is also quite slow and causes our builds & CI to timeout unless we drop in mamba.

nl · on March 16, 2023

Yes, precisely this. (I only do ML/DS work)

thrdbndndn · on March 16, 2023

Can these hard-to-build-locally lib just upload a wheel to PYPI?

hobs · on March 16, 2023

lol, the dependency hell of even using mamba can be large on old things, I have waited more than an hour for dependency resolution using both.

PaulHoule · on March 16, 2023

Seems docker is going from the frying pan to the fire. Have they added ‘resume download’ yet to docker? Over my slow DSL I can’t stand how docker makes me download 7G of images when I want to install something very simple, frequently it fails to do the download and I have to do it several times so it adds up to more like 28G of downloading and all that waiting.

I worked at one place where management was shocked when I told them the image build process would take 20 minutes on gigabit fiber up in Canada and we agreed to time it and I measured 18 minutes. Docker slows down “dev” to the speed of “ops.”

I don’t know how they did it but the data scientists could always find f-ed up Python images, you never got the same default character encoding twice, one time the default character set was Hungarian and I wonder how that happens…

drdaeman · on March 16, 2023

Doesn't pip (with wheels) solve this all just fine? Or I'm missing something?

silveraxe93 · on March 16, 2023

You're missing something. - Managing non-python / binary dependencies alongside python ones

pip with wheels doesn't deal with non-python packages. I used to be in a horrible locked down corpo laptop. Conda was invaluable in getting stuff to run, like chromedriver, etc.

drdaeman · on March 16, 2023

Uh, I guess I do? I mean, there is https://pypi.org/project/chromedriver-binary/ (and a lot of similar packages)...

I honestly don't remember which one I've used for chromedriver when I needed it for my project, but I've surely installed all the stuff with "just" pip/poetry. Larger projects are typically packaged like this, with setup.py performing the downloads, while wheels solve the problem with Python libraries with native dependencies (e.g. how psycopg-binary works).

Maybe Conda makes it slightly more convenient, but I've always treated pip as the standard Python package management tool (it's a part of the standard library now, after all) and Conda was always "that weird non-standard thing some folks use for some odd reason" for me.

est · on March 16, 2023

> Why people use conda? I couldn't figure out what it brings to the table.

Have you tried compile mkl, numpy from scratch?

kzrdude · on March 16, 2023

Pip is a good alternative for numpy, it doesn't need conda. I think mkl is the same.

raverbashing · on March 16, 2023

pip install numpy works

(though using poetry makes things less hectic in case of upgrades)

DethNinja · on March 15, 2023

I find it far better for DevOps. It is really easy to get the team up and running via conda environment files.

DreamFlasher · on March 15, 2023

Afaik NumPy, SciPy, SymPy and Pillow are not managed/owned by Anaconda? At least here: https://numpy.org/about/ Anaconda isn't mentioned.

hmaarrfk · on March 16, 2023

All the projects you listed are community projects that have grown over the years to become the backbone of scientific computing.

conda-forge, has evolved as one of the major conda community projects that helps release the latest releases from the projects you listed.

You can help improve the state of pytorch packaging on conda-forge too!

We've even released Pytorch 1.13 + Python 3.11 on linux and OSX! Give it a shot and let us know what you think!

edit: Link to the conda-forge pytorch development repository https://github.com/conda-forge/pytorch-cpu-feedstock

DreamFlasher · on March 15, 2023

Ah, yeah they do have a Python 3.11 release, just not on anaconda. Okay, yeah, for a couple of years now there isn't a good reason anymore to use anaconda anyways.

PaulHoule · on March 16, 2023

For one of my projects, conda ‘just works’ to get working with the GPU but following the instructions for pip doesn’t work. On the other hand there’s another package I am interested in using where I need to build out of GitHub and it’s a very different story.

I see myself as interested in commercial exploitation of transformers right now and I am delighted with the results. The first time I tried clustering all the Ukraine articles lumped together, all the sports were lumped, it runs 5x faster than my LDA-based clustering system and I think does a better job. With results like this I am happy to trade ‘cutting edge’ for convenience.

I have thought about a ‘path less followed’ in Python which is a truly sound package manager like maven for Python (as opposed to Poetry which I’m not sure is sound but it sure is slow) and I can say I like the way conda works I just would rather do it with wheels. One beef I have with conda is that the bzip2 files are slow to decompress and even over a DSL line I would trade a little more downloading for faster installs.

mardifoufs · on March 15, 2023

Yes that's the issue! Most of the software is already ready, usable and just works... unless you use anaconda. Now that I think about it, is there some technical reason for that? I always thought it was mostly about stability, but I can't imagine python 3.11 being so unstable as to warrant waiting a whole year before even porting.

Vox_Leone · on March 16, 2023

>>for a couple of years now there isn't a good reason anymore to use anaconda anyways.

I think of Anaconda envs as great back ends for Jupyter notebooks. There is still a place for it.

dheera · on March 16, 2023

> It really sucks that anaconda always lags behind.

I usually just go for virtualenv (if python library versions are the only issue) or go for docker (if it's more than that). Both let you just use the latest and greatest without any friction. conda sits in a weird middle ground that I hate.

vimy · on March 16, 2023

I use anaconda environments and install everything with pip in them. Am I doing it wrong?

blandcoffee · on March 16, 2023

What's the point of this vs. creating your own venv (python -m venv .venv) and having a much smaller library of intentional tools?

alexchantavy · on March 16, 2023

Anaconda automatically handles things like making sure the correct version of cuDNN for your graphics card is installed. When I tried doing this myself with venv it was really painful.

tagh · on March 16, 2023

Each environment can have its own version of python which downloads with everything else (venv would require separate system installs for this).

blandcoffee · on March 16, 2023

Can you clarify what you mean by separate system installs?

Today, I manage python via pyenv (local and global state) that let's me create infinite .venvs with different python versions.

nerdponx · on March 16, 2023

Conda does the same job as Pyenv here, with the additional ability to install system-level libraries, compiler toolchains, etc.

inconceivable · on March 16, 2023

i use venv this way. i download and compile specific python versions and install them in a non-system dir with all the other versions. then just run the specific binary to create a venv and it seems to work as expected.

vishal0123 · on March 16, 2023

Exactly my setup. I tried to use `conda install` few times, but every time after just few globally installed packages, conda SAT solver always struggles, and I now live with assumption that if incompatible package combination does not throw any error in dev environment, it is likely fine.

alephxyz · on March 17, 2023

FYI the libmamba solver released last year is way faster than the classic one at modifying environments.

nerdponx · on March 16, 2023

There's nothing wrong with this. IMO, Conda is a general-purpose "system environment" and package manager that happens to be written in Python. The fact that its package ecosystem is oriented towards machine learning with Python is almost an historical coincidence.

mardifoufs · on March 16, 2023

That's basically where we are at for tons of our pipelines, but it kind of defeats the purpose since a dockerfile with a proper base image is basically equivalent at that point.

disgruntledphd2 · on March 16, 2023

Conda is also really annoying to get working in a Dockerfile, because (if I remember correctly) of differences between login and non-login shells.

nerdponx · on March 16, 2023

In general you shouldn't need to "activate" a Conda environment in the shell. Things generally "just work" if you use absolute paths. Something like this:

  COPY conda.yaml /env/conda.yaml
  RUN conda env create -f /env/conda.yaml -p /env/conda

  COPY foo.py /env/foo.py
  CMD ["/env/conda/bin/python", "/env/foo.py"]

What is a little funny is installing a consistent version of Conda inside a container, because the official Miniconda installers are rolling-release only. However you might be able to downgrade to your desired version of Conda after installation.

disgruntledphd2 · on March 16, 2023

> *Though I guess Anaconda chewed more than it can handle w.r.t managing an entire Python universe, and keeping up to date. Conda-forge is already almost a requirement but using the official package (with pip, in this case) has its own benefits for very complex packages like pytorch.

Yeah, I absolutely adore conda, but they really need support.

urthor · on March 16, 2023

Anaconda is really a solution for a personal dev box yes.

For containers a version controlled requirements.txt or Dockerfile is the way.

Realistically, Anaconda is a venerable solution from the truly god awful old days. The ecosystem is vastly improved.

brucethemoose2 · on March 15, 2023

I'm hoping torch.compile is a gateway to "easy" non-Nvidia accelerator support in PyTorch.

Also, I have been using torch.compile for the Stable Diffusion unet/vae since February, to good effect. I'm guessing similar optimizations will pop up for LLaMA.

voz_ · on March 15, 2023

Is there somewhere I can see your Stable Diffusion + torch.compile code? I am interesting in how you integrated.

brucethemoose2 · on March 15, 2023

In `diffusers` implementations (like InvokeAI) its pretty easy: https://github.com/huggingface/diffusers/blob/42beaf1d23b5cc...

But I also compile the VAE and some other modules, I will reply again later when I can look at my local code. Some modules (like face restoration or the scheduler) still dont like torch.compile.

For the Automatic1111 repo (and presumably other original Stability AI implementations), I just add `m.model = torch.compile(m.model)` here: https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob...

I tried changing the options in the config dict one by one, but TBH nothing seems to make a significant difference behind the default settings in benchmarks.

I haven't messed with compiling LORA training yet, as I dont train much and it is sufficiently fast, but I'm sure it could be done.

brucethemoose2 · on March 15, 2023

Here is the InvokeAI code, minus the codeformer/gfpgan changes that dont work yet:

https://gist.github.com/brucethemoose/ea64f498b0aa51adcc88f5...

I intend to start some issues for this on the repo soon(TM).

datadeft · on March 15, 2023

Could you give a bit more details about this? Do you have a link?

brucethemoose2 · on March 15, 2023

See the above reply ^

singularity2001 · on March 16, 2023

100% backward compatible

That's (for me) the biggest reason why tensor flow fell out of flavor: the API broke too often (not just between tf 1 and 2)

simonw · on March 15, 2023

"the MPS backend" - that's the thing that lets Torch run accelerated on M1/M2 Macs!

sebzim4500 · on March 15, 2023

Based on George Hotz's testing it is very broken. It's possible it has improved since then, I guess but he streamed this a few weeks ago.

danieldk · on March 15, 2023

We tested inference for all spaCy transformer models and they work:

https://explosion.ai/blog/metal-performance-shaders

It depends very much on the ops that your model is using.

dagmx · on March 15, 2023

It supports a subset of the operators (as mentioned in the release notes). I don’t think it’s broken for the ones that it does support though.

mochomocha · on March 15, 2023

That's been my experience. However when fallback to CPU happens, it sometimes end up making a specific graph execution slower. But that's explicitly mentioned by the warning and pretty much expected.

norgie · on March 15, 2023

Yes, this is my experience. Many off the shelf models still don't work, but several of my own models work great as long as they don't use unsupported operators.

glial · on March 15, 2023

Where can I find a list of the supported operators?

PaulMest · on March 16, 2023

This is the master tracking list for MPS operator support: https://github.com/pytorch/pytorch/issues/77764

brucethemoose2 · on March 15, 2023

You'd think it would fall back to GPU/CPU for unsupported operations instead of failing, but I guess thats easier said than done.

rfw300 · on March 16, 2023

In fact there is an environment variable that enables exactly that. There are obviously performance issues associated with that, but it does work.

laichzeit0 · on March 16, 2023

I could fine-tune a Detectron2 model a few months ago using PyTorch and MPS backend [1]. I'd be interested if it's working yet.

https://github.com/facebookresearch/detectron2/issues/4342

bigbillheck · on March 15, 2023

[flagged]

jeron · on March 15, 2023

so, you would bet he was holding it wrong?

remexre · on March 16, 2023

More likely, no bet would be placed.

PedroBatista · on March 16, 2023

MPS is like an artificial flavor, you get the general idea but not the nutrition.

Nvidia execs should light a candle and pray to all the Gods most "AI" stuff really just works with CUDA since it was coded with CUDA in mind.

That's why I reluctantly shell out quite a few bucks on a ridiculously overpriced Nvidia card.

datadeft · on March 15, 2023

Yes, I am not sure at what extent is MPS a viable alternative to CUDA. You seem to write a lot about ML models. Do you have a detailed write about this subject?

Eugeleo · on March 16, 2023

Great! Any up-to-date guide to get the latest PyTorch running on Macs? Do we still have to use conda for example?

lucasap · on March 16, 2023

If anyone can edit it, I found a typo:

> Python 1.8 (deprecating Python 1.7)

> Deprecation of Cuda 11.6 and Python 1.7 support for PyTorch 2.0

It is clearly supposed to be python 3.8 and 3.7 respectively.

tormeh · on March 16, 2023

Hopefully the AMD support doesn't just come in the form of ROCm...

yumraj · on March 15, 2023

No CUDA 12 support unfortunately..

brucethemoose2 · on March 16, 2023

Arch Linux builds it for CUDA 12.1

VadimPR · on March 16, 2023

Agree, it's a bit dated already due to this.

marviel · on March 16, 2023

> As an underpinning technology of torch.compile, TorchInductor with Nvidia and AMD GPUs will rely on OpenAI Triton deep learning compiler to generate performant code and hide low level hardware details. OpenAI Triton-generated kernels achieve performance that’s on par with hand-written kernels and specialized cuda libraries such as cublas.

mdaniel · on March 15, 2023

discussion from (presumably) the PyTorch Conference announcement: https://news.ycombinator.com/item?id=33832511