Shareable Jupyter Notebooks That Run on Free Cloud GPUs

hsikka · on Oct 13, 2019

Hey HN, really thrilled to see this launch. I was a Research Fellow at the Paperspace Advanced Technologies Group this summer and saw this project develop. Gradient Notebooks were an indispensable tool in the work I was doing, and had several advantages for my work over other notebook services. As a researcher, it’s great to see an emphasis being placed on starting projects with no fuss or issues around setting up infrastructure and sharing/forking models.

Giving anyone access to free GPUs and powerful tooling seems like an incredible opportunity. I'd love to hear what you all think!

summarity · on Oct 13, 2019

Suspiciously absent is a comparison with Google Colab, which not only includes free GPU support (T4) and TPU support, but supports pretty much everything I've seen here. It allows access to the underlying VM system and has native integration with Google Drive to store and retrieve huge files (like checkpoints) and GitHub to version notebooks.

Other similar services include Kaggle (way more locked down) and Baidu's free GPU powered notebook platform (only open to Chinese citizens)

solidasparagus · on Oct 13, 2019

Sharing free GPUs is amazing! GPU costs are such a huge blocker for people learning deep learning.

But as a cloud provider, I would be worried about abuse by cryptojackers. I hope that's not a problem and this is sustainable.

fierarul · on Oct 14, 2019

I don't understand this obsession with cloud-based deep learning for beginners. It creates this hyper-focus on cost: always remember to shut down your instances! With the occasional slippage which causes psychological (and / or financial) pain... The mood is all wrong for somebody entering a new (work-)field.

Google Colab is decent and free.

But you can do computing on cheaper hardware too. The CPU is good enough for learning and a GPU is not outside the budget of many people that presumably already own laptops and such.

I know a local group that shares an i9 / dual GTX "server" and are learning on this shared hardware. I think it's great!

I had a small budget for this learning curiosity and bought a Ryzen and a GTX with only 4GB of RAM. Got a job offer after a while which I had to turn down as it seemed to actually kill my interest in the field. Doing some small personal project now without much fuss to rekindle the fire. And using the CPU for it since it's so small.

applecrazy · on Oct 13, 2019

Google has run a similar service for a while (Colab) and they have methods to detect crypto mining within their instances. I'm sure Paperspace does too.

ArtWomb · on Oct 13, 2019

It's a home run in my book. Perfect for "Intro to ML" courses and allows students instant access to real env. Personally can use it for art & design experiments like GAN Breeder ;)

Am wondering what to use for the data tier? Is there a dedicated backing cloud store supporting BigQuery style syntax? Import form S3 datasets?

DTE · on Oct 13, 2019

Hey there ! I'm one of the co-founders of Paperspace. We currently have a couple of options for data ingest (and more coming soon!). The current system provides a single persistent mount that you can access from any notebook/experiment/etc in the /storage directory. We also mount a special directory called /artifacts where you can pipe out any models, files, etc and they will be pushed to an S3-compatible object store.

Checkout the docs here -> https://docs.paperspace.com/gradient/data/storage

And happy to answer any other questions

MarkMMullin · on Oct 13, 2019

OK, did a quick scan on free and paid tiers and wondering about ability to bring in specialized python libraries- in my case, https://github.com/opencog/link-grammar which almost always requires hand building and migrating, and some custom packages on top of my own even more horrible C++ . - I have to run a jupyterhub ami on an ec2 instance right now

mark_l_watson · on Oct 13, 2019

You linked to the OpenCog linkparser. Just out of curiosity, what are you using it for?

I used to also keep a beefy EC2 or GCP instance all set up with Jupyter, custom kernels (Common Lisp, etc.), etc., etc. Being able to start and stop them quickly made it really cost effective.

I ended up, though, getting a System76 GPU laptop a year ago and it is so nice to be able to experiment more quickly, especially noticable improvement for quick experiments.

That said, a custom EC2 or GCP instance all set up, and easily start/stop-able is probably the cost effective thing to do unless just vanilla colab environments do it for you.

MarkMMullin · on Oct 14, 2019

I just hate n-grams is the short answer - tear apart the input with link, use arc and modifier info as input to the statistical processes - so far, too early to say it works better, but sure as hell its more interesting . :-) . Sadly I have to keep my instances running all the time, they eat all of the public Reuters news feeds - side note - additional info can be obtained by taking the top N parse candidates for a given sentence and collapsing them into a graph - I have a pretty good signal out of that for 'the parser has gone insane'

DTE · on Oct 13, 2019

Yes, one of the differences in how we handle notebooks is that everything is actually run in a container behind the scenes. This means that is isn't just the .ipynb that we are hosting and if you install and dependencies, libraries, etc it will actually persist all of it inside of a container. This makes it much easier to share your work with others so that i.e. I could fork your notebook if you made it public and get all of the installed libraries and compiled dependencies by default. Hope that helps!

Edit: we also have another tool called GradientCI (https://docs.paperspace.com/gradient/projects/gradientci) that might also be of interest. Basically it lets you connect a GitHub repo directly to a project and you can use it to build your container automatically.

MarkMMullin · on Oct 13, 2019

Gotcha - yah, I can containerize, it's not like I'm screwing with drivers or whatnot on the AMI - I'll keep an eye on you, not ready for GPU yet, as I can't even rationally define the vectors I'm extracting from the Link stuff. Best of luck to you, lot of people piling into that battleground, and the Dunning-Kreuger effect is rampant. :-)

Side question, if you're willing to entertain it. Tired me tells a notebook to checkpoint, wanders off to bed, comes back next day and wonders why its taking an aeon to open the notebook..... oh yeah, damn, I've got gigs of images and panda crap in it . -- do you wrangle this problem, i.e. please don't save your notebooks as gihugic files representing a mental state you can't possibly remember ?

DTE · on Oct 13, 2019

I should also mention you can just as easily run these on CPU-backed instances as well. The GPU is not a hard requirement.

As for checkpointing data, that is still a relatively difficult problem to solve and our current recommendation is to use a combination of the persitent /storage directory and the notebook home directory. There are definitely issues with doing 100K+ of small files and committing those to the primary docker layer.

When you get to testing it out don't hesitate to reach out to use and we can try to see what the best solution is for your particular project. To date there isn't a "one size fits all" solution but we are working hard on making more intelligent choices behind the scenes to unblock some of these IO constraints.

lostmsu · on Oct 17, 2019

Hi!

Do you guys have a trademark for the name Gradient? I am developing a C#/.NET binding to TensorFlow with the same name: https://losttech.software/gradient.html

If you do, do you mind that clash of names?

Secondly, would you be interested to invest in providing .NET-powered notebooks? I just got it working on Azure Notebooks for F# ( http://ml.blogs.losttech.software/What-New-In-Preview-6.4/ ), but I feel like there would be more interest from C# developers. There is a good C# Jupyter kernel out there.

sabalaba · on Oct 14, 2019

Hey HN, just wanted to post here to say that if you need more than 6 hours training (free tier limit) we offer a 4x GPU machine that is only $1.5/hour. Just wanted to let everybody know what kind of competition is out there while you're doing your price comparisons:

https://lambdalabs.com/service/gpu-cloud

JonathanFly · on Oct 14, 2019

I'd be interested in whatever you could offer that's lowest specs in every respect except GPU ram. So 1 GPU, moderate to low everything else, but 24GB of RAM. Right now there's no great way to run those cheaply, even if I want to do something that might only take a few hours.

dkobran · on Oct 14, 2019

Paperspace team here. We offer a 24GB GPU option with minimal RAM. Check out the P6000 here: https://gradient.paperspace.com/instances Hope that's what you're looking for :)

darkmighty · on Oct 14, 2019

Perhaps you could try CPU training. I believe it should take about 10x longer on a many-core machine, but still might be worth it (if you require lots of RAM and can afford to wait). There are also various techniques to reduce memory usage.

foobarbecue · on Oct 14, 2019

Does a card like that exist? Last time I shopped for them (a few years ago) it didn't.

dkobran · on Oct 14, 2019

It does and we offer it. It's called the P6000 and it includes 24GB GPU RAM. It's one of the most popular chips we offer for any kind of CV task as you can fit a ton of images, video frames etc. in GPU memory. In any case, here's a link to the full lineup we offer: https://gradient.paperspace.com/instances

jfim · on Oct 13, 2019

I tried browsing the documentation for several minutes, but it does not answer this question: which Jupyter kernels are supported? Is it only Python like Google colab?

DTE · on Oct 13, 2019

Because everything is running in a docker container behind the scenes we support any kernel you would like. We have a handful of pre-built containers and you can also add a custom container very easily or build one off of a base template such as the Jupyter R stack. Here is a list of some of the container we provide by default https://docs.paperspace.com/gradient/notebooks/notebook-cont...

jaredscheib · on Oct 14, 2019

I've gone ahead and added this response to the docs here: https://docs.paperspace.com/gradient/notebooks/about#contain... :)

jfim · on Oct 15, 2019

Awesome, thanks a lot!

an_opabinia · on Oct 13, 2019

Google Collaboratory works pretty well as an alternative to this. In fact it works so well, like there’s so little time spent in IT, my total usage of it is quite low, because so much of what I’ve been communicating or seen communicated with notebooks is low quality in the first place.

llampx · on Oct 13, 2019

> so much of what I’ve been communicating or seen communicated with notebooks is low quality in the first place.

Do you think there's a format which is better suited to sharing data stories?

zerop · on Oct 14, 2019

Question: Why use this over Google colab?

solidasparagus · on Oct 13, 2019

Can someone clarify how this works in terms of data? I see "5GB of persistent data", but that's nowhere near enough for my training data. Is there also a larger amount of non-persistent storage that I can download data to?

DTE · on Oct 13, 2019

The 5GB of persistent data is available for by default for all free accounts running on the free instances. You can easily upgrade your storage up to 1TB by upgrading you subscription within the console. We can also provide up to 4TB by opening a ticket.

Traster · on Oct 13, 2019

Sorry, I'm very new to this, how do notebooks/shared notebooks play with version control?

psv1 · on Oct 13, 2019

Badly - under the hood jupyter notebooks are json which stores not only the code but all of the metadata as well. I know that there are tools that help with integrating jupyter and git but I just end up going back and forth between .py files in VSCode and notebooks in jupyter lab depending on what I'm working on.

jimmyvalmer · on Oct 13, 2019

Aye, notebooks truly represent a wrong turn in scientific computing. The absence of version control alone is a showstopper, but it's insane generally to launch long-running estimations interactively via browser-based notebook interfaces, with outputs that are not readily greppable. But corporate philistines continue to ooh and ah over jupyter's visuals, and companies like Gradient are more than happy to cater to their FOMO.

enriquto · on Oct 14, 2019

> Aye, notebooks truly represent a wrong turn in scientific computing.

I agree with that problem, yet somehow like the idea of notebooks. Maybe the real tragedy here is that jupyter notebooks are saved as json and not as a valid program with comments, that can be run "as is" from the command line.

jimmyvalmer · on Oct 14, 2019

> I agree with that problem, yet somehow like the idea of notebooks.

You like the tight feedback loop of REPLs, as do I. You don't need the clunky machinery of jupyter to effect REPLs with emacs and ipython.

> real tragedy here is that jupyter notebooks are not [saved] as a > valid program with comments

Here's a simple way of doing that: write your code as valid programs with comments.

enriquto · on Oct 14, 2019

> write your code as valid programs with comments.

That's what I do! Then I have a script that converts my python program to shitty json that my colleague--who can only conceive to work inside a notebook--can run it. Finally, another script translates back the json to a readable code; and more importantly, to something that can meaningfully be put into git.

I would love if the jupyter interface allowed to save the notebook directly into a program with comments. Then all this silly sorcery would not be necessary.

jimmyvalmer · on Oct 14, 2019

It does, nbconvert.

But that's besides the point. The jupyter ecosystem, like other widespread and unwieldy formats such as Microsoft Word and PDF, poses a brutal obstruction to Unix workflows.

enriquto · on Oct 15, 2019

> It does, nbconvert

Yes, my script simply calls the nbconvert library, with some trickery to ensure that the result is idempotent. But I would like not to need this script, that instead the jupyter interface worked with valid python files directly (maybe after enabling some option).

> poses a brutal obstruction to Unix workflows.

It's not as much the notebook itself, but the file format chosen by default by the notebook. If it was a human-editable textfile there would be no problem, and there is no practical obstruction for that (other that young programmers today cannot conceive a different "serialization" format than json).

DTE · on Oct 13, 2019

I respectfully disagree :) I think that the notebook environment is a nice entry-point for lots of applications. We actually use it pretty regularly to launch more sophisticated experiments / multi-node training jobs, etc using our python SDK.

You are right that versioning is still an issue and we largely punt on it by using the docker container (with layer commits on each notebook teardown) as the versioning mechanism. Maybe not the best solution but it does have it's advantages.

jimmyvalmer · on Oct 13, 2019

If I were you, I'd have angrily railed against GP, so I salute the restraint. I don't approve of what you're doing but I also would happily trade positions with you (unemployed versus founding principal of a company -- any company -- even one whose mission is nonsense).

schmudde · on Oct 14, 2019

It's possible but requires some foresight and strategy. Images are the first problem - they're stored in blobs. Small changes in graphics can create meaningless commits.

Data is the second problem - you usually can't run a notebook without it. If the notebook transforms data over time, it can be a real issue.

Here's a pretty complete run down of the first problem, the .ipynb itself: https://nextjournal.com/schmudde/how-to-version-control-jupy...

An integrated solution that versions data, the notebook, and the computational environment (collaborators will make changes over time) is the ideal collaborative platform.

mkutsovsky · on Oct 13, 2019

There are some tools in the jupyter ecosystem that are aware of the .ipynb json format & make diffing + version control easier to manage (https://github.com/jupyter/nbdime). Another option is by converting the notebook file to something (.py,.md,.html) that existing git tools are better at working with (https://github.com/mwouts/jupytext, https://github.com/jupyter/nbconvert)

snrji · on Oct 13, 2019

Does anyone now if this system supports Google Docs-like live editing? I'm struggling to find such a service.

lrem · on Oct 13, 2019

Google Colab?

Disclaimer: I use Colab regularly as part of my job in Google.

mjn · on Oct 14, 2019

I don't believe Colab is real-time live in the way that Google Docs is. I'm using it to teach a class and would love that kind of functionality, but unless I'm missing some option, when I make changes to the notebook, students who have it open don't see the changes immediately, only when they reload.

lrem · on Oct 14, 2019

I haven't tried coding live, but comment threads do seem real-time. Probably wrong assumption on my part, sorry.

snrji · on Oct 14, 2019

With which option! I have used Colab myself and I don't see live editing. I would like to see the cursor of the collaborators in real time while they edit. Also, the option to share the runtime.

lrem · on Oct 14, 2019

I haven't tried coding live, but comment threads do seem real-time. Probably wrong assumption on my part, sorry.

kvlr · on Oct 14, 2019

Can't say if it does but at https://nextjournal.com we do. We also support running on GPUs.

tastroder · on Oct 14, 2019

fyi: your chatbot makes a "popping" noise when scrolling down and puts "(2) New Messages" in the tab title. Both distract from your content.