Hosting Jupyter Notebooks for 100k Users [video]

ssanderson11235 · on April 30, 2018

Speaker here. If you want to follow along with the slides from the talk, you can find them at https://speakerdeck.com/ssanderson/hosting-notebooks-for-100....

Also, happy to answer any questions that people might have.

cbanek · on April 30, 2018

I'm working on building an educational environment with Jupyter, and I'm interested in the multiple hubs.

A few basic questions: Why multiple hubs? (was there some point of scale where you needed this) Did multiple hubs allow you to have better migrations? (where you drain one and move it over to the other)

Totally agree that state is the enemy of scale, so having a separate service backing your storage independent of what hub you're on seems like a big win.

Thanks for the great talk!

ssanderson11235 · on April 30, 2018

> Why multiple hubs?

A few reasons for this, most of which are related to points you mentioned:

1. Having multiple hubs makes it much easier to do zero-downtime deploys.

2. Having multiple hubs makes us more resilient to transient machine failures.

3. We were worried that having a single proxy for all our notebook traffic might become a system-wide bottleneck. Notebooks with a lot of images can get pretty large, and at the time we were rolling this out JupyterHub was pretty new. We weren't sure how well it was going to scale (the target audience for the JupyterHub team at the time was small labs and research teams), so it seemed safest to aim for horizontal scalability from the start. The JupyterHub team has since done a lot of awesome performance work to support the huge data science classes being taught at UC Berkeley, so it's possible that a single hub with the kubernetes spawner could handle our traffic today, but given points (1) and (2) plus the fact that we already have a working system, I don't have much incentive to find out :).

cbanek · on April 30, 2018

That's great, thanks! I was also curious if you hit scale issues on just one hub. I agree, it's best practice to not have all your eggs in one basket. I'd love to see an HA hub where this would be all taken care of for me, but hopefully by the time we go live we'll have this.

porterde · on April 30, 2018

Are the cell sharing extensions mentioned in the slides open sourced? (Sorry if it says either way in the video, I didn't get chance to watch it in full yet). Lack of sharing / collaboration extensions for Jupyter Notebooks / Lab are still a weak point I think.

ssanderson11235 · on April 30, 2018

The sharing machinery isn't open source, mostly because it's pretty tightly coupled to our community forums, which is a custom rails application.

I know that the jupyterhub team was working on https://github.com/jupyterhub/hubshare for a while as an open source sharing solution. I've also commented in https://github.com/jupyterhub/hubshare/issues/14 and elsewhere that I think PGContents (one of the libraries I talk about in the video) could be used as a basis for many kinds of sharing (though probably not realtime collaboration).

diabeetusman · on April 30, 2018

Do you know if there's any way that we can see what's happening during the mini demo?

ssanderson11235 · on April 30, 2018

I gave an earlier version of this talk at JupyterCon 2017 https://www.youtube.com/watch?v=TtsbspKHJGo, which captured my full screen output. The pgcontents demo starts around 19:30 in that video.

chupapuma · on May 1, 2018

Hey Scott, how did you get your unlisted YouTube link for your presentation? I don't think I ever found mine from the same conference.

ssanderson11235 · on May 1, 2018

I found it in the YouTube playlist from the event: https://www.youtube.com/playlist?list=PL055Epbe6d5aP6Ru42r7h....

chupapuma · on May 7, 2018

Weird. Mine isn't there, or I am blind. shrugs Though I did find it by searching the O'Reilly user :)

fwdpropaganda · on April 30, 2018

Come on. Quantopian doesn't have 100k users... maybe 5k?

EDIT: Here we go again... downvoted, then probably flagged and reprimanded by mod dang or something. Sigh.

I actually spent hundreds of hours on Quantopian, and from the activity in the forums you wouldn't think it has 100k users. Either that, or it's the most muted community on the internet.

sandGorgon · on April 30, 2018

there is Dash (https://plot.ly/products/dash/) and there is Jupyter.

I wish there was some abstraction to generate a Dash like output from Jupyter. There are a lot of people who would pay serious money for that.

Even Airbnb built a framework to extract code from Jupyter notebooks and push them into a machine learning pipeline (https://medium.com/airbnb-engineering/using-machine-learning...).

Jupyter can be so much more by going closer to how it fits within a production pipeline versus just competing against Rstudio.

ssanderson11235 · on April 30, 2018

There have been a couple attempts to add dashboarding to Jupyter:

https://github.com/jupyter/dashboards was/is a dashboard system built by a team at IBM. I think the project stalled somewhat after IBM stopped funding it.

There are a few long threads in the currently active jupyter repos about building dashboard systems as extensions: https://github.com/jupyterlab/jupyterlab/issues/1640, https://github.com/jupyter-widgets/ipywidgets/issues/2018.

deven88 · on May 1, 2018

I was searching for such dashboard utility and I found this: https://github.com/oschuett/appmode It may be useful for some of the cases.

minimaxir · on April 30, 2018

> I wish there was some abstraction to generate a Dash like output from Jupyter. There are a lot of people who would pay serious money for that.

Notebook-to-dataviz is the value proposition of Mode Analytics, which has been doing well: https://modeanalytics.com/

rb808 · on April 30, 2018

Also I always thought notebooks would be a great devops tool, kind of like a super command line that has easily observable steps grouped in chunks and graphical feedback. No one else seems to think so though so maybe I'm wrong.

existencebox · on April 30, 2018

As disclaimer/context, am a dev on the Azure hosted Jupyter Notebooks product,

You're not wrong! (Or at the least, it's a topic that has come across our ears before, and is something I certainly agree with) Obviously I probably shouldn't go off spouting all the pipe dreams I have in this space, but given that I got my start doing Ops work and tried to keep an eye for things I might have liked back then, I can assure you there you're not alone.

I always saw the similarity foremost as a direct upgrade to the "runbooks"/"firefighting/deploy checklists" that crop up all too often.

alexeldeib · on April 30, 2018

Another Azure dev checking in :)

Have you seen Application Insights Workbooks [0]? Basically you can have interactive notebooks and run analytics queries against your telemetry, generate charts, add text cells, etc. It's picking up usage for investigating outages, e.g., have a Workbook with a query that looks at your dependency calls and determine what service is failing + produce a visualization.

Workbooks don't actually execute any external actions, though. It's solely an analysis tool. Runbooks skew the other direction, they are for executing scripts (more or less).

Jupyter/python seems to fit in a nice gap where this could be bridged, especially with the level of existing python support from azure sdk + cli.

PS: a dev from Workbooks has seen Azure Notebooks, and was curious a while back about how he could integrate the functionality [1]

[0]: https://docs.microsoft.com/en-us/azure/application-insights/... [1]: http://blog.my-is300.com/2017/06/what-i-work-on-application-...

gardnerjr · on April 30, 2018

wow, a link to my blog (that second link) made hackernews? that's exciting!

Anyway, yeah, workbooks in appinsights is almost like notebooks for non-programmers? kinda? you string together markdown, parameters, and analytics queries (and very soon metrics across more of azure) into reports. But the parameters stuff lets you do more interactive things to hide/show sections now. i really need to do a new blog post about all the new stuff that's in there that wasn't last june!

i've prototyped some stuff to export an AI workbook to an azure/jupyter notebook, as there's some support for querying analytics already from a python package. there just hasn't been enough demand for it so far (not as much as we expected, anyway?)

craigching · on April 30, 2018

How about Emacs + org mode? https://www.youtube.com/watch?v=dljNabciEGg

I thought this video was pretty cool personally!

cup-of-tea · on April 30, 2018

Very cool! Watching someone else using emacs is always a great learning experience.

amirathi · on April 30, 2018

I built a platform exactly for this. Check it out here: https://www.nurtch.com/

sytse · on April 30, 2018

Wow, it is awesome how you managed to integrate runbooks, Jupyter, and monitoring, well done and great video!

amirathi · on April 30, 2018

Thanks :)

Just saw that you are CEO of GitLab. Good job making your runbooks public [1]. I converted one of your runbook into an executable notebook to convey my point. Check the before/after screenshot here: https://blog.amirathi.com/2018/03/27/codify-infra-runbooks-w...

You should consider using Nurtch :)

[1]https://gitlab.com/gitlab-com/runbooks

wohlergehen · on April 30, 2018

Native Jupyter has the magic %%bash command, so a lot of this should just be possible, i.e. you start a cell with that, and it will invoke bash to execute it.

More generally, there is %%script <x>, which executes the cell using <x>.

0: http://ipython.readthedocs.io/en/stable/interactive/magics.h...

lmeyerov · on May 1, 2018

Yep! We've been experimenting with a variant for security & fraud investigations at Graphistry. Our original experiments with just notebooks resulted in a few people on super early adopter teams getting a kick out of it, but in critical ~operational settings, having to deal with code was... not great. Mostly looked like siloed personal use.

BUT. The core concept is great. We made a form more focused on interactive investigation, so you can jump from an alert in a dashboard into a rich & interactive visual session with pre-wrangled data: https://youtu.be/B3ZZWx9WUEk?t=1m32s . Depending on what you see, can easily pull in more data, or refine what's there. And agreed, I think it'd be fun to try in big devops/netops scenarios!

jacquesm · on April 30, 2018

That makes good sense, in case it helps your confidence.

It's a middle ground between literate programming and a traditional IDE and there is nothing like that aimed at the ops space where it would actually be quite valuable.

nickbarnwell · on April 30, 2018

emacs' org-mode / org-babel is well suited for it. This blog post [1] has some fantastic examples. The problem, of course, is that it's so closely tied to emacs.

1: http://howardism.org/Technical/Emacs/literate-devops.html

craigching · on April 30, 2018

Nice, I just posted the video from that link above ;) Great minds think alike!

nickbarnwell · on April 30, 2018

Howard's blog and github are an amazing resource; I'm a huge fan of his demo-it package as well. Highly recommend checking it out if you ever give technical presentations or talks.

peatmoss · on April 30, 2018

The R community has a really nice answer for this: https://rmarkdown.rstudio.com/flexdashboard/