Hacker News new | past | comments | ask | show | jobs | submit login
Hosting Jupyter Notebooks for 100k Users [video] (youtube.com)
151 points by jbredeche on April 30, 2018 | hide | past | favorite | 32 comments



Speaker here. If you want to follow along with the slides from the talk, you can find them at https://speakerdeck.com/ssanderson/hosting-notebooks-for-100....

Also, happy to answer any questions that people might have.


I'm working on building an educational environment with Jupyter, and I'm interested in the multiple hubs.

A few basic questions: Why multiple hubs? (was there some point of scale where you needed this) Did multiple hubs allow you to have better migrations? (where you drain one and move it over to the other)

Totally agree that state is the enemy of scale, so having a separate service backing your storage independent of what hub you're on seems like a big win.

Thanks for the great talk!


> Why multiple hubs?

A few reasons for this, most of which are related to points you mentioned:

1. Having multiple hubs makes it much easier to do zero-downtime deploys.

2. Having multiple hubs makes us more resilient to transient machine failures.

3. We were worried that having a single proxy for all our notebook traffic might become a system-wide bottleneck. Notebooks with a lot of images can get pretty large, and at the time we were rolling this out JupyterHub was pretty new. We weren't sure how well it was going to scale (the target audience for the JupyterHub team at the time was small labs and research teams), so it seemed safest to aim for horizontal scalability from the start. The JupyterHub team has since done a lot of awesome performance work to support the huge data science classes being taught at UC Berkeley, so it's possible that a single hub with the kubernetes spawner could handle our traffic today, but given points (1) and (2) plus the fact that we already have a working system, I don't have much incentive to find out :).


That's great, thanks! I was also curious if you hit scale issues on just one hub. I agree, it's best practice to not have all your eggs in one basket. I'd love to see an HA hub where this would be all taken care of for me, but hopefully by the time we go live we'll have this.


Are the cell sharing extensions mentioned in the slides open sourced? (Sorry if it says either way in the video, I didn't get chance to watch it in full yet). Lack of sharing / collaboration extensions for Jupyter Notebooks / Lab are still a weak point I think.


The sharing machinery isn't open source, mostly because it's pretty tightly coupled to our community forums, which is a custom rails application.

I know that the jupyterhub team was working on https://github.com/jupyterhub/hubshare for a while as an open source sharing solution. I've also commented in https://github.com/jupyterhub/hubshare/issues/14 and elsewhere that I think PGContents (one of the libraries I talk about in the video) could be used as a basis for many kinds of sharing (though probably not realtime collaboration).


Do you know if there's any way that we can see what's happening during the mini demo?


I gave an earlier version of this talk at JupyterCon 2017 https://www.youtube.com/watch?v=TtsbspKHJGo, which captured my full screen output. The pgcontents demo starts around 19:30 in that video.


Hey Scott, how did you get your unlisted YouTube link for your presentation? I don't think I ever found mine from the same conference.


I found it in the YouTube playlist from the event: https://www.youtube.com/playlist?list=PL055Epbe6d5aP6Ru42r7h....


Weird. Mine isn't there, or I am blind. shrugs Though I did find it by searching the O'Reilly user :)


Come on. Quantopian doesn't have 100k users... maybe 5k?

EDIT: Here we go again... downvoted, then probably flagged and reprimanded by mod dang or something. Sigh.

I actually spent hundreds of hours on Quantopian, and from the activity in the forums you wouldn't think it has 100k users. Either that, or it's the most muted community on the internet.


there is Dash (https://plot.ly/products/dash/) and there is Jupyter.

I wish there was some abstraction to generate a Dash like output from Jupyter. There are a lot of people who would pay serious money for that.

Even Airbnb built a framework to extract code from Jupyter notebooks and push them into a machine learning pipeline (https://medium.com/airbnb-engineering/using-machine-learning...).

Jupyter can be so much more by going closer to how it fits within a production pipeline versus just competing against Rstudio.


There have been a couple attempts to add dashboarding to Jupyter:

https://github.com/jupyter/dashboards was/is a dashboard system built by a team at IBM. I think the project stalled somewhat after IBM stopped funding it.

There are a few long threads in the currently active jupyter repos about building dashboard systems as extensions: https://github.com/jupyterlab/jupyterlab/issues/1640, https://github.com/jupyter-widgets/ipywidgets/issues/2018.


I was searching for such dashboard utility and I found this: https://github.com/oschuett/appmode It may be useful for some of the cases.


> I wish there was some abstraction to generate a Dash like output from Jupyter. There are a lot of people who would pay serious money for that.

Notebook-to-dataviz is the value proposition of Mode Analytics, which has been doing well: https://modeanalytics.com/


Also I always thought notebooks would be a great devops tool, kind of like a super command line that has easily observable steps grouped in chunks and graphical feedback. No one else seems to think so though so maybe I'm wrong.


As disclaimer/context, am a dev on the Azure hosted Jupyter Notebooks product,

You're not wrong! (Or at the least, it's a topic that has come across our ears before, and is something I certainly agree with) Obviously I probably shouldn't go off spouting all the pipe dreams I have in this space, but given that I got my start doing Ops work and tried to keep an eye for things I might have liked back then, I can assure you there you're not alone.

I always saw the similarity foremost as a direct upgrade to the "runbooks"/"firefighting/deploy checklists" that crop up all too often.


Another Azure dev checking in :)

Have you seen Application Insights Workbooks [0]? Basically you can have interactive notebooks and run analytics queries against your telemetry, generate charts, add text cells, etc. It's picking up usage for investigating outages, e.g., have a Workbook with a query that looks at your dependency calls and determine what service is failing + produce a visualization.

Workbooks don't actually execute any external actions, though. It's solely an analysis tool. Runbooks skew the other direction, they are for executing scripts (more or less).

Jupyter/python seems to fit in a nice gap where this could be bridged, especially with the level of existing python support from azure sdk + cli.

PS: a dev from Workbooks has seen Azure Notebooks, and was curious a while back about how he could integrate the functionality [1]

[0]: https://docs.microsoft.com/en-us/azure/application-insights/... [1]: http://blog.my-is300.com/2017/06/what-i-work-on-application-...


wow, a link to my blog (that second link) made hackernews? that's exciting!

Anyway, yeah, workbooks in appinsights is almost like notebooks for non-programmers? kinda? you string together markdown, parameters, and analytics queries (and very soon metrics across more of azure) into reports. But the parameters stuff lets you do more interactive things to hide/show sections now. i really need to do a new blog post about all the new stuff that's in there that wasn't last june!

i've prototyped some stuff to export an AI workbook to an azure/jupyter notebook, as there's some support for querying analytics already from a python package. there just hasn't been enough demand for it so far (not as much as we expected, anyway?)


How about Emacs + org mode? https://www.youtube.com/watch?v=dljNabciEGg

I thought this video was pretty cool personally!


Very cool! Watching someone else using emacs is always a great learning experience.


I built a platform exactly for this. Check it out here: https://www.nurtch.com/


Wow, it is awesome how you managed to integrate runbooks, Jupyter, and monitoring, well done and great video!


Thanks :)

Just saw that you are CEO of GitLab. Good job making your runbooks public [1]. I converted one of your runbook into an executable notebook to convey my point. Check the before/after screenshot here: https://blog.amirathi.com/2018/03/27/codify-infra-runbooks-w...

You should consider using Nurtch :)

[1]https://gitlab.com/gitlab-com/runbooks


Native Jupyter has the magic %%bash command, so a lot of this should just be possible, i.e. you start a cell with that, and it will invoke bash to execute it.

More generally, there is %%script <x>, which executes the cell using <x>.

0: http://ipython.readthedocs.io/en/stable/interactive/magics.h...


Yep! We've been experimenting with a variant for security & fraud investigations at Graphistry. Our original experiments with just notebooks resulted in a few people on super early adopter teams getting a kick out of it, but in critical ~operational settings, having to deal with code was... not great. Mostly looked like siloed personal use.

BUT. The core concept is great. We made a form more focused on interactive investigation, so you can jump from an alert in a dashboard into a rich & interactive visual session with pre-wrangled data: https://youtu.be/B3ZZWx9WUEk?t=1m32s . Depending on what you see, can easily pull in more data, or refine what's there. And agreed, I think it'd be fun to try in big devops/netops scenarios!


That makes good sense, in case it helps your confidence.

It's a middle ground between literate programming and a traditional IDE and there is nothing like that aimed at the ops space where it would actually be quite valuable.


emacs' org-mode / org-babel is well suited for it. This blog post [1] has some fantastic examples. The problem, of course, is that it's so closely tied to emacs.

1: http://howardism.org/Technical/Emacs/literate-devops.html


Nice, I just posted the video from that link above ;) Great minds think alike!


Howard's blog and github are an amazing resource; I'm a huge fan of his demo-it package as well. Highly recommend checking it out if you ever give technical presentations or talks.


The R community has a really nice answer for this: https://rmarkdown.rstudio.com/flexdashboard/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: