I'm working on building an educational environment with Jupyter, and I'm interested in the multiple hubs.
A few basic questions:
Why multiple hubs? (was there some point of scale where you needed this)
Did multiple hubs allow you to have better migrations? (where you drain one and move it over to the other)
Totally agree that state is the enemy of scale, so having a separate service backing your storage independent of what hub you're on seems like a big win.
A few reasons for this, most of which are related to points you mentioned:
1. Having multiple hubs makes it much easier to do zero-downtime deploys.
2. Having multiple hubs makes us more resilient to transient machine failures.
3. We were worried that having a single proxy for all our notebook traffic might become a system-wide bottleneck. Notebooks with a lot of images can get pretty large, and at the time we were rolling this out JupyterHub was pretty new. We weren't sure how well it was going to scale (the target audience for the JupyterHub team at the time was small labs and research teams), so it seemed safest to aim for horizontal scalability from the start. The JupyterHub team has since done a lot of awesome performance work to support the huge data science classes being taught at UC Berkeley, so it's possible that a single hub with the kubernetes spawner could handle our traffic today, but given points (1) and (2) plus the fact that we already have a working system, I don't have much incentive to find out :).
That's great, thanks! I was also curious if you hit scale issues on just one hub. I agree, it's best practice to not have all your eggs in one basket. I'd love to see an HA hub where this would be all taken care of for me, but hopefully by the time we go live we'll have this.
Are the cell sharing extensions mentioned in the slides open sourced? (Sorry if it says either way in the video, I didn't get chance to watch it in full yet). Lack of sharing / collaboration extensions for Jupyter Notebooks / Lab are still a weak point I think.
The sharing machinery isn't open source, mostly because it's pretty tightly coupled to our community forums, which is a custom rails application.
I know that the jupyterhub team was working on https://github.com/jupyterhub/hubshare for a while as an open source sharing solution. I've also commented in https://github.com/jupyterhub/hubshare/issues/14 and elsewhere that I think PGContents (one of the libraries I talk about in the video) could be used as a basis for many kinds of sharing (though probably not realtime collaboration).
I gave an earlier version of this talk at JupyterCon 2017 https://www.youtube.com/watch?v=TtsbspKHJGo, which captured my full screen output. The pgcontents demo starts around 19:30 in that video.
Come on. Quantopian doesn't have 100k users... maybe 5k?
EDIT: Here we go again... downvoted, then probably flagged and reprimanded by mod dang or something. Sigh.
I actually spent hundreds of hours on Quantopian, and from the activity in the forums you wouldn't think it has 100k users. Either that, or it's the most muted community on the internet.
There have been a couple attempts to add dashboarding to Jupyter:
https://github.com/jupyter/dashboards was/is a dashboard system built by a team at IBM. I think the project stalled somewhat after IBM stopped funding it.
Also I always thought notebooks would be a great devops tool, kind of like a super command line that has easily observable steps grouped in chunks and graphical feedback. No one else seems to think so though so maybe I'm wrong.
As disclaimer/context, am a dev on the Azure hosted Jupyter Notebooks product,
You're not wrong! (Or at the least, it's a topic that has come across our ears before, and is something I certainly agree with) Obviously I probably shouldn't go off spouting all the pipe dreams I have in this space, but given that I got my start doing Ops work and tried to keep an eye for things I might have liked back then, I can assure you there you're not alone.
I always saw the similarity foremost as a direct upgrade to the "runbooks"/"firefighting/deploy checklists" that crop up all too often.
Have you seen Application Insights Workbooks [0]? Basically you can have interactive notebooks and run analytics queries against your telemetry, generate charts, add text cells, etc. It's picking up usage for investigating outages, e.g., have a Workbook with a query that looks at your dependency calls and determine what service is failing + produce a visualization.
Workbooks don't actually execute any external actions, though. It's solely an analysis tool. Runbooks skew the other direction, they are for executing scripts (more or less).
Jupyter/python seems to fit in a nice gap where this could be bridged, especially with the level of existing python support from azure sdk + cli.
PS: a dev from Workbooks has seen Azure Notebooks, and was curious a while back about how he could integrate the functionality [1]
wow, a link to my blog (that second link) made hackernews? that's exciting!
Anyway, yeah, workbooks in appinsights is almost like notebooks for non-programmers? kinda? you string together markdown, parameters, and analytics queries (and very soon metrics across more of azure) into reports. But the parameters stuff lets you do more interactive things to hide/show sections now. i really need to do a new blog post about all the new stuff that's in there that wasn't last june!
i've prototyped some stuff to export an AI workbook to an azure/jupyter notebook, as there's some support for querying analytics already from a python package. there just hasn't been enough demand for it so far (not as much as we expected, anyway?)
Just saw that you are CEO of GitLab. Good job making your runbooks public [1]. I converted one of your runbook into an executable notebook to convey my point. Check the before/after screenshot here: https://blog.amirathi.com/2018/03/27/codify-infra-runbooks-w...
Native Jupyter has the magic %%bash command, so a lot of this should just be possible, i.e. you start a cell with that, and it will invoke bash to execute it.
More generally, there is %%script <x>, which executes the cell using <x>.
Yep! We've been experimenting with a variant for security & fraud investigations at Graphistry. Our original experiments with just notebooks resulted in a few people on super early adopter teams getting a kick out of it, but in critical ~operational settings, having to deal with code was... not great. Mostly looked like siloed personal use.
BUT. The core concept is great. We made a form more focused on interactive investigation, so you can jump from an alert in a dashboard into a rich & interactive visual session with pre-wrangled data: https://youtu.be/B3ZZWx9WUEk?t=1m32s . Depending on what you see, can easily pull in more data, or refine what's there. And agreed, I think it'd be fun to try in big devops/netops scenarios!
That makes good sense, in case it helps your confidence.
It's a middle ground between literate programming and a traditional IDE and there is nothing like that aimed at the ops space where it would actually be quite valuable.
emacs' org-mode / org-babel is well suited for it. This blog post [1] has some fantastic examples. The problem, of course, is that it's so closely tied to emacs.
Howard's blog and github are an amazing resource; I'm a huge fan of his demo-it package as well. Highly recommend checking it out if you ever give technical presentations or talks.
Also, happy to answer any questions that people might have.