How Heroku Works

benhalllondon · on July 17, 2013

Re: The Rap Genius Question:

"""A random selection algorithm is used for HTTP request load balancing across web dynos""

bit more....

"""and this routing handles both HTTP and HTTPS traffic. It also supports multiple simultaneous connections, as well as timeout handling."""

davidbanham · on July 17, 2013

""""If you’re using some established framework, Heroku can figure it out. For example, in Ruby on Rails, it’s typically rails server, in Django it’s python <app>/manage.py runserver and in Node.js it’s node web.js."""

Been working in node for years now, have never named anything web.js. Why not index.js, server.js or app.js?

tlrobinson · on July 17, 2013

Yeah I think that's a relic of their very early Node.js beta. Now they look for package.json (https://github.com/heroku/heroku-buildpack-nodejs/blob/maste...) and you define the command to start the app in the Procfile.

jonmountjoy · on July 17, 2013

Dang - thanks for catching. Fixed.

sksksk · on July 17, 2013

You define the start command in the package.json anyway, so presumably Herkou can just read that?

bradhe · on July 17, 2013

Does...it really matter?

happywolf · on July 17, 2013

Not trusting a company that has a record of inaccurate documentations, and a record of ignoring customer complaints until got hammered hard (enough)

psychometry · on July 17, 2013

Well, they're obviously working on improving their documentation and have made public statements about some of the high-profile complaints. Is that not what you want? Or would you prefer they do nothing so that you can continue making tired accusations against them?

happywolf · on July 17, 2013

How do you know they document accurately this time? 'Obviously' as compared to what? Except if you have internal access to their source code and be able to verify against the documentation.

I am always happy to give honest mistakes a chance, but not to a company who ignored customers complaints for years, sold inaccurate tools for big bucks, and tried to obfuscating documentation to hide the change of algorithm in request routing at dyno level. Sorry, I just find these acts are bordering evil, and my money will go to other businesses. If you are not so sure what I am talking, read the following link and see how Heroku's CTO and COO responded to complaints.

http://news.rapgenius.com/Lemon-money-trees-rap-genius-respo...

positr0n · on July 17, 2013

    For example, dynos are cycled at least once per day

Why so often? I wonder if this is just to protect against memory leaks leading to poor performance from poorly coded apps or if it's an issue with their infrastructure.

derefr · on July 17, 2013

I would bet it's horizontal migration to "compact" dynos down onto fewer servers during off-hours, so they can terminate some of their EC2 instances. Basically, generational garbage collection performed on VM containers.

CoffeeOnWrite · on July 17, 2013

For one thing, it demonstrates empirically that the app can be successfully restarted.

bifrost · on July 17, 2013

I have yet to see one rails app that doesn't leak memory like crazy, this is the easiest way to maintain sanity :)

ryan-allen · on July 17, 2013

Stop using RMagick!

bifrost · on July 18, 2013

Trust me, I don't use rails at all anymore.

_pctq · on July 17, 2013

> Terminology: Dynos are isolated, virtualized Unix containers, that provide the environment required to run an application.

... also known as "process".

codebeaker · on July 17, 2013

Actually I believe the containers are implemented with something similar to http://lxc.sourceforge.net/ which is more akin to a process level chroot, and kernel resource limits. This also the same technology that fundamentally underpins Docker and some other similar super-lightweight virtualisation technologies. With lxc and buildroot, for example it's trivial to create a self-contained 20Mb image for running a PostgreSQL server. With all the security guarantees [sic] that would be provided by paravirtualisation.

_pctq · on July 17, 2013

(this reply is for jschulenklopper too)

Certainly, it was provocative of me to imply it was that simple : when you have to deal with processes at this scale, you can't do it without any sandboxing if you want to avoid users messing with one another.

It's still the basic scheme, though : when you rent dynos, you actually rent processes. That's the heroku business model (with all the deployment sugar on top of it, of course), and I'm always surprised people are ok to pay that much to rent processes.

Of course, one could argue you can spawn multiple processes in a dyno. But 1. thoses processes have to consume very low memory because of low memory limits of dynos and 2. this could lead to boot up nightmare.

Before the "rap-genuis-gate", the company I worked for then had scalability issues with heroku, being the later described router random timeout problem. Our dyno costs were already very high, even when using hirefire, so adding more dynos sounds not a good option.

We tried to replace thin with unicorn, which was later the proposed solution from heroku. But even with two unicorn workers, we reached memory limits. Even worse : when dyno booted unicorn master process, it considered itself ready, while unicorn was still spawning its children, so it ends up with more failed requests.

Bottom line is : using (and paying for) a heroku dyno is basically the same as using a process on a dedicated server. That may not be a problem, provided heroku clients are aware of that, and are not sold dynos as if it was what we all know as virtual servers.

jschulenklopper · on July 17, 2013

> when you rent dynos, you actually rent processes

True; more dynos, more processing capacity. I was replying to the suggestion that dynos are processes, but they aren't. Agreed, in common discussions this distinction doesn't matter much, and you correctly mentioned that it is possible to run multiple processes in one dyno.

> using (and paying for) a heroku dyno is basically the same as using a process on a dedicated server

Partly true. Using an Heroku dyno is basically the same as getting a small (although 2X dynos have 1 GB memory allocated) temporary virtual server in which a limited number of processes run, often just one. No running process equals to no dyno, I think.

Paying for an Heroku dyno is basically the same as renting (and releasing) a small virtual server for the duration of a running process (and for possibly very short periods of time). That is especially true for one-off dynos: a dyno only for the duration of one command, but those are charged just like web or worker dynos.

Normally, we aren't paying for processes on (dedicated or virtual) servers, but are buying the server capacity whatever we do with it (limited by the server resource allocation).

So that is indeed one of the differences between Heroku's PaaS solution and other IaaS offerings (and we both agree on): paying for the (running) processes instead of paying for the capacity made available for you.

_pctq · on July 17, 2013

> Paying for an Heroku dyno is basically the same as renting (and releasing) a small virtual server for the duration of a running process (and for possibly very short periods of time)

> Normally, we aren't paying for processes on (dedicated or virtual) servers, but are buying the server capacity whatever we do with it (limited by the server resource allocation).

That's one of the thing that makes me say we pay for processes, on heroku (even if it's indeed oversimplified). A dedicated or virtual server is not only about process and memory. You can have access to sendmail on it, handle uploads, store long and exhaustive logs, perform specific task via third party application (like resizing images), ssh to machine if any problem need deep inspection, etc.

The heroku model is about extremely specialized services. Whatever you need that is not processor share or memory needs an external addon. I'm ok with that, but it has to be stated clearly. It's not at all like if you had a standard virtual server.

Your stance "Paying for an Heroku dyno is basically the same as renting (and releasing) a small virtual server for the duration of a running process" would perfectly do. Way better than heroku's one "Dynos are isolated, virtualized Unix containers, that provide the environment required to run an application" which could let think that it's a regular virtual server. Well, sendmail, for me, belongs to an "environment required to run an application".

jt2190 · on July 17, 2013

  > ... getting a small... temporary virtual server in which 
  > a limited number of processes run, often just one.

Is it "often just one" process running because of some limit enforced by Heroku, or is it simply because, well, many people don't bother to spawn additional processes? Do additional processes "wait" for other processes to finish before proceeding? (I'm about to run another experiment on Heroku...)

jschulenklopper · on July 17, 2013

Heroku enforces a 512 MB memory limit (or 1024 MB on the 2X dynos), so that limits the practical number of processes running in the same dyno... especially if those are (heavy) full-stack web applications. Additionally, processes share the same CPU, so you'll not get the same performance as two processes in their own dynos.

See https://devcenter.heroku.com/articles/process-model and https://devcenter.heroku.com/articles/rails-unicorn. Unicorn forks multiple OS processes within each dyno to allow a Rails app to support multiple concurrent requests. Generally no more than 3-4 processes will fit in one dyno.

jschulenklopper · on July 17, 2013

No, dynos are 'containers', not processes. Those run inside the dynos. From the documentation at https://devcenter.heroku.com/articles/dynos: "The commands run within dynos include web processes, worker processes".

jonmountjoy · on July 17, 2013

Yes absolutely. A process is something that can run within a dyno. The root process is typically the one defined in the Procfile. It may spawn sub-processes, all running within the same dyno.

tlrobinson · on July 17, 2013

Here's my outsider's analysis of how Heroku works (well, worked in 2011) from a couple years ago: https://www.quora.com/Scalability/How-does-Heroku-work

hk__2 · on July 17, 2013

Here is the full link, for those that are not on Quora: https://www.quora.com/Scalability/How-does-Heroku-work/answe...

diminish · on July 17, 2013

Has the whole or parts of the architecture been made available as open source?

I want to create a container based PAAS architecture myself, based on docker etc. Does anyone recommend a ready made solution similar to heroku, to serve ruby web apps.

opdemand · on July 17, 2013

Deis will be released soon. It's an open-source PaaS based on Docker & Chef, with a user experience modeled after Heroku. You create a "formation" which contains a configurable number of backends and Nginx proxies. After you git push your app to the formation, you can scale web=N worker=N just like Heroku. Deis will automatically balance Docker containers across the backends, reconfigure routing, etc.. all using Chef. The goal is to provide a Heroku-like platform where you control everything: Chef Server, PaaS controllers, hosting providers, routing layer, etc..

zeckalpha · on July 17, 2013

https://github.com/ddollar/foreman is what is used for the local testing, but also for the actual running of the app on the Cedar stack. One could use Foreman and Docker and AWS to make a pretty reasonable PaaS.

(I'm working on a S3 static-site-only version of Heroku so people don't have to serve static sites behind 3 layers of web servers.)

bradhe · on July 17, 2013

> S3 static-site-only version of Heroku

You mean...a CDN?

zeckalpha · on July 17, 2013

No. I mean a command line tool mimicking the Heroku Toolbelt API but for static sites.

bradhe · on July 17, 2013

Ah, s3cmd then!

zeckalpha · on July 17, 2013

s3cmd is to ec2 as what I'm working on is to Heroku.

bradleyg_ · on July 17, 2013

This looks promising https://github.com/progrium/dokku and even uses Heroku buildpacks.

tlrobinson · on July 17, 2013

Dokku is cool, but its a small piece of a full PaaS platform like Heroku. It doesn't scale beyond a single "dyno"/machine or do multitenancy.

znowi · on July 17, 2013

Since Heroku's Ugly Secret I don't trust this company.

http://news.rapgenius.com/James-somers-herokus-ugly-secret-l...