I think as a web professional I would want anything but this. I even tend to dislike using apt-get for every package on my server. Some things need a tender loving hand with careful consideration during configuration (I'm a poet and I didn't even know it).
You can't really build a one-size-fits-all distribution for everyone. Sure you can get a basic LAMP stack setup, but 1) is that what you really want in a PHP situation and 2) not everyone is rolling a simple PHP website these days.
If what you're trying to solve is the problem with "scalability" then a Distro isn't gonna help. A carefully constructed architecture and implementation is what will. I personally tend to shy away from a big cluster of apt-get installations in favor of installing certain things straight from source. You can bet that all of the huge sites out there are rolling their own too.. Facebook has their own version of Apache and so do companies like Yahoo.
I think something like this might be good for simple in-house development and testing, but again, not everyone's needs are the same. I totally love server configuration, and oftentimes it gives me a real nice break from coding and designing. There is nothing more fun and satisfying than getting a new server up in just a few hours configured to the best of my abilities.
Whateva hopefully these comments make sense. Been up all night and running on adderall and mcdonalds.
You can build your own packages, with whatever tweaks you want built in. The thing is once you get past a certain size, you've got too many servers to be treating them as individuals.
And if you're past 4 machines you really want to start looking at cfengine.
; You acknowledge that system administration takes time and effort. If you have more than some 10-15 machines, you ought to automate the work of keeping them in sync. You want to minimize effort of administration.
; You install something along the lines of puppet from http://reductivelabs.com and use that to manage your server infrastructure.
; You install a monitoring suite for the infrastructure. Ganglia is popular when the servers are forming a cluster.
With the above, any reasonable operating system will work, be it .deb/.rpm based linux distros, *BSD, or OSX (I honestly don't know how to do this properly in Windows).
I'm actually working on something like this in my free time for Ubuntu flavors. I've been trying to figure out how to get some funding for it so I can focus on it, but I'm not sure if it's a service I should try to charge users for (automated customization of Ubuntu linux flavor live/installcds).
I figure that once I have a proof-of-concept I can show it to people who might be willing to invest in it so I could afford to host it somewhere and so on.
Right now I'm focusing on package installation / removal, but I plan on adding the ability to put some stuff in your home folder for first user creation, and upload specific configuration files.
Can a distro really hope to do much more than this apt-get one-liner can?
Yeah, you could install an improved collection of prewired config files for everything so that all the tools are better integrated out of the box. But given that each web 2.0 app probably has a unique configuration, which requires a sysadmin to hand-edit all of those config files anyway, is it really going to save that much time? Configuration space is kind of large. The odds that the prewired configuration is a close match to the one you want may be fairly low.
And if your code doesn't need a custom configuration of servers, but is designed to run on some kind of standard server-farm-in-a-box configuration (with, at most, minor tweaks of a couple of config files), why are you even installing your own distros? Aren't there hosting companies that run farms of standardized boxes that your app can be designed to, and that will handle the provision and administration of those boxes for a monthly fee? Kind of like the Google App Engine business model? I haven't done business with such a company, but I've been presuming they exist. Isn't this how (e.g.) Engine Yard works?
The potential problem with the distro idea is that it sits between these two business models (bespoke setups by your own sysadmin on the one hand, one-stop shopping for standardized architectures, standardized server farms, and standardized sysadmins on the other). Is there much daylight between the two?
Deprec (http://www.deprec.org/) takes you from a clean ubuntu install to a live rails site, and claims to support heartbeat/linux-ha and multiple servers, as well as some misc handy stuff like ntp. (I've only used it to manage a simple single server.)
"It's a limited market." Fantastic! That's the perfect reason to do something like this. Sadly I don't think anyone will be doing it in one night, I believe that this is a realistic request. The whole point of Linux (and Unix before it) is to have a group of tools, a toolbox if you will, that can be easily refactored, reorganized and redeployed to fit the problems you face. What he is asking for is a toolbox that comes with the tools he wants from the start, and he can't find a Sears that carries it. The only issue I have with the whole thing is I wish he would do it, not ask others to solve his problems for him.
Which company? One of the companies that already make a distribution of Linux? I'm sorry, but I don't understand whose responsibility you believe it to be to create this distribution.
As the author of the article alluded to, if it were to get made, it would probably get made by hobbyists.
Ya, I was not seeing this as a company thing. I don't believe that a Linux distribution is a smart way to start a company(although it has been done), I think it would be a fun project if I had the time though.
A larger site can usually afford to staff up in ways that smaller sites can't.
So, someone needs to make a Linux distribution targeted explicitly to those companies that currently spend as little as possible on their infrastructure. I smell a hugely profitable business model.
The point has already been made that the vast majority of "small to medium businesses" do not need scalability (not even a little bit--90% of those businesses could be hosted on a $10/month shared hosting account and never have a performance problem). So, we're talking about a tiny subset of an extremely cost-conscious market. Not only that, for the small businesses that do need scaling (the Twitters and the Justin.TVs), the vast majority of their scaling problems cannot be solved by adding a few applications like memcache and a pre-tuned MySQL to the distribution. Your application has to be built to scale, or you will not scale. The tuning of MySQL is trivial (and case-by-case specific, to boot...there is no way to say any particular MySQL configuration is good for all high performance use cases that balances all of the usual tradeoffs, and as much as there can be such a configuration the one provided by most distros is already pretty close to it).
You could ask me how I know, with some confidence, that web scalabilty for small to medium businesses is a bad business idea, or I could just blurt out that my first business built web scalability products for small to medium businesses. It took me seven years to figure out that they don't have any money, and that they don't know enough to know to look at the niche solutions for their problems (they call Cisco, or F5 if they're really on the technical ball), so when they do spend money, they're spending it with companies that provide a huge range of products that are sort of ancillary to the task at hand...and also happen to have a few items in their product line that can sort of address the problem, if you squint right and send them enough money. Not that I'm bitter. (Seriously, I'm actually not bitter. Just amused when folks take the "pick a problem that you have, and chances are others will have it too, and there's your perfect business idea!" to an extreme that doesn't quite mesh with reality.)
There may be the seed of a good idea in there somewhere. But building yet another Linux distribution is just dumb as rocks. And I mean that in the nicest possible way.
Yeah, I kind of figured, but I've spent the last three years growing a decidedly unsexy web-app from 1 webserver and two database machines to the small fleet of machines it is today; and if there were an extant distribution that took some of the pain away I would be on top of it. But as you say it's somewhat hard to get people to spend money on server operating systems.
The real problem is that there is too much variation for widely usable solution to come to the fore. Basic webhosting has things like webmin or (yech!) Ensim, that package up the standard tasks in nifty web frontend. But managing a heterogeneous network of webservers and databases is still more difficult than it ought to be.
Thanks for the considered answer. I don't think anyone was thinking that this was a viable business; more just people wishing there were a solution to the problems they are dealing with.
and if there were an extant distribution that took some of the pain away I would be on top of it
The problem isn't that there isn't someone out there that would be happy to sell you "scaling in a box". The problem is that scaling is not a generic problem. You can't sell scaling in a box. Your application has to be built to scale, or it will not scale.
Tuning MySQL is trivial. Adding sharding to your application is the hard part, and no automated solution will know how to do that. The Django, RoR, and Catalyst/DBIx folks are all thinking about the problem, and they may be able to do something about it. Replication probably could be made easier. But it's a far less useful solution than it seems on the surface--if there were no latency, you could just use replication. But there is latency, and so you still need application level support to insure you don't ask for data that doesn't exist yet.
Basic webhosting has things like webmin or (yech!) Ensim
I'm really glad there was no "yech!" for Webmin, as I take that sort of thing personally. (Check my profile for why. And, we are thinking about these sorts of problems, and we're addressing the ones that can logically be addressed in a reasonably generic way.)
But managing a heterogeneous network of webservers and databases is still more difficult than it ought to be.
We're working on this, too (as are a couple dozen of our competitors). But scaling your application is still your problem to solve. While one could build (and Jamie and I are building) an application to manage multi-server deployments, no product can generically make an app scale.
Thanks for the considered answer. I don't think anyone was thinking that this was a viable business; more just people wishing there were a solution to the problems they are dealing with.
My feeling is that it's coming at it from way off in the wrong direction (another niche Linux distro...again, just couldn't be a dumber idea), not that there won't be money spent on scaling issues and multiple server related issues. I'll also point out that from where I sit (which I humbly submit is a position of reasonable knowledge of the industry) scaling is not the biggest problem facing hosting providers--not even close. They simply are not being asked to solve scaling problems--us nerds love to talk about scaling. It feels good to think about all those millions of requests that we'll be serving when we can scale up (ah, those magic words!), but in reality, the average website gets a handful of visitors a day and the average webserver could host 250 average websites without breaking a sweat.
But, a side effect of all of the work that is going into more efficiently making use of all the machines in a data center is that a few scaling problems will go away or be mitigated. Ramping up new "servers" in a cloud environment becomes trivial enough to do on-demand--when you need ten servers, you ask for them, and when you don't need them anymore, you drop them. Databases are getting more "cloud" oriented. SimpleDB, Hadoop, BigTable, CouchDB, are all looking at the database problem from a functional zero-side effects perspective, and the end result is a system where related queries can be made in parallel and sane results can be expected.
So, there are solutions to a lot of these kinds of problems coming...but they aren't what you expect (or even know you want yet) and they aren't drop-in "MySQL now goes ten times faster" solutions (nor are they, "every new MySQL server linearly increases my apps performance and I didn't have to do anything to my app"). Those kinds of silver bullets just do not exist, and won't exist.
Of course the first thing that is going to happen is htat you are going to have a batch of arguments:
Xen vs. VMWare vs. $virtualiser
cfengine vs. bcfg2 vs. $(cool_haskell_config_that_will_be_cool_as_soon_as_it_is_done)
dpkg vs. yum vs. rpm vs. $packagemanagerthatdoesnotsuck
Ubuntu vs. Fedora vs. $(my_favorite_distro) vs. BSD
emacs vs. vi #this one may be orthogonal
It would be nice if we had something that just routed around all of those arguments and helped to set up a straightforward way to manage multiple images as part of one security domain, with monitoring and rekeying baked in.
I don't understand: eucalyptus looks like an open-source implementation of EC2, so isn't the end-result a bunch of blank virtual boxes? Its server management system might make it easy to load new slices with your custom disk image, but if you have that image, why not just put it on EC2 and save yourself the hassle of running a utility company in your spare time?
(Loading pre-built images to ec2 seems like a good answer to the op's question; I know I've seen some built-to-purpose images around.)
eucalyptus is only one part of what Steve seems to be asking for.
Ideally you'd have one location where you define the services you're running and the variables you're monitoring and you'd be able to do things like rekey the entire cluster in one operation, set ACLs for resources that exist on multiple locations, schedule batch jobs by priority and deadline; and do all this while totally abstracting away CPUs, filesystems, networks and all of the messy and bothersome failure-prone hardware.
I don't understand exactly which version of Linux he's been deploying that makes this a time saver. You can install Linux with 95% of what you need (and nothing more) with plenty of distros, and one command gets you the rest of the way.
The rest of the work is custom (user access, software specific requirements, etc). I'm also the lead sys-admin for a dot-com company, and I think I can safely say I've NEVER had a Linux install that wasn't, in his words, something that I "can just deploy, configure and turn off what I don't need?"
I just had either a brilliant or a dumb (take your pick) idea that has sprouted from this thought.
It be cool if there was a site where you could just click the options you wanted and it would create a customised distro for you, without all the added extras?
I'm no linux guru and I dread the thought of having to set up a new system under linux. If there was a site where I could just pay a couple bucks for a customised distro for what I wanted (as its not always the same thing) that just worked, I would absolutely pay it.
I think this kind of approach may also help to increase the adoption rate of linux amongst less technically inclined people too.
EDIT - Sorry about the stream of conciousness, just noticed someone supplied a link for SUSE, is there anything similar for debian based systems? Ubuntu undoubtedly is getting the "lions share" off attention for linux systems, so is there anything like this for it?
Their "conary" package management system (developed by at least one of the same folks that were involved in RPM, presumably with a reasonable amount of confidence that RPM didn't really get it right) is designed for the construction of specific purpose distributions. It, obviously, hasn't taken the world by storm, so they seem to have evolved in a different direction of late, and also changed their name I'm pretty sure--though I can't remember what they used to be called (I could only remember the name "conary"). Anyway, technologically speaking conary is awe-inspiring. But, I'm not using it...so obviously, it's not providing value that I can figure out a use for.
Well, for our web-services server, stock Ubuntu Server and a 50 line script does the trick. I have one tarball that I scp over, and it handles setting up accounts, installing the right packages, applying a config-files diff, setting up rc.local and starting the web-services.
I'm planning on setting it up eventually so that it'll automatically clone from the most recent backup of our databases.
I wish there existed, and looked for, tools to manage such a roll-out, but really, it was only a few hours of work to get everything set up. VMWare is your friend there for testing the script, fixing, rolling back to the fresh-installed point, wash, rinse, repeat.
I think that what software you use has little to do with scaling a huge site. The biggest obstacle is how your data is organized and accessed. This is different for every application. If you can't get the data schema right then what software you have doesn't matter. That's assuming you're huge. If you're not huge then just build something and don't lose focus trying to build something that can handle 100 million users when you have 100.
It lets you create customised OpenSuSE-based images for deployment/installation. It would be fairly easy to construct the distro you've got in mind, you'll just get the added benefit of having everything kept up to date for you.
Will every startup have the same requirements? What if I want nginx instead of apache and so forth? You are always going to want some customization, if you are on EC2 then you can just create an AMI and I think there is some way to save these distributions otherwise.
You can't really build a one-size-fits-all distribution for everyone. Sure you can get a basic LAMP stack setup, but 1) is that what you really want in a PHP situation and 2) not everyone is rolling a simple PHP website these days.
If what you're trying to solve is the problem with "scalability" then a Distro isn't gonna help. A carefully constructed architecture and implementation is what will. I personally tend to shy away from a big cluster of apt-get installations in favor of installing certain things straight from source. You can bet that all of the huge sites out there are rolling their own too.. Facebook has their own version of Apache and so do companies like Yahoo.
I think something like this might be good for simple in-house development and testing, but again, not everyone's needs are the same. I totally love server configuration, and oftentimes it gives me a real nice break from coding and designing. There is nothing more fun and satisfying than getting a new server up in just a few hours configured to the best of my abilities.
Whateva hopefully these comments make sense. Been up all night and running on adderall and mcdonalds.