Azure, EC2, and Google Cloud are overpriced in most cases. You can do the same with cheaper (and often similarly reliable) VPSes managed with tools like Puppet/Chef, Consul, Vault, Docker, etc. Plus you avoid lock in. Your stuff is yours and can be deployed anywhere.
I don't see the allure of the costlier cloud other than the "nobody ever got fired for" factor so common in enterprise purchasing. Amazon is the only one that might have a stronger case for it based on its huge managed service stack, but much of that is not too terribly hard to duplicate with other tools and more a la carte services. There also really isn't a reason you can't use some of Amazon's stuff while also using more commodity options.
On a more principled level I'm starting to see huge proprietary cloud as a potential threat to the open Internet. It's not quite there yet but at some point I could see it, especially with the walled garden plays you see around IoT.
One unique thing about Google Cloud is that most managed services like Load Balancer, PubSub, Datastore, BigQuery etc do not charge you for variability and high-availability. AWS and Azure WILL charge you 10x to scale up and another 3x for redundancy. Because Google's managed services are often based on Google's internal stack, they just scale. Good luck scaling Kafka to millions of messages per second - with PubSub you get it out of the box. PubSub, BigQuery, and others are geographically high-available out of the box. These things are difficult to replace on EC2, and nearly impossible on players like Digital Ocean.
Edit: BigQuery, for example, allows you to rent 10,000 cores for 5 seconds at a time. This type of stuff is impossible to do with VMs at all.
Not really related to your bigger point, which I have no opinion on, but Kafka & PubSub have different delivery contracts, Kafka's are generally more strict. Therefore comparing the scalability of the 2 is somewhat problematic.
Can you elaborate on that? PubSub is a fully-managed service, which means that Google SREs are on call making sure things are up. In addition, Pubsub has "guaranteed at-least-once message delivery". In a sense, Google's SREs guarantee delivery.
PubSub is also a GLOBAL service. Not only are you protected from zone downtime, you are protected from regional downtime. Is there an equivalent to this level of service anywhere in the world?
I'm not too familiar with Kafka's fully managed service, but Kafka-on-VM is a whole other ball game. YOU manage the service. YOU guarantee delivery, not Kafka.
Kafka promises strictly ordered delivery, PubSub promises mostly ordered. The differences between those promises are what drive PubSubs ability to scale throughput and global availability.
From an availability standpoint, I don't disagree with anything you mention, but the difference between the consistency models means that PubSub is solving a different set of problems than Kafka, thus my opinion that comparing them is problematic.
That's a fair point. But remember, Kafka promises this as long as the underlying VM infrastructure is alive and well. PubSub completely removes this worry, or even the concept of VMs.
There are several ways to look at it, but I'd opine that a "mostly ordered" fully-managed truly-global service that's easy to unscramble on the receiving end is more "guaranteed" than something that is single-zone and relies on the health of underlying VMs that YOU have to manage.
edit: Kafka and PubSub have a lot of overlap, but they each have qualities the other one doesn't. I suppose you gotta choose which qualities are more important for you.
If you can design your protocol such that it can work in a mostly ordered fashion, I'd highly recommend that you do. It opens up your choices for technology stack tremendously. But, if you require ordered delivery, your choices start shrinking dramatically.
Also, just so we are on the same page. Kafka is a software product that can be run on hardware or VMs, not a managed service. Possibly, you are thinking of the Amazon Kinesis product which does offer a managed service with strict ordering.
No confusion on second point. My argument was that Kafka adds significant complexity and delivery risk because it's software that you must run on hardware/VMs, rather than a fully-managed service. You have to pay a whole lot of eng time to make Kafka truly "guaranteed delivery" because there's always risk of underlying hardware/VM/LB dying.
Pubsub guarantees delivery regardless of what happens with underlying infrastructure. In a sense, the bar has been raised dramatically.
> PubSub is also a GLOBAL service. Not only are you protected from zone downtime, you are protected from regional downtime. Is there an equivalent to this level of service anywhere in the world?
Could you point to some of the documentation that describes more about its reliability model and SLA? I glanced through the documentation and couldn't find out any information about this.
It seems like a service that has this kind of global availability would have to make a trade off in latency for writes and potentially reads. If it's a multi-region service, then all writes need to block until they're acknowledge by at least a second region, right? It seems like that will add latency to every request and may not necessarily be a good thing. Similarly, at read time, latency could fluctuate depending on which region you query, and whether your usual region has the data yet. I'm just speculating though, not having read any more about the service. It does sound nice to have the choice to fall back to another region and take the latency hit, instead of an outage. On the other hand, regions are already highly available at existing cloud providers (with zones being a more common failure point).
Is PubSub mature? The FAQ suggests that you should authenticate that Google made the requests to your HTTPS endpoint by adding a secret parameter, rather than relying on any form of HTTP-level authentication.
> If you additionally would like to verify that the messages originated from Google Cloud Pub/Sub, you could configure your endpoint to only accept messages that are accompanied by a secret token argument, for example, https://myapp.mydomain.com/myhandler?token=application-secre....
This feels rather haphazard. If I'm exposing an HTTPS endpoint in my application that will trigger actual behavior upon the receipt of an HTTP request, then of course I "would like to verify that the messages originated from Google Cloud Pub/Sub", so that they're not coming from some random bot or deliberate attacker who happened to learn my URL.
I didn't see anything in the docs that touches on those subjects in detail (I did skim the docs looking for sections and pages that might contain answers to my questions before I posted), but please point me to the page that does if you know of one and I'd be interested to read it! I trust that your perceptions and information are accurate, but cite-able and reference-able information is also valuable.
"For the most part Pub/Sub delivers each message once, and in the order in which it was published. However, once-only and in-order delivery are not guaranteed: it may happen that a message is delivered more than once, and out of order."
If you have a SQL query that takes 50,000 core-seconds, it's probably more useful to execute that query using 10,000 cores in 5 seconds rather than 10 cores and 5000 seconds, especially if cost is the same. Even better if you never have to spin up a VM or worry about scale. This benefit is tangible and applicable to anyone who runs SQL. The reason this isn't prevalent is because it's economically and technologically prohibitive. BigQuery tips that scale in the other direction.
Point is, higher-level cloud-native services unlock very interesting use cases that are applicable for both small-scale startups and large companies, use cases that are impossible with just VMs.
I'm not really disagreeing (much), but very few things fit that criteria. More common are simple problems so overengineered that they sprawl across two Amazon availability zones when a straightforward implementation could serve the whole customer base off a $20 a month VPS. This is more depressingly common than you think. Also depressingly common is a 50000 CPU second operation that could be a 1 CPU second operation with a few indexes and a smarter algorithm. AWS adds a lot of carbon to the atmosphere cranking through crap code. Trust me I've seen it.
What Amazon and kin have done is offer developers a new sexy way of over engineering. The AWS stack is the new Java OOP design patterns book. Yes, there is occasionally a time when an AbstractSingletonFactory is a good thing but I guarantee you most of those you see in the wild are not those times.
The real genius was to build a jungle gym for sophomore programmers to indulge their need to develop carpal tunnel syndrome where everything bills by the instance, hour, and transaction. If Sun had found a way to bill for every interface implemented and every use of the singleton pattern they would have been the ones buying Oracle.
Likewise, but I think you're getting into the philosophical, not the practical. You may choose to live in a single-CPU world for your database, but you're simply disqualifying yourself from a whole lot of interesting use cases. Index+algo only solves a sliver of analytic use cases. And, ultimately, I'm afraid you're creating a world where you cannot effectively understand the shape of your data and you cannot effectively test your hypotheses, so you go with gut feel. And, perhaps more importantly, you cannot create software that learns from its data.
Your argument can be summarized thus as this - do not give people incredible computing capacity at never-before-seen economic efficiency, because they will use it inefficiently. I'm afraid this argument gets made every time the world gets disrupted technologically (horse vs car anyone?).
Edit: I may argue that if "carbon footprint" is your prerogative, then economies of scale + power efficiency should tilt the scale towards cloud, no? AWS is certainly on the dirtier side, but there are other, greener clouds.
I'm not saying what you think I am saying. The thread was about how the cloud is immensely profitable, and I'm saying that a good chunk of that is built on waste and monetization of programmers' naive tendencies to overcomplicate problems.
I am not arguing that there are no great use cases for these systems. But I would be willing to bet that those are less than half the total load.
It's like big trucks. How many people who drive big trucks actually need big trucks? Personally I like my company's Prius of an infrastructure. :) And of course we've architected it so it can be a fleet or an armada of Priuses if need be, with maybe just a bit of work but if we get there I will be happy to have that problem.
If availability and scale are not important, and you can tolerate having to engage a human in the event of a hardware failure, then sure a $20 VPS might suffice. You could also run a single virtual machine in one zone in the cloud.
But I think you might underestimate the amount of use-cases that do legitimately benefit from and desire a greater degree of reliability and automation. When one of my machines dies, I don't want to be notified, and I don't want to have to do anything about it. I want a new virtual machine to come online with the same software and pick up the slack. Similarly, as my system's traffic grows over time, I want to be able to gradually add machines to a fleet, to handle my scaling problem, or even instruct the system to do that for me.
Plenty of use-cases may not require this, but I'm not convinced that the majority of systems in the cloud do not. Every system benefits from reliability, and it's great to get it cheaply and in a hands-off way. In the cloud, I can build a system where my virtual machine runs on a virtual disk, and if there's a hardware failure, my VM gets restarted on another physical machine and keeps on trucking without my involvement. As an engineer and scientist, I can accomplish a lot more with a foundation like this. I can build systems that require nearly zero maintenance and management to keep running, even over long time scales.
I don't think I disagree with you that some people overengineer systems, but I think I disagree with you about how much effort it requires to achieve solid availability and a high level of automation. It's not a lot of effort or cost, and it's a huge advantage. Once I build a system I never want to touch it again.
A certain segment of users are adopting these technologies because they want to be prepared to scale. One of the advantages of "big data" products even for small use-cases is: all successful use-cases grow over time. If you plan for success and growth, then you may exceed the capabilities of a traditional technology. If you use a "big" technology from the beginning, then you can be confident that you'll be able to solve increases in demand by scaling up, rather than by rearchitecting. As these platforms mature and become easier to use, the scales begin to tip, and they no longer require more engineering time than the alternatives; a strong hosted platform actually requires less time in total, especially when you consider setup and maintenance. Many of these technologies do an excellent job "scaling down" for simple use-cases too. While they have been difficult to use, they're getting easier. For example, MapReduce-paradigm technologies are becoming fairly easy with Apache Hive, and fast with Spark. They're becoming easier to set up due to hosted variants like AWS's ElasticMapReduce or Google Cloud Dataproc, etc.
I don't think you shouldn't make the capability available, but I wish more people would stop to ask, "do I need this?"
Since I do data analysis and machine learning (sometimes), a common one I see is people using "big-data analytics" stacks when they don't have anything remotely in the range of a big-data problem. Everyone really seems to want to have a big-data problem, but then it turns out they have like, single-digit gigabytes of data (or less). And they want Hadoop on top of some infrastructure to scale a fleet of AWS VMs, so they can plot some basic analytics charts on a few gigs of data? They would be better served by the revolutionary new big-data solution "R on a laptop". But somehow many people have convinced themselves they really need Hadoop on AWS.
Though I haven't used it yet, BigQuery does seem interesting in comparison, because it at least seems like it doesn't hurt you much. The Hadoop-on-VMs thing is objectionable rather than merely unnecessary, because you get this complex, over-architected system for what is not a complex problem. BigQuery at least seems like, at worst you end up with basically a cloud-hosted RDBMS with scaling features you don't need, which isn't the end of the world as long as the pricing works for you.
edit: Just to clarify, I'm not the person you were replying to, just someone who also has opinions on this. :)
One is welcome to use staff that cost $20k per month (e.g. DevOps engineers who understand those four technologies well enough to use them in production) to shave ~50% off one's AWS bill, but one needs minimally two to three of them, so your friendly neighborhood insurance company should probably pay their $15k or whatever a month without blinking.
In many (perhaps most) areas of the US, DevOps staff do not cost $20k per month. For example, 92% of Boston rates are less than half that[1].
This implies that a $40k per month bill for AWS[2] could pay for three DevOps engineers and save approximately $10k per month in the vast majority of the US.
Quite true regarding company (fully loaded) costs.
IMHO, a reasonable estimation for fully loaded cost per employee (excluding facilities expendetures) is approximately 1.4 * ES, where "ES" is the employee salary.
The "three DevOps engineers and save $10k" estimation was based on working backward from the 92% of available jobs in Boston being less than half of the stated $20k per month cost. Assuming a Gaussian distribution where 0.5 * $20k per month represents the high end of two standard deviations (since Boston ranks quite highly in S/W salary nationally), most DevOps engineers will be paid roughly half of that as well.
This yielded an estimation of $6.5k per month per DevOps employee or $19.5k per month for three.
Since all of this was off-the-cuff, I figured it best to throw in a bit of "fudge factor" and present a $10k per month savings.
As always, YMMV and I could be completely wrong about all of this :-).
It's not that hard, and if you are so huge that your devops takes a team you have a good problem. If you are a startup then architect your software so it can be scaled but otherwise just stick it somewhere and worry about product market fit. You can decompose and refactor and distribute once your product has enough users for it to matter.
A lot of the difficulty also comes from over engineering and premature scalability obsession. You often just don't need all that. I swear over engineering is the bane of software and devops these days. We've gone from java factoryfactorysingletons to "how many distributed systems fads can I use in one stack?"
I don't see the allure of the costlier cloud other than the "nobody ever got fired for" factor so common in enterprise purchasing. Amazon is the only one that might have a stronger case for it based on its huge managed service stack, but much of that is not too terribly hard to duplicate with other tools and more a la carte services. There also really isn't a reason you can't use some of Amazon's stuff while also using more commodity options.
On a more principled level I'm starting to see huge proprietary cloud as a potential threat to the open Internet. It's not quite there yet but at some point I could see it, especially with the walled garden plays you see around IoT.