> I'm going to assume everyone experiencing this issue is using M/S. Upgrading to HRD will solve your issue.
This is the reason I abandoned AE and part of why adopting a platform that isn't standardized is incredibly dangerous. The problem is technical debt constantly accrues even when you aren't making changes.
Even though the API was unchanged, HRD differs subtly enough that breakage can occur on any non-trivial project. Edge cases (how indices behave within transactions comes to mind, but there are plenty more examples) will see new semantics compared to M/S, and so this "upgrade" involves not only thorough testing and auditing, but likely also code changes and potentially significant engineering hours.
http://goo.gl/HVuaC: These techniques are not needed with the (now deprecated) Master/Slave Datastore, which always returns strongly consistent results for all queries.
This means a project written and signed off circa 2011 requires mandatory engineering costs just to continue running in a functioning and supported fashion. An AE app will never quite resemble that ancient perl5 behemoth running uninterrupted since 1997, as the underlying implementation and recommended APIs are constantly modified and replaced (Datastore, NDB, Python major version).
"A strong test suite will save your soul!" I hear you say, tests that a small project might have survived without if targeting any other platform, and testing on AppEngine is also yet another moving target (for example, testing nested subrequests was all but impossible using the SDK until relatively recently).
The promise was a carefree life for a project willing to code against their proprietary APIs; the reality is a constantly moving target, "not quite free" autoscaling and the threat that while you're asleep an unannounced change will take down your app (I could name a few, but as many will attest this has happened regularly since launch).
> The promise was a carefree life for a project willing to code against their proprietary APIs; the reality is a constantly moving target, "not quite free" autoscaling and the threat that while you're asleep an unannounced change will take down your app (I could name a few, but as many will attest this has happened regularly since launch).
Yeah, I got sucked in with the same promise and I had the exact same sour experience. Including panicky calls from the client when the app suddenly stopped working. The maintenance windows used to plop right in the middle of my client's busy time, once a month at least and often more.
The worst part is the apologists, like lysprr@gmail.com in the original bug report:
> I got here from HackerNews, but after seeing the original poster spam the forums in multiple places and have a bad attitude, I can't blame Google for not fixing what looks to me like a non-issue.
>
> Fuck 'em.
That always reply to your request for help while you're attempting to fix a suddenly dead application and a totally screwed client.
If you've been running a 1997-era Perl app unmodified --including the operating system of the machine it's running on, and security patches to Perl in the mean time -- you are so owned and I really hope you're not storing any important data on that box.
I'm not saying that App Engine is a panacea, but regardless of how you write your code and what technologies you use, there'll be some sort of mandatory maintenance and system administration that you have to do every so often.
That's a fair point, however my personal expectation would be that unlike a perl (or PHP or Python or .. solution), App Engine probably won't exist in its current form as a supported product in 16 years.
Maybe it'll go the way of Wave, or perhaps hopefully the technology style itself will simply be supplanted by newer and better. Regardless I'd say that given an app today, the 1997 perl5 app (MySQL 3.2 and perl5.003 were already circulating) still has much better supportability prospects over this time frame than App Engine ever did or ever will.
The Master/Slave datastore has been deprecated for almost a year. [0] Beyond the latency issues, it simply wasn't reliable; both reads and writes were failing way too often. I'm glad GAE is focusing its resources on the HRD.
Salesforce have quite a nice solution for a similar problem. If you write custom Apex code, the platform will not let you deploy it to live unless it has sufficient unit test coverage. It runs the tests and calculates code coverage when you try and deploy, and if your tests aren't covering enough, no deployment happens. So, you have tests.
Then, upcoming platform changes are released to sandbox environments six months or so before they go live - you can see if your tests run, and have time to keep up with things. You do have to keep up though.
I take it they then don't have any tests of their own to assure backwards compatibility?
Maybe I've been spoiled by using Windows for 20 years but I feel that we should be able to expect better from vendors than this. That goes double if you're paying them, though it sounds in Salesforce's case as if they pay you, because I can't see any other way in which this arrangement would make sense.
> This means a project written and signed off circa 2011 requires mandatory engineering costs just to continue running in a functioning and supported fashion.
Funny (?) thing is, as an engineer at Google, stuff like that happened to me ALL THE TIME. I don't even want to think about how much of my time was spent simply migrating to the "latest greatest" replacement for some critical service that was being deprecated.
Remember the old "thundering herd" problem with Apache children and things of that nature? You'd basically have a whole bunch of processes which had a listening fd from an earlier call to listen(). When a new connection would come in, the kernel would wake all of them, even though only one of them would actually have something to get. The others would go through the process for nothing. It caused a big performance hit back in the day.
Well, imagine now that you have a directory or lock service where you can store things and perform atomic updates. When you do a write to something in it, it fans out to all of its clients, and they all wake up (nearly) simultaneously and receive the update. They then have to do whatever processing you do with new data of that type.
If they all do this at the same time, then you have no processes left to service incoming requests. They're all identically busy with whatever mutexes held in order to apply those config changes safely, so no other work happens on those clients while they load in the new data.
It's not so much that it's taking a mutex and is getting stuck for a little bit, since that's going to happen no matter what. It's that all of the children do it at the same time, so there's nobody to service your hit, and you're guaranteed to get stuck. If it was spread out, then only some percentage of incoming requests would get stuck behind this. The others would get lucky and would hit another instance which either had already run it or hadn't yet run it.
I'm not saying this is what's going on here, but it sure sounds familiar.
On what basis do you think these issues are related? The bug report provides very little insight in what's going on, only that there's a severe performance degradation at 9AM.
The thundering herd problem applied to waking up child processes is one possible explanation, but there are dozens of other explanations that are just as likely, based on the information we're provided with.
The commenter you are replying to is a former Google employee; the description of the lock service sounds like Chubby (http://research.google.com/archive/chubby.html), App Engine likely uses some sort of distributed directory service for keeping track of things like quotas.
9 AM in Brussels is midnight here on the west coast if I've done my time zone math properly. It's the perfect time to push something. Unfortunately, if that means having everything snap-to and then freeze for a couple of seconds, that's not good.
Again, I don't know if this is what happened here. I've just seen this sort of thing before.
Well, the bug report doesn't really invite quick attention. Simply reporting your observations is not enough: you should position yourself as a competent customer, by explaining what you have done to ensure the problem isn't on your side. Mention the code hasn't changed, that you have no database cleanup cronjobs or similar running that could be interfering, etc.
My first instinct when I see a report like this is: he probably has some cronjob running he forgot about; perhaps one whose performance decreased with O(n^2).
By which I'm not saying that Google is right in not replying for days, but by which I am saying that as a customer, there are easy ways to get attention beyond shouting and threatening. Show it's an interesting problem and you're bound to get some techie's attention.
This is a daily outage that affects all our master/slave appengine applications. We know these applications are 'deprecated', but we're still paying significant money for the service and therefore hadn't expected 'deprecated' to mean 'won't be fixed when there are problems'.
Migration to HRD is not trivial even with the tool provided by Google. HRD has a different consistency model, blob keys and associated image serving URLs will change by migrating, and last time we checked any deletes that happen during migration (which can take days) will not make it into the migrated app.
Shedding responsibility for problems arising from continued use is pretty much the essential rationale for deprecating a piece of software. Google has never appeared particularly concerned about backward compatibility or facilitating small segments of its customer and user bases.
"Google has never appeared particularly concerned..." seems to be a recurring theme. The company was designed to work at large scale, and individual problems don't get the attention they would at more customer-oriented companies.
I may be misunderstanding GAE, but isn't the reporter's 'StayUp' servlet a minimal test case? Without any dependence on other datastores or processes, it seems to be showing that something is seriously amiss when handling trivial requests. It's like a demo "Hello World" app... that stops working in a certain time range each day.
I consider him panicking more than shouting and threatening. I couldn't imagine having that kind of treatment as a vps customer else I'll be moving out asap.
Because of this I moved away from GAE over a year ago. And not me alone, when GAE was still hot, 2-3 years ago, you could read tons of blog articles with unsatisfied customers.
So it doesn't surprise me to read about weird performance degradations. Since years GAE suffers from such problems.
Maybe they don't care about small customers and love to hear about them move to Heroku or to good old Virtual Servers. It would be polite to tell upfront though.
What are we actually talking about when discussing "good support" and "bad support"? Is it just someone nice to talk to whilst someone else fixes a problem for you? There was an interesting article along these lines by the former President of Enterprise at Google written recently: http://gigaom.com/2013/01/26/the-delusions-that-companies-ha...
In this case, the GAE feature that underlies this issue is the Master/Slave (MS) datastore. It's been deprecated for ages in favour of the High-Replication Datastore (HRD).
Maybe you don't rationally need someone to talk to when someone is fixing a problem for you -- but I think you need to know that the vendor is _aware_ of the problem, and is working on fixing it.
Or you start freaking out. And I don't think that's entirely irrational.
This is, among other things, why the 'post mortem' has become somewhat popular -- because it allows us to judge "Yeah, those guys DO know what they're doing, they're on top of things, the chances of outages are getting constantly smaller, not larger."
Has Google ever published such a "post-mortem" after an outage? Has Google ever even admitted there was an outage publically?
But also, yeah, rational or not, people like to have someone to talk to. In customer service in general, there are many studies showing that customers satisfaction will be higher when they are treated 'nicely' _without a solution_ than when they are treated brusquely but their problem is solved. This is not actually rational, and I'm not saying I'd like vendors to strive towards that model -- but it is apparently human psychology that vendors may want to take account of.
On the other hand, Google seems to be doing pretty fine how it is going. Although I don't know how GAE is doing, really, compared to competitors.
How Google does it, though, is basically no support at all, right? It's beyond 'good support' or 'bad support' -- with the possible exception of AdWords, is there any Google product where you can ever talk to a human about any support issue at all? For email that might be fine, especially when the email product is pretty darn reliable. For enterprise critical software... it would sure make me nervous.
He falls in to a trap of knowing machine behavior, but not dealing with people behavior.
Insanity #2: I need somebody to talk to when a service interruption occurs
You hear about an earthquake in California, you call your aunt to make sure she is ok.
You are getting bad weather in the area you live, your mom calls and checks on you.
The server you use disappears off the internet and your providers status page hasn't been updated for a week, you '...'?
When something goes wrong, it's not an event that effects everybody (even if it is), it's an event that effects you. As long as humans are still involved in the purchasing and managing of servers you'll always need someone to call and yell at/be soothed by.
That's true. I think that his broader point still stands, though. Once you get beyond variants of "are you working on it or do I need to convince you to?" the role of support is basically catering to irrational desires.
Migrating is an effort but it is always possible. In fact, when you migrate you realize how independent you are. Even if you use tons of APIs, everybody has 'em even if their interfaces are different.
>Well, the bug report doesn't really invite quick attention. Simply reporting your observations is not enough: you should position yourself as a competent customer
If it was one person experiencing the problem, you would be right. But it's a number of people.
Google support borders upon the farcical. It doesn't appear to be costing them too much money in the grand scheme of things, which is sort of surprising to me.
I had a support guy tell me I had to get Apple's legal team to contact Google so I could use the "Mac" trademark, because I happened to be selling a piece of software that ran on OS X. Like that's ever going to happen. My ad simply said "Try ____, a better way to _____ on Windows and Mac.", linking them to http://www.apple.com/legal/trademark/guidelinesfor3rdparties... didn't quite cut it, apparently, even though it clearly states that such use is acceptable under "2. Compatibility" near the top of the page. i.e. I can say my product runs on Mac if in fact it runs on Mac.
Approving a ten word ad takes Google over a week, in my experience. Baffling.
All this with their adwords $100 free trial. All that trial did was convince me that I should never ever in the life of the universe commit any money to Google, because they made it starkly apparent that I would never get what I paid for... running honest ads for honest products in a reasonable timeframe. I went with other ad networks in the end and had zero trouble whatsoever, and infinitely faster approval times. I suppose I may have had a smaller audience, but the headaches Google causes aren't worth the extra money.
Someone is going to come along and pull the rug out from under Google eventually. You can't rest on your laurels forever.
This customer is complaining about a service component which has been deprecated since almost 11 months ago. There is a tool which migrates application data from the old datastore to the new one. When you don't move off of deprecated infrastructure, I'd say you've set yourself up for problems.
11 months huh? That's barely anything for larger enterprise customers, about long enough to make it onto a project plan. Most enterprise software companies will provide support for 5-8 years.
There are tools that can migrate you from python 2 to python 3, or from Oracle to postgres. But it's not something you do lightly. Switching from M/S to HRD in AppEngine is similarly not something you do lightly.
because we're paying customers? my bill is peanuts, but i know there are many big customers, e.g. Khan Academy. and also they have Premier Support which is $500/mo.
"You just have to pay for support and you get support? I don't believe it, there must be more to it than that!"
Whether you believe it or not, a Premier account is what you need if you want support. You could argue that $500/mo is too expensive, but it is what it is.
I've paid over $50,000 a year for Google Maps. I assure you, their support sucks no matter how much you pay them. We pay a small fraction of that for AWS, and Amazon's support is infinitely better.
The potential bad press that Google would get if Khan Academy had daily issues due bad GAE performance I'm sure that would cost them much more than the 500$/month that you're paying. Being big and nice(or cool) I think it gives and edge here.
there is support if you pay for it. in general whenever I file a ticket I get a response within hours. that's at least as good(if not better) as working with any other vendor in my experience. So... where's this 0 support you are talking about?
A GAE user sees a problem of his service being slow, writes a frantic bug report with caps and exclamation marks and threatens to leave GAE. As a GAE user myself, two questions come to mind:
1. Is GAE outside of their .9995 SLA* uptime? If they aren't, then it probably isn't important enough spend time looking into it. Customers cannot expect better than the agreed upon uptime percent, and hosting companies are obligated to reimburse customers if they go below SLA. Both of these are covered in the SLA doc.
2. Is it reproducible? So far, the bug report mentions 2 people out of GAE users. Is 2 people enough to say its a problem with GAE? One person is panicked, and the other provides few details for the bug report.
1. 0.9995 SLA means about 6 minutes of downtime a month. Since it's a daily event, I'm guessing that yes, the SLA is violated.
2. It's a problem that is occurring daily, with a test case that has pretty much no code at all. That in itself does not prove anything, but it really makes me wonder how it could be a problem on the user's side.
Having never used GAE, it would nice if someone could expand M/S and HRD for me.
It looks like OP of the bug-report is using a depreciated feature/program which according to the Project Member is causing latency issues at a specific time daily. But that could not be the real issue since another commentator who is using the new HRD is also having the same problem. It is even frustrating for people who are reading this. All it implies is the lack of communication from Google when something goes awry. Come on Google, stop reinforcing my stereotypes about your customer support!
Selling to a customer is different than selling to a business, you may have a great product at a great price but if you offer terrible CS, in the B2B world everyone is going to avoid you. It is a place where support is valued more than the product itself.
Therefore, unless you start offering a decent CS, you can lower your price all you want, I will be sticking with AWS.
The rates are the same ($1 per million writes and $0.70 per million reads beyond the daily free threshold), but the daily free threshold is 0.05 million of each for master/slave, and 0.01 million of each for high-replication.
For small applications it costs more because of thresholds for free services is lower. In our case it costs a lot more since some of things we were doing need another instance on HRD that we didn't need on M/S
I manage the 3rd line support of some of the busiest websites in the world (we provide back-end e-commerce software).
I can't say I think much of google's response here. Nearly two weeks before the first comment, and then shut down after 2 days and a question directed at who knows who, and no explanation?
The analysis elsewhere on here suggests they're violating SLA, so this should get more attention. I'm guessing support is under-resourced @ google, and the culture of support is a bit shabby (no acknowledgement of inconvenience or indication or evidence of work undertaken in the background) - hardly surprising for a large-scale software business based on free services.
I'm sorry, but this is the price you pay for running your business that is dependent TOTALLY on a 3rd party service. Forget Google, everyone out there is most likely the same, that's why it's important for you to run your 'apps' on something you have control over - Like Linode, AWS, Rackspace, Openshift, etc. and also have back-up nodes from other providers for redundancy, for emergency situations, incase of
storms, etc.
I would recommend trying your apps on OpenStack (Openshift in particular), which doesn't have the vendor lock-in, which you face right now.
To their credit, people are apparently using something that's been deprecated and should be changed regardless. At least, that was their conclusion when it was changed to wontfix. The replies are very rare and curt though, I can't really say it's quality service when you're paying for a product.
Customer support from Google has always been like this as far as I've experienced and heard. There is no way to actually reach and converse with anyone, regardless whether you are paying them for the service or what kind of request it is.
Once a Google employee randomly replied to a complaint of mine about Google+ (I didn't even +mention them). After a few comments and him confirming that it was added to the bugs list, I asked if it was okay to +mention him in the future with similar issues. It was okay. I did. He never showed his face again. (His profile still says "Works at Google+".)
Another Google employee I know online also never replies to anything concerning Google. I know he works on the Google+ project, but I can only hope he passes on any bugs I +mentioned him in.
For Youtube, you can post in their forums but merely hope for a reply. Copyright complaint disputes are no priority, either.
I haven't used many paid products, but I have read about their customer support being one of the very worst and also have never been able to find a single e-mail address or phone number to get support at for any service.
Edit: By the way, I would have moved away from the Google Apps Engine a long time ago if my app went down every morning during rush hour for 10 days straight.
It is interesting that basically no one (including the news poster) noticed that there is an comment (#12) which states that this problem happens on HRD too.
This statement may be false and/or a completely different issue, but at least it should be considered here for HN comments which state "M/S is deprecated, Google is right, just use HRD."
A bug tracker seems like a horrible way to report production (or non-production) support issues. This is the same bug tracker OSS projects on Google Code use.
Is it really helpful for the public to comment on my support request? Seems like the signal to noise ratio would be quite low, and then you get inane comments like:
I got here from HackerNews, but after seeing the original poster spam the forums in multiple places and have a bad attitude, I can't blame Google for not fixing what looks to me like a non-issue.
Fuck 'em.
You have to believe that the choice of tools has some bearing on the quality of the response from Google. Seems like there is very little incentive for any "Project members" to trawl through open bug reports when no one is ever responsible.
Not surprising... the second most-voted bug in Google Code, reported exactly a year ago ( http://code.google.com/p/support/issues/detail?id=24324 ) deplores the removal of a feature that was already there (the Updates page) and was the single most useful feature in Google Code for many of us. After one year and more than 800 people registering their interest on the issue, they haven't even explained why they removed it or whether there are any plans of brinding it back.
"M/S is deprecated and there is a clear and straightforward path to migrating to HRD."
M/S was deprecated April 4, 2012, so it has been some time since the notice has been out there. High replication data store has been available for over 2 years now. Whether or not less than a year is too short a deprecation period is another issue.
Ok, so here's the deal. If your app runs exclusively on GAE you've essentially tied yourself to one cloud vendor. Now disregarding the respective benefits and drawbacks of google as a hosting company for your app (I would never do that), being dependent on one cloud provider is a very bad idea. No matter if you run on EC2, Azure or GAE, if you can't seamlessly switch to another provider, you're screwed. These all go down regularly and have issues. They're big companies, you're a small company, you have no such thing as "recourse". The court of public opinion will not save your company.
Agree with this to an extent however the company I work for deploys on AWS and is far too cautious about vendor lock-in, to the point where we use AWS basically as a VPS, not a cloud service, and get none of the advantages (and all of the disadvantages, e.g. worse performance, higher price).
Many on the thread say the reporters are over-reacting. They are not. What would amazon do? They would not consider this an issue, would respond in less than 24 hours, and would take complete responsibility. GAE is a pay service. I think this level of service is pathetic.
As noted the only attempt at diagnosis is completely wrong (even the reporter is not on MS) and very late.
Few people (who act obnoxious as hell) report a problem that can be solved by moving away from a deprecated system, yet they fail to even read the note because they're busy smashing exclamation marks into the issue tracker.
The problem apparently also occurs on the non-deprecated system. I can understand their frustration after not getting a reply for X days on what seems to be a critical issue for them. That's not "obnoxious as hell", that's customers panicking. You really don't want your customers panicking about your service.
The two datastores even have the same API. As long as your app doesn't depend on the exact performance characteristics of the old one, the migration is very straightforward. I did it for one of my apps in a morning and was done well before lunch.
The problem with saying things like "Support packages are available", is that time and again we see paying Google customers with support packages being treated awfully.
These are paying customers who are paying a non-trivial amount of money for support (though not the "Premium" support in this case, which is an extra $500 per month for GAE).
We're a paying customer of GAE. I think it's quite clear that paying for the basic service doesn't include support beyond the public issue tracker and the forums. Support packages start at $150 per month, and at that point you get a 4 hour response time. I think that's entirely reasonable. We have yet to sign up for a support level, but then again we're not really seeing any troubles with the service.
Customer support of Google really sucks!
Currently the GAE cloud has a reliability problem (also for new customers).
Instances are restarted like crazy. This leads to downtimes. But that's not enough. Customers have even to pay more(!) instance hours because of this.
There is the running gag on the mailing-list: "Whenever GAE is unreliable for weeks Google needed to make revenue targets ;-)"
BTW, this issue is not simply a due to MS, it also happens on HRD. So any google support apologists here, please read the BUG thread submitted by this poor customer before dismissing it simply as a 'migration issue'.
I have had some issues with Google Docs (paid for premier commercial account). Some documents we had stored simply vanished from our account.
After getting the run around for 3-4 days, finally a google engineer tolds us they can't help us recover the documents THEY 'lost' unless we have the URL to the document ...
Thankfully someone on our team had kept the URL when I first shared that document with them (1+ year after the document had been created).
Quick tip for anyone making a system with high load and daily or hourly quotas: When an account is created, assign a random start time (e.g. 05:43 for daily quotas or minute 12 for hourly) to measure that account's quotas against. Then you can avoid this issue of the system getting a huge spike in load when everyone's quota refreshes at the same time.
It happens that 9 AM Brussels time is midnight pacific time. I'm sure Google is running some maintentance cron at midnight thinking "This is a low demand time," and it is, across the US, but not in Brussels. These are old instances, and Google probably doesn't want to re-time or rewrite the cron job to be more efficient.
"sentimentally is a tool that determines sentiment of your emails. Once determined, it helps you gauge your relationships with co-workers, customers, friends, or other individuals based on the tone of your conversations with these people."
Because they did not post any kind of evidence (request logs, Pingdom report, etc.), not to mention the App ID in question (so that Google would know where to look). All too often, bug reports end up being some kind of misunderstanding.
I am also a GAE-user, I have had no problems like the OP. But I start to miss a fundamental feature, sockets. I have worked around it by using other services and polling.
Maybe wrong forum, but is there any infrastructure templates for setting up a scalable web/db/loadbalancer/memcached for a simple tradional webservice, in my case a game?
I want to be able to sleep at night, and easily scale up by adding some more machines in case of higher load.
I could use denormalized myslq/postgre or mongodb for speed. Preferred language is Python (or maybe c# or java).
Depending on your budget (isn't it always..?), speak to Rightscale - they provide a set of frameworks to deploy infrastructure to various cloud platforms, and can handle auto-scaling and all that stuff.
Response from us was initially muted because it looked like it only affected M/S apps, but it turns out (a) it can impact HRD as well, and (b) we're pretty unhappy about the level of impact for many M/S apps so we're looking at ways to resolve. It's a high priority and we're looking at a number of ways to address it. It's also a pretty interesting issue, because indirectly it's caused by (a) the large scale that App Engine is running, and (b) the large extent with which GAE is running free applications.
Regardless, apologies to those who felt support was unresponsive. We are working very hard to improve support. For the sophisticated audience that comes to these pages, please link to me on Google+ to get my attention if we are failing you (https://plus.sandbox.google.com/110401818717224273095).
I've heard a lot of people saying this is why Google can't get a lot of businesses to sign on. There's no one for the CEO to call and complain to directly when their stuff is down.
This is the reason I abandoned AE and part of why adopting a platform that isn't standardized is incredibly dangerous. The problem is technical debt constantly accrues even when you aren't making changes.
Even though the API was unchanged, HRD differs subtly enough that breakage can occur on any non-trivial project. Edge cases (how indices behave within transactions comes to mind, but there are plenty more examples) will see new semantics compared to M/S, and so this "upgrade" involves not only thorough testing and auditing, but likely also code changes and potentially significant engineering hours.
http://goo.gl/HVuaC: These techniques are not needed with the (now deprecated) Master/Slave Datastore, which always returns strongly consistent results for all queries.
This means a project written and signed off circa 2011 requires mandatory engineering costs just to continue running in a functioning and supported fashion. An AE app will never quite resemble that ancient perl5 behemoth running uninterrupted since 1997, as the underlying implementation and recommended APIs are constantly modified and replaced (Datastore, NDB, Python major version).
"A strong test suite will save your soul!" I hear you say, tests that a small project might have survived without if targeting any other platform, and testing on AppEngine is also yet another moving target (for example, testing nested subrequests was all but impossible using the SDK until relatively recently).
The promise was a carefree life for a project willing to code against their proprietary APIs; the reality is a constantly moving target, "not quite free" autoscaling and the threat that while you're asleep an unannounced change will take down your app (I could name a few, but as many will attest this has happened regularly since launch).