Issue 8788 - Every day around 9 AM Brussels time, huge drop in GAE performance

hosay123 · on Feb 27, 2013

> I'm going to assume everyone experiencing this issue is using M/S. Upgrading to HRD will solve your issue.

This is the reason I abandoned AE and part of why adopting a platform that isn't standardized is incredibly dangerous. The problem is technical debt constantly accrues even when you aren't making changes.

Even though the API was unchanged, HRD differs subtly enough that breakage can occur on any non-trivial project. Edge cases (how indices behave within transactions comes to mind, but there are plenty more examples) will see new semantics compared to M/S, and so this "upgrade" involves not only thorough testing and auditing, but likely also code changes and potentially significant engineering hours.

http://goo.gl/HVuaC: These techniques are not needed with the (now deprecated) Master/Slave Datastore, which always returns strongly consistent results for all queries.

This means a project written and signed off circa 2011 requires mandatory engineering costs just to continue running in a functioning and supported fashion. An AE app will never quite resemble that ancient perl5 behemoth running uninterrupted since 1997, as the underlying implementation and recommended APIs are constantly modified and replaced (Datastore, NDB, Python major version).

"A strong test suite will save your soul!" I hear you say, tests that a small project might have survived without if targeting any other platform, and testing on AppEngine is also yet another moving target (for example, testing nested subrequests was all but impossible using the SDK until relatively recently).

The promise was a carefree life for a project willing to code against their proprietary APIs; the reality is a constantly moving target, "not quite free" autoscaling and the threat that while you're asleep an unannounced change will take down your app (I could name a few, but as many will attest this has happened regularly since launch).

Ensorceled · on Feb 27, 2013

> The promise was a carefree life for a project willing to code against their proprietary APIs; the reality is a constantly moving target, "not quite free" autoscaling and the threat that while you're asleep an unannounced change will take down your app (I could name a few, but as many will attest this has happened regularly since launch).

Yeah, I got sucked in with the same promise and I had the exact same sour experience. Including panicky calls from the client when the app suddenly stopped working. The maintenance windows used to plop right in the middle of my client's busy time, once a month at least and often more.

The worst part is the apologists, like lysprr@gmail.com in the original bug report:

> I got here from HackerNews, but after seeing the original poster spam the forums in multiple places and have a bad attitude, I can't blame Google for not fixing what looks to me like a non-issue. > > Fuck 'em.

That always reply to your request for help while you're attempting to fix a suddenly dead application and a totally screwed client.

on Feb 27, 2013

[deleted]

BCM43 · on Feb 27, 2013

What? Their message is at the bottom of the linked page, with their email above it.

mcmillen · on Feb 27, 2013

If you've been running a 1997-era Perl app unmodified --including the operating system of the machine it's running on, and security patches to Perl in the mean time -- you are so owned and I really hope you're not storing any important data on that box.

I'm not saying that App Engine is a panacea, but regardless of how you write your code and what technologies you use, there'll be some sort of mandatory maintenance and system administration that you have to do every so often.

hosay123 · on Feb 27, 2013

That's a fair point, however my personal expectation would be that unlike a perl (or PHP or Python or .. solution), App Engine probably won't exist in its current form as a supported product in 16 years.

Maybe it'll go the way of Wave, or perhaps hopefully the technology style itself will simply be supplanted by newer and better. Regardless I'd say that given an app today, the 1997 perl5 app (MySQL 3.2 and perl5.003 were already circulating) still has much better supportability prospects over this time frame than App Engine ever did or ever will.

Benferhat · on Feb 27, 2013

The Master/Slave datastore has been deprecated for almost a year. [0] Beyond the latency issues, it simply wasn't reliable; both reads and writes were failing way too often. I'm glad GAE is focusing its resources on the HRD.

[0] http://googleappengine.blogspot.com/2012/04/masterslave-data...

muraiki · on Feb 27, 2013

The problem is occurring during the same timeframe for someone else who is using the HRD: https://code.google.com/p/googleappengine/issues/detail?id=8...

codeulike · on Feb 27, 2013

Salesforce have quite a nice solution for a similar problem. If you write custom Apex code, the platform will not let you deploy it to live unless it has sufficient unit test coverage. It runs the tests and calculates code coverage when you try and deploy, and if your tests aren't covering enough, no deployment happens. So, you have tests.

Then, upcoming platform changes are released to sandbox environments six months or so before they go live - you can see if your tests run, and have time to keep up with things. You do have to keep up though.

to3m · on Feb 27, 2013

I take it they then don't have any tests of their own to assure backwards compatibility?

Maybe I've been spoiled by using Windows for 20 years but I feel that we should be able to expect better from vendors than this. That goes double if you're paying them, though it sounds in Salesforce's case as if they pay you, because I can't see any other way in which this arrangement would make sense.

codeulike · on March 1, 2013

> I take it they then don't have any tests of their own to assure backwards compatibility?

I think they do. They take that pretty seriously, breaking changes to the API are rare and usually obscure edge cases.

wallywax · on Feb 27, 2013

> This means a project written and signed off circa 2011 requires mandatory engineering costs just to continue running in a functioning and supported fashion.

Funny (?) thing is, as an engineer at Google, stuff like that happened to me ALL THE TIME. I don't even want to think about how much of my time was spent simply migrating to the "latest greatest" replacement for some critical service that was being deprecated.

theatrus2 · on Feb 27, 2013

I think thats the modus operandi of any big internet company with significant internal infrastructure. The next big thing is always coming.

rachelbythebay · on Feb 27, 2013

Remember the old "thundering herd" problem with Apache children and things of that nature? You'd basically have a whole bunch of processes which had a listening fd from an earlier call to listen(). When a new connection would come in, the kernel would wake all of them, even though only one of them would actually have something to get. The others would go through the process for nothing. It caused a big performance hit back in the day.

Well, imagine now that you have a directory or lock service where you can store things and perform atomic updates. When you do a write to something in it, it fans out to all of its clients, and they all wake up (nearly) simultaneously and receive the update. They then have to do whatever processing you do with new data of that type.

If they all do this at the same time, then you have no processes left to service incoming requests. They're all identically busy with whatever mutexes held in order to apply those config changes safely, so no other work happens on those clients while they load in the new data.

It's not so much that it's taking a mutex and is getting stuck for a little bit, since that's going to happen no matter what. It's that all of the children do it at the same time, so there's nobody to service your hit, and you're guaranteed to get stuck. If it was spread out, then only some percentage of incoming requests would get stuck behind this. The others would get lucky and would hit another instance which either had already run it or hadn't yet run it.

I'm not saying this is what's going on here, but it sure sounds familiar.

stingraycharles · on Feb 27, 2013

On what basis do you think these issues are related? The bug report provides very little insight in what's going on, only that there's a severe performance degradation at 9AM.

The thundering herd problem applied to waking up child processes is one possible explanation, but there are dozens of other explanations that are just as likely, based on the information we're provided with.

brown9-2 · on Feb 27, 2013

The commenter you are replying to is a former Google employee; the description of the lock service sounds like Chubby (http://research.google.com/archive/chubby.html), App Engine likely uses some sort of distributed directory service for keeping track of things like quotas.

rachelbythebay · on Feb 27, 2013

9 AM in Brussels is midnight here on the west coast if I've done my time zone math properly. It's the perfect time to push something. Unfortunately, if that means having everything snap-to and then freeze for a couple of seconds, that's not good.

Again, I don't know if this is what happened here. I've just seen this sort of thing before.

Confusion · on Feb 27, 2013

Well, the bug report doesn't really invite quick attention. Simply reporting your observations is not enough: you should position yourself as a competent customer, by explaining what you have done to ensure the problem isn't on your side. Mention the code hasn't changed, that you have no database cleanup cronjobs or similar running that could be interfering, etc.

My first instinct when I see a report like this is: he probably has some cronjob running he forgot about; perhaps one whose performance decreased with O(n^2).

By which I'm not saying that Google is right in not replying for days, but by which I am saying that as a customer, there are easy ways to get attention beyond shouting and threatening. Show it's an interesting problem and you're bound to get some techie's attention.

thijser · on Feb 27, 2013

We filed a more concise bug report too: https://code.google.com/p/googleappengine/issues/detail?id=8...

This is a daily outage that affects all our master/slave appengine applications. We know these applications are 'deprecated', but we're still paying significant money for the service and therefore hadn't expected 'deprecated' to mean 'won't be fixed when there are problems'.

Migration to HRD is not trivial even with the tool provided by Google. HRD has a different consistency model, blob keys and associated image serving URLs will change by migrating, and last time we checked any deletes that happen during migration (which can take days) will not make it into the migrated app.

brudgers · on Feb 27, 2013

Shedding responsibility for problems arising from continued use is pretty much the essential rationale for deprecating a piece of software. Google has never appeared particularly concerned about backward compatibility or facilitating small segments of its customer and user bases.

mark-r · on Feb 27, 2013

"Google has never appeared particularly concerned..." seems to be a recurring theme. The company was designed to work at large scale, and individual problems don't get the attention they would at more customer-oriented companies.

gojomo · on Feb 27, 2013

I may be misunderstanding GAE, but isn't the reporter's 'StayUp' servlet a minimal test case? Without any dependence on other datastores or processes, it seems to be showing that something is seriously amiss when handling trivial requests. It's like a demo "Hello World" app... that stops working in a certain time range each day.

bodegajed · on Feb 27, 2013

I consider him panicking more than shouting and threatening. I couldn't imagine having that kind of treatment as a vps customer else I'll be moving out asap.

blablabla123 · on Feb 27, 2013

Because of this I moved away from GAE over a year ago. And not me alone, when GAE was still hot, 2-3 years ago, you could read tons of blog articles with unsatisfied customers.

So it doesn't surprise me to read about weird performance degradations. Since years GAE suffers from such problems.

Maybe they don't care about small customers and love to hear about them move to Heroku or to good old Virtual Servers. It would be polite to tell upfront though.

meaty · on Feb 27, 2013

If you were a VPS customer, you'd be less locked in and can just walk to another vendor so they actually are shit hot with support usually.

It was obvious when I first tried it that GAE has crappy support.

jfoster · on Feb 27, 2013

What are we actually talking about when discussing "good support" and "bad support"? Is it just someone nice to talk to whilst someone else fixes a problem for you? There was an interesting article along these lines by the former President of Enterprise at Google written recently: http://gigaom.com/2013/01/26/the-delusions-that-companies-ha...

In this case, the GAE feature that underlies this issue is the Master/Slave (MS) datastore. It's been deprecated for ages in favour of the High-Replication Datastore (HRD).

jrochkind1 · on Feb 27, 2013

Maybe you don't rationally need someone to talk to when someone is fixing a problem for you -- but I think you need to know that the vendor is _aware_ of the problem, and is working on fixing it.

Or you start freaking out. And I don't think that's entirely irrational.

This is, among other things, why the 'post mortem' has become somewhat popular -- because it allows us to judge "Yeah, those guys DO know what they're doing, they're on top of things, the chances of outages are getting constantly smaller, not larger."

Has Google ever published such a "post-mortem" after an outage? Has Google ever even admitted there was an outage publically?

But also, yeah, rational or not, people like to have someone to talk to. In customer service in general, there are many studies showing that customers satisfaction will be higher when they are treated 'nicely' _without a solution_ than when they are treated brusquely but their problem is solved. This is not actually rational, and I'm not saying I'd like vendors to strive towards that model -- but it is apparently human psychology that vendors may want to take account of.

On the other hand, Google seems to be doing pretty fine how it is going. Although I don't know how GAE is doing, really, compared to competitors.

How Google does it, though, is basically no support at all, right? It's beyond 'good support' or 'bad support' -- with the possible exception of AdWords, is there any Google product where you can ever talk to a human about any support issue at all? For email that might be fine, especially when the email product is pretty darn reliable. For enterprise critical software... it would sure make me nervous.

pixl97 · on Feb 27, 2013

He falls in to a trap of knowing machine behavior, but not dealing with people behavior.

Insanity #2: I need somebody to talk to when a service interruption occurs

You hear about an earthquake in California, you call your aunt to make sure she is ok.

You are getting bad weather in the area you live, your mom calls and checks on you.

The server you use disappears off the internet and your providers status page hasn't been updated for a week, you '...'?

When something goes wrong, it's not an event that effects everybody (even if it is), it's an event that effects you. As long as humans are still involved in the purchasing and managing of servers you'll always need someone to call and yell at/be soothed by.

jfoster · on Feb 27, 2013

That's true. I think that his broader point still stands, though. Once you get beyond variants of "are you working on it or do I need to convince you to?" the role of support is basically catering to irrational desires.

nightpool · on Feb 27, 2013

There are at least 2 people on HRD that are experiencing this issue too, per TFA

blablabla123 · on Feb 27, 2013

Migrating is an effort but it is always possible. In fact, when you migrate you realize how independent you are. Even if you use tons of APIs, everybody has 'em even if their interfaces are different.

blablabla123 · on Feb 27, 2013

>Well, the bug report doesn't really invite quick attention. Simply reporting your observations is not enough: you should position yourself as a competent customer

If it was one person experiencing the problem, you would be right. But it's a number of people.

Ensorceled · on Feb 27, 2013

You pretty much only have bug reports and google groups.

As other commenters note: they DID provide a minimal test case that showed the problem wasn't their side.

Unfortunately with GAE support screaming and yelling is pretty much the only recourse.

benjaminwootton · on Feb 27, 2013

Google support is an absolute disgrace.

I had a Nexus 7 go AWOL at Christmas and I've never had such a shambolic customer service experience.

They have absolutely no respect or customer service ethos when it comes to people who are actually paying them real cash money.

Not in a million years would I sign off on hosting a production project on App Engine.

PostOnce · on Feb 27, 2013

Google support borders upon the farcical. It doesn't appear to be costing them too much money in the grand scheme of things, which is sort of surprising to me.

I had a support guy tell me I had to get Apple's legal team to contact Google so I could use the "Mac" trademark, because I happened to be selling a piece of software that ran on OS X. Like that's ever going to happen. My ad simply said "Try ____, a better way to _____ on Windows and Mac.", linking them to http://www.apple.com/legal/trademark/guidelinesfor3rdparties... didn't quite cut it, apparently, even though it clearly states that such use is acceptable under "2. Compatibility" near the top of the page. i.e. I can say my product runs on Mac if in fact it runs on Mac.

Approving a ten word ad takes Google over a week, in my experience. Baffling.

All this with their adwords $100 free trial. All that trial did was convince me that I should never ever in the life of the universe commit any money to Google, because they made it starkly apparent that I would never get what I paid for... running honest ads for honest products in a reasonable timeframe. I went with other ad networks in the end and had zero trouble whatsoever, and infinitely faster approval times. I suppose I may have had a smaller audience, but the headaches Google causes aren't worth the extra money.

Someone is going to come along and pull the rug out from under Google eventually. You can't rest on your laurels forever.

/rant

Ironlink · on Feb 27, 2013

This customer is complaining about a service component which has been deprecated since almost 11 months ago. There is a tool which migrates application data from the old datastore to the new one. When you don't move off of deprecated infrastructure, I'd say you've set yourself up for problems.

fzzzy · on Feb 27, 2013

There is a customer complaining that it is happening on the non-deprecated solution, as well.

logn · on Feb 27, 2013

11 months huh? That's barely anything for larger enterprise customers, about long enough to make it onto a project plan. Most enterprise software companies will provide support for 5-8 years.

scottbartell · on Feb 27, 2013

A comment from Google explaining the issue wouldn't be much to ask. "Move off of deprecated infrastructure" is much better than radio silence.

mrgoldenbrown · on Feb 27, 2013

There are tools that can migrate you from python 2 to python 3, or from Oracle to postgres. But it's not something you do lightly. Switching from M/S to HRD in AppEngine is similarly not something you do lightly.

davedx · on Feb 27, 2013

Why would anyone put anything production critical on Google these days, knowing that they provide 0 support across most of their business?

marban · on Feb 27, 2013

"No support" > not exactly true: http://googleappengine.blogspot.com/2013/02/google-cloud-pla...

wiradikusuma · on Feb 27, 2013

because we're paying customers? my bill is peanuts, but i know there are many big customers, e.g. Khan Academy. and also they have Premier Support which is $500/mo.

raverbashing · on Feb 27, 2013

One thing I've learned with Google is that they don't give a rat's ass if you're a paying customer or not.

Khan Academy may have it easier, I'm sure Google won't let them down

Really, go somewhere else, spend less money and have better support.

(At the expense of, if you're lucky, Google will give you almost zero headaches)

dylanvee · on Feb 27, 2013

There's nothing about Khan Academy's application (or any other customer's) that would somehow immunize it from platform-wide serving issues.

raverbashing · on Feb 27, 2013

Of course not, like when AWS fails Netflix stops working like every customer.

But you can bet that if Khan Academy has a problem it will be looked into with extra attention.

dylanvee · on Feb 27, 2013

You're correct--because we have a Premier account, which anyone else can obtain too: https://developers.google.com/appengine/docs/premier/

raverbashing · on Feb 27, 2013

Well, Caveat emptor

Also, Khan academy receives funding from Google ( http://en.wikipedia.org/wiki/Khan_academy ), so I don't think it boils down to only having a Premier account.

codeka · on Feb 27, 2013

"You just have to pay for support and you get support? I don't believe it, there must be more to it than that!"

Whether you believe it or not, a Premier account is what you need if you want support. You could argue that $500/mo is too expensive, but it is what it is.

griffordson · on Feb 27, 2013

I've paid over $50,000 a year for Google Maps. I assure you, their support sucks no matter how much you pay them. We pay a small fraction of that for AWS, and Amazon's support is infinitely better.

raverbashing · on Feb 27, 2013

Oh, I believe you can pay and get support

What I don't believe is that the support provided by Google is good or sufficient. Based on experiences with paying Google Apps, I'd say it's not.

kordless · on Feb 27, 2013

My question is whether or not you personally have paid google for said support and then not received what was promised?

samuel · on Feb 27, 2013

The potential bad press that Google would get if Khan Academy had daily issues due bad GAE performance I'm sure that would cost them much more than the 500$/month that you're paying. Being big and nice(or cool) I think it gives and edge here.

dylanvee · on Feb 27, 2013

Yep. We (Khan Academy) have a lot riding on App Engine, and they are supportive when we run into issues with the platform.

blablabla123 · on Feb 27, 2013

http://www.google.com/intl/en/enterprise/apps/business/prici...

Either you click "Start Free Trial" or "Contact Sales". The one is automatic registration, the other involves human interaction.

I guess Google Apps illustrates this the best. IMHO it's somehow the glue that keeps your stuff together when you use Google as hoster.

Ironlink · on Feb 27, 2013

Speaking of support, Google recently announced support packages for the Cloud Platform group of services.

https://cloud.google.com/support/packages

tbatterii · on Feb 27, 2013

there is support if you pay for it. in general whenever I file a ticket I get a response within hours. that's at least as good(if not better) as working with any other vendor in my experience. So... where's this 0 support you are talking about?

afhof · on Feb 27, 2013

A GAE user sees a problem of his service being slow, writes a frantic bug report with caps and exclamation marks and threatens to leave GAE. As a GAE user myself, two questions come to mind:

1. Is GAE outside of their .9995 SLA* uptime? If they aren't, then it probably isn't important enough spend time looking into it. Customers cannot expect better than the agreed upon uptime percent, and hosting companies are obligated to reimburse customers if they go below SLA. Both of these are covered in the SLA doc.

2. Is it reproducible? So far, the bug report mentions 2 people out of GAE users. Is 2 people enough to say its a problem with GAE? One person is panicked, and the other provides few details for the bug report.

*https://developers.google.com/appengine/sla

efdee · on Feb 27, 2013

1. 0.9995 SLA means about 6 minutes of downtime a month. Since it's a daily event, I'm guessing that yes, the SLA is violated. 2. It's a problem that is occurring daily, with a test case that has pretty much no code at all. That in itself does not prove anything, but it really makes me wonder how it could be a problem on the user's side.

mark-r · on Feb 27, 2013

My math for downtime per month works out a little differently: (1-0.9995)x30x24x60 = 21.6 minutes. Still I mostly agree with you.

stickfigure · on Feb 27, 2013

Not trying to apologize for G here, but just making sure the facts are straight:

There is no SLA for M/S applications (which run on completely different infrastructure). GAE has only ever offered a SLA for HRD applications.

nivla · on Feb 27, 2013

Having never used GAE, it would nice if someone could expand M/S and HRD for me.

It looks like OP of the bug-report is using a depreciated feature/program which according to the Project Member is causing latency issues at a specific time daily. But that could not be the real issue since another commentator who is using the new HRD is also having the same problem. It is even frustrating for people who are reading this. All it implies is the lack of communication from Google when something goes awry. Come on Google, stop reinforcing my stereotypes about your customer support!

Selling to a customer is different than selling to a business, you may have a great product at a great price but if you offer terrible CS, in the B2B world everyone is going to avoid you. It is a place where support is valued more than the product itself.

Therefore, unless you start offering a decent CS, you can lower your price all you want, I will be sticking with AWS.

ronyeh · on Feb 27, 2013

M/S is Master/Slave: https://developers.google.com/appengine/docs/python/datastor...

HRD is High Replication Datastore: https://developers.google.com/appengine/docs/adminconsole/mi...

M/S is deprecated, and HRD is the new hotness (and it conveniently costs more).

moobirubi · on Feb 27, 2013

I think it cost the same, when it was first out it cost more, but now it cost the same.

jfoster · on Feb 27, 2013

The rates are the same ($1 per million writes and $0.70 per million reads beyond the daily free threshold), but the daily free threshold is 0.05 million of each for master/slave, and 0.01 million of each for high-replication.

Ensorceled · on Feb 27, 2013

For small applications it costs more because of thresholds for free services is lower. In our case it costs a lot more since some of things we were doing need another instance on HRD that we didn't need on M/S

nivla · on Feb 27, 2013

Thank You very much. That puts a lot of context to what I read in the report.

zwischenzug · on Feb 27, 2013

I manage the 3rd line support of some of the busiest websites in the world (we provide back-end e-commerce software).

I can't say I think much of google's response here. Nearly two weeks before the first comment, and then shut down after 2 days and a question directed at who knows who, and no explanation?

The analysis elsewhere on here suggests they're violating SLA, so this should get more attention. I'm guessing support is under-resourced @ google, and the culture of support is a bit shabby (no acknowledgement of inconvenience or indication or evidence of work undertaken in the background) - hardly surprising for a large-scale software business based on free services.

neya · on Feb 27, 2013

I'm sorry, but this is the price you pay for running your business that is dependent TOTALLY on a 3rd party service. Forget Google, everyone out there is most likely the same, that's why it's important for you to run your 'apps' on something you have control over - Like Linode, AWS, Rackspace, Openshift, etc. and also have back-up nodes from other providers for redundancy, for emergency situations, incase of storms, etc.

I would recommend trying your apps on OpenStack (Openshift in particular), which doesn't have the vendor lock-in, which you face right now.

pestaa · on Feb 27, 2013

Interesting comment. When AWS went down in the US, devops rightly said "we told you to run your apps on something you have control over."

stavros · on Feb 27, 2013

What's that? A server I own on a rack I own in a datacenter I own on a power factory I own and a telco I own?

"Having control over" something is a scale, it's not binary.

pestaa · on Feb 27, 2013

Indeed, I was trying to say it is all relative.

stavros · on Feb 27, 2013

Sure, but it's a tradeoff. No need for devops vs no control over outages.

jamespo · on Feb 27, 2013

More relevant would be "where's your DR site?"

rplnt · on Feb 27, 2013

The Openshift looks quite promising as a PaaS. If anything goes wrong with RH you can just move your app to other host or even your own hardware.

lucb1e · on Feb 27, 2013

To their credit, people are apparently using something that's been deprecated and should be changed regardless. At least, that was their conclusion when it was changed to wontfix. The replies are very rare and curt though, I can't really say it's quality service when you're paying for a product.

Customer support from Google has always been like this as far as I've experienced and heard. There is no way to actually reach and converse with anyone, regardless whether you are paying them for the service or what kind of request it is.

Once a Google employee randomly replied to a complaint of mine about Google+ (I didn't even +mention them). After a few comments and him confirming that it was added to the bugs list, I asked if it was okay to +mention him in the future with similar issues. It was okay. I did. He never showed his face again. (His profile still says "Works at Google+".)

Another Google employee I know online also never replies to anything concerning Google. I know he works on the Google+ project, but I can only hope he passes on any bugs I +mentioned him in.

For Youtube, you can post in their forums but merely hope for a reply. Copyright complaint disputes are no priority, either.

I haven't used many paid products, but I have read about their customer support being one of the very worst and also have never been able to find a single e-mail address or phone number to get support at for any service.

Edit: By the way, I would have moved away from the Google Apps Engine a long time ago if my app went down every morning during rush hour for 10 days straight.

sgift · on Feb 27, 2013

It is interesting that basically no one (including the news poster) noticed that there is an comment (#12) which states that this problem happens on HRD too. This statement may be false and/or a completely different issue, but at least it should be considered here for HN comments which state "M/S is deprecated, Google is right, just use HRD."

brown9-2 · on Feb 27, 2013

A bug tracker seems like a horrible way to report production (or non-production) support issues. This is the same bug tracker OSS projects on Google Code use.

Is it really helpful for the public to comment on my support request? Seems like the signal to noise ratio would be quite low, and then you get inane comments like:

I got here from HackerNews, but after seeing the original poster spam the forums in multiple places and have a bad attitude, I can't blame Google for not fixing what looks to me like a non-issue.

Fuck 'em.

You have to believe that the choice of tools has some bearing on the quality of the response from Google. Seems like there is very little incentive for any "Project members" to trawl through open bug reports when no one is ever responsible.

Al-Khwarizmi · on Feb 27, 2013

Not surprising... the second most-voted bug in Google Code, reported exactly a year ago ( http://code.google.com/p/support/issues/detail?id=24324 ) deplores the removal of a feature that was already there (the Updates page) and was the single most useful feature in Google Code for many of us. After one year and more than 800 people registering their interest on the issue, they haven't even explained why they removed it or whether there are any plans of brinding it back.

afhof · on Feb 27, 2013

Comment from the WontFix mark:

"M/S is deprecated and there is a clear and straightforward path to migrating to HRD."

M/S was deprecated April 4, 2012, so it has been some time since the notice has been out there. High replication data store has been available for over 2 years now. Whether or not less than a year is too short a deprecation period is another issue.

nailer · on Feb 27, 2013

Read the following message - other customers using Python on HRD are reporting the same issue as the parent, who is Java on M/S.

Udo · on Feb 27, 2013

Seems to happen on HRD as well.

pyalot2 · on Feb 27, 2013

Ok, so here's the deal. If your app runs exclusively on GAE you've essentially tied yourself to one cloud vendor. Now disregarding the respective benefits and drawbacks of google as a hosting company for your app (I would never do that), being dependent on one cloud provider is a very bad idea. No matter if you run on EC2, Azure or GAE, if you can't seamlessly switch to another provider, you're screwed. These all go down regularly and have issues. They're big companies, you're a small company, you have no such thing as "recourse". The court of public opinion will not save your company.

kaolinite · on Feb 27, 2013

Agree with this to an extent however the company I work for deploys on AWS and is far too cautious about vendor lock-in, to the point where we use AWS basically as a VPS, not a cloud service, and get none of the advantages (and all of the disadvantages, e.g. worse performance, higher price).

rbanffy · on Feb 27, 2013

Not really.

You can host your own applications using AppScale or Typhoonae.

Moving applications is easy. Moving data is much worse.

killermonkeys · on Feb 27, 2013

Many on the thread say the reporters are over-reacting. They are not. What would amazon do? They would not consider this an issue, would respond in less than 24 hours, and would take complete responsibility. GAE is a pay service. I think this level of service is pathetic.

As noted the only attempt at diagnosis is completely wrong (even the reporter is not on MS) and very late.

timme · on Feb 27, 2013

The headline blows this out of proportion.

Few people (who act obnoxious as hell) report a problem that can be solved by moving away from a deprecated system, yet they fail to even read the note because they're busy smashing exclamation marks into the issue tracker.

efdee · on Feb 27, 2013

The problem apparently also occurs on the non-deprecated system. I can understand their frustration after not getting a reply for X days on what seems to be a critical issue for them. That's not "obnoxious as hell", that's customers panicking. You really don't want your customers panicking about your service.

Ironlink · on Feb 27, 2013

I couldn't agree more. In fact, it was deprecated just about eleven months ago: http://googleappengine.blogspot.com/2012/04/masterslave-data...

When your datastore gets deprecated, you act sooner rather than later.

jfoster · on Feb 27, 2013

The two datastores even have the same API. As long as your app doesn't depend on the exact performance characteristics of the old one, the migration is very straightforward. I did it for one of my apps in a morning and was done well before lunch.

edent · on Feb 27, 2013

Why would anyone expect customer support from Google? They have made it clear time and time again that they don't provide it. http://shkspr.mobi/blog/2013/02/googles-customer-contempt-co...

Ironlink · on Feb 27, 2013

Support packages are available. http://googleenterprise.blogspot.com/2013/02/google-cloud-pl...

EwanToo · on Feb 27, 2013

The problem with saying things like "Support packages are available", is that time and again we see paying Google customers with support packages being treated awfully.

For example, this guy was a paying Google customer but couldn't get help http://www.sultansolutions.com/google-voice-lost-number/

These are paying customers who are paying a non-trivial amount of money for support (though not the "Premium" support in this case, which is an extra $500 per month for GAE).

Ironlink · on Feb 27, 2013

We're a paying customer of GAE. I think it's quite clear that paying for the basic service doesn't include support beyond the public issue tracker and the forums. Support packages start at $150 per month, and at that point you get a 4 hour response time. I think that's entirely reasonable. We have yet to sign up for a support level, but then again we're not really seeing any troubles with the service.

EwanToo · on Feb 27, 2013

I guess the question is, if you're paying and you find what you believe is a system wide issue, do you expect a resolution?

In this case, Google's response is seemingly "It might (or might not) be a system-wide issue, but we don't care - we won't fix it".

There's no indication in this case that someone paying $500 a month for the premium support would get a better answer.

mos · on Feb 27, 2013

Customer support of Google really sucks! Currently the GAE cloud has a reliability problem (also for new customers). Instances are restarted like crazy. This leads to downtimes. But that's not enough. Customers have even to pay more(!) instance hours because of this. There is the running gag on the mailing-list: "Whenever GAE is unreliable for weeks Google needed to make revenue targets ;-)"

References

Current Issue: http://code.google.com/p/googleappengine/issues/detail?id=88...

Same issue from last year that took weeks to be resolved (check last comments!): http://code.google.com/p/googleappengine/issues/detail?id=80...

Some Pros and Cons of Google App Engine in this blog-post: http://www.mosbase.com/

kelvin0 · on Feb 27, 2013

BTW, this issue is not simply a due to MS, it also happens on HRD. So any google support apologists here, please read the BUG thread submitted by this poor customer before dismissing it simply as a 'migration issue'.

I have had some issues with Google Docs (paid for premier commercial account). Some documents we had stored simply vanished from our account. After getting the run around for 3-4 days, finally a google engineer tolds us they can't help us recover the documents THEY 'lost' unless we have the URL to the document ... Thankfully someone on our team had kept the URL when I first shared that document with them (1+ year after the document had been created).

Nightmare ...

bromley · on Feb 27, 2013

Quick tip for anyone making a system with high load and daily or hourly quotas: When an account is created, assign a random start time (e.g. 05:43 for daily quotas or minute 12 for hourly) to measure that account's quotas against. Then you can avoid this issue of the system getting a huge spike in load when everyone's quota refreshes at the same time.

mrerrormessage · on Feb 27, 2013

It happens that 9 AM Brussels time is midnight pacific time. I'm sure Google is running some maintentance cron at midnight thinking "This is a low demand time," and it is, across the US, but not in Brussels. These are old instances, and Google probably doesn't want to re-time or rewrite the cron job to be more efficient.

guard-of-terra · on Feb 27, 2013

Do people go to bed that early in the US? Looked at my project's charts, the low demand time is definitely 4AM and midnight is 40% of peak.

logn · on Feb 27, 2013

Ironically, the guy who closed this issue owns this project:

https://code.google.com/p/sentimentally/

"sentimentally is a tool that determines sentiment of your emails. Once determined, it helps you gauge your relationships with co-workers, customers, friends, or other individuals based on the tone of your conversations with these people."

kushti · on Feb 27, 2013

Never pay Google. It has terrible support for all products

rbanffy · on Feb 27, 2013

Not my experience at all. While it's true I never had much trouble with App Engine, when I had problems, they were solved rather quickly.

raverbashing · on Feb 27, 2013

" Upgrading to HRD will solve your issue. M/S is deprecated and there is a clear and straightforward path to migrating to HRD."

Can anyone explain why this is not possible for them?

Wonderful Google support apart, there are a lot of alternatives out there.

log0 · on Feb 27, 2013

Multiple comments mentioned that this occurs in HRD as well.

afhof · on Feb 27, 2013

Only one other comment mentioned the problem with HRD as the datastore.

justin66 · on Feb 27, 2013

And the guy having the problem with HRD doesn't count because ... ?

Ironlink · on Feb 27, 2013

Because they did not post any kind of evidence (request logs, Pingdom report, etc.), not to mention the App ID in question (so that Google would know where to look). All too often, bug reports end up being some kind of misunderstanding.

Kylekramer · on Feb 27, 2013

N of 1?

badgar · on Feb 27, 2013

Because we don't implicitly trust everything everybody says on the Internet just because they suggest Google is bad.

okku · on Feb 27, 2013

I am also a GAE-user, I have had no problems like the OP. But I start to miss a fundamental feature, sockets. I have worked around it by using other services and polling.

Maybe wrong forum, but is there any infrastructure templates for setting up a scalable web/db/loadbalancer/memcached for a simple tradional webservice, in my case a game?

I want to be able to sleep at night, and easily scale up by adding some more machines in case of higher load.

I could use denormalized myslq/postgre or mongodb for speed. Preferred language is Python (or maybe c# or java).

Any ideas?

EwanToo · on Feb 27, 2013

Depending on your budget (isn't it always..?), speak to Rightscale - they provide a set of frameworks to deploy infrastructure to various cloud platforms, and can handle auto-scaling and all that stuff.

ThingTwo · on Feb 27, 2013

Channels.

okku · on Feb 27, 2013

Channels are for communicating with javascript clients. I want to communicate with other servers and applications.

petersmagnusson · on March 7, 2013

stay tuned

petersmagnusson · on Feb 28, 2013

Hi folks. We are fully aware of this issue. We've added it to external issue tracker (https://code.google.com/p/googleappengine/issues/detail?id=8...), please follow up there.

Response from us was initially muted because it looked like it only affected M/S apps, but it turns out (a) it can impact HRD as well, and (b) we're pretty unhappy about the level of impact for many M/S apps so we're looking at ways to resolve. It's a high priority and we're looking at a number of ways to address it. It's also a pretty interesting issue, because indirectly it's caused by (a) the large scale that App Engine is running, and (b) the large extent with which GAE is running free applications.

Regardless, apologies to those who felt support was unresponsive. We are working very hard to improve support. For the sophisticated audience that comes to these pages, please link to me on Google+ to get my attention if we are failing you (https://plus.sandbox.google.com/110401818717224273095).

mnml_ · on Feb 27, 2013

Gae is cool but its not worth the money. I shouldn't be out of beta.

lnanek2 · on Feb 27, 2013

I've heard a lot of people saying this is why Google can't get a lot of businesses to sign on. There's no one for the CEO to call and complain to directly when their stuff is down.

chris_wot · on Feb 27, 2013

Well, there's a good reason not to use this service.

xsace · on Feb 27, 2013

Jesus, so glad I switched to node when the hosting rates increased back in the sept 2011 with the "GAE out of preview" move

darksaints · on Feb 27, 2013

Ahh the perils of being a customer of Google.