DeadlineExceededError is a really stupid problem in App Engine and the simplest fix would be to just kill the instance after such an error and start a new one. Google doesn't want to do that because it wastes resources. So, the current solution is to write Python modules which, when they're imported, will work correctly even if the execution stops in the middle of the module and then is restarted from the beginning (i.e., when a new request is received).
It's possible that with the upcoming pre-warming requests this bug will disappear in practice because your whole project gets a chance to import and initialize all modules before the first request is received. If these pre-warming requests don't have a 30s timeout your instance will always have enough time to finish the initialization in a non-broken state.
Anyway, it's your own mistake if you stay with the Django helper. The bug can be worked around. It's fixed in Django-nonrel. I still see people who try to use the helper for mission-critical projects. Don't do that. The helper is buggy. It uses monkey-patches all over the place. This makes it extremely vulnerable to bugs caused by DeadlineExceededError. Just use Django-nonrel, even if you only want to use App Engine's models.
I don't have anything running on App Engine, in large part because Google is not a service provider I can trust. There's a really tough dichotomy between selling their magic scaling beans and the fact that anyone who needs said scaling beans also needs real support—something which is simply not in Google's DNA.
I'm surprised that Google isn't just selling a raw, white-box cloud-services layer (like AWS) that other companies can then resell with convenience and support on top.
AWS was grown out of Amazon's internal systems. They productized what they used everyday because they thought it would probably be useful to others.
Google doesn't offer anything like a virtual server internally. Everything is built on their various abstraction layers. They are productizing what they used everyday because they thought it would probably be useful to others.
Agreed. I work at a company in a similar space to App Engine, and we see this a lot: people are astounded that we reply to them quickly (often in minutes), on anything from "I can't find feature X" to "how do I do Y in SQL?"
It seems to me like support is a huge part of the value when you're selling a complex service like this to businesses.
I had a very similar experience, unfortunately it wasn't feasible for me to migrate the site to another vendor, we just had to sit tight. It almost ruined the launch, which had a hard deadline. Fortunately we too had an understanding client, but being hamstrung was no fun at all.
The issue of the App Engine status page not reflecting reality is a very real one - we had contact with a Google engineer who told me that "this is affecting all our stuff" and I know that it went on for several days, but this wasn't reflected in the status chart; the severity of the issue was definitely not conveyed well enough.
I like the promise of App Engine, particularly the "instant" scalability which is great for sites that you know are going to get hammered as soon as they go live. But it really highlights the risk you take when you allow yourself to be locked into such an all-encompassing platform.
I'm reluctant to give up too much detail, but our infrastructure is very, very scalable and secure. It's pretty solid, and allows for very fine-grained control over many instances of any running application. Load balancing is taken care of, as well as redundancy.
Once we go public, it's possible we'll be more open about the architecture, but right now we're still iterating pretty rapidly, trying to make the user experience dead-simple without sacrificing stability or scalability.
django + appengine is something I am suspecting. Personally, I had good luck with sticking with tornado + appengine. Tornado is pretty raw and atleast keeps an option to port the code over if I have to. Datastore and dns are still suckers in the performance.
I'm looking forward to the current crop of ongoing errors being resolved. I strongly prefer App Engine for a variety of reasons (built in versioning, pay only for what you use, easy deployment).
Right now I'm just happy that some of the features that would have caused me to put all my eggs in the app engine basket weren't available, because if they had been I'd have felt a lot more pain from the outages.
Has google posted anywhere on these issues and what they are doing to resolve them? I'm working on a site for App Engine and this has me a bit concerned.
There's a good amount of fear mongering going on here. I've been using App Engine+Python exclusively for side projects since its release and never run into these problems.
It's because a little VPS won't give you a fail-safe system. For a SaaS startup you'll need to have several machines in a redundant cluster and ideally you'll also have machines in at least two data centers. The people who use App Engine for their business all have the same dream: Build the app, click deploy, and have Google handle all the annoying server stuff. Obviously, App Engine isn't quite there, yet. It's not fast enough, it's not stable enough, and it has a limited feature set. Clearly, we're early adopters. However, App Engine is going in the right direction and once the problems are gone people will ask "Why not just use App Engine?" (or whatever other PaaS) instead of "Why not just use a VPS?".
Because I'm feeling pedantic today: you meant to say "a fault-tolerant system". A fail-safe system is one which fails in a safe way. An example would be a CPU that shuts itself down if it's overheating, rather than risk permanent damage. If anything, a fail-safe system may be more prone to failure because of its cautious approach to dealing with faults.
There are some pretty cool examples on the Wikipedia page:
GAE offers massive scalability at relatively little cost, at least theoretically, but you lose flexibility. (For instance, you're limited to a very specific database backend and so on.) A cheap VPS, on the other hand, gives you free reign. But scaling isn't trivial - you have to take care of it yourself.
Learn and fiddle around on a VPS, deploy on GAE or EC2 or whatever cloud computing service suits your needs.
Alternatively, Amazon will be offering a "free usage" tier (starting November 1) that has pretty generous usage limits to help you get a project off the ground. If it outgrows those limits, you're already running on reasonable infrastructure and can move into the next paid AWS tier.
How easy or hard is it to move your code from AppEngine to something else? I avoided it because there isn't enough support for Django. And I love Django!
Did you try Django-nonrel? It supports a lot of Django features out of the box (even the admin). Also, it makes it a lot easier to move somewhere else. You might not be able to switch to SQL, but you can at least switch to MongoDB or some other NoSQL database.
Which version of Django is Django-nonrel? Did they only change the backend? I've browsed their site a bit but they don't answer the important questions...
It's based on the latest trunk code. The only changes are in the ORM layer and they're pretty small (maybe less than 100 lines). Which other questions do you want to see answered on the website?
Actually, this exact response tells me 95% of what I need to know. Another good thing to know would be how easy it is to use existing apps with it (provided they don't use JOINs, of course), but it looks like I need to give it a try, thanks!
If they don't use JOINs or aggregates your apps might just run unmodified. In some cases you'll need extra index definitions, either for djangoappengine or django-dbindexer, but you can keep those separate from your app, so you don't need to fork the code. BTW, I've updated the docs and added a short note that should answer your original questions.
It's possible that with the upcoming pre-warming requests this bug will disappear in practice because your whole project gets a chance to import and initialize all modules before the first request is received. If these pre-warming requests don't have a 30s timeout your instance will always have enough time to finish the initialization in a non-broken state.
Anyway, it's your own mistake if you stay with the Django helper. The bug can be worked around. It's fixed in Django-nonrel. I still see people who try to use the helper for mission-critical projects. Don't do that. The helper is buggy. It uses monkey-patches all over the place. This makes it extremely vulnerable to bugs caused by DeadlineExceededError. Just use Django-nonrel, even if you only want to use App Engine's models.