Well, 1000 item limit on offset wasn't introduced recently. It was always there, and it was always documented. It's actually that no matter what you do, query always returns max 1000 items, offset just works "locally" on these items. There have been workarounds floating around the net, I remember one about using filtering on timestamps.
What is not documented is 1MB limit on memcached entries and also on HTTP response size (I found out only cause of exceptions). And there is 5 second limit on urlfetch with undocumented cryptic "DownloadError: ApplicationError + various numbers" exceptions.
What is tricky is that many limits happen only on production servers, so you don't discover them on a development one.
On the plus side, limits are very generous on memcache total size. Which is good as even trivial operations on datastore take a lot of CPU cycles.
Still, limits can be good. They force you to architect your application in a different, hopefully more scalable, way.
This beta state is taking too long. And there isn't much communication. I dropped out a month ago and consider all those working hours as time wasted.
Don't get me wrong, they are giving a fantastic opportunity to long tail sites with a free service. It's just not for what I need. My under $10 a month web hosting gives me 400GB/1TB and python as cgi for low volumes, clustered and with 99.8% uptime.
Amazon might have downtime but they have been up and running for a while. And have a pricing system. Google needs to catch up fast.
Well written and I learned a few new things about limitations of the AppEngine platform. Definitely worth reading if you consider trying something on AppEngine so you can know ahead of time what some of the less documented limitations are.
Some of the things the article mentions that I never realized: 1MB limit on python variables, 1000 limit on offset (which together with the 1000 limit on results means you can only ever fetch 2000 items).
Google App Engine has all the makings of a great service, but for the reasons listed in the article, it fails...
* ...1M limit is also on files uploaded — early on, created a game application with Django on GAE but had to trim down my /dict/words file to words with less than 9 characters as it had to be under 1M or it wouldn't upload...
* ...really don't mind the 1000 call limit as you really shouldn't be doing that much DB access for a single transaction... ...but the lack of aggregate functions (and the GAE team suggested "workaround" is an inelegant non-ACID compliant hack that really is dreadful when you consider the next point...) and inability to run any offline processes against the complete data repository (if you go over 1000 records). There's been some rather arcane AJAX hacks to mediate this, but to me, that's an awfully inelegant solution.
* ...I bumped up against quotas by simple performing ONE DataStore read/write with a 200-300K object pickle/unpickle... ...not the total quota, just the singles got tagged as CPU-intensive...
This guy doesn't understand that these limitations are not just arbitrary, they stem from the architecture of the service. In order for it to scale, Google needs to make sure the applications are doing everything in small chunks. I bet even the background batch tasks, when they come, will require the developer to break the tasks in small pieces, map-reduce style or otherwise.
What is not documented is 1MB limit on memcached entries and also on HTTP response size (I found out only cause of exceptions). And there is 5 second limit on urlfetch with undocumented cryptic "DownloadError: ApplicationError + various numbers" exceptions.
What is tricky is that many limits happen only on production servers, so you don't discover them on a development one.
On the plus side, limits are very generous on memcache total size. Which is good as even trivial operations on datastore take a lot of CPU cycles.
Still, limits can be good. They force you to architect your application in a different, hopefully more scalable, way.