Hacker News new | past | comments | ask | show | jobs | submit login
Announcing Google Cloud Bigtable (googlecloudplatform.blogspot.com)
257 points by suprgeek on May 6, 2015 | hide | past | favorite | 151 comments



I've been quite frustrated with the google cloud platform, Just take a look at their APIs for AppEngine, CloudSQL, GCE and the sort, Its pathetic compared to their direct competitor AWS.

Lets compare trying to do create an RDS instance on AWS vs Creating one on CloudSQL

AWS:

1. Get AccessToken and AccessSecret from IAM

2. pip install boto

3. conn = boto.rds.connect_to_region("us-west-2")

4. db = conn.create_dbinstance("db-master-1", 10, 'db.m1.small', 'root', 'hunter2')

Done !

Google

1. Get a Client Id and Client Secret

2. pip install google-api-python-client

3. Go through the OAuth Flow and run a server locally to capture the access token

4. Use the discovery api to generate a service object. Good Luck finding this in the documentation

5. use the uninspectable service object to create a cloudsql instance.

The reason I don't have code for steps 3,4 and 5 is because I gave up after wasting time trying to figure this out.

My point is that they've gotten into the habit of doing half assed work so I have no hopes that the've improved this time. Practically no way to automate this The only way to use this would be from the horribly slow GUIs that google provides.

EDIT:

I ended up using google cloud sdk cli and running the automations with subprocess.check_output(['gcloud', 'sql', 'instance' ... ])


You missed the snakes and ladders game of following googles out of date documentation.

"You're doing it wrong, we deprecated that last year, I know we haven't yet updated the 4 separate places in which the same thing is documented all in a different context and different way. We really should fix that but we're busy coding. Did you know we're all PhDs at Google?"

What?!?!? You're using the Google console version 3? Why? That feature isn't implemented there. Stupid you. You're MEANT to be using our new Google console version X. Why are you using our old one?

Also you missed the 16 hours you'll spend trying work out why something isn't working only to find it's actually been changed or taken out of the Google developer console, without any trace or notice left in the code to say'we moved/removed the feature that you expected to be here.'

Really you should ask Google for help on StackOverflow which is now the official Google channel for ignoring support questions, and where your question will within seconds be down voted, derided and deleted by the StackOverflow Community,saving Google the effort of not reading and ignoring your question about how to resolve the catastrophic failure of your software.

Seriously though, why not entrust your critical systems to such capable hands?


I signed up just to reply to this post and say THIS is exactly how our development team (which write in Go) feels. This post could not be more accurate in terms of trying to deal with the outdated documentation.

#1 The documentation is often months and many versions old.

#2 is the fact that app engine instances have to connect to back end compute engine instances using a public IP is unbelievable. The solution for an app engine instance to communicate with your compute engine app is to use their PubSub api system. To get the PubSub api system to work, see #1. The examples are months outdated and you no longer use the appengine.Context to create your OAuth token, but instead now use context.Context which you learn by going through github issue comments.

Google needs to recognize their great deficiency in documentation to be considered in the same tier as AWS.


(Google DevRel here)

We recognize that there's out of date stuff in there :\ This is obviously on us, but if you see something wrong, please use that little 'send feedback' link the upper right corner, as demonstrated in this gifly: http://gifly.com/3SA5/

I just set up an alert set up for feedback with a table flip in the description, so those go right to my inbox.


You should remove the feedback link and instead have hard requirements in your release process for all documentation to be perfect before any code is released.

"Oh the documentation is out of date, why didn't you tell us?" illustrates the problem with Google exactly.


It looks like my previous reply vanished into the ether. Leme try again.

I don't intend to shift any blame here. The feedback link isn't a solution. I know you can't un-see docs and reclaim your lost time.

But, there are a lot of words in there for us to fix, and that feedback does help us prioritize :)


It's the development process that Google needs to fix. Google's development process does not require accurate documentation prior to release.


Understood.

The proof of the pudding is in the eating, right? The only way I can respond to that is by delivering good stuff. It will take time for that evidence to accumulate.

For today I just wanted to communicate that we are paying attention :)


I've recently applied for a few documentation/tech writer positions at google. I'll gladly help with improving your documentation and I live about 1 mile from the googleplex.


FYI, outside of Google Cloud, it's also the things that Google doesn't document that are frustrating.

An example: Chrome on Android goes beyond the spec by adding limitations on when an Audio element will play, requiring an explicit user action. (https://code.google.com/p/chromium/issues/detail?id=178297)

In that example, the limitation (along with other Chrome limitations, such as all the features that behave differently on HTTPS sites) isn't documented anywhere. Developers have to try it on Android, notice that something is amiss, search for what is going wrong, eventually figure out that Google have made an intentional limitation, and then work around it.

I'm not a fan of Google going beyond limitations that are necessary for security when it comes to Chrome. That said, if it's going to happen, it would be nice to at least save folk some time by documenting it properly.


Great to hear they go straight to your inbox. Usually I have the feeling the stuff that is submitted through the 'Feedback' buttons is never read.


(Googler here)

In fact, every piece of feedback goes directly to our bug tracking, and a human being looks at every single one. That's not an excuse, and doesn't mean that we can't do better, but we track it all. The biggest problem is the conversion from item in the bug list to actual changes, and we're working on that!


Just to clarify, they all go to someone's inbox. Only the table flip ones go to mine.


That's clever with the table flip and all; it gives you a unique way to get just the responses from this. It's great, seriously, that you're using the tools you have to try to make this situation better.

I bet it's frustrating, though, that you can't do more to address the real problems, like lack of (apparent) incentive to keep this stuff up to date. You have to resort to clever sort-of hacks like "put this magic string in your request so someone will read it". A solution which, I might add, sounds a lot like the problems people are talking about when they talk about inscrutability.


The table flip is more about whimsy than effectiveness. :) Given many options, I usually pick the one that's most fun.

Honestly, I plan to read all of the feedback. I worked on the Firebase docs, and I've see how effective a tight feedback loop can be for improving polish.


In case you're reading on iOS and see only a static image (webm doesn't play, even in Chrome): http://i.gifly.com//media_gifly/3/S/A/5/b/3SA5.mp4


It's a shame about the tone of this post. I think it might not suit HN. As a Google Cloud customer, though, I think you do manage to be spot on about a lot of the things Google are doing wrong when it comes to the Cloud Platform.

Today I came across an App Engine SDK bug that was reported almost 5 years ago, has a several years old patch in the comments, and still hasn't been remedied by Google. App Engine users who encounter it have to find the bug, apply the patch to the SDK, and repeat that upon each SDK update.


Could you send me a link to the issue? I can hopefully get that fixed.


Not OP but here's another example: https://code.google.com/p/googleappengine/issues/detail?id=6...

The patch is trivial


urlfetch is the worst, at least MVM will be getting rid of that. I would just use the socket api instead of urlfetch, like I do here: https://code.google.com/p/googleappengine/issues/detail?id=6...


Wow. That's a classic example of the lack of investment into App Engine. Filed 7 years ago, acknowledged, and still open with no one assigned to it.


It's true, if Google wants GCP to be good, they will have to use it themselves. I definitely don't trust public bug reports to fix issues. At least if you have a support plan they are good at responding and sometimes fixing things. There's no way anyone in Google uses URLFetch, which explains why it's so bad.



We started StaffJoy infrastructure on GCE. The per-minute billing had the potential to be a huge cost-saver with the optimization software that we use for scheduling because doubling the number of CPU cores (at twice the hourly cost) halved the calculation time, resulting in no overall cost increase but faster data.

I switched from GCE to AWS when we had a build break because Google released a new command line utility for deploys, immediately yanked the previous CLI causing it to 404, and provided no way of pulling the "latest" CLI. (We needed to reinstall the CLI for each build because we used CircleCI). It was part of a long string of frustrations of getting basic workflows to work.


It looks like https://circleci.com/docs/deploy-google-app-engine explicitly hard-coded an old version of the appengine SDK. And by old, I don't mean not-quite-the-latest, I mean quite old. When it was yanked, it was not the previous CLI, it was the CLI from a deprecation-policy-timelimit ago.


That's just sample code to demonstrate how one would install appengine on Circle. The URL in the example is a placeholder for the one you would get from https://cloud.google.com/appengine/downloads.


Also, it seems Google's Page Rank is sometimes optimized to show old documentation instead of the most related and current one. this is natural because there are more links pointing to the old one but needs to be fixed.


(Google Cloud Support here)

The across-the-board guidance to go to StackOverflow is not working, and we get that. It's not just that system administration questions get downvoted on StackOverflow (which I think is the parent's point) but that StackExchange isn't good for general discussion.

We (support) are trying to be much more clear that free support is not "StackOverflow or nothing." We recently updated our "community" support page at https://support.google.com/cloud/answer/3466163 to lay out all of the community support options in one place.

Bottom line: everything listed on that page has Googlers actively participating. This includes groups, StackExchange, and issue trackers. There's room for plenty of improvement, the intent is not to ignore you.


Google should have a central support forum and try to focus all the questions into one place.

StackOverflow should not be on your support list at all, for the reasons raised. Having said that, Google should still be on SO answering questions that end up there.

What Google needs to understand is that its long standing public reputation that it has built is an organisation that actively tries to avoid providing support - ever since Google started it has tried to avoid support. That is now the reputation that Google carries into its efforts to woo the developer community.

Google has to be extraordinarily good at developer support to dispel the baseline assumption that developers have that Google really (genuinely) wants to avoid dealing with support questions.

There's a level of cluelessness to Google's support strategy that is concerning. Why, for goodness sake, would it EVER look like a good idea to push support to StackOverflow? Who is doing the thinking behind that sort of decision? It is self evident that Google's support interests and StackOverflow are not the same thing. It's the sort of decision made without really considering the detail, and that is the point about Google's support - it's an afterthought. Kind of like washing the dishes after dinner is eaten - has to be done but we're not enthused about it.

And in the end, lack of support is a showstopper for using a cloud computing platform. If the support looks sketchy then it just isn't worth risking your business by using that platform.


You make great points. I'm new and can't really speak to your rhetorical question of "what were they thinking?!?" but I do hope for my own job security that support isn't an afterthought :-)

First, I should point out that what we are talking about here is Google's "Bronze Support." This is similar to Amazon's "Basic Support." In both cases it's not what you should be thinking about if you need a case response time SLA, or the ability to wake up engineers at midnight. If your business depends on any platform provider I really hope you buy a support plan which gets you the ability to talk to support and engineering whenever you need. Google definitely offers these. They start at $150 per month. (https://cloud.google.com/support/) End plug.

On StackOverflow: let's stop calling it "support." It's a Q&A site with good SEO. If you have a question which "belongs" there it's a fine place to ask. We're moderating our "go to StackOverflow no matter what" messaging, but I can't see tossing it completely.

Anyway, when it comes to free support, I partially agree with your point about a single forum, in that can be confusing. Again, keep an eye on https://support.google.com/cloud/answer/3466163?hl=en&ref_to...

To your point about a single forum for everything, I respectfully disagree. Free-form discussion is different from bug tracking. Highly structured Q&A has a place too. It seems to me that every product should have - a discussion forum where users can discuss the product and raise issues - an issue tracker to collect bug reports and feature requests - a designated place for Q&A

On each of those three, support staff and (preferably) engineers should participate daily. I think the situation with Compute Engine is closest to our ideal right now: it has a lively Google Group, actively triaged issue tracker and a sponsored tag on ServerFault (which I hope we can agree is a better destination than StackOverflow)


While StackOverflow has a number of flaws, it is pretty good for finding things that I'm looking for. I've definitely found posts from the BigQuery team on there answering my BigQuery questions.

As for Google's support plan thing, it might be a little odd that it costs $150 per month, but if you're spending any significant money GCP, it's well worth it. The support response times even at the lowest level are pretty good and they sometimes fix bugs.

Of course, I think if Google employees use GCP internally, it will improve at a much faster rate.


> If your business depends on any platform provider I really hope you buy a support plan which gets you the ability to talk to support and engineering whenever you need

You can get support in 10 minutes in Linode, without any payments for support. Linode can afford it but Google can't, or don't care.


Stack Overflow is the Soup Nazi of technical Q&A. It's where mods compete for who can be the first to find a way to prevent you from getting an answer. ("Repeat: someone asked a question about a similar topic a couple of years ago with two contradictory answers that were both wrong, so we will not only NOT answer you, but we will prevent anyone ELSE from answering you, either! You're welcome.")

If Google doesn't want support questions getting through, forcing us to submit them through Stack Overflow should do the trick nicely.


Link to StackOverflow is equal to "RTFM". With same success you could just put some search engine input box.


Google Cloud Developer Relations here :)

Thanks for the feedback and stuff. I empathize with your frustration. We're listening to threads like these, and workin' hard to make stuff better.

Also, next time you're frustrated with Google Cloudy stuff, feel free to vent to me. It's like unfiltered usability studies.


OK so if you're from Google and you say Goigle is listening, here's a question I've never heard and answer to: how come Google has such terrible support for Python 3?


This is probably outside of my area of expertise, but I'll see if I can dig up an answer.

I need more context though. Where are you trying to use Python 3 and what problem are you trying to solve with it?


I've never done much Python stuff at all (It's a great language, I just never used it on a major project). I just spent the last hour catching up on the chronicles of Python 2 -> Python 3 (not just at Google, but everywhere).

Sorry, but here's my answer: http://i.imgur.com/tZOS8.gif


I just want to say this comment has 73 upvotes/points.


Disclaimer: I work on gcloud-node [1]. :)

You're describing the OAuth 2.0 client dance which has always been cumbersome but necessary for browser-side auth. With Google Cloud Platform related services, you should never have to do this if you're using these services from the context of your server. As someone previously mentioned, Google has Service Accounts which greatly simplifies the auth process.

For example, to auth with gcloud-node it's as easy as:

  1. Download the key.json associated with your service account.
  2. var gcloud = require('gcloud')({ projectId: 'abc-123', keyFilename: './key.json' });
  3. There is no step 3.
We've also made some pretty nice docs [2] with clear examples to help you every step of the way. We have yet to integrate Cloud Bigtable but it's on our radar.

For a pythonista like yourself, you may like to consider trying gcloud-python [3], and it should go without saying please feel free to file issues or reach out directly with concerns if something just doesn't feel quite right!

[1]. https://github.com/GoogleCloudPlatform/gcloud-node

[2]. http://googlecloudplatform.github.io/gcloud-node/#/docs/

[3]. https://github.com/GoogleCloudPlatform/gcloud-python

Edit: Formatting.


I'll try this out. Thanks.


I like the gcloud-node docs! I remember the oauth stuff being a pain just because oauth is kind of complicated and the terms are weird and there was no guide that just says "go here and click this" you had to just figure that out.


Well, it might be a bit more difficult to get started. The cool thing is that you only have to install the google-api-python once, and you can communicate with lots of services, by creating the service object. CloudPubSub, BigQuery etc etc.

Also, google comes with very good monitoring tools (they bought stackdriver and integrated it).

We are running a big appengine app, with quite some compute engine instances. Together with cloud-storage and biguery its a full blown solution, without having to setup every instance yourselve. I think AWS cant compete with that, you almost need a system-engineer when you work with AWS.

All in all: you are used to AWS. Making it cumbersome to get started with Google Cloud.

I do agree tough that the documention is lacking often.


I don't think anyone cares about having to install one or any number of client libraries for services being used.

Whats more important the ease, simplicity and clarity when using a service. The reason I like boto is because its easy to find out what is possible and what is not.


There's also the point about vendor lock-in, in AWS you do need a systems engineer to setup and automate infrastructure, but pretty much everything (aside from a few products like Redshift, Dynamo and EMR is what I recall from my mind) you can port out of AWS pretty easily if and when the time comes to have your own data center and cut some costs.

Google Cloud is a much bigger vendor lock-in, you could not setup the same environment outside of it.


I'm genuinely curious why you hold that opinion, especially in context of a thread that discusses a service that's available through the standard hbase api


Without support for co-processors in BigTable I would hardly call it 'standard' API. Also the implementation of buffered mutation, a key attraction in HBase 1+ for fast non-blocking writes, is different enough in bigtable that you might have to change application code to work it consistently. It's nice though that bigtable API have references to HBase bugs :)

> HBase 1.0.0 has a bug in it. It returns maxSize instead of the buffer size for > getWriteBufferSize. https://issues.apache.org/jira/browse/HBASE-13113


Ever try porting a customer using cloudformation templates and autoscaling groups over to another cloud service? Or reimplementing s3 or EBS semantics? It can be done, but definitely non-trivial... lockin++


I think of the three AWS services you named, only Dynamo can't easily be ported out.

Redshift use Postgres drivers. EMR uses the HBase API.


Redshift uses Postgres drivers, but does not use Postgres semantics. Important to consider if you're using any nontrivial amount of data, because porting it even to other cstore databases will require a lot of thinking and probably some munging-at-scale.


You are doing it wrong:

  1. curl https://sdk.cloud.google.com | bash  
  2. gcloud auth login (one time thing)  
  3. gcloud sql instances create db-master-1


> 2. gcloud auth login (one time thing)

That's a royal pain in the arse to automate with something like ansible... AWS nailed it with a token & secret (not some horrible expiring oauth2 object).

And that's not to mention gsutils sleeping for a second or two every time it tells you there's an update.

It's like these guys have never heard of devops.

Updates are my problem; I don't expect software to sleep whenever they're available... because you never know, I might be running it in a loop under cron and it might just piss me off when I start to lose performance...


'gcloud auth activate-service-account' will auth using a JWT, acquired from the Cloud Console.

Also, if you are running stuff in Google Compute Engine, no auth flow is needed at all: there is a metadata service that you can connect to (and gcloud connects to by default) that provides credentials associated with the machine you're on.

> It's like these guys have never heard of devops.

:\ yeah...I can't imagine trying to automate a web login flow with ansible...but no one would ever suggest you do that.

(disclosure: I wrote most of the client-side auth code for gcloud)


This was also frustrating on something like CircleCI


This is a bad practice:

  curl * | bash
and should be discouraged. It's easy but it's like saying, hey I'm logged in as root all the time - it's easier.


In this case, * is "https://sdk.cloud.google.com" and I don't see how it's worse than trusting a package from PyPi. If anything, the curl command offers some guarantee that you are running code endorsed by Google.


If the HTTP connection is interrupted during download (highly likely if you're doing this routinely) you'll end up with something in a broken state. Locally running a remote stream as it arrives as code is just a bad idea, unless you're talking about something like a webpage where a partial is potentially preferable to nothing at all.


Code which is meant to be piped into bash is generally written as:

    #!env bash
    f () {
      ...code
      ...code
      ...code
    }
    f
Hence a partial stream will do nothing (syntax error, missing brace to be precise).


This is a completely trivial problem to solve (wrap the logic in a function), which the script in question does solve.


It's no different from steps 2 and 3. At each step you're trusting that code you just downloaded from the web is doing what you want it to do.


You're also trusting that the Internet is going to stay up for the duration of the install script, which is an unreliable assumption. Imagine at some point the script does:

rm -rf ~/.config/google

and the connection gives out at

rm -rf ~/

Suddenly your script didn't install, and you've blown away your home directory. HTTP(S) is designed for reading documents, where it's OK if you can't read the document in its entirety.


This is easily solved by putting everything in a bash function and calling the function as the last thing in the script. If you look at the Cloud SDK setup script, that is exactly what it does.


I agree it's a trivial problem to solve, I just think the right way is to download the code and store it separately, so that it's easy to add checksum/signature validation later.


And only it really does anyway is

  wget https://dl.google.com/dl/cloudsdk/release/install_google_cloud_sdk.bash
  chmod 775 install_google_cloud_sdk.bash
  ./install_google_cloud_sdk.bash


Only if you don't trust the source (or it's http).

Otherwise you might as well give up on downloading any software from the internet. Ever.


Have you ever run "./configure" on some open source code you downloaded? That's no more safe. In this case, curl | bash is probably more safe because at least it came over HTTPS.


> Have you ever run "./configure" on some open source code you downloaded? That's no more safe.

That is if you don't do a

    view ./configure
first and go through it. You do take a look at the configure scripts, don't you?


What's less readable, a ToS that you have to accept or a GNU autotools configure script?

Example: expat's autoconf script is over twenty two thousand lines long.


You could just as easily do 'curl someurl | less' before you run it, too.


You don't need root to install gcloud


I think that was just an analogy.


(I'm a Google developer on API Platform) A few of the follow ups mentioned that Service Accounts are more appropriate for this API and closer to the AWS experience. There have been some challenges using service accounts that have been addressed with a relatively new feature set called "Application Default Credentials": https://developers.google.com/identity/protocols/application... For most cloud APIs, you simply run "gcloud auth login" to enable auth on the development machine, and when you deploy to GCE or GAE it will use the more secure built-in service accounts there. And there is no token expiration in either case.


Or you can use the service account to avoid dancing around gcloud and others. But I agree Google has a problem building stuff for us common humans who do not work at the big G.


I this stems from a general lack of empathy for customers in the organisation. Take a look at the issue list on appengine for example https://code.google.com/p/googleappengine/issues/detail?id=1...

It seems like they get a bunch of engineers to come and build stuff then once they move on, the project is abandoned or poorly maintained until another team of engineers come and try to rebuild it from scratch.

Here's more proof, Google Wallet for example ... https://news.ycombinator.com/item?id=9498475


Let's hope the oauth token you get back doesn't expire after you think you've finished! And then lets hope it doesn't also take you a day or so to have to learn oauth2client so you know which secret method you need to call request a refresh token. Such fun!


It took me more than half a day to figure out that I was doing things wrong even though I've used oauth2client before on other projects.


Interesting point in that you used a third party in step 2 of your AWS workflow. If you're going to use a 3rd party, have you tried Terraform?


Is there something in your comment regarding bigtable?


I think using the HBase API is a very clever move. This means that the HBase API is now supported on AWS (EMR), GCE, VMWare (Serengeti), OpenStack (Sahara), and everywhere (Hadoop, if you're willing to run it yourself).

In comparing against DynamoDB (for example), you'll have to weigh a proprietary single-vendor API against an API with a good open-source implementation (that will get even better with hydrabase), yet that is also available in managed-form on all major clouds.

Edit: although - ouch - the $1500 per month entry price-point does not compare well to DynamoDB's $5 per month minimum.


DynamoDB does not directly correlate to Cloud Bigtable, it would be more akin to Cloud Datastore (which I don't think has a minimum and has a free tier)


Where is $1500 per month from? I can not find it in the pricing page.


Cost per node per hour - $0.65; Minimum number of nodes per cluster - 3

0.65 x 3 x 24 x 30 = $1404 / month

And that's before any storage costs.


Thanks!


They're pretty explicitly saying that this isn't an entry-level database—which is perfectly OK. For example, in the docs it says that Cloud BigTable is not a good solution for less than 1TB of data. If you're storing more than a terabyte of data and need the benefits that BigTable provides, you're probably already spending at least $1,500/month on your current database.


Pretty impressed with the performance metrics: Reads/Writes 6ms@99% compared to Cassandra 300ms for read and 10 ms for write.


How is it different than Datastore? https://cloud.google.com/datastore/


datastore is a copy of Google Megastore service. It has indexes, sql like queries, transactions.. and you don't need to run servers like with BigTable (you pay for documents and api calls only)


Why would I want to manage instances?


I don't believe you need to manage anything with BigTable.. "instances" is a concept to describe iterations of scale only


Well that's up to you to decide :)


From the docs, under "Cloud Bigtable and other storage options"

  If you need to store highly structured objects, or if 
  you require support for ACID transactions and SQL-like
  queries, consider Cloud Datastore.


I thought the same, and found this while digging through the documentation: https://cloud.google.com/bigtable/docs/#title_short_and_othe...

> If you need to store highly structured objects, or if you require support for ACID transactions and SQL-like queries, consider Cloud Datastore.


So their benchmark of Cassandra against BigTable doesn't even match their previous benchmark of Cassandra.

http://googlecloudplatform.blogspot.com/2014/03/cassandra-hi...

How did the latency for Cassandra on their cloud platform increase by 200ms from a year ago?


I wrote last year's benchmark. The clusters are completely different, and so is the workload. Last year's cluster had 300 VMs, which was a much higher price point, and the workload was write only. This benchmark uses YCSB workloads A and B, which we though matches the usage we'll have on BigTable. The cluster is much smaller as well. I shared my scripts from last year, it is pretty easy (although a bit expensive) to repro the numbers. Let me check if we can share this year's benchmark scripts as well.


I'm pretty surprised about the difference in latency though, throughput as you say will be different due to number of nodes.

For any given replication factor in Cassandra, overhead remains the pretty much the same irrespective of whether you have 300 or 3 nodes. So should the latency.

On top of that both BigTable and Cassandra use SSTables to store the data on disk (with all the compactiony goodness that goes with them), so I'm even more surprised that the difference in latency is so huge.

Would love to see the scripts for the benchmarks! I don't want to take away from a great product launch and I'm sure BigTable kicks arse in certain areas that Cassandra doesn't... I'm just surprised at the differences in latency.


Without knowing a lot more about their benchmark environment this go around, these bold statements are just about useless. Let's hope further details follow.

Worst case, people are going to benchmark this independently and hopefully do a better job being transparent.


The gentleman who produced these benchmarks replied directly to this thread. He also has been very open with sharing his scripts and setups, so that you can reproduce it yourself. He encourages it actually!


It doesn't look like he actually shared the scripts for this year's benchmarks, unless I am missing something.

That's what I'd be looking for, not so much some basics on the clusters and the workload.


I may have missed something obvious, but can you link the reply? I'm having difficulty finding it with all of the other comments in here.



You must be looking the median latencies. 99% latency was and still > 200ms. You can blame GC jitters for the much bigger variance. They should also show median and 95% latencies for this years number as well.


Any information on pricing? I doubt they'd have specific prices ready to announce yet, but it would be good to at least know the DIMENSIONS by which it will be priced (e.g. per read and write, storage, etc?). Will it be accessible to "classic" App Engine front-end instances, or only meant for Compute Engine VM's and "App Engine 2.0" Managed VM's?

The biggest pain point with the current Datastore is how difficult it can be to predict your costs. Also, there are weird quirks in the pricing model (e.g. "writes" used to cost more than "reads", it's more expensive to delete rows than it is to flag them as tombstoned and continue storing them indefinitely, etc). These quirks have left people with a lot of technical debt from having designed around them.

If this is another database option (alongside the Datastore and CloudSQL) for "classic" App Engine apps, which aren't likely to be re-written for Managed VM's, then it might be interesting. However, if it's only for Compute Engine or Managed VM contexts, where you're not locked-in and are free to choose any technologies you want, then at this point I would need to hear some pretty amazing information on the pricing model before I could be bothered to even test it out. Google lock-in is painful... once you've gone through the trouble of breaking free from the App Engine jail, it's really difficult to even consider adding new lock-in dependencies.

EDIT: Doh. You have to click through a couple of links from the original post to find it, but they have indeed posted pricing specifics already.

https://cloud.google.com/bigtable/#pricing

Looks like it's priced by the number of VM nodes you want in your cluster, storage, and network I/O if you're using it from outside Google's datacenters. No metered pricing on "read ops" and "write ops". This model IS a significant improvement over classic Datastore pricing. Unfortunately, it doesn't look like you can use it as a Datastore-replacement on classic App Engine front-end instances... and I'm not sure that I wouldn't just use Cassandra in other contexts where I have complete control.


Cost per node per hour - $0.65; Minimum number of nodes per cluster - 3; SSD storage (GB/mo) - $0.17; HDD storage (GB/mo) (coming soon) - $0.026; Source: https://cloud.google.com/bigtable/


"To help get you started quickly, we have assembled a service partner ecosystem to enable a diverse and expanding set of Cloud Bigtable use cases for our customers. "

Any idea how the service partners were chosen?


Is this like a direct competitor to DynamoDB? How about open-source solutions, like Cassandra/HBase?


Bigtable is the original Cassandra/HBase.


DynamoDB competitor is more likely Datastore(because of indexes, and way of pricing)


funny they call it "open source" just because it supports other open source API.


Interesting if there will be Cloud Bigtable to BigQuery connector, possible using Cloud Dataflow.


Currently you can use the BigQuery Hadoop connector and write a MapReduce job to scan Bigtable and write everything to BigQuery. Works quickly. I'm sure dataflow support is in the works, since Google internally doesn't really use MR and therefore likely has this on the back end already.

Source--I wrote one of the whitepapers on the BigTable homepage.


Anyone willing to be dependant on this is honestly stupid when you take into account Google's history in this area: unreliability, changes in offerings, changes in pricing, discontinuations of services, hard lock-in, bad customer service, ...


As the blog post says, Bigtable internally runs virtually all of Google's big services. This means it's rock solid, and it's not about to get discontinued anytime soon.


FYI, in the case of Google Wallet for Digital Goods, Google essentially did what you are saying they wouldn't do.

Google Wallet for Digital Goods was retired a couple of months back, but they still use it on some Google properties. It's still used for the Chrome Web Store developer fee & I think Android, too. They essentially just made it private.

You might argue that BigTable going through the same thing would be too impactful, but GWDG was quite impactful as well. Businesses that had subscribers paying monthly subscriptions via Google Wallet lost those subscribers.


> Bigtable internally runs virtually all of Google's big services

Are you sure? I'm not a Googler but have been told by other Googlers that Bigtable has essentially been replaced internally (though heard it's still similar). So I wasn't sure how much Bigtable is even used anymore inside of Google.



The blog post says "the same database that drives nearly all of Google’s largest applications", and I work at Google, so yes, I'm pretty sure ;)

Of course Google has a whole slew of other storage options optimized for various use cases, but some of these are actually built on top of Bigtable.


Haha okay fair enough. Thanks!


Spanner is the successor to Bigtable. It's not like Google keeps this fact secret.


Actually spanner and Bigtable have different and complementary use cases


You've obviously not tried Google Cloud Platform then.


So just pricing will change, got it :)


You can always move to Hbase ... Google BigTable uses Hbase api.


Until the API's begin to diverge - which will happen.


Right now it's a subset of HBase, but the basic operations are unlikely to change. HBase adheres pretty well to the BigTable paper.

You are intentionally forgoing things like coprocessors, and other advanced stuff listed here[1]

[1] https://cloud.google.com/bigtable/docs/hbase-differences


My thoughts exactly. I will look at the features, but I won't even consider it for production use in our company.


Google Cloud platform didn't have most of your problems


It had all of them. To be more specific, I'm talking about App Engine.


Sometimes I get surveys from Google about how much I like App Engine. I'm always hesitant about marking them too poorly out of fear that they'll decide to shut it down if the results come back too bad. They need a question along the lines of "would you rather that we fixed the issues preventing App Engine from being an amazing, awesome PAAS solution, or shut it down?" question.

Ultimately, I do love App Engine. It's just the really poor support from Google that is letting down what could be an awesome platform. Neglected bugs, outdated docs, key limitations (eg. no https on naked domains, which looks like it will hopefully be fixed soon) and such.


All is cool except one thing - it's vendor lock and vendor is known for absence of customers support, often API deprecations and products shutdowns.


does anyone know of a Go client?


https://github.com/google/google-api-go-client is the code-gen Go client for Google APIs.

It probably does not (yet) have the generated client for cloud bigtable checked in (but I'm sure it will), but you can always use it to generate a client. You pass it the API to use on the command line, it will go fetch the docs it needs to make your client, and put its source where you tell it to.


This uses the HBase API. You just connect to it like you would any other HBase cluster, using an HBase client. It's not like e.g., BigQuery or Datastore where you need the API client. You include a JAR and then connect to HBase like normal.


The website claims that you must use their customized version of the Java HBase client library: it does not claim it is network compatible, and seems to state it is API compatible with the Java API (but then describes numerous subtle differences).

> To access Cloud Bigtable, you use a customized version of the Apache HBase 1.0.1 Java client.


Believe it or not, it appears gcloud-golang package already contains go bigtable library :) https://github.com/GoogleCloudPlatform/gcloud-golang/commit/...


    go get google.golang.org/cloud/bigtable


I suppose you should be able to use any Go Hbase client.


No, but there's a Go client for AWS.


Nice, but you still can't use this for your privacy-aware customers.


Care to elaborate on why not?


Depends whether he means "You can't use any SaaS IaaS datastore for privacy-aware customers" or "Google itself is less trustworthy than (AWS or [insert competitor here]).

The former is a valid position - if a fairly harsh one in the current tech landscape. If the latter however - I'd like to see a more fleshed out explanation as I don't see Google being much of an outlier on this front.


Storing data in a database that is managed by a third-party is something that some customers explicitly forbid.


Wouldn't that mean you can't use any cloud data services with any company? Or even cloud hosting? What kind of customers forbid this?


Are you kidding me? This is a very common requirement and one of the reasons larger institutions are building out their own clouds. It's not a requirement for everyone of course, but it's certainly a common theme.


Many in the financial services or healthcare industry. Some of those industries have requirements far in excess of both SOX and HIPAA combined due to SEC scrutiny.


> What kind of customers forbid this?

Government entities, for instance.



> We look forward to working with governments across the country on these exciting initiatives in the months ahead.

So what about foreign governments?


Heard of encryption?


No Python 3? Not interested. All that BigTable development, pointless without drivers. Silly Google.


Do you actually know what hadoop & HDFS are?


I am going to read your comment as "you should be able to use the off-the-shelf drivers for HBase for Python" (I have elided the "3" as no one uses Python 3: that must have been a typo for "2" ;P). The "APIs" that Google describes as being compatible with are for Java, not the network: "To access Cloud Bigtable, you use a customized version of the Apache HBase 1.0.1 Java client.". So, no: it seems like if you are not using Java you will need to pull apart their customized Java SDK and build your own driver.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: