I am currently writing a Golang client for Elasticsearch which uses the native binary protocol and I have to say the lack of documentation about it is making the process really painful!
I tried to use the Elasticsearch thrift plugin but unfortunately it does not work for the version 1.1 and 1.2
So basically I have to inspect each and every byte of each and every request and response in order to be able to send or parse data.
While developing the client a managed several time to crash the Elasticsearch server by sending malformed packets. In addition, this, brought me to review the networking part of Elasticsearch code and I think it needs a refactoring and a better, deeper and cleaner usage of Netty.
I hope they will soon sort out this and the problems mentioned in the article, since I think that Elasticsearch is really an amazing product!
This brought me to dig deeper into Elasticsearch code, find out more about its code quality, deal with machine endiannes, deal with byte shifting, think how to structure code in Golang and overall enjoy the feeling of touching the bare metal again...
I guess bc theoretically someone is paying you to write applications at a fair clip. Granted, we have no idea what your role is or what your goals of the project are, so we are probably completely wrong at your own role! :)
So if Elasticsearch is a stringified (JSON/HTTP) wrapper around Lucene with simplified setup for the web app crowd, you are making a binary-fied wrapper around it. Why not skip ES altogether, or use the JSON api?
ElasticSearch is a lot more than just a "stringified wrapper around Lucene". Lucene is used for the underlying inverted indexes, the item store and tokenization/analysis, and that's pretty much it. ES adds clustering, a query DSL, configuration, data mapping system, "river" functionality, HTTP API etc.
The clients actually acts as a cluster node and therefore has knowledge about the cluster state, its indexes and shards, because it receives notifications from it, once it joins.
This allows to execute operations on a specific shard of a specific index on a specific node of the cluster resulting in better performance than going through the HTTP interface.
It can be used to efficiently store big quantities of data, for instance logs, which then can be visualized with Kibana.
It's just unfortunate that Elasticsearch presents the problems mentioned in the article and which I also experience in production, because it has a series of plugins which makes it a good solution for specific use cases.
I'd recommend not using the native binary protocol unless you have proof it makes a substantial difference for your application.
If you need to do bulk work, connection pooling, keep-alive, and batching on the client-side over HTTP can easily vastly exceed what ES cluster can handle. Users of my library have confirmed this.
You could use my library as a guide to the abstract data types, even I don't use the native protocol.
I tried to use the Elasticsearch thrift plugin but unfortunately it does not work for the version 1.1 and 1.2
So basically I have to inspect each and every byte of each and every request and response in order to be able to send or parse data.
While developing the client a managed several time to crash the Elasticsearch server by sending malformed packets. In addition, this, brought me to review the networking part of Elasticsearch code and I think it needs a refactoring and a better, deeper and cleaner usage of Netty.
I hope they will soon sort out this and the problems mentioned in the article, since I think that Elasticsearch is really an amazing product!