There's a few people asking about how to get something up quickly to play with - this was a problem for me, and it isn't really well explained, but for those interested, Alexander Kiel's datomic free docker image has worked well for me - https://hub.docker.com/r/akiel/datomic-free/ - and an example of it being used with docker compose is here: https://github.com/dazld/urlomic/blob/cc3d1218218b5f08751dfa... (on a wip branch for some playground thing). I'm far from a pro, but hope this helps someone interested in this tech.
Datomic Cloud makes it much easier to get started on AWS. It takes only a few minutes to sign up, see https://www.datomic.com/videos.html for a walkthrough.
For sure! looking forward to becoming a customer. I guess the journey I went through was missing some intermediate step between playing at the repl with an in memory db (which I believe you’ve mentioned before as being a good way to get hands dirty from zero), to having something in dev which was backed by storage, to a toy service, to production.
Jumping into AWS, at least in my work environment, is a big jump requiring lots of supporting infrastructure etc. We’ll get there if it works out, but the dockerized setup is really helping in learning and evaluation.
Officially supported ways of doing this is something i’m sure you’ve considered before, it would be nice if cognitect could reconsider this.
You don't have to be very far "in" AWS to try Cloud. The bastion server will let you connect from anywhere, so the idea is that the only infrastructure you need is a laptop and an internet connection.
That said, we understand the value of a local development mode and that is possible in the future.
This is a big change in pricing approach. Previously the licensing was $5k/year and still is for on-prem. Although you could run datomic for free for 1 year and longer as long as you never upgraded it.
https://www.datomic.com/get-datomic.html
Now they are offering an approach where one can at least start at $1/day (Every time you scale out this goes up? What is the cost of support?).
The cheapest way to get started is probably still to run what is now called "on-prem AWS" (for up to 1 year), which there is already cloudformation for.
https://docs.datomic.com/on-prem/aws.html
Cloud is definitely the preferred way to get started if you are on AWS. When running the On-Prem version you also have to manage (and pay for) storage. Cloud also puts less load on DynamoDB, which can dominate your spend.
I'm not sure it's still true, but awhile ago their EULA banned the publication of performance statistics. Be very careful using this and do your own performance testing.
The Licensee hereby agrees, without the prior written consent of Cognitect, which may be withheld or conditioned at Cognitect’s sole discretion, it will not:... j) publicly display or communicate the results of internal performance testing or other benchmarking or performance evaluation of the Software;
This is a similar style I believe to the Oracle licensing model which used to prevent similar stuff I think.
Building your stuff on a platform (proprietary) with this attitude towards licensing... I'd say run away... fast.
Performance testing is a fickle art and publicizing misleading or non-generalizable results can be extremely damaging to a small business like this, even if the results are not representative nor applicable to most people.
For a relatively novel data model with potentially unusual performance characteristics, the chance of finding a properly done, fair, and generally understandable benchmark is virtually impossible.
A damaged reputation is hard to repair, if even possible.
As a small business, the choice seems to be pretty straightforward--pay multiple salaries to do PR, damage control, reputation management, etc. (and compete with huge corporations with deep pockets)...or pay those same salaries to engineers to build a better product for your customers, and get your customers to promise not to create PR headaches for you.
Tech is a ruthless meritocracy, because people face challenges constantly that exceed their capabilities.
If something sucks, they will shout it to the skies. If it performs better or saves their ass on a key feature, they will shout it to the skies.
Let those shouts echo, and the equilibrium is how we decide on what's next.
Censorship is almost always a bad idea. Furthermore, you shouldn't chill the ability of people to talk about a platform if you eat your own dogfood and believe in it.
That seems like a very negative way to approach benchmarks. It could also be a PR boon--out of nowhere, the company could get positive attention for someone else doing work on the company's behalf. Assuming the technology is performance competitive. (But I don't know if it is or isn't--I've not seen any benchmarks).
A bad benchmark is something everyone can understand and use as criteria to disqualify a technology.
A license that prohibits a public benchmark is something only a few people care about.
Having seen hordes of developers on IRC, Reddit, and HN develop outdated gut reactions that they still invoke years after the fact has led me to believe that benchmark prohibition is easily worth it.
This is absurd. All databases, at some point in time, run into scaling and performance issues. Documentation by blogging and performance metrics are the only guiding light of sanity we have when these hard problems hit your app.
Having used products that bragged about their performance benchmarks (won't name any names), I respect the Datomic team's position on this. Benchmarks lead to gamed results and you still have to test with your workload.
Indeed, I just think the ban on it makes certain things harder to find before you're using it in a production system (with proper production loads). Pagination and sorting, for example, do not exist in the usual sense.
It'd be swell if they put up a list of differences, because using the pull query API + clojure really is quite cool -- it's just good to know what you're getting into.
Our team signed a contract with Cognitect for Datomic late 2017.
This clause was in-place and stood out to me as well. I had a chance to ask their legal team about it. The clause is written in legal-ese, which always sounds overbearing.
I asked the question in the positive sense, "what if have some really nice metrics from our use cases, and want to talk about them at a conference?" They simply asked to be consulted and request written permission to share. The intent, like others have noted, is to request (legally: insist) that Cognitect have a chance to review and point out potential implementation issues (good or bad) prior to customers making performance statements about their product.
The clause can/does put a damper on 'notes from the field' reports, which often help when deciding on tech direction.I look for community based reports to reinforce perceptions of a tool (to a degree). Completely agree with OP, do your own performance testing.
One thing I will say is that it would be hard for someone who hasn't invested in learning the inner-workings of Datomic's decoupled architecture to pick apart storage speed vs. transactor speed. For example, storage speed (SQL, DEV, Dynamo, etc.) is not a concern of Datomic, but a key dependency to measurable perf. This may change in the AWS service announced today, and become more uniform on dynamo and S3 "storage resources". https://docs.datomic.com/cloud/whatis/architecture.html#stor...
Datomic is a unique product and there are many ways to make it sing (or blow up) depending on how you use it. We designed data models, streaming processes, and queries with Datomic in mind and have had success. Exactly how much success, I'm not at liberty to say just yet.
https://danluu.com/anon-benchmark/ - "That time Larry Ellison allegedly tried to have a professor fired for benchmarking Oracle" - Looks like other database vendors do the thing. That doesn't excuse the behavior though.
Then people using PostgreSQL remember how their database is free, rather than a $100,000 a year despite having more flexibility and features, and you see why Larry Ellison would do this.
My experience with Datomic was that it is not appropriate for a system that has performance requirements. We developed a system using redis while the data team was developing the data service on Datomic. When we tried to integrate, the whole system slowed to a crawl and we needed to do a rewrite.
My advice is to develop with Datomic from the start and not separate it out into a service. Any other database will be so much faster that you won't know how bad your performance is until it's too late.
Another piece of advice would be to seriously consider if you need immutability in your database. If it's not a hard requirement, I would not use Datomic.
> ...it is not appropriate for a system that has performance requirements
This is an unreasonably simplistic approach to assessing performance. A useful approach would be to describe what workloads Datomic may or may not be suited for. Having used Datomic (On-Prem) 2 years in production, here's my take on it:
1. For a system that writes a lot (eventually leading to 10 billion datoms or having high peaks of write throughput), Datomic is not a great fit, because it has a single writer thread and does quite a bit of indexing.
2. Reads are horizontally scalable, and because Datomic's semantic allow for pervasive and reliable cacheing, reads can handle a huge load, and have low latency for OLTP-style queries.
3. Datalog is relatively slow for aggregations, and offers no facilities for trading accuracy for speed: if you need big, low-latency aggregations, you should offload it to a specialized store like ElasticSearch. This is especially easy to do with Datomic, because Datomic makes it trivial to implement change detection, in-real time if needed (see Log API and txReportQueue)
4. When writes are overwhelmed and become unavailable, reads stay available, which is an awesome situation to be in.
Having built a variety of products with Datomic and performance requirements I can only disagree. Especially queries are insanely fast compared to other databases.
I don't know what your data team was doing, but they must have done it wrong.
My advice is to develop without a data team from the start ;)
Datomic is interesting. It is somewhat similar to object databases like ZODB. I use an object database called Durus (spiritual derivative of ZODB). Both ZODB and Datomic make use of aggressive caching in the application. That is a huge performance boost for applications that do a lot of reading and little writing. ZODB essentially puts the indexes in the application, rather than having the DB layer take care of indexes. Again, I understand this is similar to how Datomic works.
Developing an application on top of ZODB vs on top of an SQL database leads to a very different data model design. You can't just swap one database layer out for the other and expect things to work well. I'm not surprised if a team who is used to developing on top of SQL-like DBs did not have good luck when moving to Datomic. It is just a very different model.
The main advantage is that you can do time-consistent queries against any past database state across the entire database without any read locks. This frees resources and eliminates the necessity of timestamp tables.
Then there is Datomics caching model in combination with its local data based query engine. The vast majority of critical queries only read from memory. Data that is fetched doesn't block other consumers.
What RDBMS are you comparing to pretty much everything is MVCC so writers don't block readers and readers do not block writers mature RDBMS also are fairly highly optimized written in C with critical paths hand written in assembly so again would be interesting to learn for what workload Datomic is significantly faster. The biggest performance penalty in a modern RDBMS is GC of snapshots so if you need a design that is append only it will make modern RDBMS more performant.
I don't care so much about optimized tight loops written in assembly, rather about the ability to scale nowadays. Datomic uses databases like you describe e.g. Postgres as its storage. So it would be foolish to say that it beats their performance on a bare bones level. Instead, Datomics architecture and information model make it much easier and significantly reduce the overhead to design and implement applications that provide insane performance. I won't argue that you can hand rewrite every Datomic application in its underlying storage database and get more performance out of it if you do your caching and coordination right. It will take you much longer though (I'd guess a tenfold at least), likely have some very difficult to find bugs, and the result won't be as easy to extend. With Datomic, I get memory speed performance out of the box for the heavy hitters and so much more that it take would take some very uncommon requirements for me to choose something else nowadays.
Well I would buy the developer productivity argument but most applications have a mix of reporting requirements that are generally extremely hard to implement on anything "distributed". Also in a distributed system you are either running some consensus algorithm (paxos, RAFT etc) that will def. not be "insane performance" or you it will have issues with consistency.
Datomic uses distributed storage, writes are coordinated by a single instance (transactor). Reads don't block writers and immutability allows to query consistent snapshots. Does that address your concern?
If after three years you think so only "probably", you probably still aren't confident about your requirements.
You don't say to your girlfriend "We had fun the last three years, but probably I'm better off with my ex" unless you have no clue what you are looking for.
I'm using Postgres for a financial app, and the idea of never updating a value in place sounds very appealing over PostgreSQL's construct of updating in place.
I love Datomic, it solves so many problems that I face daily and you end up building a sort-off datomic on top of your SQL DB anyway.
I would have prefered a Datomic for Kubernetes to get this cross-platform and allow it to be hosted on different and private clouds because of my job I can not use AWS. I can't use Kubernetes either but there is at future possiblity that it would be possible.
The downside would probebly be that it would offer less integration as the ecosystem is not so far along as AWS. Still an easly containerised version for devlopment would be really nice.
Are there any open-source immutable databases yet? Or would the hegemony of SQL semantics (which seem quite tied to mutability) somehow disincentivize such an undertaking?
FSVO "database": datascript is in-memory, similar to a slimmed down version of datomic. I think it's fair to say that datascript:datomic :: sqlite:postgres or something. It's in-memory, but there are third-party extensions using hitchiker trees that make it persistent.
There's also Mozilla's mentat and its predecessor atomish which talk to SQLite on the backend.
It's very early days, but we have been working on an append-only key-value store that runs on top of a distributed shared-log [0,1]. We don't have a lot of resources, but it has been a lot of fun hacking on, and I hope it can fill a niche once it's more mature.
Can I ask the same question but focus on datalog not immutability.
I really wanted to use datomic for a sideproject because what I wanted to do seemed to fit really neatly with the tuple and query language of datomic, other than datascript is there a db with a similar query language and way of having linked tuples?
Yeah, doesn't Bitcoin use BerkeleyDB underneath or something? Superficially it's immutable (when just API'ing to the blockchain) but underneath it's backed by BerkeleyDB I believe, which is your everyday mutable sql store
Maybe for querying read values of unspent transaction balances from blockchain ledgers, but the data integrity where it matters is on the chain where people can't screw with it...
Eve had, as I understand it, an immutable datalog+time inspired database at it's core, but it looks like (as with LightTable which preceded it) Eve’s attempt to revolutionize programming is victim of yet-another-pivot.
This looks great, I really like datomic but its licensing is preventative for small projects. Hope this helps it get more love and community attention.
Can you expand on this, what do you find wanting with the various choices? "Small project" can mean very different things between 0 and 10M$ budgets, but there seem to be a lot of options reaching to the low end, even the free Starter edition sounds pretty capable.
The Free Edition has limitations that impact how you would write code -- it supports only the "peer" API and not the "client" API, and is restricted in terms of backend storage choices and number of peers. So while it's very likely true there's perfect API compatibility when you want to step up to a paid edition, it makes you design to the artificial limitations they've used to set apart "free" from "paid".
Indeed, it's also not the recommended way to learn. The Datomic team says: "If you are trying Datomic for the first time, we recommend that you begin with a client library." In other words, the Free Edition doesn't include the client library that they recommend you try first.
If you step up to the Starter Edition you have full functionality and free updates but only for the first year. Beyond that, you start paying. Either you run in Cloud (as low as $1/day) or you purchase the on-prem Pro Edition ($5000/year).
Don't get me wrong. Cognitect has the right to charge whatever they like, and bundle features however they like. It's their intellectual property and they made the investment to create it.
But if you're building a hobby project then even $1/day may be more than you want to spend, and/or more than you want to impose on outside contributors to your codebase.
Seems it's only the free updates that are limited to 1 year with Starter - you can keep using it forever. I wonder how/if they limit you getting a new Starter the next year? Anyone know the details?
Even if this drives you to the $1/day edition, that should cover a lot of small projects. Especially as you don't have to keep it running 24/7 during development.
@gonewest covered why the starter isn't ideal. I have played with the starter but compared to when I worked for a company that did have a license its like a different product.
This makes it hard to select it for small projects because you know you'd have to pay the 5k or switch to something else if you ever wanted to scale.
1$ a day I think is still a bit high for a non-profit driven side project, but definitely an welcome improvement.
What if the deployment model of my "small project" is on-prem operation by end users? How do I evaluate the usefulness of Datomic without forcing those users into a license purchase or going through the up-front cost of buying a license that allows redistribution?
For me a small project would generally mean a clojure web app with a couple hundred users, a clojure admin webapp used only occasionally, and a back end cron job that runs every 5 minutes or so and does the back end work. Any rough guesses what we are looking at to run this monthly using the Solo topology on aws?
Yes, but I was wondering how much memory would be needed for each of the other apps (web/admin/cron) connecting to the Solo. Would they require a full JVM peer, with a couple gig of memory each, right?
Cloud uses clients, not peers, so those apps can be much more lightweight. I have tested down to 128M per Clojure (JVM) client, which is an order of magnitude less mem than a typical peer.
I’ve been in love with the idea of Datomic for a few years now but haven’t had a project to really dig into it with. The AMI makes it irresistible to try... but I’m really not clear on how the client and peer libraries work outside of the JVM. Do they? It seems REST support exists with some client libs for js, Python, etc but they explicitly state in multiple places that it is defacto deprecated and won’t receive further development. I highly doubt given many of Rich’s statements that they would ever break the API, but I’m trying to push a new app towards serverless, and I’d rather not have to mess around JVM keep-warm hacks. A CLJS library that wasn’t orphaned 4 years ago and not based on the REST api would be enough to push me over the edge.
I really wish they'd open source Datomic and just make money on hosting. I can't bring myself to spend anymore of my life learning proprietary software in such a foundational domain.
if you search online they did address this repeated request
in summary
they do it for money,
they dont know how to do it for money if its open-source,
they already contributed clojure to the open source community
I don't believe that Datomic is profitable for Cognitect or ever has been.
Given how profitable it is for my customers that truly is a shame.
I wish they'd open source it, primarily because reading Rich Hickeys code is always enlightening. But I can see how cautious they are about it and am just glad they haven't already sold out to some big company who pushes tons of "business needs" into it.
IME you can't really compare Datomic to a SQL database. If you know Datomic it is like comparing a filesystem to a harddrive.
Popular databases at scale can be tremendously popular too. This is the profit model of databases like Neo4J. Not sure if it worked out for them, but it seems like a viable route to popularity: go big at scale.
Speaking of open source Datomic, check out mentat. It's not at all the same as an open source Datomic, because it's made for use on single devices not clusters, but it's inspired by Datomic so at the same time it is very much like Datomic.
The similarities to FoundationDB (a unique closed source database technology acquired by Apple, no longer publicly available) encourages additional caution.
I poked around for a few minutes trying to find any assurances Datomic offers their customers should they be acquired, but couldn't even find the current EULA accessible online (older versions didn't seem to mention anything).
It's interesting really. Thinking back 20 years, nobody would have expected or dreamed of a having a production level database be open source. It stared I think mainly with MySQL but it was really part of a greater shift. Compilers, OS-es, other tools and most software was closed source, by default. Shareware was a thing but it wasn't open source usually.
To that effect I mostly credit the GNU licensing model. I know people love to hate it and praise BSD/Apache/MIT licensing but I believe without GNU, the proliferation would not have happened.
But back to the point, it is interesting that the expectation is reversed completely - people expect databases to be open source by default. Having said that, I still support author's decision to keep it closed source, it's their work and they intend to monetize it in a particular way. It's been around for years, presumably it works for them, which is great.
http://dtrace.org/blogs/bmc/2004/08/28/the-economics-of-soft... seems pretty prescient given the glut of open source we find ourselves enjoying today. I do agree a lot of it can be credited to the GNU model, that is aligning on values over other things, but once more and more companies started understanding the economics that was immensely helpful too.
That's a good post from Brian, can't believe I haven't see it yet. Thanks for sharing.
> seems pretty prescient given the glut of open source we find ourselves enjoying today.
Indeed, even regarding databases:
[from blog post] > Yes, there have been traditional demand-side efforts like MySQL and research efforts like PostgreSQL, but neither of these “good enough” efforts has actually been good enough to compete with Informix, Oracle,
Look at PostgreSQL today -- it's got a variety of indexing options, scalability improvements and various other things. It hasn't displaced Oracle but it's certainly not a research effort anymore.
I'd be interested to see if people would candidly unpack their rationale a bit more in this area.
Before I get into that part, forgive me, but I must admit that I've begun to tune out when I hear the "make Datomic open source" commentary. Still, this particular line of commentary returns from time to time, so I'll weigh in.
This reminds me of "meta stories" that take over the original story. On Hacker News, I'd much rather hear about technical commentary, lessons learned, or interesting domains and applications used in Datomic projects.
I tend to be less interested in hearing armchair quarterbacking around what business model would work better for Datomic, particularly when the arguments seem:
1. largely motivated by self-interest. Open source is often perceived as "free" to software developers. It is relatively easy to say "I would rather not pay for this software, why isn't this open source?" Of course, open source is not necessarily really free, due to integration and maintenance costs. In the cases where open source projects are abandoned, projects face substantial risk and transition costs.
2. not framed around the long-term interests of Datomic (at least the arguments rarely seem to make suggestions from the perspective of Cognitect, which invests in the product and makes income from sales)
It is easy to say "make it open source". It is harder for a company to find a business model that works. It seems that Datomic's business model is working. There is a free tier and paid tiers.
I know that I cannot properly summarize all perspectives with the ideal amount of nuance. Perhaps some people think it is really in Datomic's interest to be open source.
Nevertheless, it seems to me that many arguments people make are somewhat unexamined. Let's go a level deeper.
May I ask this: How many of Amazon Web Service's offerings are based on open source software?
I ask that question because there are four follow-up points I would like to make:
1. People use AWS quite extensively.
2. AWS is based on closed source software.
3. I don't think it is a coincidence that AWS is so successful.
4. It seems totally reasonable (and arguably the smartest thing to do) for Datomic to stay with a closed source model.
Also, I have no affiliation with Datomic, but I have used it on projects.
Datomic isn't Amazon. People aren't afraid of it going away and they know that Amazon actually uses their services as a part of their core business. This makes people much more willing to use something like Lambda when they might be afraid of the long-term viability of something like Datomic.
Just look at what happened to Parse. What if Datomic one day gets bought by some company that doesn't care about offering Datomic cloud or keeping it maintained? At least with Parse, people could migrate to an open source solution.
> 4. It seems totally reasonable (and arguably the smartest thing to do) for Datomic to stay with a closed source model.
It is 100% understandable / practical / reasonable, yes. It is the same to recognize using Datomic technology as a core part of a technology stack involves some degree of additional risk, which grows as dependencies increase on its unique capabilities.
All dependencies carry risk. Open source projects are abandoned all the time. What do you intend to take over maintenance of your database software if it is abandoned? Highly unlikely.
Agreed, and it is often tough to evaluate to any exact degree or measurement.
However, uniquely irreplaceable closed-source dependencies seem classifiable as carrying comparatively greater risk than thriving open source projects. This specific case is compounded because there are few (if any?) production-quality alternatives that offer what Datomic does, whether closed or open.
> What do you intend to take over maintenance of your database software if it is abandoned?
I would phrase this as "Can I hire a domain expert to fix blocking bugs in my database software if it is abandoned?", but this doesn't really account for the uniqueness of Datomic (unnecessary to really consider further since any fix is impossible due to being closed source).
It boils down to "is my business even possible if this disappears" evalutation considering Datomic a novelty that enables and/or dramatically simplifies very specific use cases (profitable even in the short term) that basically wouldn't be possible without it -- and I believe Datomic does this! The best marketing for Datomic would reveal these use cases, but they are often a competive advantage.
> What do you intend to take over maintenance of your database software if it is abandoned?
Yeah. Unlikely I'd be adding any additional real features or anything but why not? It isn't magic. I'd also expect a lot of other people doing the same and we can share the maintenance load.
There is also a lot of functionality that is obviously backed by very nice code a lot of people would like to see and potentially use other places. But knowing you can do necessary maintenance and tweaks, even if it is while you rewrite your whole db layer to work with another product rather than a permanent thing, is a serious benefit.
I'm not sure there's much rationale to unpack. What benefit does proprietary have over open for the user?
The reason I even bother to state the obvious in the case of Datomic is that having this 1 restricted area in the otherwise open ecosystem around Clojure makes me want to start looking for an alternative environment. I'm sure everyone else that says it feels similar and just hopes if enough people say it, it'll be opened up. Obviously, Clojure can be used without Datomic, but I can't help but feel like I'm missing out on the bigger vision by not using it and I don't like that feeling because that's exactly how vendor lock-in begins.
Several people (including Feross, who I know) have made comparisons of Datomic to gun, which I have made open source: https://github.com/amark/gun they're based on very similar principles and architecture of functional reactive programming (elm and eve are also in that camp, but not databases).
> "The regulation applies if the data controller (an organization that collects data from EU residents) or processor (an organization that processes data on behalf of data controller e.g. cloud service providers) or the data subject (person) is based in the EU."
This is explicitly supported in Datomic via an operation known as "excision":
> "Excision is the complete removal of a set of datoms matching a predicate. Excision should be a very infrequent operation, and is designed to support the following two scenarios:"
> "- removing data for privacy reasons"
> "- removing data older than some domain-defined retention period"
> "Excision should never be used to correct erroneous data, and is unsuitable for that task as it does not restore any previous view of the facts. Consider using ordinary retraction to correct errors without removing history."
I haven't read whether or not there's a difference here with respect to the cloud offering, as I wasn't able to find a corresponding Reference section in the cloud docs. However, given that :db/excise is a transaction operation, it's likely supported.
Edit to add: Looks like it currently isn't, see 'davidrupp below. That's important to keep in mind.
> "Furthermore the regulation also applies to organizations based outside the European Union if they collect or process personal data of EU residents."
It's really not clear to me how a company like datomic would cope with that in any practical way. Are they supposed to know what kind of data their clients store??
I attended a GDPR session at Re:Invent this year. Basically, there's two types of organizations: data controllers and processors. Datatomic, AWS, Azure, etc. would all fall into the processors category. Amazon.com, however, would fall into the data controller category.
I agree these are important questions, and they apply equally to various offerings in AWS, Google, Microsoft, and others. It's not something specific to Datomic.