Having used mongo in a professional context, I'm sort of amused by how much vitr...

brainlock · on April 17, 2014

> Relational databases are great, but they're almost an optimization -- they're way more useful after the problem set has been well defined and you have a much better sense of data access patterns and how you should lay out your tables and so on. But a lot of times that information isn't obvious upfront, and mongo is great in that context.

I think it's exactly the other way around. I prefer to lay out tables in a way that reflects the meaning and the relationship in the data. Only later, if there is a bottleneck somewhere, I might add a cache (i.e. de-normalize) to better fit the specific data access patterns that were causing trouble.

prawks · on April 17, 2014

I think if your understanding of the domain is complete enough that you can map out a relatively wholesome definition of the relationships of data then you're probably right.

I think the advantage the parent is talking about is when you're exploring a domain that you do not have as deep of an understanding in. Schema-less data storage can be helpful in that exploration, as it allows you to dive in and get messy without much upfront consideration. Then afterwards you can step back with what you've learned/seen/experienced and build out your conceptual model of the problem domain.

brainlock · on April 17, 2014

Thanks for spelling that out, I see that it can be a useful exploration tool.

For me though, the act of thinking about a problem in terms of data and relationships helps a lot in exploring and understanding what I'm dealing with. Even more so in an agile setting, where things can and will be changed often - the schema is no exception, it can be changed. No need to cast in stone the first schema you came up with.

But that's just my preferred way of approaching the problem :-)

splawn · on April 17, 2014

I share your preference, but I am always curious how others do things. IMO the datastore is the most important piece of a CRUD app, because it is the foundation that everything hangs off of. So from my point of view tracking changes to its structure is extremely important in avoiding major headaches. How do developers manage this without a schema definition? Mongo's popularity has always made me second guess my assumptions about how important I think this is.

threeseed · on April 17, 2014

See I disagree. As a developer the core of a system for me is not the datastore but the code. I have my domain model in code e.g. User.java and it gets stored transparently and correctly to the database using the ORM every time I save it. The ORM will never compromise the integrity of the database by say trying to store a Date in the wrong format.

So you have to ask yourself what is the schema getting me ? It doesn't help with validation or business rules enforcement since I do that in code anyway. And it doesn't help with integrity since in the case of Java my data is strongly typed so no chance of storing types incorrectly.

balfirevic · on April 17, 2014

Thanks for sharing your point of view. I'm still not convinced that it's much easier to change schema with MongoDB. Do you mean before there is any production data that needs to be kept indefinitely? If that's so, then I get it. But if you have some production data and your data model changes, you have two options:

1) Don't migrate any existing data. That means you must forever handle the old form of data when you read it from the db.

2) Actually migrate and transform existing data in the database to the new data model. In which case, it seems easier to do in RDBMS because schema changes are at least properly transactional and you actually now what exact schema you're migrating from.

Additionally, with the relational database, you simply don't have to make as many decisions about how to store your data early on because your data model is less dictated by your usage patterns (because you have joins and other tools at your disposal). In my eyes, that's a big advantage that relational databases have, even for prototyping.

ulisesrmzroche · on April 17, 2014

You're putting far too much importance on the DB, when really, it's an implementation detail. It has nothing to do with your app, really, and letting your app be a slave to a schema is an anti-pattern. In the end, all a schema really is a fancier linter.

This is one of the advantages of Mongo. No schema, no longer an issue.

zaidf · on April 19, 2014

I'm fascinated by this perspective because for any business, it's the data that is treasured more than anything else. For example, as a bank I can afford to lose all the app code but I cannot afford to lose a record of my customers and their balances. Therefore, I would never see data as being inferior to my app. The app can be rebuilt from logic; data not so easily.

In my perspective I don't want my app to even touch the schema. That is not the app's job.

It also means that just because a dev decided that the user object no longer needs the DOB field that that field will be discarded. Even scarier, what precisely happens in those situations varies from implementation to implementation. Someone who is handling the database directly will think many many times before deleting any db column. Even then, he will take a back up. I don't see the same discipline among developers when dealing with the same data, just abstracted via an object programmatically.

JonnieCache · on April 17, 2014

I would certainly be happier using a schemaless database with a strongly typed language.

I guess it comes down to where you want your dynamism: in the app code, or in the persistence layer. Using a highly dynamic language with a schemaless database feels very unsafe to me. Similarly, using a strongly typed language with a relational DB is sometimes a pain.

I wonder, when one makes a large change to one's app code in the "strongly typed language + schemaless db" scenario, what happens to the existing data that was persisted before the change, which is now in a format that is incompatible with the new model code?

I'm used to writing migrations that change the schema and update the existing data if needed, all inside a transaction.

splawn · on April 17, 2014

I just had an ah ha moment, thank you for that. It looks to me like the trade off is deciding where to put business logic based on what needs to access it. The setup you describe sounds great as long as all access to it is through your ORM. I usually assume access from multiple applications is going to happen at some point.

ericingram · on April 17, 2014

We solve this by creating a schema definition for mongo collections, from within the app layer. Now we have easy version control where we want it, not in sqldump files like we used to. That was painful.

dev360 · on April 18, 2014

Most full stack frameworks have a way to create migrations. When the schema changes over time, you just get a mess with mongo, thats just my observation. Its far easier to manage change with db migrations.

ericingram · on April 18, 2014

You'd be right except that we also have the ability to create data migrations based on schema changes over time. I'm glad it's not built into MongoDB or else we wouldn't have the (fairly ideal) solution we built to work with it today.

nealabq · on April 17, 2014

During early prototyping, when I'm still figuring out the problem and how the user thinks, that's when I try not to worry too much about a schema. For me a data model is kinda seductive; of course it's necessary when you know what you're doing, but it draws my attention away from the messy uncontrollable users.

collyw · on April 18, 2014

"IMO the datastore is the most important piece of a CRUD app, because it is the foundation that everything hangs off of."

I agree. I think it is similar to when Linux quoted:

"Bad programmers worry about the code. Good programmers worry about data structures and their relationships."

Ok, he is talking about low level systems programming, and databases are a level above that, but I think there is still truth in it.

tegeek · on April 17, 2014

IMO you can't generalize every problem domain by specifying schema first approach.

I developed a real time analytic solution where the system was suppose to receive unstructured web data. And each client of system can send any type of data & generate unlimited number of reports with unlimited number of fields and all of this in real time. When I say real time, I literally mean all processing was supposed to be done in sub-second. Above all the size of data was 100s of GBs every day. None of RDBMS would have modeled this problem with the efficiency like MongoDB did.

Most of the web related things like analytics, CMS systems or when you need high performance and horizontal scalability, its hard to beat Document DBs like MongoDB.

dev360 · on April 18, 2014

I found it to really suck with horizontal scalability and high performance. We are paying 3k per month for mongo hosting because its such a pain to manage and it performs so poorly at just 150gb of data that we need 3 shards which I find incredibly ridiculous. I would pick postgres tables with JSON fields over mongo any day of the week.

threeseed · on April 17, 2014

You missed the point.

Since most companies now are doing Agile development there isn't the big upfront design process where the data model is clearly understood at the beginning. Instead you have this situation where the schema is continually evolving week by week and hence this is why schema less systems can be appealing. It isn't about performance.

kpmah · on April 17, 2014

I would argue if you don't know how your data is going to be used then you should use the most flexible option - a normalised relational database.

This gives you the most flexibility when querying, whereas with a denormalised database you need to know how you're going to query it ahead of time. Unless you want to abandon any semblance of performance.

manoDev · on April 17, 2014

IMO, this is the worst argument. There are multiple schema evolution tools for SQL, there's nothing stopping your team from changing the schema every week - plus, it's not hard, certainly less hard than having to maintain code that deals with multiple document schemas at once.

JonnieCache · on April 17, 2014

Rails-style migrations (did they invent them? I have no idea. I currently use Sequel's implementation) allow you to change the schema as often as you want. I often write half a dozen of them over the course of a feature branch. The rest of the team merges them in, and when they boot the app they get a message saying they have migrations to run. It's always been supremely reliable, even after complex merges. It gets more involved when you have multiple interacting DBs, but what doesn't?

You have to write sensible migrations of course, or you can get into trouble. This is a feature.

Obviously wide-ranging schema changes may require you to rewrite a lot of your app code, but I don't see how that's different for a schemaless database.

My bigger worry is that every "schemaless" app still has a schema, just one that isn't well defined, similarly to how an app with a big mess of flags and boolean logic still has a state machine representation, it just isn't codified, or even finite or deterministic for all the developers know.

threeseed · on April 17, 2014

The point is that if you using an ORM and have domain classes then it is unnecessary and annoying step. You have to make changes in two places rather than just one. Most people using MongoDB I know are seasoned enterprise Java developers and we have used many schema management tools for the last decade. It is a giant relief to be finally rid of them.

manoDev · on April 17, 2014

IMO, it sounds like the wrong remedy for the right diagnostic. I would never throw away the database because I'm duplicating information between ORM and domain classes. This seems more related to the particular constraints imposed by your codebase/architecture than the database.

Right now I'm writing a REST API that will be consumed by web and mobile apps. It would impractical to duplicate validation across all codebases. Rather, I'm leveraging the database checks to get form validation on all clients for free. The application layer is thin, adding a field amounts to adding one line and running a command.

I believe it boils down to which component rules the system: application or data layer.

raverbashing · on April 17, 2014

The reason why this is not always possible is the exact reason PostgreSQL has added a json field type

Otherwise you could create a table for every XML schema you would have and store it like that

Some systems require that. It's a new schema every week/month. And this is not the development of the system, this is the bread and butter of it.

manoDev · on April 17, 2014

I agree, but this is something else. The parent was talking about evolving schema. You're talking about what is effectively unstructured data. In this case, the main concern is being able to store the data first, and figuring out how to deal with it later, at the application layer, after you've added intelligence to deal with the new schema(s).

shadowmint · on April 17, 2014

yeah, one million migrations are fine right?

anyway...

The point is that rapid prototyping and rigid database hierarchies are diametrically opposite.

If you can maintain a flexible sql database, thats great. However, my experience has always been that the 'normalised databases are good' crowd either a) are DBAs trying to remain relevant or b) people who have never actually done that in a project; because its not flexible and dynamic, its performant.

It depends on your problem domain; and servers are so ridiculously overspec'd these days (linode $20/month is a 2GB of ram machine) that performance optimisation is severely less important than rapidly developing functionality.

camus2 · on April 17, 2014

how mongodb is more rad than MySQL or Postgresql?

anyway NoSQL or SQL you'll still have migration issues if you change the way your application consume datas.

if you have an array of category strings in a document and then you decide you prefer categories to be a dictionary with title keys, you still need to migrate your old datas. NoSQL or SQL same thing.

I think what made MongoDB interesting at first place is the use of JSON for storing documents,and the aggregation framework.

then you realize simple SQL queries are almost impossible to write as aggregate, so you end up writing map/reduce code which is slow and gold forbid uses javascript.

At first you think it solves the impedance mismatch problem,then you realize MongoDB has its own datatypes and you still need an ORM anyway because at some point you need to implement "single table" inheritance because your code is OO.

Now about perfomances. They are good yes, but only if you use "maybe" writes.

Now in my opinion, CouchDB does far less,but the little it does ,it does it better than MongoDB. curious about CouchBase though.

The only reason i'd use Mongo is when using nodejs,cause frankly Mongoose is the most mature ORM on the plateform,and that's quite a bad reason.

SideburnsOfDoom · on April 17, 2014

> most companies now are doing Agile development ... you have this situation where the schema is continually evolving week by week

I haven't seen that level of churn in any of the agile places that I have worked and it is not an inevitable consequence of agile working. If you don't want schema churn, then don't do that.

mlieberman85 · on April 17, 2014

I agree it has its uses but I feel like MongoDB (the company) sometimes puts forth use cases for MongoDB (the database) that are untenable. For example the pre-aggregated reports use case that they tout that fails at any reasonable scale (e.g. hundreds of upserts a second against tens of thousands of documents)

overgard · on April 17, 2014

Yeah, I can agree with that. A lot of their claims of scalability and benchmarks aren't lies, but they come with a lot of asterisks. I know in my last job we used both mongo and (mainly) microsoft sql server, and frankly, I enjoyed developing against mongodb way better, but if I needed something to be consistently fast, there was no question I'd put it in sql server. (Mongo's query speed can be really variable, and the tools to diagnose query planning kind of suck compared to more established databases)

jeffdavis · on April 17, 2014

"Relational databases are great ... after the problem set has been well defined ... But a lot of times that information isn't obvious upfront, and mongo is great in that context."

A lot of people get stuck on the schema because they are trying to make it perfect upfront and hit analysis paralysis. I think that comes from other database systems that make it very hard to change the schema.

In postgres, just define a very simple schema and get started with your prototype. It may just have a couple ordinary columns, and then a JSON or JSONB column as a catchall.

Then, you can add/remove/rename columns and tables (all O(1) operations) or do more sophisticated transformations/migrations. All of this is transactional, meaning that your app can still be running without seeing an inconsistent (half-migrated) state. I believe this level of flexibility far exceeds what document databases offer.

opendais · on April 17, 2014

Most of the vitrol stems from the fact that MongoDB's default settings were unsafe for a very long time and people were burned by it. Imo anyway.

threeseed · on April 17, 2014

Which is nonsense since every driver since the beginning set the default to be safe.

Most of the vitriol comes from people who never even used it.

opendais · on April 17, 2014

https://blog.rainforestqa.com/2012-11-05-mongodb-gotchas-and...

"MongoDB allows very fast writes and updates by default. The tradeoff is that you are not explicitly notified of failures. By default most drivers do asynchronous, 'unsafe' writes - this means that the driver does not return an error directly, similar to INSERT DELAYED with MySQL. If you want to know if something succeeded, you have to manually check for errors using getLastError."

We have very, very different definitions of safe defaults. Especially when this fact was poorly documented early on and caused people to be surprised by it.

You may wish to consider that other people have different perspectives of what 'safe' means. Safe, to me, means:

1) Fsyncs to journal every write.

2) Returns an error immediately if an error occurred.

dev360 · on April 18, 2014

The python drivers never made it safe by default.

cachehit · on April 17, 2014

To me this whole discussion seems equivalent to the static/dynamic typing war in programming languages. And so far, none of the camps have "won" as has been predicted over and over again. There are tradeoffs to both and people will prefer one over the other for what they will believe are rational reasons but probably really just a series of coincidences.

jeffdavis · on April 17, 2014

Informed, perhaps even constructive, criticism is not "vitriol".

threeseed · on April 17, 2014

A lot of the vitriol seems to come from the PostgreSQL camp. The enemy used to be MySQL but since they lost that fight they seem to enjoy spreading FUD about MongoDB. If they just focused on improving their own product i.e. easy of use, sharding, clustering etc. there wouldn't be a need for MongoDB or even HBase and Cassandra.

worklogin · on April 17, 2014

Postgres is winning the war with MySQL, if you're going to classify it that way. And from everything I've read and heard on HN, Postgres is constantly improving on clustering/HA fronts.

threeseed · on April 17, 2014

No MySQL won many years ago. Every shared hosting provider uses the LAMP stack. And PostgreSQL has a pretty average scalability story. It lacks the simplicity of MongoDB or the flexibility of Cassandra or HBase. It needs a lot of work.

SideburnsOfDoom · on April 17, 2014

> No MySQL won many years ago.

Yeah, and Internet Explorer won many years ago. Then it lost. Because it stagnated while the others forged ahead.

neverminder · on April 17, 2014

I wouldn't exactly call that winning. It's like saying that Justin Bieber won because he's so popular. The article seems to focus on features rather than popularity. Yes, MySQL will be used mostly by amateurs for a long time, because for all it's flaws it somewhat compensates with ease of usability. I as a professional however will celebrate the day my company finally finishes the move from MySQL to PostgreSQL.

mindcrime · on April 17, 2014

Every shared hosting provider uses the LAMP stack.

The world is a LOT bigger than just shared hosting providers.