More

jgeraert · on Sept 24, 2018

Well, there are multiple services involved in an email stack so I guess it does what you ask for.

JoshuaAshton · on Sept 26, 2018

What I mean is, what a person would honestly consider a service (such as e-mail, a website, a git server), not a technical definition of a service (a database, dovecot, php).

I should have been more clear.

jgeraert · on June 1, 2018

I guess the same goes for Lambda.

jgeraert · on Dec 18, 2017

I'm so glad someone finally confirms my idea about using hibernate. I've been in constant battle with developers that map full object graphs coming Frome some endpoint to hibernate/jpa annotated classes and then throw it at hibernate. Here you save this... It just doesn't work that way very well. It's always a mess with relationships. Whereas you run your logic on entities attached to the session things work out nicely.

jgeraert · on Dec 5, 2017

Isn't the log of deleted users subject to the GDPR then?

hobofan · on Dec 5, 2017

You can make a log of deleted users without it containing personally identifiable information, by just storing the IDs.

zaarn · on Dec 5, 2017

No since the GDPR exempts things you need for legal compliance, thus a list of users who have asked to be deleted is fine if it's being used to ensure compliance.

jgeraert · on May 2, 2017

Seeing this as well in proprietary software sold by one of the biggest erp software vendors... And it's still not fixed and likely won't be fixed anytime soon.

moron4hire · on May 2, 2017

There is a perception amongst lower programmers that FKs are only a thing you enable in production, that they are an impediment to rapid iteration.

I mean, it's completely wrong, I know that. But try being the new guy telling a team of 10 people that.

There is also a perverse corruption of "if it ain't broke, don't fix it" that goes on. If you can spend 100 hours manually validating every relationship in your application code, that's 100 hours you can put into your estimate and you know you can complete. If you only have a cursory understanding of SQL, then "learn more about FKs and implement them across the DB" seems like a big, unknowable blob of time that is impossible to estimate. It doesn't matter that it might only be 5 hours of work, at least we can be certain about 100 hours and bill the client for it.

Finally, there are a small set of things that are fundamentally wrong about all modern RDBMS implementations. For example: it's nonsensical to have a foreign key that isn't indexed. You always want an index on foreign keys, there is never a scenario where you don't want them. But while primary keys are indexed by default, foreign keys are not. And those sorts of things give the anti-RDBMS crowd enough of a foot-hold to argue for continued ignorance.

mattbroekhuis · on May 2, 2017

I have found out the hard way that foreign keys are indexed by default in some implementations including MySQL innodb. Not in oracle.

My experience has been the opposite of yours -- I've heard of disabling foreign keys in production only for performance reasons, but not the other way around.

I like them turned on all the time as well. Valuable safety rail

aidos · on May 2, 2017

Curious, this doesn't fit my mental model: it's nonsensical to have a foreign key that isn't indexed

I guess you normally want indexes on parent - child tree relationships (book -> chapter), but you don't want them all the time. What about when you have an 'article' with a 'status and a 'category' and you only ever find all articles by combination of category and status? In that case you'd be maintaining 3 indexes, category, status and status_category, but the only one you'd need would be status_category.

Indexes are a tool to to allow for optimising lookups while foreign keys are a tool to allow you to keep your data consistent.

moron4hire · on May 2, 2017

I don't see how your example doesn't require an index, specifically because, as you said, indexes are a tool to allow for optimizing lookups.

I imagine that maybe you're suggesting you first query the DB for the ID of the combined status_category, and then query the article by that ID. That's not a good idea, for a couple of reasons. First, you're making two round trips to the DB when you could, with no additional effort (just effort in a different place) be doing one. Second, you've introduced a data race condition. If someone deletes that status_category after you've queried for it but before you've queried the articles, you aren't going to get the results you want.

It would be better to do a join across articles to status_category to status and category, then query based on the status and category values you want. Without an index on the FKs between status_category and status and category, a relatively small table can have a big impact on query performance.

Finally, while I know your example is arbitrary, it's a little hard to argue against a design that is probably wrong. I doubt the suggested schema for articles and categories is a good one. If I argue "you should never have to arbitrarily subtract '1' from a result just to get the results you want", it would not be a good counterargument to say, "yes, but sometimes you want to add 2 and 2 and get 5, so then you need to subtract 1". The problem isn't where you see it.

FKs aren't just a consistency tool. Consistency and referential integrity are features that results from having an FK, but the FK is a signal that data can be searched in a certain way.

dragonwriter · on May 2, 2017

> I don't see how your example doesn't require an index, specifically because, as you said, indexes are a tool to allow for optimizing lookups.

If you never lookup by a column alone, you don't need a single-column index on that column. An FK need not ever be a lookup target (the target column[s] it references are necessarily a lookup target, but not necessarily vice versa.)

An FK is probably usually going to want some kind of index, but it's not nonsensical to have a non-indexed FK.

aidos · on May 2, 2017

Touché! An index signals the data should be searched in a certain way!

I don't think I explained my arbitrary scenario particularly well :-) I'm definitely not suggesting 2 queries.

If I have articles that could be category: math|science and source: website_a|website_b|... and I only ever query for source and category together then the other indexes aren't used.

It's a contrived example, but in my mind the existence of a foreign key doesn't imply an index is required.

irishsultan · on May 2, 2017

I assumed they meant an index on multiple columns (status, category), not a separate table for status_category with an index on that.

jrochkind1 · on May 2, 2017

I think in general most rdbms require indexes to properly enforce foreign key constraints in a reasonable way.

dspillett · on May 2, 2017

If the parent entities are immutable, or are at least immortal and have primary key values that are immutable, then the DB engine will only need the index on the target primary/unique key/index to maintain the constraint.

Though in general this is not the case, as you say, it is the case often enough that enforcing "FK means an index" could be an annoyance.

jrochkind1 · on May 2, 2017

The number and kind of bad ideas a bad development team can have is virtually limitless.

scriptkiddy · on May 2, 2017

> There is a perception amongst lower programmers that FKs are only a thing you enable in production, that they are an impediment to rapid iteration.

Really? I've been working as a web programmer for about 5 years now. I use the term Web Programmer because I feel like the term Web Developer carries with it the connotation that you only work with Javascript.

My specialty is definitely with Python and more specifically with Django. Django's ORM is the first ORM I ever used. Maybe it's because I self taught, but I never had the inclination that FKs were an impediment to rapid iteration. In fact, quite the opposite. FKs are a fantastic way to enforce relationship constraints between tables/objects/models/whatever you want to call them.

Whenever I start on a new project, the first thing I do is start defining my data structure. In Django, this mostly involves using the ORM layer to define Model Classes. I usually define the core sets of models necessary for the application. As an example, if I were building a simple blog, I'd start by overriding Django's built in user model so I have the flexibility to add columns or place constraints on existing columns. Then I would define the post model which includes an `author` and a `category` FK column. Then I would define the `Category` model. I generate a migration script and run it. Every time I need to make a change to the Schema, I simply add or change whatever I need to and generate a new migration script alongside the old one. These migrations are dependent upon previous migrations and have version numbers. If I need to, I can roll back to a previous schema. Django makes this all very simple and actually separates the migrations out by which app the model definitions reside in. This means that I can make changes to multiple model definitions and only apply the changes to the Db for one of those model classes. It's very flexible.

So, after my long-winded explanation above, I don't see how people could find FKs to be restrictive when there are so many tools like Django's ORM and migration system that makes altering your schema so simple.

TimJYoung · on May 2, 2017

Re: FKs being indexed. This isn't always the case. Our embedded/workgroup-level database engine, ElevateDB, always uses a system-generated index for all PK, FK, and UNIQUE constraints. They're needed for constraint enforcement, anyways, and offer good "default" performance improvements by ensuring that the query optimizer can flip (most) inner joins in order to improve nested-loop join performance.

dspillett · on May 2, 2017

> it's nonsensical to have a foreign key that isn't indexed. You always want an index on foreign keys

Incorrect, though cases where you don't want the index are rarer than those where you do especially when thinking about simple examples.

> there is never a scenario where you don't want them.

If the parent entities never have their primary key values changed and are never deleted, then the database engine itself will never use the index to enforce the key. If you never need to join from the parent entities to the child entities then your queries are unlikely to make use of it either.

The extra index takes space (maybe a fair amount of space if the key is wide and/or you have a [bad] design with a wide clustering key), potentially space in your in-memory page pool, and processing time & IO during inserts, updates and deletes. If you are unlikely to need the index then why take that hit?

> But while primary keys are indexed by default, foreign keys are not.

Primary keys need to be indexed to avoid a full table scan on every insert (or update of a key value) to that table or any table that refers to the key via a foreign key constraint. Foreign keys will only cause a scan with no index present if a key value is changed (or a row deleted) in the parent table. As some entities shouldn't be deleted and primary key values should be immutable, it follows that this sort of situation can happen.

It would be possible to make the index optional by other means then requiring you to declare it if you want it, but SQL's modelling language tends towards declaring what you want not what you don't want so it fits better with the syntax to have you add the index if you want it rather than deleting it if you don't or having something in the syntax like "ADD CONSTRAINT fk_key_name ON (<field(s)>) REFERENCES <target>(<field(s)>) WITHOUT INDEX".

> And those sorts of things give the anti-RDBMS crowd enough of a foot-hold to argue for continued ignorance.

This isn't purely a relational problem. noSQL data stores have indexes too, and not having an index on a referencing key that might be needed to check on referential integrity (looking for orphan records) can be a problem there too. We shouldn't change the behaviour of relational stores because some people who use noSQL don't understand good data modelling. Many people using noSQL do understand good data modelling of course, but some use noSQL because they don't want to try understand SQL rather than because it is the wrong tool for the job at hand, and those people probably don't understand noSQL either but get away with not doing so in the short term).

moron4hire · on May 2, 2017

> If the parent entities never have their primary key values changed and are never deleted, then the database engine itself will never use the index to enforce the key.

Enforcing referential integrity is not the only thing we can do with foreign keys. Once you have an FK, you're going to want to query against it. To do that without data races requires a join. The join can be optimized better if the FK is indexed.

Come up with an example of an FK you never want to join on and then maybe we talk about not wanting an FK indexed. I've never seen anyone make a convincing argument that this is even as much as a 1% case. It should be so incredibly, almost inconceivably rare to not want an FK indexed that yes, I'm saying it should go against the traditional grain and just be the default. Traditions are not sacred.

dragonwriter · on May 2, 2017

> Once you have an FK, you're going to want to query against it.

Not necessarily. You are definitely (at a minimum, implicitly for referential integrity) going to want to query from the table using the FK to the table it references, but you may or may not want to query by the FK column.

> Come up with an example of an FK you never want to join on and then maybe we talk about not wanting an FK indexed.

Joining on an FK doesn't require the FK to be indexed, it requires the target to be indexed. You need an index on the FK of the FK is used in a simple equality or inequality filter criteria other than a join to it's target, or if it's used in a join where the other criteria are filtering it's target table rather than the table with the FK. But if you filter the table with the FK and join to it's target, which is a fairly common case, you don't need an FK index.

dspillett · on May 3, 2017

> Once you have an FK, you're going to want to query against it.

Not necessarily, and even where you do there are many cases where an index just on the foreign key is not the best choice because the queries are filtering on other properties too so a wider index is worth defining instead.

> Come up with an example of an FK you never want to join on

Recording tables with many fact dimensions, where you need to enforce a limited range of values in each dimension column but only ever use the data afterwards for aggregating after filtering/grouping by other properties. The storage needed for the unnecessary indexes structures could balloon significantly here. This is most commonly seen in warehousing situations, but the pattern is far from unheard of in OLTP workloads.

> it's nonsensical to have a foreign key that isn't indexed

> and "there is never a scenario where you don't want them

It may be uncommon to prefer not to have an index on a foreign key, but absolute statements like those are simply wrong because there are cases where you don't want (or at least done need) them and there are cases where you want something other than what would be generated automatically.

Useful rules-of-thumb perhaps but only if presented as such ("in general X" or "X is true except when it isn't due to Y or Z") rather than absolutes.

mannykannot · on May 2, 2017

The relational model provides a pretty general and unified way to represent information, together with a cohesive and powerful set of primitives for working with it, yet many systems architects insist on hiding it behind an ad-hoc interface that looks like a throwback to pre-relational days.

jgeraert · on Feb 11, 2017

Great idea. I will start by developing my to-do app...

traviscj · on Feb 11, 2017

Legitimately gone down this path and hack on it to procrastinate. It's now redis-backed, supports dependent todos, and has a decent command-line interface.

No idea how I am still employed.

jgeraert · on Jan 7, 2017

> One thing that absolutely needs library/drivers is persistence of the event store. I say this having built my own as well as contributing to open source ones in the past. It is still a hard and not well solved problem to do this well. Again it can be simple, but in the real world it isn't usually that simple.

Would it also be such a hard thing to do if you can delegate the actual persistence to something like a rdbms? What are the typical pitfalls?

dm3 · on Jan 7, 2017

Having built a few RDBMS-based event stores - it's pretty easy to do. There are two parts that I can think of that are not completely trivial - appending events to the stream while respecting "expected version" modifiers with optimal concurrency and allowing for fast and light subscriptions to new events (e.g. LISTEN/NOTIFY in Postgres, ringbuffer on the client).

The good thing is that you can adapt the event store to your performance requirements and do the simplest thing possible in a huge amount of cases.

jgeraert · on Aug 4, 2015

I tend to abstract away my queries using database views. This way the query in application becomes very simple (select * from view where $simplepredicate).

In my test database (sqllite,hsqldb,derby,...) i create actual tables containing test data with the same definition as the views.

This allows me to have rather complex queries that are database specific and still have fast running database queries.

The views themselves i test separately. They are also easier to fix in production than code releases, and can also be replaced by materialized views if necessary.

Inserts/updates are typically simple too.

jgeraert · on June 8, 2015

Also on windows.

ekianjo · on June 8, 2015

Yeah. Well most Linux software is kind of multiplatform anyway, or at least portable. My reason for citing Linux here more than Windows is that Evince was included by default in many distros.

jgeraert · on Oct 4, 2013

Clever. We did something similar for a friend getting married. Instead of giving his present directly we created a text file encrypted with his public pgp key. We printed out the ascii-armored cryptotext and handed it over. He had lots of fun typing it back into his computer.

jzzskijj · on Oct 4, 2013

I'm sure FBI didn't have as much fun.

Anyway, as he decided to give the SSL key, pulling this kind of prank seems bit childish. On the other hand, he must have been under a heavy pressure, so can't blame the guy for not thinking 100% straight.

Sad, that people are sidetracked to talking about the font size instead of warrantless wiretapping.

cynwoody · on Oct 4, 2013

That's sadistic.

You should have at least hinted at viable OCR solutions.