Hacker News new | past | comments | ask | show | jobs | submit login
Yagni Exceptions (2021) (lukeplant.me.uk)
290 points by bubblehack3r on Oct 17, 2022 | hide | past | favorite | 207 comments



Rarely do I read something so specific about software development that I agree with 100%.

One thing I would add: soft deletes in relational databases. There's a really good chance that eventually you'll need it for customer support, for debugging, or to fix nasty performance issues caused by cascading deletes.

For some types of businesses, I wonder if "right to be forgotten" should be designed in from the beginning as well. This can be a problem with hard deletes and with soft deletes. With soft deletes, well, it's hard to figure out how to actually delete things if you've been growing a data model with soft deletes for a couple of years. With hard deletes, well, after you apply some hacks to prevent cascading deletes from killing your performance, now you're in the same place. Maybe worse if your foreign keys are no longer an exhaustive guide to relationships. "Right to be forgotten" will be a nightmare if it hasn't been designed in from the start. Obviously not every kind of business will have to worry about this, but I think the ones that do should consider putting some effort into making sure their design supports it.


I'm far more partial to the "deleted" table that simply removes the deleted entries from one table into another.

Slightly more hassle on the recovery side (in the rare event), way less risk on the read side, and covers the 90% use case where most of the time its more useful for things like debugging. Only rarely in my career have I seen soft deleted data get restored, but you don't lose that this way, you simply mitigate against accidentally showing it.

All in all, its a better compromise than soft deletes

EDIT: I'm pretty tempted to just convert this to dumping out to a SQLite 3 database for all soft deletes. I wonder if actual DB separation is better somehow for backup / compliance


> I'm far more partial to the "deleted" table that simply removes the deleted entries from one table into another

Or you could use a system versioned temporal table (or the equivalent in non-MSSQL databases), then the DB does the work for you, and allows recovery of UPDATEs too. Not sure exactly what the performance implications are.


Temporal tables can get pretty bloated when the table sees a lot of updates since they essentially save a copy of the row every time it's modified. You could of course add a reaper system to only keep the last N versions, or only versions less than N days old, etc. Unfortunately IIRC there's no built in support to do that automatically.


I've actually not used in them anger, just looked into them at one point, the only reason we didn't go with them was because we didn't want to tie ourselves overly to SQL server. I'm assuming it can only work sensibly if you have an ID-generation/usage mechanism that means an ID should never be re-used even after records are deleted, which is arguably the most common scenario anyway, but I have worked with at least a few database tables in the past where this wasn't actually the case (IDs were randomly generated but with a sufficiently small range of possible values that there was a reasonable chance of generating the same ID twice within, say, 6 months of heavy usage).


Our current system uses soft deletes, and I wish we had done a "deleted" table. The biggest downside to soft deletes for us is that it ruins the default indexes cause every query has a !IsDeleted where clause added to it.


this works better if you split your whole schema into a physical and logical layer, which is a lot of boilerplate but seems like a pagni as well since physical concerns that shouldn't change logical semantics always eventually creep in. this entails at least a view with instead-of-delete triggers and making all indexes be filtered indexes for all tables so it's a lot of boilerplate.


Wouldn’t partition indexes solve this?


Yes, we use filtered indexes to mitigate it. It's a lot of additional boilerplate though.


uh, that's a really interesting idea that i've never used before -- like a graveyard for data.

have you done it before? how did you implement it?


yes, I have done it, and I've implemented it two ways:

A temporary "marked for deletion" (I guess temporary soft_delete) field was used to mark rows for deletion (we had a user trash type deal) anything older than 30 days just got dump into a new table, and it recorded some simple metadata, but the data itself was dumped as JSONB. The important thing is we always recorded a way to figure out the customer / user without having to dig into the JSON blob, so if they ever actually left the platform we could still delete the data in full quite easily. This was key. It preserved the PK system we had in place for stuff like this.

In another, more naive implementation, we just dumped it out into JSON blobs with special "$metadata" fields describing data types, deletion time, and other associated metadata I can't quite recall. if I recall correctly those were then encrypted and stored elsewhere (I can't recall where exactly, but likely S3 or equivalent). I don't think this was a good idea, but its what was done.


There is at least some mismatch between the "relational databases" and the soft deletes.

While for some fact tables (time series) it is easy to implement the soft deletes, for other dynamic workloads (e.g. when both fact tables and dimension tables are updated) it can become a messy nightmare to keep the referential integrity up-to-date with the dangling soft-deleted records.

Sometimes it will be difficult to decide if the long-ago soft-deleted record should be maintained, if the dimension is maintained.



Relatedly, I've worked with systems that would maintain a "deleted" or "archived" table for every normal table, with an identical schema. I nowadays prefer it over a deleted_at field for many of the same reasons as the article.

I also nowadays don't care much for created_at or updated_at; IMO it's preferable to maintain an actual transaction log (which is something I'd add to the YPGNI list, given how invaluable that tends to be for auditing purposes).


How about using YEGNI for “you’re eventually gonna need it”? Not exactly the same as “probably”, but more pronounceable.


I guess YUGNI (You're Usually Gonna Need It) would be close to the ideal intersection between pronouncability and original meaning.


Thanks for the reminder -- I forgot one of the great benefits of soft deletes that I posted in that conversation. If your customer support can quickly investigate and resolve false reports of "my data disappeared without me deleting it" then they can pass along the rarer more mysterious cases to engineering, a lot of which will be real bugs. With hard deletes, all the reports look the same to customer support, so nothing gets passed on to engineering (or worse, everything does.)


Too late to edit my original comment, but I'm surprised not to see more discussion about "right to be forgotten." It's legally mandated for some types of data in some jurisdictions, and it's hard to implement in a schema that has evolved without it. It doesn't affect the company I work at now, so I haven't had to think much about it, and I was hoping to hear from people who have.


My company doesn't deal with personal information, so ymmv, but the way we deal with soft deletes is deleting the password, setting their name to "Deleted user" and setting the e-mail address of the user to deleted-<random_hash>@ourdomain. That's all personal information, all other information they produced on our platform is property of the company they work for so it's not their or even our call to make if it should be deleted.

Keeping the user record in the database helps bring complexity down tremendously.


Can't agree with you there. I've never seen a situation in which it was necessary to recover "optimistically" soft-deleted data. I have seen multiple situations in which soft-deleted data was accidentally included in live queries.


Not recover, but explain to the users that the data was deleted, and when it was deleted. If you have to provide support to customers, you will get a steady trickle of confused and angry customers wondering why their data disappeared after they (or their coworker, etc.) deleted it. Soft deletes let you answer those inquiries confidently with very little effort. With hard deletes, complaints like that can eat up support time with no satisfactory resolution.

I haven't experienced soft-deleted data being accidentally included, but I think that's because I've worked with systems that included soft deletes from the very start.


Perhaps Tombstones would be a better choice, where ether data is deleted, but a record of when and why is included instead?


Difference being here a tombstone doesn't contain enough information to recreate the record should it be necessary. Guess it's a tradeoff whether you want a graveyard you can use digital necromancy on, or simply an indicator why the data got yeeted.


From a customer support PoV, soft deletes have the wonderful property that you can assuage the upset / irate individual on the other end of the phone that their data can indeed be recovered from whatever mishap they just engineered and do so with a simple query.

Yes - backups can help - but without live backups (WAL shipping or equivalent) you still risk some form of data loss. And if you have those, you just have soft deletes on a system level rather than an in-DB level + a vastly more complex system of partial restores.


Soft deletes can be thought of as a combination of versioning and timestamps. By using the Sixth Normal Form (6NF) with timestamps in the key, you get those for free, but this kind of schema may be a bit too complex for many simpler applications.


Can you point to an example please?


Wikipedia has a couple of non-timeseries examples[1].

I guess with timeseries you'd have something along the lines of:

   user |   time_range |  status 
  ======+==============+=========
      1 | [2018, 2021) | ACTIVE  
      2 | [2019, )     | ACTIVE  
      1 | [2021, )     | BLOCKED 
Where the PK is (user, time_range). (The time_range field being shortened to just year here for simplicity's sake.)

In PostgreSQL you can use the tstzrange[2] type. You could also only store one of the timestamps, but that would come at the cost of more complex queries.

[1]: https://en.wikipedia.org/wiki/Sixth_normal_form#Examples

[2]: https://www.postgresql.org/docs/current/rangetypes.html


For compliance reasons, you can always hard delete data. It's also not a terrible idea to hard delete data that has been soft deleted x time ago.


This is very nearly off topic, but "Ain't" is the correct conjugation to pair with "Gonna". This form comes from Britain and is preserved in some dialects of American English, but "Aren't Gonna" is a partial hypercorrection; if we want acrolect, we must go all the way to "Aren't Going To".

"You gonna be at the thing this weekend?"

"Nah man I aren't gonna" <-- this is wrong

For the Commonweath, to whom this is no longer part of the language, "You Aren't Going to Need It" is fine, no need to include the T.

Edit: I was drawing further attention to the actual grammatical distinction between "ain't" and "aren't" by showing that "ain't" conjugates the first person and "aren't" doesn't.

It might help to know that "going to" is usually pronounced "gonna" unless it ends a clause, while "gonna" is more normally realized in speech as "goan".

Even more: if you want to reify a partial hypercorrection, go ahead, I'm a descriptive linguist, say the 't' in often while you're at it.

One of my grandfathers would have said something sounding like "yain go need it", the other "you aren't goin a need it".

The latter I'm sure never said the word "ain't" in his life. Nor would he have ever spelled "going to" as "gonna", no matter what it sounds like when spoken.


> "Nah man I aren't gonna" <-- this is wrong

Well, yes, because the first person singular of to be is am, not are. It works fine with I am or you are.

> I'm not gonna do that <--- looks good to me

> You aren't gonna need that <--- also looks good to me

'You aren't gonna need it' looks weird because it deviates from the stock phrase YAGNI, but it's not wrong.


> if we want acrolect, we must go all the way to "Aren't Going To".

but "gonna" is a contraction of "going to"

"I'm gonna go to the thing" is short for "I'm going to go to the thing".

> "Nah man I aren't gonna" <-- this is wrong

but "Nah man, I'm not gonna go" is fine - the problem is not with "gonna" here, it's with "I aren't" (which as others have said, is incorrect, and should be "I am not" rather than "I are not").

I don't see a problem with "You Aren't Gonna Need It" or even "You Ain't Gonna Need It"


This is flat-out wrong pseudogrammatical criticism. I say "aren't gonna" all the time. "Gonna" is a very common spoken contraction, used much more broadly than "ain't" in AmE.


We're not gonna need it!

No, we ain't gonna need it!

We're not gonna need it anymore!


This is a really weirdly specific set of prescriptivism that is easily countered by plenty of native English speakers who absolutely do use "aren't" and "gonna" together (myself, for example).


“You aren't gonna need it” is ordinary colloquial British English.

“I aren't…” is not.


You're not gonna need it

I'm not gonna need it

You aren't gonna need it

I amn't gonna need it

You ain't gonna need it

I ain't gonna need it


She's not gonna need it

They're not gonna need it

She isn't gonna need it

They aren't gonna need it

She ain't gonna need it

They ain't gonna need it

"Ain't" is wonderfully versatile.


You'ren't gonna need it.


yagni, yagnas, yagnat...


I used to argue for “amn’t” as in “I amn’t going to need it” but none of my elementary school teachers were persuadable.


Fully in-use in Scotland and Ireland


Shouldn't the "ain't" form require double negation, as in "you ain't gonna need no exceptions"?

Then YAGNI would need to be transformed to something like YAGNNOI, "you ain't gonna need none of it".

(An offtopic technicality comment being at the top on HN always amuses me.)


> Shouldn't the "ain't" form require double negation, as in "you ain't gonna need no exceptions"?

Only in some dialects.


And in some of those dialects, "no" expands to "none of them there doggone"


> This form comes from Britain

All forms of English, ultimately, come from Britain.


You goan dun bikesheddin'


* Having all user-facing strings in a common place.

It doesn't take much effort, but makes it so much easier in the future when you suddenly either need internationalization/translation (both tech side making the switch, but also gathering all the strings to send to some translator), or it's requested that these be managed by a cms of some sort instead of hardcoded all over the place.


Sure, but what if you never need to internationalize? Introducing a level of indirection have a cost. This is exactly why YAGNI applies.

Refactor the code to support a requirement when the need arises.


Exactly.

Better imo: Don't use strings but a proper datastructure/wrapper. Later, if you want to add i18n, then change `Wrapper("Hello, welcome to the future")` into `Wrapper("Hello, welcome to the future", "welcome-user")` and just look for all wrappers to find all places.

Benefits: it's easy to find the code with a simple ctrl+f from the English version of the site and you don't have any disadvantage except for having the English translation (or whatever primary language you choose) inside the code and the other translations in mapping somewhere else.


This is still not YAGNI enough for me. Why introduce a wrapper you don't need?

The thinking behind anti-YAGNI "future proofing" seem to be that it is cheaper to change something now than in the future. But I reject this premise. Inserting `Wrapper($string)` is exactly as much work now as in the future, but it will add additional overhead to all development going forward.

Perhaps the wrapper is even less work to introduce in the future, since you can add it with a simple search-replace, instead of inserting it manually every time you type a string.


I don't think so. That's just a semantic description of what you are doing. Call it `UserMessage("....")` or whatever. If you go without that and have simply plaintext for any kind of thing, you might at some point mix up passwords with user messages and stuff like that. Has nothing to do with future-proofing.

You need user-messages? Then create user-messages and not strings. You don't need translations? Then don't create them.

Or in other words: make things a simple as you can but not simpler than they inherential are.

> Perhaps the wrapper is even less work to introduce in the future, since you can add it with a simple search-replace, instead of inserting it manually every time you type a string.

No you cannot, unless literally all strings in your program must be translated.


Lets say I write a HTML page.

   <h1>Hello world</h1>
Is less work and complexity than:

   <h1>{{ UserMessage("Hello world") }}</h1>
Introducing this abstraction before it is needed is a waste of resources and opportunity cost. Lets say you have the first solution, and someone decides "world" should be a link. Easy, you insert an <a>-tag. In the second example such a change would be much more complicated. You have suddenly made your work much more difficult for no benefit at all.

Sure, if you discover at some point in the future that you actually need translations, then you have to work through all text and decide which strings need to be translatable. Yes, that is work, but you have done that anyway! There is no extra work. And every text which have been deleted between the abstraction was introduced and the translation became necessary, would have had an extra cost which is wasted anyway.


Oh, if you write HTML then you are already using semantic wrappers. I.e. you already used <h1> which clearly indicates your intend.


Agreed, lots of things never need to be internationalised. Pretty much everything in institutional finance will never need it.


Have you done that refactor? There is no timeline you can do once people are just injecting static text everywhere in the app.

That project is 'done' when the interval between bug reports increases past some number where people decide it's done. But if you tell your boss "next month" then you're going to be proven wrong.


Are you asking if I have ever added internationalization/translation to an app which initially didnt have it? Yes I have done that.

It sounds like you have had a bad experience?


I once worked on a large app that had always tried to support translation, but that had never actually been used by clients (after more than a decade running in production) in a language other than English. Well, what do you know, after all those years, finally a big client signed on, and this client needed the whole UI not in English.

Trouble was, in practice there were gaps in the translation support all over the place, and so several rounds of patching those gaps and performing manual testing were needed, which took a lot of time and manpower. So, was it worth half-maintaining the translation support for so many years? I guess it would have been even more work had it needed to be added from scratch; but it was still a lot of work cleaning it up.

Ironically, the client in question then cancelled on us, shortly before going live, and after we'd finished all that work. Although another non-English client did go live with us the year after that, so it wasn't wasted effort in the end.


There's hard coding strings, and then there's hard-coding sentence fragments.

If you hard code strings, you can't translate at all. If you hard-code sentence structure ("There are " + n + " copies of this book in stock") then you're going to sound like Yoda in a number of languages.

Adjectives are a particularly bad sticking point. Is it a large red book (English), a large book that is red (French), or a book that is large and red? But Verb-Object-Noun order also vary just within first world nations, so even the lame classist/racist/nationalistic excuse of "We don't need their money anyway" doesn't really fly. One of your competitors will be very happy to take Japan's money.


I think this is one of the biggest problems with violating YAGNI. They're effectively dead code paths that are never truly being executed, and developers might assume they aren't full of bugs. But they oftentimes are.


Not too ironic I think, more common than not.

I wish I had a box of nickels for every feature that cost $100k+ to implement that ended up being a dead end (i.e., sales benefit never materialized) or, even more commonly, the benefit materialized and was less than $100k and resulted in a feature that required continued maintenance and ongoing support cost forever.


I haven't worked with a lot of i18n systems, but on the contrary I have found them to be a huge amount of work.

I have used the Angular one and the Laravel one, and both are a chore to use, because they interrupt your flow. If I'm writing the view of my component, I don't want to have to switch to another file, think of a key that identifies that text well enough and then translate the text, thinking of potential plurals and others. I just want to write my damn text.

I'd rather spend days doing the mind-numbing work when needed rather than slow down my development process and remove all enjoyment from it ad vitam aeternam because of i18n-ing on the fly.


These patterns and practicee generally aren't intend for _you_ as the developer of the app right now.

They are intended for the poor shmuck 18 months later who has to work with a tangled mess of copy text stored in JS, server side yaml files and a CMS.

I've been that person at least 3-4 time now. I've waded through changes where having static strings got the UI would have simplified things immensely, and _none_ of these have had anything to do with internationalisation.

They have been far more prosiac (Legal request, product changes, rebranding, just general things like "where did this text under interest rate calculator come") but have still taken significantly longer than they should have.

When you are writing these apps it can feel like it's not that hard to go back later and pull out strings, just some grunt work.

But once you munge your code through three refactors and a framework change it becomes so much harder.


I might have something for you that reduces context switching. I am working on an IDE extension that shows the content of a localized message inline. A screenshot can be found here https://github.com/inlang/inlang#vs-code-extension.


On the other side we have the individuals inheriting such codebases and being told to add multilingualism when a bunch of other soft dependencies and assumptions exist which make this entire thing a disaster to implement.

It is fairly small and trivial to add so please, if you foresee an event where systems have to support multiple languages and odds are you won't work on it yourself, prepare the damn codebase for it.


> I have used the Angular one and the Laravel one, and both are a chore to use, because they interrupt your flow. If I'm writing the view of my component, I don't want to have to switch to another file, think of a key that identifies that text well enough and then translate the text, thinking of potential plurals and others. I just want to write my damn text.

In django (python), the key is just the English text (with optional extra context), and if it doesn't find a translation for the target language it dumps that text back out. You don't need to go anywhere else until actually doing the translating.


It does the same in one of the two that I mentioned (Angular I believe), but you still need to mark plurals and other stuff like that.

Here is an example about a basic case that you will encounter multiple times per page:

  <span i18n="Pretty update timestamp|Update timestamp prettyfied accompanied by user gender@@prettyUpdateTimestamp">Updated: {minutes, plural,
    =0 {just now}
    =1 {one minute ago}
    other {{{minutes}} minutes ago by {gender, select, male {male} female {female} other {other}}}}
  </span>


Does this work well?

What happens when you need to make a change in the english version? Do you just search/replace through all the translations?

Does django do anything to make this easier, or does it work with any i18n library?


There's also "translate to English", so you can instead treat it directly as a key and use that, or treat the English text as a key and change/add to the file if it's just something like a typo.

Don't recall about the second, we only had translations on one site and it's been a few years.


You don't need to bake in a complete i18n solution, though. Just create a simple map of keys->strings, and then a function ala

    getStringForKey(key, language) {
        return map[key]
and then whenever you actually need it, just expand on it. But at least you then have all strings in one place, and a common way of accessing them. (And easy lookup of all places using these strings)


Your function doesn't solve any of the issues I mentioned, you still need to switch to another file, still need to think of a sufficiently explicit key, still need to handle plurals and masculine / feminine variations, etc.


YAGNI ;) You can skip all those things the first time. You don't need to have a separate file or anything. Just have a stupid function returning its input or whatever. The important part is having an entry-point, and an easy way to find all relevant uses in the future. You could then later, when i18n is needed, write a script finding those usages and automatically extract them. That's what I've done once before. And it was a thousand times easier than the time before where we had to hunt through all of frontend, backend, cms etc. to find stuff.


> The important part is having an entry-point, and an easy way to find all relevant uses in the future.

You already have it, it's everything that's in between {{ }} and > </.


Do you want to translate numbers between ><? What about alt-texts not in those tags? Things passed as props multiple layers? What about strings coming from backend? Do you read some things from the environment?


Agreed. I don’t really know what problem these i18n tools solve that this doesn’t.


But if you end up in a situation where i18n tools are needed, this function can be the single entry point anyways, abstracting most of it away.


True, although at least on mobile I think this is a button press in the IDE. Maybe Javascript/non-mobile native apps have it harder.


Yes, I can think of Android Studio which pushes you into using Android's string resources[0], which have out-of-the-box support for i18n and are essentially big XML files.

[0]: https://developer.android.com/guide/topics/resources/string-...


I haven't worked in Android, but a quick perusal at least passes my giggle test, which frankly too many languages do not.

This is essentially a refinement of the i18n support that has been in Java for ages. I probably would not have stuck with Java as long as I did if their localization support wasn't as good as it was (not to say it's perfect, because it's a bit clunky). Google has made a few of the examples into a more concrete requirement, which is nice.


I would go further with your idea. You should have all I/O separated and in on place. UI is just a special case of I/O for apps.

In services you also want to keep all RPC handlers in on place (or any interface with which others interact with your binary). You also want to keep any interfaces with which you make IPC calls each separate and in one place as well. I use IPC (inter process communication) as this includes service clients, database clients, disk access, peripheral access, etc as well.


> I'm essentially a believer in You Aren't Gonna Need It — the principle that you should add features to your software — including generality and abstraction

YAGNI is not a principle. It is a contextual thumb rule. A codification of expert intuition. Exceptions to thumb rules are quite the norm. Conflating thumb rules with principles is a sign of sloppy thinking. Often engineers will misuse terminology thinking that it "doesn't matter". But it does. Think twice before claiming that something is a principle. The words you use highly influence your thought process[0].

[0]: https://en.wikipedia.org/wiki/Linguistic_relativity


> Conflating thumb rules with principles is a sign of sloppy thinking

Multiple dictionaries and thesauruses would disagree with you there. Rule of thumb is often defined in terms of "rules, procedures, principles, or... ...derived from..."

Some sources say that "rule of thumb is a principle or procedure...""

So, you know, going straight to "sloppy thinking" reminds me of OldManYellsAtCloud.jpg.

Anyway, I'm really not sure what difference you perceive - principles are often derived from experience, as well as theory, and they often have exceptions too.

Many people have a principle of not committing violence, for example. _Except_ when (multiple clauses follow).


That they include "rules" and "procedures" indicates those definitions are based around the "what" rather than the "why". A rule of thumb is something passed down, while a principle is something you come to from experience. A rule of thumb likely started as a principle from someone else.

Rules and procedures are a further watering down of the original idea, where even the rule-of-thumb's justification isn't paid attention to or even has been lost over time.


Your definitions are very interesting there mate. In some contexts, yep, what's often called a rule of thumb (for example, sparkies use a multitude of <X>-hand rules[0][1] and call them rule of thumb, because you know, there's a thumb involved) are passed down.

But people are entirely able to derive their own rules of thumbs, if we hark to the common definition that a rule of thumb is a principle/procedure/process derived from practical experience.

For example, I rapidly developed a rule of thumb when dating post-divorce that any person who said "I hate drama" in their dating profile is in actual fact the source of any drama, but it took me a couple of disastrous dating attempts to develop that rule of thumb.

[0]: https://en.wikipedia.org/wiki/Right-hand_rule#Electromagneti...

[1]: https://en.wikipedia.org/wiki/Fleming%27s_left-hand_rule_for...


Pragmatism vs Dogmatism. For example DRY vs YAGNI. The first is more dogmatic: refactor everything that can be refactored, while the second is more pragmatic. The refactor is probably not necessary and might turn out not to be necessary. An even more pragmatic rule is the You Aint Gonna Need It Yet.


> Pragmatism vs Dogmatism. For example DRY vs YAGNI.

DRY often has exceptions to the "rule"/"principle" too though. And people often blog on these exceptions.

And pragmatism and dogmatism aren't a xor, they're just convenient labels for the -X and the +X ends of the axis.

(Admittedly, the fact that I've seen more "It's okay to have exceptions to DRY" blog posts than "It's okay to have exceptions to YAGNI" indicates you're right about their relative weighting on that axis.)


That is because DRY sets a puzzle for us, how to cleanly reuse some code. YAGNI on the other hand, denies us a puzzle. We like puzzles.


I like your thinking on this


Your semantical argument of "principle" vs "contextual rule of thumb" is incorrect. They mean the same thing in almost every context, according to any dictionary/thesaurus you find out there.

In fact, the ultimate defining characteristic of a "principle" is that it has exceptions and should not be used as the "ultimate word of God".

Even in Physics, where most would think a "principle" means something "without exception", they are only ever currently without exception according to current evidence. There has been countless times where there has been a scientific principle only to be disproven later on.

Lastly, you link Sapir–Whorf, which, quite ironically, is considered a "principle" yet has had much criticism over time, which contradicts your own argument.

However, I will agree with you that one should always be careful of the language they use since it can affect how others view your thoughts, feelings, and intentions.


> The words you use highly influence your thought process

Given that the article is literally about exceptions to YAGNI, and explicitly calls out that there are probably more, their use of the term "principle" doesn't appear to have caused them any harm.


You say

> The words you use highly influence your thought process[0].

Yet in the very article you link it says (emphasis mine)

> The strong version, or linguistic determinism, says that language determines thought and that linguistic categories limit and determine cognitive categories. This version is generally agreed to be false by modern linguists.[3] > The weak version says that linguistic categories and usage only influence thought and decisions.[4] Research on weaker forms has produced positive empirical evidence for a relationship.[3]

Your use of 'highly' would suggest a strong relativity, but that's discredited. Signed, a disgruntled linguist who's tired of people banging on about Sapir-Whorf.


So what would be an example of a software design principle?


Never write a function called destroyBaghdad(). Instead name it destroyCity and pass the target as a parameter.


If Baghdad is city your users destroy most often, I would allow a destroyBaghdad() function that calls destroyCity() with all parameters set correctly.


This is why the US destroyed Baghdad.

The contract is called CongressAuthorizesInvadingCountry(country :Country): Invasion

Congress made a thunk with Iraq in it, so despite the lack of any real connection to Recent Events, the White House called it in 2003.


Ugh.

Try writing destroyKiev instead and see what happens.


Last time I tried, I got a 404 Competent Military Not Found error. :/


My point was that neither routune should ever be written and the Middle East would've been much better off if the world had started an all-out economic war against the US in 2003.

In case you are wondering, I'm Russian, have friends in the Ukraine and destroyed Kiev is the last thing I want.

Jokes about destroying Baghdad are just callous. Especially coming from Americans.


Good thing I'm not American then.

(And the joke I'm referencing is making fun of software ethics, so you're in violent agreement.)


> This can apply to protocols, APIs, file formats etc. It is good to think about how, for example, a client/server system will detect and respond to different versions ahead of time (i.e. even when there is only one version),

A blogpost/article about YAGNI manages to suggest a rather nuanced feature should always be included, that isn't needed, right up front.


Indeed. A rule of thumb is something you're supposed to use, at most, in the absence of any better information. I've written abstractions that didn't end up needed, sure. I've also written abstractions that I've been extremely glad for in hindsight.


Regarding the point about having a relational database:

I was of the same opinion, but recently it has been challenged. We were working on a very simple application and one of the first requirements was:

> User should be able to have a list of skills (e.g., Golang, Java, OOP, etc.). Users can be filtered by list of skills as well (e.g., "give me all the users with the skills "Java" and "OOP" but not ".net")

So, the non-relation model fits perfectly (so, we ended up using MongoDB and the `skill` attribute of "User" is just an array). I know it's possible to use, let's say, MySQL and build a couple of tables to achieve the same, but it just "didn't feel right" (e.g., we cannot filter anymore by querying only one table... and if that requirement is needed, we would need to build a view. But the view needs to be updated regulary, and it just feels like yet another stone in the road of achieving our requirements. The document model, on the other hand, felt just right)


let's say, MySQL

There's your mistake. Using Postgres with an Array column you can just search for records where the column contains a value if you define it like "character varying(100)[]", eg

    SELECT * FROM users WHERE skills && '{"databases", "sql"}';


This doesn't look that hard to model relationally:

    select distinct user_id
    from user_skills
    where skill_id in @1
    except
    select user_id
    from user_skills
    where skill_id in @2
The biggest hurdle would probably be the lack of support in your database driver for passing collection-like parameters.


YMMV but with the introduction of JSON/JSONB columns in Postgres, this becomes very easy to achieve without introducing multiple databases. In my experience, at some point relational data will be required in most projects, even if it was not a requirement in the first iterations. Obviously, constraints may apply that make NoSQL a sensible choice in some scenarios.


> In my experience, at some point relational data will be required in most projects

I've noticed this too.

My leading hypothesis is that projects start with simple data storage requirements which don't require much more than "flat file database + JSON", i.e. NoSQL, document database like Mongo etc. The fast and loose nature of these systems is nice to have at the start of a project too, compared to the work needed do SQL.

As the project evolves requirements start to appear which require combining data from 1 or more of these "flat files" or document collections. Then you discover that that your data is relational after all and these new "simple" queries you want are hard to do with your chosen (NoSQL) database.

In short: As an application evolves and its data grows, it tends to become more relational.


Am I missing something, or could you have achieved the same thing with a relational database using joins? Or even an array-typed column?


That's what I wrote yes. I could also have achieved the same using the file system and plain files, or using a graph database. My point is/was: the document model seemed to us the model that fit the best for our problem: the `skills` attribute is an unbounded list (well, in practice the list was bounded to have at most 50 items) that can be filtered by. So in MongoDB, that's plain simple to implement. In MySQL, though, as you said, you have to come up with two tables (one for `user`, one for `skills`, make sure FK are in place, and then use joins for filtering... doesn't seem to me that this model fits better)


> In MySQL, though, as you said, you have to come up with two tables (one for `user`, one for `skills`, make sure FK are in place, and then use joins for filtering... doesn't seem to me that this model fits better)

Their other option was an array-typed column. As per the article, Postgres' JSONB column type will give you that, as one example.

But the skills table - and many to many linking table - shines as soon as you want to treat skills as first-class concepts in some way, e.g. "only let people pick from this list of skills the skills administrator has approved" or "count how many people have the skill 'Java'". Then tables seem much more natural, as the skills table can model them as concepts in their own right.


The data model fits better. The data is relational and fits neatly into a simple relational table of user_id, skill_id. I think perhaps you just don't perhaps like SQL syntax which is why a join feels weird to you, but what you've described is a 4 or 5 line sql query which anyone can understand. Additionally, you can now easily answer questions like. "How many skills does the average user have?" "Who has the most skills?" "What are the most popular skills?" etc, all likewise in trivially simple sql expressions.


I know (except I didn't know I didn't like SQL syntax). That's why I'm saying tables are a good idea.


Additionally, FK constraints are a good thing, they mean that one of your users doesn't have the "Python" skill while another one has the "Pytthon" skill. You need this - and with SQL it's extremely easy to implement.


Are you replying to the comment you mean to be replying to? :)


Yes, the one in which you considered FK constraints to be a waste of time


Can you quote where I said that?


In a docucument database you model this as a document with a field which is an array. In relational database, you model this as two tables.

In both cases, you simply model according to the philosophy of the database. It is incorrect to think either is more "natural" in some intrinsic sense.


I interpreted your comment as you claiming a view would have been needed.

The list of things required in a relational database does not seem long to me. I could probably come up with an even longer list for what's required when using Mongo.

Also, my point about an array typed column still stands


Did you consider using the MySQL JSON_CONTAINS() function? https://dev.mysql.com/doc/refman/8.0/en/json-search-function...


> we cannot filter anymore by querying only one table

I'd not want to work in a team where that is actually considered an issue.


Serious question: why is "having an extra table" considered such a challenge? I've seen this in a few places and it always raises alarm bells in my head.


Relational databases are one of those technologies where enough people can skate by enough of the time with just some very basic skills. And the skill curve in relational databases isn't offset by experience in other domains.

There's a number of technologies like this. They grow an air of mystery and impenetrability. Often unjustifiably so; many people would succeed just fine if they would go in unassuming and expecting to learn. It's why my interns can master things in a summer that some of my full-time people have been half-assing for years.


Off topic, but I think skills are not a binary thing. They have levels. Also skills might be transferable: fe programming languages, if you know language X, language Y is just a small step away (not talking about the eco-system here), and skills might be dependent: (How can you have the skill "java" and not have some level of "OOP" ?) I have yet to see an ontology that fits reality here :(


Oh!... Having "Java" but no clue of "OOP"? Seen it any number of times. Devs who've migrated from a C or Cobol background with little/no/hopelessly inadequate OO training in particular. The code is predictably horrible.


Multiple tables are the way to go with SQL, you'll need multiple tables all over the place. So if multiple tables give you a “wrong” feeling, I argue that the feeling is wrong for the technology.

It would be very normal to have tables as follows:

* Table "users" with an "id" column and some additional data about the user.

* Table "skills" with an "id" column and some additional data about the skill.

* Table "user_skill" with columns "user_id" and "skill_id" (and the corresponding foreign key constraints).


I have also done this many times but I think GP has a point. This model needs more code, and a hassle when you're just trying to get something working.


Its a wetware thing. Use relational enough and this isn’t a hassle at all. Like static types are not a hassle if you are used to them. Or docker once used to it. Etc.


> This model needs more code

That's not my conclusion at all. This model often requires less code. And less hard-coded data your application has to share around.


YAGNI is about avoiding premature optimisation.

A lot of these make sense. But they always do. And the road to shipping hell is paved in good architectural practices.

To ship a first version of a product, we always need to cut corners. Deep, horrible, painful, cuts. Because if we spent the time to make the perfect product, we'd launch too late.

A lot of these YAGNI things are not "you're never going to need it", but just "can we ship the first version without it?".


The hard lesson I learned during the dotcom era and immediately after is that this cult like worship of version 1 ends up sinking many companies but since it's new nobody wants to hear, it doesn't get reported well.

Any asshole can ship version 1. Shipping version 2 takes some talent.

In particular, to your scenario, there's a reason why companies operate in 'stealth' mode. The moment they are on the public radar, now all of the time frames are based on customer interest and customer complaints. We have to keep some tempo of releases to build customer confidence. You can't launch the MVP and then immediately stop all work to address the tech debt you acquired. So we're not talking about 'after MVP' versus 'before MVP', we are talking 'this year' versus 'next year or the year after'. That's a long time to creak along with very bad initial architecture decisions.

The value of a coach in sports or other activities is that this is a person who is not bogged down by the minutiae of the performance. They can see when you're stuck and tell you to stop. Very, very few developers have the ability to self-coach. It's one of the purposes of the morning standup: the hope that you hear yourself saying you've been wrapped around the axle on something and need a change of perspective. But the "stuckness" I'm talking about here has to do with Sharpening Your Axe. You're in a bad way because your 'tools' are not 'sharp', but you've already panicked about deadlines so you keep muscling through instead of realizing that the fastest way to complete tasks is to make the tasks easier and then do the task. That's the sort of tech debt that gets devs angry and resentful of people who tell them 'no'.


Totally agree that version 2 is really painful, and more painful the more we cut corners on version 1.

But I've seen sooo many startups ship version 1 and sink without trace. I'd say the vast majority of startups don't ever need to worry about version 2 because version 1 isn't going to get any traction.

The thing about YAGNI isn't technical - it's commercial. Ship version 1 as cheaply and quickly as possible so you understand whether there's any point in continuing.


If adding these things really slows your ability to ship the first version of your product, you need to fire your team and start with a new one. These PAGNIs are:

1. log.info() calls 2. apt-get install postgresql 3. Using timestamp and datetime.now() instead of boolean fields 4. Adding a /v1/ into your API endpoints

I'm being serious. If your team says "this will add a week onto the release date" then your team is either very junior or bullshitting you.


Sure, but it escalates quickly. Logging goes from `log.Println("the thing happened")` to structured logging via an API very easily.

Adding Postgres is a no-brainer. Totally agree. But should you architect the database properly? It's going to be tricky to normalise data later, maybe we should start with properly normalised tables now, instead of jamming everything into a jsonb field?

Versioning, sure, but does every API need a version? If so, shouldn't we write some standard code for versioning all the things? Should the CICD process version the system automatically? Talking of which, we should set up the CICD. There's a rabbit hole.

These things are all good, definitely worthwhile, but where do you draw the line? If we spend a couple of days doing X, then it becomes worthwhile spending a couple of days doing X+1, and so on. The author drew the line here. But that's not the only place you can draw the line. You can draw it a few paces back too.

There's always a temptation as a techie to add a couple of days now to save a couple of weeks or months later. That may not be the right commercial decision.


The detail you're going into are the YAGNI part. Yes, you'll get those eventually, but you deal with _those_ problems later. Log to a file right now and switch to DataDog, Papertrail, etc. later on. Add /v1/ now, worry about integrating CI/CD into that later, when breaking changes means you actually need to worry about using versioning.

I'd think it's implied that you just normalise the tables. It's really not hard to build and use a normalised database. I was doing this in my first months as a junior developer, so I'd expect an engineer with any seniority to be able to do it.

If you're asking: "should we model the domain perfectly?" then my answer is a simple "model the domain as you understand it _today_." YAGNI is not "write shitty code" but "don't add things you don't need right now."

Again, if adding /v1/ into your first URL string or configuring your logging takes "a couple days" each, then I'm going to have serious concerns.


If it is just API endpoints, but I can see some junior dev spending time putting version numbers on literally every data structure, taking the exception as a rule.

And when logs go from "huh, if this happens I want to know about" type thoughts as you are coding to "lets make a full pass and see where I could have added some logs to this PR" then effort is being spent too far in advance.

I definitely agree timestamps, two vs many, and don't reinvent the database (or don't assume your db is flat) are effort free and always worth it.

With the caveat that you maybe can't afford the overhead of a full rel db, and have to use something like an ECS for performance reasons. This is pretty common for embedded or game domains.


> I can see some junior dev spending time putting version numbers on literally every data structure

That's a strawman argument.

> And when logs go from "huh, if this happens I want to know about" type thoughts as you are coding to "lets make a full pass and see where I could have added some logs to this PR" then effort is being spent too far in advance.

That's a strawman argument.

> have to use something like an ECS for performance reasons. This is pretty common for embedded or game domains.

If your domain requires something else, you probably already know it.


YAGNI is about much more than avoiding premature optimization. It's about avoiding premature work, of any kind. Optimization, sure, but also implementation and even design.

I know that design sounds like the kind of work you don't want to avoid. But designing, say, a spiffy database query cache so that you don't hammer the database with repeated queries, before you know that you need such a thing? Don't bother designing it. YAGNI. (I mean, you might go so far as to think "If we need it, we'll put it here". But an actual, detailed, implementable design? No. It's likely to be never needed, and is therefore a waste of your time.)


> It's about avoiding premature work, of any kind.

It's even worse than that. Features have complexity, and they introduce constraints. Without active effort to fight entropy, each new feature costs a bit more than the last.

Putting what-if delegation in, before the Rule of 3 tells you how to best accomplish it, slows down everyone working in that part of the code, especially new people and folks trying to debug. You can organize code so it's hostile to delegation, or amenable to delegation. It's not free but it's relatively cheap, if your team is on-board. If not you're going to have a lot of teachable moments.

I don't build what-if features into my code much anymore. What I build in instead is potential. If I wanted to change this code, here is where I would do it. But not yet.


I agree that we have to be very vigilant against the what ifs, but with the exception of logging, which is the one I most disagree with (unless it is helping you debug a failed test), these all seem like cases where the future thinking thing takes precisely as much time as the ship-it-now thing.

The versioning thing could get really out of hand too probably now that I think about it.

But timestamps, many vs two, and the "uh this is starting to sound like a database, lets use one" are all effort neutral.


I irritate myself constantly by realizing that this bug fix I'm working on would have been easier if I'd written some better tests first. Or noticed the order of a couple of log messages sooner.


Should maybe add security and privacy to that list, in this day and age. Not that it all needs to be implemented right away (depending on jurisdiction you're operating under) but having a plan for how to solve security and privacy considerations and working with that in mind from the start can make it a much less painful experience in the long run.


You are correct. How would you distill this in to a handful of elements akin this submission?


I probably wouldn't, since it is very use case specific what concerns are relevant. So more a suggestion to get an overview of the security requirements and privacy requirements one needs to deal with at some point and sketch some possible ways to make those requirements easy to solve when the time comes.

Examples of things to consider: zero trust, multi tenancy, permission structures, user data classification (for GDPR removal/extraction requests).

As a European, GDPR has far reaching consequences that may even dictate what other services you rely on. I.e. can you use that SaaS service for your product when it's located outside of the EU/EEC?


YAGNI as a whole, I've found, is an oversimplified hammer used to justify bad behavior and cutting corners. The engineers that quickly and happily throw around "YAGNI" have never had to deal with a Sev1 outage whereby you're hemorrhaging money - often because you don't exactly know where your system is failing because you don't exactly have good logging/tracing/observability in place.

EDIT: this is not to say it doesn't have utility. As a guiding thought it most certainly does. But it's also easily abused.


I experienced a case of zero one many that would have been better if YAGNI was applied.

We were building a messaging system for a system that was being rewritten, where the only requirement was direct messaging between 2 users. It was suggested to generalize it to support multi-user conversations since that was a product desire some time ago. This complicated the backend implementation significantly, including performance and maintainability. The APIs and the frontend only supported 2 user messaging still.

And the system never needed multiuser chat.


Isn't this, technically, a case of "one", since it's one user messaging one user?


Yes, you are right. I was just mentioning the name of the principle as used in the article.


> More generally, instead of a boolean flag, e.g. completed, a nullable timestamp of when the state was entered, completed_at, can be much more useful.

In my experience, the timestamps should almost always be entered in addition to the flag.

    select * from data_journal
    where status = 'Loading'
is much easier to write and understand than

    select * from data_journal
    where started_loading_at is not null
    and finished_loading_at is null


This sounds like you're opening yourself up to situations where you have rows in data_journal that have status='Loading' but which have started_loading_at=NULL, in violation of your data model.

Your premise is slightly flawed in that your second example isn't actually difficult to write or understand as you claim. However, it could be argued that if the logic for selecting "loading" rows is repeated in multiple places then a layer of abstraction over it would be useful. This could be achieved in SQL by e.g.

    CREATE VIEW loading_entries AS
    SELECT * FROM data_journal 
    WHERE started_loading_at IS NOT NULL AND finished_loading_at IS NULL
or, if you have several such statuses you need to define,

    CREATE VIEW data_journal_view AS
    SELECT
    data_journal.*,
    CASE
      WHEN started_loading_at IS NOT NULL AND 
    finished_loading_at IS NULL THEN 'Loading'
      WHEN ... THEN ...
      ELSE 'Some other status'
    END AS status
    FROM data_journal


that have status='Loading' but which have started_loading_at=NULL, in violation of your data model

Why not CHECK() it then? Or make ‘status’ GENERATED STORED or do a similar thing on triggers. The way that you suggested is also good, but it creates two names, one for update, another for select, which may confuse orms or developers.


The iron law of data is that there should be one source of truth.

It’s always better to enforce constraints statically (in this case by the schema) than dynamically at runtime. Because avoiding inconsistencies is the human’s job, the computer sure as hell won’t know what to do about it.


Not sure I understand a context for this, since checks and triggers are as static as views in a schema sense, and all of them are dynamic in a sense of computation (though get updated at different times).

If you mean that columns/fields should not share parts of the same “fact” even if one of them is computed or both are constrained accordingly, then I disagree.


Another benefit of having a separate status field is that you can index it easily. You could technically index the timestamps, but it's a lot of busywork for the RDBMS, as the values are all distinct.


This is orthogonal, because expressions can be indexed too. Of course if you see the same expression three+ times in your code, it’s begging for a name.


Feels like maybe a view on top your table that adds the boolean based on the timestamp might be safer, in that by definition there's no risk of the two columns getting out of sync?


If you have to answer: "does it usually take so long or is something broken?" The Timestamps would be handy. Oh and you can estimate an average for the execution time; investigate performance degradation or improvements


Yes, that's why I've written it's better to have both.


Not just easier to write but also easier to index and easier for the query engine to optimize the query plan.


> Versioning APIs

I hear this recommended quite frequently, but I don't see it practiced much, nor do I really understand what is being recommended. No matter how you wrote your first API you can always introduce a /v2/ later, and no matter how good you get at versioning it's still much worse than maintaining a well-designed backwards-compatible API. If anyone has recommended reading on this I'd love to check it out.


YAGNI is not an excuse to take lots of shortcuts against known good implementation patterns. Second guessing years of people doing things right by naively assuming you won't need something is more likely to be a mistake than it is not. Do it properly the first time. Actively creating technical debt against your better judgment is silly.


Abstraction is critical, but it is also very hard.

Unneeded and wrong abstraction creates more technical debt than no abstraction. A simpler design is easier to extend, and worst case rewrite, than a more complex, abstract design.

If you know by experience, personal or otherwise, that a design or abstraction is sound, and you know you are very likely to need it in the future, go for it. Otherwise, YAGNI.


> YAGNI is not an excuse to take lots of shortcuts against known good implementation patterns.

I’d say it is, because “best practices” and “patterns” are usually context dependent and will have a negative cost when applied when not needed.


I'm glad to see exceptions being discussed. I had a coworker who applied an almost religious adherence to YAGNI. We were building the next iteration of our build system and were applying the lessons of problems we had in the previous iteration and a coworker maneuvered management to take over the project and threw everything out because YAGNI. He wanted to start from scratch and get direct proof of problems for everything we designed into the system, throwing out all of the years of experience in supporting our build system and the limitations it had.


Cool list!

For Django developers I've compiled a list of Django guidelines that also contains a lot of yagni exceptions especially for Django: https://www.spapas.net/2022/09/28/django-guidelines/


Clicked on the link expecting to see the usual beginners advices, but I'm plaisantly surprised that I would give most of the same recommendations to my team.


Thank you for you kind words!


I will play devil advocate

Relational databases add too much time overhead due to building schema or configuring orm for schema

Migrations management

Building an db model to domain model mapper and maintaining it

You waste time thinking about building an db model and focusing on that technical layer which probably eventually affecta the way you model your system

Nosql gives you modeling freedom which is handy for architects


Using a relational DB doesn't necessitate the use of an ORM. This is another case of YAGNI. I rarely reach for an ORM until there's significant enough complexity to warrant it, and even then it's not a decision made lightly.

Designing and maintaining a schema is not too arduous. There are tools which can produce a diff between the current and desired schema for your database, for instance Migra for Postgres (https://github.com/djrobstep/migra).


Yet more counterpoints, I find that more often than not, as a project matures the more it ends up implementing its own half baked ORM internally and much time will often be saved just using an ORM from the start. It’s harder to restructure all your data to fit the ORM you eventually need to manage X developers working on the application than it is to just pick and use an ORM from the start.


I didnt say it is

I said building schema or configuring an orm which will create schema.


The effort to make a key-value store on top of a relational database is negligible.

The effort to make a relational database on top of a key-value store which is not already a relational database is great.


Relational databases add too much time overhead due to building schema or configuring orm for schema

That sounds like the same sort of argument as "I'm not going to write tests because they slow me down!"


But that statement is true for some development and business models. No automated testing is like mastrubation - no one wants to admit, but many do it.

https://insights.stackoverflow.com/survey/2019 (search for Unit Tests, #-linking doesn’t work correctly on this site, perhaps because they don’t test it)

Also tap on a Job Satisfaction button to see how it correlates.


Even if it sounds, then what?

Are you trying to say that NoSQL based systems are less reliable?


Nope. I'm saying that if your argument is "I'm choosing this tech because I'm slow at this other tech" then it's very likely you're making a poor choice. There's nothing inherently slow about designing a relational database. The work required takes time, but the same is true if you're designing something for a schemaless database. You still have a schema, it's just defined in your application code instead of the database layer. That needs thought, and therefore time.

Working with a schemaless 'nosql' database is only faster if you're skipping the part where you think about how you store and access your data. That makes your system less reliable.


>I'm saying that if your argument is "I'm choosing this tech because I'm slow at this other tech" then it's very likely you're making a poor choice.

We do it all the time as the industry.

We do web dev in langs like c# java instead of c cpp

Even despite the fact that cpp is capable of returning html text just fine.

>Working with a schemaless 'nosql' database is only faster if you're skipping the part where you think about how you store and access your data. That makes your system less reliable.

How so?

At worst performance may be not optimal

The question whether youll get to this point is hard to answer

How about the using nosql by default to enable rapid development and sql when theres a need for it?


How about the using nosql by default to enable rapid development and sql when theres a need for it?

My point is that the development speed for nosql should be roughly the same as for relational databases, because you should be spending the same amount of time thinking about your data. The database you use is almost incidental. Whether you choose to create a users table in Postgres and make sure the data is valid there or whether you choose to create a JSON object in MongoDB and make sure the data is valid in the application logic doesn't matter - the hard, and the bit you should be spending time thinking about what does 'correct' mean for this set of data.

If you're creating a MongoDB table with a user JSON object, and then you just let different services manipulate that JSON however they want then your data will screw up in the future. There needs to be a robust contract between the app code and the data store that defines what the data is allowed to look like. That's where the time gets spent, whether it's in the database admin or in the application layer.

I would argue it's a bit easier to do that in a relational database but that's mostly down to experience.


>If you're creating a MongoDB table with a user JSON object, and then you just let different services manipulate that JSON however they want then your data will screw up in the future. There needs to be a robust contract between the app code and the data store that defines what the data is allowed to look like. That's where the time gets spent, whether it's in the database admin or in the application layer.

If other app touches my database in write mode and modifies the structure by mistake, then it is relatively good scenerio because the problem is easy to notice In compare to messing with data e.g account balance += 200

And relational db without some hardcore constrains / triggers do not check it

I'd still rather have that code in app

>My point is that the development speed for nosql should be roughly the same as for relational databases, because you should be spending the same amount of time thinking about your data. The database you use is almost incidental. Whether you choose to create a users table in Postgres and make sure the data is valid there or whether you choose to create a JSON object in MongoDB and make sure the data is valid in the application logic doesn't matter - the hard, and the bit you should be spending time thinking about what does 'correct' mean for this set of data.

Thats in theory

In practice it always took me more time to setup postgre mssql and initial system model

Than when using mongo

Ive always used relational dbs but Ive started wondering whether I should start using nosqls in real world scenerios


Nobody says they are not reliable. We just say that by not using RDBMS, your application is now responsible of your data cohesiveness.

Good luck when you discover that your production database is full of incoherent data because a bug in your application generated unusable data objects (missing fields, wrong structure …) that your NoSQL was happy to store without a problem.

In a relational database, you are 100% guaranteed that your data fits the schema. It’s not just tables and columns names. It’s uniqueness checks, it’s constraints, it’s what to do with your data when a related data is modified or deleted.


Database integrity checks are only small part of integrity checks

You still must have those in code to operate on valid business object

I'd say that world has partly moved from programming in database and does it in code which can be easily covered with tests


Djangos migration system is so good that migrations are basically a non-issue. Having worked with it a while it's really weird that more frameworks are still so radically behind.


You might want to take a look at EdgeDB. You still have to define a schema, but both its data model and query language have much smaller mismatch with popular programming languages and APIs, so you don't need a complex ORM. For example it has first class support for following links (instead of awkwardly joining on foreign keys), outputting nested data and polymorphism.


I have to disagree with the first point in the article. The “zero, one, many” rule, or the related “rule of three” isn’t about how many items you have in your list, it’s about code reuse. The conclusion is correct though, that you should default to one:many relationships unless you know they are strictly one:one


I don't know that I completely follow on this comment, in particular how code reuse comes into it.

I think the biggest thing is about deciding whether you have one or many.

There are a lot of cases where having many doesn't just mean there is a list to show.

In a lot of cases, having many means you now need to choose "the right one" to show/edit/action whatever.

This brings up all sorts of complications beyond "just return a list instead of a single item".

Even if you don't end up building it out, I think it always a good thought experiment during the design phase to think about how things would behave if there where multiple of key resources, rather than just one.

Would it have implications elsewhere? Are there low cost alterations to the design that can be made now that make space / allow for this key resources to be a list later on? How hard would it these be to do later?

I'm many cases the conclusion might be it isn't worth it and that's the right choice (yagni applied judiciously), but every now and again this might highlight a valuable early stage change that would have cost a lot to make later on.


I took this as “if you need more than one, plan for storing ANYTHING greater 1 or greater”

Meaning that if I need to store specifically 2 addresses per user, don’t force it at the data layer… just make it an easy to swap validation and no literal limits elsewhere


For display/view purposes, yes you need to decide what the likely option is. For model/type purposes, the decision to have a 1:1 relationship should be made only with great care, as the OP shows with the example of addresses.


And I thought it was going to be about how you shouldn't use exceptions unless you really need them...


Kudos to the comments on databases. It is very hard to justify ripping out Mongo after the fact even if relational is better. Part of the problem is opportunity cost (“We need the money for innovation and new features”) and part is having to fess up to the wrong call to begin with.


"By this I mean, if you need a database at all, you should jump to having a relational one straight away, and default to a relational schema, even if your earliest set of requirements could be served by a “document database” or some basic flat-file system. Most data is relational by nature, and a non-relational database is a very bad default for almost all applications."

That is terrible advice.

Relational databases are heavy weight solutions, expensive and slow.

Make a data interface from the start, yes. But start with backing with a flat file and exhaustive search. Simple, cheap, and more efficient than relational databases, indexing, sorting, and/or binary searching until you have a lot of (for some definition of "a lot of") data.


sqlite is lightweight, cheap, easy and faster than fopen :)

https://www.sqlite.org/fasterthanfs.html


Adding dependencies to solve simple problems is foolish.

I have seen many projects get bogged down in complications around interfaces to third party software. Where the 3rd party software solved some simple problem that could easily be solved in a few dozen lines of code.

Databases are one of the main offenders. But command line processing is probably the worst offender I see.

Simple problems should use simple problems. Most data storage problems are simple. How much code is in sqlite? I do not know. But opening a flat file, gulping in the contents, linearly searching for the blob of data you want is simple, often fast enough, can be implemented in the time it takes to find, download, install,... some package from a third party.

As I said: make an interface (probably three lines of code) and put the simplest thing possible behind it. (That simplest thing will not be sqlite). If requirements expand, put something more capable behind the interface.

Another thing:

Most data is not relational. I have seen systems that put apache log files into SQL databases. I imagine there might be sensible uses for that, but I cannot think of any.

Golly, KISS is not some random acronym written by fools. It is a fundamental principal of software design

"Go straight to relational database" is not a sensible principal of anything. It is bad advice.


I agree will all of this, but it's very much because my tooling make it easy to do so.

In some other stack, setting up good logging can be annoying and I understand you wanna take the shortcut.


Bad title. I read it as "you ain't gonna need exceptions" but the author intended "exceptions for the YAGNI-thumbrule".


It's only bad if you read "YAGN Exceptions"


But this would be very good advice! Exceptions (a.k.a. modern-flavored COMEFROM statements[0]) are extremely confusing and have no place in a clean codebase.

[0] https://en.wikipedia.org/wiki/COMEFROM


Exceptions do not specify from where exactly they should “come from” and are two-way protocol akin to setjmp/longjmp. Situations where exit through few levels of stack is required do happen regardless of code cleanness, and the only alternative is to pair every call with a flow control statement, turn primary return value into a status, and add two out-arguments for error and result. Some people love this, some not really.


Exceptions are the worst, except for any other way of handling errors.


I agree, but your sentence is empty: errors do not exist, they are just conditions that you dislike. Do not let your emotions affect the syntax for control flow!


Great, I will tell that to my customers!


Fair enough.


And I read it a third way: Exceptions you ain't gonna need.

Any more offers?


I had "YAGNI as exceptions" - you start building out the extra functionality, but drop a NotImplementedException at each entrypoint for the extra functionality in the code.


FWIW I read it as the latter.


Yagni is for justifying throwing crap at a wall until it sticks, not purposefully creating usable or maintainable software.


“Good logging” is mentioned here. Can somebody recommend a good write-up of what good logging consists of?


I've wanted to do that for a while. Look into structured logging. I want/need machines to analyze my logs, and that requires key value pairs. I put any dynamic text in its own field. "error for user 42, remote api timed out" -> level: error, userid: 42, message: api timeout, http_request: <curl call to reproduce>, timestamp: <curr_time>, time_duration_ms: 30000, ... (but as json).

Now I can see what users are affected by which errors however often. Alerts are now trivial to implement. They key is being able to debug an issue after the fact. I like to include a copy-pastable curl call when I can so I can manually reproduce an issue easily as an example


This is great. We use Sentry for our projects, which serves the same purpose.

I was wondering if using info statements in certain places are seen as a good practice, for example.


The curl request log is a very interesting idea I have never seen before. Do you have a library recommendation or did you implement something yourself?


Pretty bad advice.

The good thing about YAGNI is that you can treat it as dogma and you'll be fine. Sometimes you'll be wrong, but it's always going to be easy to recover from mistakes. Doesn't matter how much experience you have, YAGNI is always a net positive.

Author's recommendations on the other hand require a lot of caution. Take logging for example. It's quite a slippery slope: how much logging is enough?

Just a few INFO statements here and there? Or multiple levels? Centralized logging? Structured logging? Alerts? What about retention? GDPR?

Good logging requires thinking.

You can always add more logging when necessary. But you cannot unadd the logging noise you accidentally introduced (well technically you can, but noone does it).

Zero-one-many: if 1% of your users needs two addresses, then 0.01% will need more than two. You can just say no to those. You avoided one level of abstraction by slightly upsetting 0.01% of your users. In most cases that's a good thing: unlike MAU, complexity piles up and grows non-linearly.

Versioning: in most cases if you don't control both ends of the API, versioning is a requirement, YAGNI doesn't apply here. And when you do, versioning is a huge process change that rarely pays off.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: