*> but it's the twin meanings of NULL that really kills me* NULL only has one me...

crazygringo · on Aug 24, 2022

> NULL only has one meaning: NULL. This is roughly analogous to unknown.

Which is what I'm arguing against, because it's used in two unrelated ways in SQL -- as a data value, and to express no matching row found in a JOIN. So no matter how it's defined formally, in practice it has two meanings that have nothing whatsoever to do with each other. One is a value and can be stored, the other says 'not found' and results only from expressions and isn't for storage.

> but you can infer the difference

Yes, as I said there are lots of workarounds. But they're still workarounds.

> This is one of the reasons some purists argue against NULL existing in SQL at all

Sure, but the reality is that NULL is used in a practical sense to mean "no data entered". It's so ridiculously common that only some columns have no data for a particular row, and so e.g. you need the 'time_finished' column to either have a valid date, or it's NULL if the activity hasn't finished yet. The alternative is to have an additional boolean column 'is_finished' and for 'time_finished' to be arbitrarily '1970-01-01T00:00:00Z' whenever is_finished is false, which is clunky and redundant.

Purists can argue what they want, but NULL is so ridiculously useful as a stored data value it doesn't really matter.

jon_richards · on Aug 24, 2022

> you need the 'time_finished' column to either have a valid date, or it's NULL if the activity hasn't finished yet

Rust’s Diesel has an interesting way of dealing with this. “None” means missing while “Some(None)” means the value null. So when updating a record, “None” makes no change while “Some(None)” sets the value to null.

This distinction is obviously lost when retrieving from the database, but it’s an interesting concept.

yen223 · on Aug 24, 2022

Javascript having both `null` and `undefined` doesn't seem so crazy now

beefjerkins · on Aug 24, 2022

But in classic Javascript style, it's half-assed. Being able to assign anything to undefined makes the distinction between null and undefined pointless.

SoftTalker · on Aug 24, 2022

If you have a business need to represent "empty" or "n/a" or "declined to answer" or something like that, use a specific value for that. NULL does not mean anything. Or, it means nothing. It's just NULL. Once I got that into my head, SQL became less frustrating.

crazygringo · on Aug 24, 2022

You can't without defining yet another field, usually. It's really annoying to double the number of columns so that [time_started, time_finished] becomes [time_started_exists, time_started, time_finished_exists, time_finished].

NULL values makes business logic far more compact and intuitive. For enumerated values in fields it's easy enough to define another value in the same field to mean 'unknown' or 'not entered', but you can't do that for strings, numbers, datetimes, etc. -- you have to throw in a bunch of unwieldy additional boolean fields instead.

Quekid5 · on Aug 24, 2022

The underlying problem here is that SQL lacks Sum Types (aka Tagged Unions). Such types solve all these problems effortlessly.

In contrast to what SQL has, Sum Types combined with Product Types (which is basically what a row is) are actually a universal way to model all possible data[0]. (Of course you may want syntax sugar, etc. on top of that, but Sums and Products at the bottom is sufficient.)

[0] I'm actually not sure if I should qualify that -- I believe Sums+Products can actually model anything, assuming you allow recursive type definitions -- which might be hard to make perform well. Storing a linked list in a database field, e.g. might not be the best idea.

Koshkin · on Aug 24, 2022

Indeed, the issue of non-existence is a somewhat tricky philosophical question (e.g. there are many ways something may not exist).

andy81 · on Aug 24, 2022

A better option in many cases is to check the primary key.

e.g.

select questions.Id , questions.Text , answers.Id , answers.Text

from questions left join answers on answers.QuestionId = Questions.Id

answers.Id is non-nullable as a primary key, so if answers.Text is null but answers.Id is not null they've declined to answer.

dspillett · on Aug 24, 2022

That may be implementation or circumstance specific. For instance in MS SQL Server with a heap table (one without a clustered index) or a table where the primary key is not the clustering key, it will result in extra page reads to check the other field's value (the query planner / engine could infer from it being the PK that it can never be null, so the lookup to check is unnecessary, but IIRC it does not do this). As the columns used in the join predicate have to be read to perform the join, no extra reads will result from using them for other filtering.

In your example it is very likely that the primary key is the clustering key, so will be present in the non-clustered index that I assume will be on answers.questionId, making my point moot, but if for some unusual reason neither Id nor questionId were the clustering key checking Id may result in extra reads being needed.

In DBMSs without clustering keys implemented similarly to SQL Server, there may be such concerns in all cases.

onlyrealcuzzo · on Aug 24, 2022

> This is one of the reasons some purists argue against NULL existing in SQL at all, rather than needing a family of NULL-a-likes.

So then are there no optional values? What happens when OUTER JOINs don't match?

dspillett · on Aug 24, 2022

I assume NULL would still exist there, but there would be no explicit NULL values permitted in tuples (rows) that actually exist.

Or perhaps the purists would remove outer joins too, it isn't since University that I've read around the discussion, but given alternate syntax to do the same thing can sometimes be convoluted that might be a bad idea itself.

ErwinSmout · on Aug 27, 2022

In the relational model, no there is no such thing as "optional values".

Even Codd often hinted (quite a bit of material in the 1990 book gives such hints) that "if a row has a null somewhere, it SHOULDN'T BE CONSIDERED AS BELONGING TO THE RELATION". E.g. if an attribute of a FK is null, no FK checking should be done at all on account of that null. The FK checking should be done only when the FK value is "complete".

As for outer join, one problem with it is that its result is (intended to be) the extension of a disjuctive predicate, the disjuncts in question being exactly the predicate for the "matching" case and the other one the predicate for the "no match" case. Now if you see a row in the result, how can you tell from that row which of the two disjuncts made that row appear in the result ?

Outer join is problematic by definition precisely because of the possibly ambiguous interpretation that derives from its disjunctive predicate.

(Would you do that with base tables ??? I mean design a base table that can hold a row for strictly more than one possible reason ??? I mean where the user then sees a row in that table and subsequently cannot tell whether that row is there for reason/meaning 1 or for reason/meaning 2 ??? I'll tell what you -and anyone else- would do in such a case : you'd also add the indicator telling whether it's meaning 1 or 2 (and then the problem of 'what if it's both' is also solved). Now think back about what could be done in outer join to resolve that very same kind of disjunctive ambiguity.)

vbezhenar · on Aug 24, 2022

You don’t do outer joins.

Koshkin · on Aug 24, 2022

Don’t these exist for a reason?

ErwinSmout · on Aug 27, 2022

Yes. The reason is lazy people expect to be able to ask five questions in one go and expect one single answer to answer all five of them.

The key is to realize that submitting a query is to ask a question. So how many queries to you think you have to issue if the number of questions you have is five ?

It is technically perfectly possible to "ask five questions at the same time" to any given truly relational DBMS. Said DBMS will give five answers at the same time (and not leave you to guess which answer corresponds to which question). That SQL doesn't do it, does not mean it's impossible.

andy81 · on Aug 24, 2022

Of course.

You've got a list of countries and are pulling each country's national flower, national bird, largest port city etc.

Without outer joins, Liechtenstein with no ports doesn't show in the list at all. Sad news for people who want to know all countries, or Liechtenstein's national bird (eagle).

vbezhenar · on Aug 24, 2022

You’re not obliged to pull everything in one request. You can issue several requests.

tremon · on Aug 24, 2022

Sure. You're also not obligated to include a WHERE-clause in the query you send to your database. You can do the filtering in the application.

vbezhenar · on Aug 24, 2022

What's your point? You can live without nulls, they're not required and you don't need to avoid WHERE-clause for that. It's a theoretical concept, nobody in sane mind would do that, but nulls are not required for relational algebra.

tremon · on Aug 25, 2022

You started this subthread by asserting that one shouldn't use outer joins, and when other people pointed out valid use cases for outer joins, you advocated to use application logic to work around your refusal to use them.

My response was just to point out the absurdity of your premise.

ErwinSmout · on Aug 27, 2022

"NULL only has one meaning: NULL. This is roughly analogous to unknown."

From the re-published version of "Much ado about nothing, part 2" (re-published in "Database Dreaming, Vol II) :

"An outer join produces nulls because we ask the DBMS to show us data from one table even when no matching data can be found in another table. Such nulls have no meaning, no semantic content."

Personally I'd add to that : and even if it's the case that they do, that meaning is patently not "simply unknown" but rather a very clear indication of non-existence.

But I have no doubt the apologists will obviously always apologize no matter what and handwave arguments such as these away with "So what ? What's the difference." Well, it's the difference between knowing of non-existence and not knowing at all.

ErwinSmout · on Aug 27, 2022

"The one that his a lot of people is WHERE <value> NOT IN (<set>) where <set> contains a NULL. Because NOT IN unrolls to “<value> <> <s1> AND <value> <> <s2> AND … AND <value> <> <sx>” any NULL values in the set makes one predicate NULL which makes the whole expression NULL even if one or more of the other values match."

Sorry, but this is FALSE.

"Even if one of the other values match" will make that particular INequality test FALSE and one FALSE conjunct will make the entire conjunction FALSE, even in 3VL.

chasil · on Aug 24, 2022

NULL itself is open to interpretation.

If I CREATE TABLE foo (bar CHAR(1) PRIMARY KEY, baz char(1) UNIQUE), then different things happen on different databases.

In Microsoft SQL server, only one insert of a null into the baz column is allowed, and the null value is indexed.

In Oracle, null is never indexed in this context, so any number of null insertions into baz are allowed.

On a composite index, I believe that nulls are always indexed, on any database (they must be).

As far as I understand it, this is implementation-defined:

  SQL> select * from dual where null=null;

  no rows selected

dspillett · on Aug 25, 2022

WHERE NULL=NULL should never return true, so no rows returned is correct. It is only implementation defined in that some implementations get this wrong!¹³

SQL Server's handling of NULLs with in unique indexes/constraints is not standard. You can get better using a filtered unique index (“CREATE UNIQUE INDEX more_compliant_unique_index ON dbo.some_table(this_column) WHERE this_column IS NOT NULL;”), but be aware that filtered indexes have their own collection of gotchas⁴.

----

[1] MS Access IIRC, MS SQL Server if the ANSI_NULLS option set to off, MS² SSIS's default (consider NULLS equal) behaviour for merge steps, and so on.

[2] You may be noticing a pattern here… MS are not the only culprit though.

[3] I'm making a distinction here between places where behaviour is not defined in the standards, so implementations are free to go their own way, and where implementations go against what is expected by the standards.

[4] particularly with respect to the collection of unfixed/wontfix gnarly edge cases that is the MERGE statement.