fine, here is the full query since extrapolating from incomplete data is hard fo...

wefarrell · on Feb 3, 2020

These two queries will always return the same results:

  SELECT * FROM my_table WHERE my_column NOT IN (5, 6)

  SELECT * FROM my_table WHERE my_column NOT IN (5, 6) AND my_column IS NOT NULL

Because "my_column NOT IN (5, 6)" will exclude NULL values.

unnouinceput · on Feb 3, 2020

Right, my bad. Upper parent was about using IN and having return everything "but" 5 and 6. So here the select that will do that:

select mumu from kaka where mumu not in (5, 6) or mumu is null.

That will return everything, null included, except for 5 and 6.

But wait, there is more. If, for example, performance is the main issue here above query is quite slow. Even on an indexed table on column mumu, it will still do a full scan of the table before returning. How to improve performance in this case? Well, you use LEFT join on itself. Implementation is left as exercise for reader :D.

Groxx · on Feb 3, 2020

does postgres really not index nulls in a useful way? mysql does, though it may only work efficiently on a single val-or-null comparison at a time.

unnouinceput · on Feb 3, 2020

nobody does. MySQL, Oracle, MSSQL, you name it. All sux. That's why I prefer to always declare NOT NULL and have a DEFAULT value when I create tables. Treat the default value as NULL and you'll increase performance a lot.

deathanatos · on Feb 4, 2020

PostgreSQL will happily use an index when looking up nulls. Given an index over an integer column, a query for a constant non-null value appears like:

                               QUERY PLAN                              
  ---------------------------------------------------------------------
   Index Scan using foo_b_idx on foo  (cost=0.29..8.30 rows=1 width=8)
     Index Cond: (b = 333)

The exact same query plan is generated for a query looking for nulls:

                               QUERY PLAN                              
  ---------------------------------------------------------------------
   Index Scan using foo_b_idx on foo  (cost=0.29..8.30 rows=1 width=8)
     Index Cond: (b IS NULL)

(That is, the condition on the scan is the only thing that differs.)

(I would heavily suspect that both MSSQL and MySQL have similar behavior here; this is an easy optimization for a query planner.)

Groxx · on Feb 4, 2020

MySQL does too, yes, the comment is clearly incorrect: https://dev.mysql.com/doc/refman/8.0/en/is-null-optimization...

And it has done so for many versions, 8 is just the default redirect I got. The very first sentence on that page is:

>MySQL can perform the same optimization on col_name IS NULL that it can use for col_name = constant_value.

It works effectively everywhere, with the restriction that it can only do one null-optimized comparison per index (e.g. `(a == 1 or a is null) AND (b == 1 or b is null)` will only optimize one piece). Which is a potentially-significant restriction that makes me curious about the internals... but it does exist, and typical for MySQL is clearly documented.

xiphias2 · on Feb 4, 2020

So I guess ,,OR column IS NULL'' would have been the correct answer?

oarabbus_ · on Feb 3, 2020

Pretty sure most DBs do not treat that any differently than

`select * from table where column not in (5,6)`

Can you name one that does?