If your code is littered with DISTINCT clauses either you are doing something wr...

swyman · on Feb 19, 2015

I'm genuinely curious. Would you mind elaborating or pointing me to another resource that goes into detail on why this is the case?

buckbova · on Feb 20, 2015

The poor design part or the performance part?

The use of DISTINCT everywhere smells of denormalized tables. Now if you are in data warehouse or reporting position, then this likely makes sense. I tend to work in transactional applications and keep the redundancy down to an absolute minimum. I abstract away some of the complicated queries with views, procs, and functions where allowable.

As for performance, different rdbms implement this differently, but the general query plan category for GROUP BY and DISTINCT is SORT. If you are querying something where you have multiple subqueries with DISTINCT and you and ordering the final results, you are adding extra sort operations to the query plan, hence hurting performance.

I tend to design everything to optimize read operations because there tends to be much more read than write in systems I work in. For me this means denormalized and heavily indexed where table/index scans are extremely rare. On smaller data sets, some rdbms always scan tables because it saves operations based on table statistics.

I hope this helps.