One of the reasons you are seeing that the subquery runs twice as fast on most of the databases ( like SQL server, oracle & MySQL), is that it is able to execute the query with one pass through of the data. When it is taking twice as long (without the subquery), it is doing a second pass for the count distinct.
Crazy to see those numbers on old MSSQL...there's still a whole bunch of stuff they're behind the curve on (json support / array support), but its tough to argue with those benchmarks...
Did you use default postgres configuration? By default postgres uses very small amount of RAM and is forced to store hash maps on disk -> very significant slowdown.