It's not so much that SparkSQL doesn't support gz as that gz is slow because you can't parallelize the reads. Regardless, parquet format in hdfs so yarn can allocate containers local to the chunk to be processed. Scales nicely.
Not really, you said SparkSQL doesn't support gz, which is incorrect and the thrust of my comment. The anecdote about parquet is orthogonal to gz support.
pedantic sidebar: hdfs isn't a file format, it's a distributed file system layered over a traditional on-disk filesystem. For example you might have: json logs, in a gz-formatted file, tracked in the hdfs filesystem, stored on disk in an ext4-formatted filesystem.