I do not see how either VARCHAR or CHAR would provide more or less information s...

zzzeek · on Nov 21, 2013

the fact that one sees "CHAR" in the schema definition is self-documenting that this is a fixed length field (and CHAR is "fixed length", the size of the string you get back will always be N), that's what I mean by "semantics". E.g., "what does it mean that the developer used CHAR here and not a VARCHAR?"

The CHECK constraint you illustrated earlier can just as well be placed on a CHAR (using trim() as well to adjust for padding). Then there's no chance of any blank padding issues either.

jhaywood · on Nov 21, 2013

But the semantics of CHAR are not what most people expect and almost never what you actually want. If you want an actual fixed length of non-blank data you need an additional check constraint to enforce the min-length.

CHAR semantically represents fixed length text fields from old data file formats not this data always has n (non-blank) characters.

If you do have different length then a VARCHAR is more appropriate. Also a lot of application frameworks that interact with the database only deal with VARCHAR so then as soon as you use a CHAR you have to start worrying about trimming your text data because one of the biggest database text type gotchas is accidentally trying to compare a VARCHAR and a CHAR improperly.

While I can see good reasons to include length checks there is never a good reason to use a CHAR unless you're trying to interoperate with COBOL programs written in the 80's

zzzeek · on Nov 21, 2013

> But the semantics of CHAR are not what most people expect

how do you know that? did you take a survey? I've been working with DBAs for years, in my world everyone knows that's how CHAR works. The padding behavior is nothing new, and is intuitive - the value must be N characters in length, so if you stored less, you get back a right-padded string. This is exactly what I'd expect.

> CHAR semantically represents fixed length text fields from old data file formats

and two or three letter character codes like country codes, state codes, etc. are what we use CHAR for, these are fixed length text fields. They are still in modern use today. Plus lots of us still have to write apps that actually read old files too - CHAR is appropriate for these as well, assuming you are storing fields that aren't right-padded in the source datafile (such as social security numbers, etc.).

Your app will of course work with a VARCHAR instead, but the point of CHAR is that it's self-documenting as to the type of data to be stored in the field - fixed length, as opposed to variable length.

> If you do have different length then a VARCHAR is more appropriate.

if you are storing variable length, then you should absolutely use VARCHAR. That's why it's called "VAR", it means, "variable".

> Also a lot of application frameworks that interact with the database only deal with VARCHAR so then as soon as you use a CHAR you have to start worrying about trimming your text data

If you are using CHAR correctly, you don't have to trim anything, because you are storing a string that is exactly the length of the CHAR type. I'm not familiar with how an application framework would only know how to deal with VARCHAR and not CHAR, database adapters return strings for both types. And if there were such a framework, I'd not be using it.

> one of the biggest database text type gotchas is accidentally trying to compare a VARCHAR and a CHAR improperly.

all of which stems from the same, singular mistake - don't store variable length data in a CHAR - plus if you are comparing VARCHAR to CHAR, that is also usually doing it wrong, as an adequately normalized database wouldn't be repurposing some kind of fixed length datatype out into a VARCHAR of some kind elsewhere.

The aforementioned CHECK constraint is a good way to enforce that if the developers/frameworks in question tend to be error-prone about this kind of thing (it's not an error I've had much issue with, since I know how CHAR behaves).

> While I can see good reasons to include length checks there is never a good reason to use a CHAR unless you're trying to interoperate with COBOL programs written in the 80's

as it turns out a vast portion of the world economy is supported by mainframes and old software that often spits out fixed length datafiles, there is even Python code in my current work project just written in the past six months which is tasked with parsing such files (they are actually quite simple to parse, since you just pull out each field based on an exact position). Not to mention that boring things like state codes, country codes, and the like are often fixed length fields.

jhaywood · on Nov 21, 2013

I say it's not what people expect because everyone emphasizes the "fixed length" rather than "blank padded" nature of CHAR. CHAR is only actually a fixed length if you actually ensure that it is so yourself. That's possible but then you're just using a CHAR as a placeholder for those semantics not as something that naturally enforces them.

If you actually really really have fixed length fields then yes CHARs could be appropriate. But for many things even though you intend for them to be static length codes things change when you start having to interoperate with systems designed with different constraints. (For example after mergers or aquiring a competitor.) And I know that mainframes still exist but they aren't the use case in mind when many say "USE CHAR".

Also the database adapter that handles CHAR poorly is none other than JDBC on oracle http://stackoverflow.com/questions/5332845/oracle-jdbc-and-o... (Yes that is hilariously bad.) But the mysql way of always ignoring trailing whitespace is not standard in all databases.