Hacker News new | past | comments | ask | show | jobs | submit login
Time for a WTF MySQL Moment (gbl08ma.com)
327 points by gbl08ma on Oct 6, 2020 | hide | past | favorite | 114 comments



I'm a big fan of Postgres too for a number of reasons, but this issue is pretty clearly documented so I'd like to counter with an issue I hit in Postgres recently that is terribly documented. UNNEST works a bit funky, and in particular it works super funky if you have multiple calls in the same select statement (or any set expanded function calls it turns out). There's a bit of a dive into here[1] (though that is out of date - PG10 no longer follows the different array sized result, it uses null filling) which I managed to find after struggling with an issue where an experimental query was resulting in nulls in the output while unnesting arrays without nulls.

All DBs have their warts and while MySQL has an over abundance of warts they tend to be quite well documented. The warts that postgres has tend to be quite buried and their documentation is very good for syntax comprehension but rather light when it comes to deeper learning.

1. https://stackoverflow.com/questions/50364475/how-to-force-po...


I am also a big fan of Postgres, and tend to have a bit of fun picking on MySQL having been scarred by it in a past life. But since we're picking at Postgres warts...

One which bit me recently, and is still utterly baffling to me, is that a column defined as an array type will accept values of that array's type in any number of dimensions greater than that specified for the column. In other words, `{{{{{{text}}}}}}` can be inserted into columns of the following types:

- `TEXT[]`

- `TEXT[][]`

- `TEXT[][][]`

- `TEXT[][][][]`

- `TEXT[][][][][]`

- `TEXT[][][][][][]`

The inverse is true as well! A column specified `TEXT[][]` (and so on) will accept `{text}`. Of course, none of this (as far as I've been able to find) is documented.

But wait, there's more!

`UNNEST` does not allow you to specify depth, it always unnests to the deepest dimension. This, too, is undocumented. In fact, it's anti-documented. The documents provide an example function to unnest a two-dimensional array that is wholly unnecessary (and likely performs worse than the built-in `UNNEST`, but I'm just guessing). Said documentation would seem to imply that the depth of `UNNEST` is 1, but of course that's not the case.

But wait, there's more still!

What if you want to get at a nested array? Idk, I'm sure it's possible, but if you thought `SELECT that_array[1]` is the way to do it, look under your seat because you're getting a `NULL`!

- - -

Postscript: I discovered the first part of this in a production system where a migration had incorrectly nested some data, and where that data was in turn causing certain requests to unexpectedly fail. Of course, given that this was in production, I didn't have a lot of time to research the issue. Found the problem, fixed it, moved on with my day. In the course of fixing it, I discovered the `UNNEST` issue, which... okay fun, fix it a slightly different way than I expected.

So in the course of verifying the particulars to write this comment, I played around with some things, and discovered the `NULL` issue.

At least when Postgres has wildly unexpected behavior, it's exceptionally unexpected behavior.


All of the array types are basically the same. The docs actually do mention this, but only in passing as a current limitation.

https://www.postgresql.org/docs/13/arrays.html

> The current implementation does not enforce the declared number of dimensions either. Arrays of a particular element type are all considered to be of the same type, regardless of size or number of dimensions. So, declaring the array size or number of dimensions in CREATE TABLE is simply documentation; it does not affect run-time behavior.

Similar text extends at least all the way back to the documentation for 7.1.


The other key to understanding this is that multidimensional arrays in Postgres are not simply arrays of arrays like typical programming languages. That’s why you can’t do things like array_agg on an array column.


Thanks! I don't know why I wasn't able to find it. Still, I would never have expected any of the behavior I described.


I always internalized postgres's array behavior based on the examples, which include multidimensional arrays on text[] columns. Also, common functions like array_length require an index argument--"which array within the possibly multiple arrays do you want the length of" etc.


Also while we're picking on other DBs, another fun WTF I've encountered (this time in an external system): SQL Server stores timestamps to ~1/300th of a second resolution. This[1] StackOverflow question describes different behavior than I saw (it rounded differently), so apparently it's not even consistent. I'd assume across versions? IDK, never had time to look too deeply into this one either.

[1]: https://stackoverflow.com/questions/715432/why-is-sql-server...


Note that only happens using the DATETIME data type.

As long as I remember, the documentation for DATETIME [0] has had a disclaimer at the top to not use it, but use DATETIME2 instead.

Of course, that doesn't excuse older systems, but this is an issue that can be avoided for new work.

[0] https://docs.microsoft.com/en-us/sql/t-sql/data-types/dateti...


And datetime2 was introduced in mssql 2008… So 12 years ago.

Of course when maintaining the rust crate for mssql, implementing support for these old types was necessary and I had some wtf moments. Another interesting one is the smalldatetime, and the biggest annoyance the division to varchar and nvarchar -- the n-variants of cource using ucs2 to store the data.

This only changed in the 2019 version, where one can use varchar to store utf8 data with a special collation.

Still prefer mssql over mysql, any day.


Given the time of that question, my guess would be they are using floating point underneath, and have worked out the math so you get approximately that much precision. There's some weirdness in this between 32 bit and 64 bit, based on how the calculation is done. See [1].

I had a mysql instance once where I migrated from a 32 bit system to a 64 bit system, and one of the main tables' primary key was a float that was a high resolution timestamp. Unfortunately, on inserting the old data into the new 64 bit system with a MySQL compiled for 64 bit, the values started having lots of collisions on the primary key, where two previously distinct high-resolution float values now rounded to the same float in 64 bit. Counterintuitive, and a nightmare.

1: https://stackoverflow.com/questions/31415712/floating-point-...


I got bitten by this recently maintaining a legacy system. The same .NET DateTime object is stored in a DATETIME field and also embedded in an XML document in the same database record.

A comparison was done between the value stored in the XML and in the DATETIME field, but even using the same DateTime object there is sometimes a 1 to 3 millisecond difference between the two.


> I'm a big fan of Postgres too for a number of reasons, but this issue is pretty clearly documented

Of course it is, the documentation is where TFAA got the information in the 4th paragraph of the story, out of 15 or so.

The range itself is what nerd-sniped the author and led them to try and find out why mysql had such an odd yet specific range.


> I hit in Postgres recently that is terribly documented.

I'm going to have to disagree with you there. This issue is quite well documented in the "SQL Functions Returning Sets" section [0]. The relevant bit starts thusly:

> ...Set-returning functions can be nested in a select list, although that is not allowed in FROM-clause items. In such cases, each level of nesting is treated separately, as though it were a separate LATERAL ROWS FROM( ... ) item...

And there's even a note about the crazy behavior pre-PostgreSQL 10:

> Before PostgreSQL 10, putting more than one set-returning function in the same select list did not behave very sensibly unless they always produced equal numbers of rows. Otherwise, what you got was a number of output rows equal to the least common multiple of the numbers of rows produced by the set-returning functions. Also, nested set-returning functions did not work as described above; instead, a set-returning function could have at most one set-returning argument, and each nest of set-returning functions was run independently. Also, conditional execution (set-returning functions inside CASE etc) was previously allowed, complicating things even more. Use of the LATERAL syntax is recommended when writing queries that need to work in older PostgreSQL versions, because that will give consistent results across different versions.

I agree that allowing SRFs in the SELECT clause is a wart that should never have been permitted, but I think the PostgreSQL docs do a pretty great job describing both the old behavior and the new behavior that has to balance backwards compatibility with sensibility.

(And, indeed, the 9.6 docs have this to say on the behavior of SRFs in the SELECT list: "The key problem with using set-returning functions in the select list, rather than the FROM clause, is that putting more than one set-returning function in the same select list does not behave very sensibly.")

I do think one notable defect with the PostgreSQL docs is that they were designed in a time before modern search engines. They are better understood as a written manual in electronic form. Almost always the information you need is there, but possibly not in the chapter that Google will surface. But there are all sorts of tricks you can use if you update your mental model of how to read the PostgreSQL docs. For example, there's an old-style index! [1]

[0]: https://www.postgresql.org/docs/current/xfunc-sql.html#XFUNC... [1]: https://www.postgresql.org/docs/current/bookindex.html


> I do think one notable defect with the PostgreSQL docs is that they were designed in a time before modern search engines. They are better understood as a written manual in electronic form. Almost always the information you need is there, but possibly not in the chapter that Google will surface. But there are all sorts of tricks you can use if you update your mental model of how to read the PostgreSQL docs. For example, there's an old-style index!

I think of the Postgres docs as significantly better than most other documentation for this reason. The information is there and it's organized in a way that makes sense.


I hadn't seen the index - that's neat.

In terms of the documentation quality I do agree that it's documented but not obviously, you mentioned that it's in the "SQL Functions Returning Sets" section however this section isn't indicated to by the Array functions and operators page - additionally while, if you know that you're dealing with an issue related to SRFs it's pretty easy to find - attempting to get at that information via google isn't going to get you anything unless you specifically hone in on set returning functions. I ended up finding that doc page after finding a SO answer that mentioned SRFs while searching for "postgres unnest cartesian product".

The information is in the documentation, but the documentation isn't always super good at linking to other relevant portions of the documentation and, honestly, reading the documentation about the feature you need is reasonable, but I don't expect most people are reading the full postgres docs before starting to play around with it. So I don't disagree that the information is there, but I do think it is mostly inaccessible due to the structure of the documentation.


> This format is even less wieldy than the current one, requiring multiplication and division to do basically anything with it, except string formatting and parsing – once again showing that MySQL places too much value on string IO and not so much on having types that are convenient for internal operations and non-string-based protocols.

Not necessarily an odd choice in the Olden Days, after all BCD representation used to be pretty popular. By modern standards it's insane, but at a time where binary to decimal conversions could be a serious performance concern it might have made sense. For instance if you had a date in "hours, minutes, seconds" and wanted to add or subtract one of these TIME values, you could do it without a single multiply or divide.

Now I was 8 when MySQL first released in 1995, so I can't really comment on whether that choice really made sense back then. 1995 does seem a bit late for BCD shenanigans, but maybe they based their design on existing applications and de-facto standards that could easily go back to the 80's.


I'll comment: it absolutely didn't make sense back then, either. We didn't use BCD for pretty much anything in '95. If anything, all timestamps were 32 bit signed ints.

Edit: plenty of things still stored dates as strings where the emphasis of the app was on displaying information. Int and float types carried the day whenever any kind of math was going to be used, or when you wanted to output the data in multiple formats.


BCD was outdated when I got my first Amstrad CPC (1984). The assembly language textbooks all said “this is a weird holdover from the 8080, don’t bother with it”.


BCD is still in use in many financial applications, where it's the usual way to handle decimal fractions which can't be exactly represented in binary (floating or fixed-point fractions).


Is there any advantage at all to BCD over the int number of cents? Either approach gives exact precision but requires a marker to say how many digits to the right of the decimal. Fixed-point decimal is vastly faster for calculations. BCD requires weird carry calculations, is less memory efficient per digit stored, and requires a choice between truly awful memory density (one digit per byte) or using bitwise operations to do nibble-level addressing.

Actually, the more I think about it the more awful it gets. I'll go ahead and assert that the only reason ever to consider BCD is for compatibility with legacy systems that use it, and even then you'd only want to use it on the edges of the codebase where the system interfaces live.


Not really, other than compatibility. When the previous poster said financial applications, what was probably meant was "mainframe applications". BCD is still popular on z/OS because it's a first-class format there, and even editors knows how to handle files that contain BCD values.


The usual approach is to use fixed point (often, integers with implicit point) or decimal floating point. Neither of which is BCD.


https://en.wikipedia.org/wiki/Binary-coded_decimal#Packed_BC...

Earlier than the 60s. Had to work on a system in the 90s to “transpile” IBM minicomputer code to C.

Useful for accounting since what you see is what you get as far as cents and other decimals go. 1/5, and thus 1/10, can only be approximated in floating point.

Not that I’m defending MySQL


This is going to sound insulting, maybe it is, sorry. It's definitely subjective.

The reason I reach for Postgres over MySQL isn't features or technical superiority. Although those result from the reason. Which is, PG devs consistently have "taste", they have "good" style. They make good choices. MySQL devs are not consistently strong in these areas. I'm guessing that MySQL is now so full of tech and design debt (like OP issue) that they're just stuck, without choice.


In the MySQL vs PgSQL comparison, it basically feels like MySQL tried to obtain fast performance first, then worked towards correct and useful behavior later, where PgSQL went for correct and useful behavior first, then fast performance later. While the end result after decades of development is comparable, the echos of those very different beginnings remain in the current products.


MySQL and PHP are a dynamic duo that never fail to surprise.

And not in a desirable way.


I mean, with the exception of the name, of course.

"postgre" feels and sounds like a plate full of very wet, oily rice and the subsequent addition of clear vomit to the mix.



Nice

These three points has made me raving mad from working with mysql:

- The default 'latin1' character set is in fact cp1252, not ISO-8859-1, meaning it contains the extra characters in the Windows codepage. 'latin2', however, is ISO-8859-2. - The 'utf8' character set is limited to unicode characters that encode to 1-3 bytes in UTF-8. 'utf8mb4' was added in MySQL 5.5.3 and supports up to 4-byte encoded characters. UTF-8 has been defined to encode characters to up to 4 bytes since 2003. - Neither the 'utf8' nor 'utf8mb4' character sets have any case sensitive collation other than 'utf8_bin' and 'utf8mb4_bin', which sort characters by their numeric codepoint.

utf8 being effectively alias of utf8mb3 has cost us so much work its not even funny.


> utf8 being effectively alias of utf8mb3 has cost us so much work its not even funny.

An extra warning about that mess: mysqldump in many configurations will silently convert utf8mb4 down to utf8mb3. So when you're testing your backups or migrations, do an extra check to make sure that emoji and rarer characters didn't get eaten!


I am currently trying to fix a program that was made by a person that didn't knew those details of MySQL...

Most weirdly, the fact that the default collation is SWEDISH.

It is a complete freak show, the users kinda got used to it, butchering our language (portuguese) to use only characters valid in english, hoping MySQL won't barf spetacularly on them.


Most weirdly, the fact that the default collation is SWEDISH. It is a complete freak show,

Unless you're Swedish, I imagine. Then it's quite handy.

I believe the author of MySQL was Swedish, so to me it all makes sense. It also provides a learning opportunity for people who believe the entire planet operates on ASCII.


MySQL being a Swedish company, the default collation for MySQL was (is?) latin1_swedish_ci.


But... why would you not just set the correct collation?


> - The default 'latin1' character set is in fact cp1252, not ISO-8859-1, meaning it contains the extra characters in the Windows codepage.

Oh you sweet summer child. No it isn't. It's somewhat like Windows CP1252, but it also defines 8 other extra characters that are not in cp1252.


> - The default 'latin1' character set is in fact cp1252, not ISO-8859-1, meaning it contains the extra characters in the Windows codepage.

Actually, it's generally saner to assume that people mean Windows-1252 when they say ISO-8859-1. Charset labeling is frequently incorrect, and C1 characters are so infrequently used that seeing one pop up probably means you actually wanted Windows-1252 instead.


Well, the thing is that Mysql latin1/cp1252 isn't actually code page 1252, it's a tad different. There are extra characters that CP1252 doesn't know about, so it's not entirely compatible with CP1252 or Windows-1252.


Another one: if you happen to keep big numerical values in a VARCHAR field, be careful, the engine will convert them to DOUBLE when comparing to a big integer, and interesting things will happen. For example, "SELECT id, clmn FROM tbl WHERE clmn = 999999999999999999" may return records where `clmn` is '999999999999999999', also where `clmn` is, for example, '999999999999999998' (because those big integers are too big for DOUBLE and, when converted, they have the same representation).

So the correct query is "SELECT id, clmn FROM tbl WHERE clmn = '999999999999999999'"


Those are certainly valid WTFs. I've been bitten by the GROUP_CONCAT issue before.

This list is missing the WTF that cascaded deletes/updates don't cause triggers to fire on child tables:

https://bugs.mysql.com/bug.php?id=11472


Alright, seriously, thank you for this list! You just solved two mysteries that had my team going in circles.


Is there something similar for Postgres? I realize it might be a lot shorter, but any system big enough has its share of gotchas.



This gives me flashbacks to the DOS filesystem timestamps. It's exactly the same mistake. By splitting the date into multiple fields, bits are wasted. If they hadn't tried to be smart and just made it one number, it would've been more precise with a wider range.


I don't really see it as the same thing. Using multiple fields of bits works fine here. 60 possibilities for seconds and minutes fit very well into 6 bits each. A raw 24-bit integer gives you plus or minus 2330 hours, and if you don't reserve a bit the format they actually went with gives you plus or minus 2047 hours. Since it just needs to represent 999 hours and change, that's fine.

The problem is the weird non-bit packing they did before.

DOS is different, partly because there are fields that fit worse, and partly because 32 bits is just barely enough to store seconds in the first place. If you did a DOS-style packing with 34 or 38 bits it would work fine. And it would be able to represent leap seconds, making it arguably better than unix timestamps!


FWIW, even keeping an interval in a single number still imposes limits. As proven by the epoch's 2038 problem.


That's not really a problem though, at least from what i understand. even on 32bit linux systems, time_t is still 64bit. on windows it also seems to be 64bit


The linux kernel has the option CONFIG_COMPAT_32BIT_TIME, which adds syscalls for 32 bit time_t. If this option is not enabled, much badness happens if you run 32 bit binaries, for instance, Steam and many 32 bit native linux games fail to run. (I ran into this problem a few weeks ago)

32 bit linux applications can use 64 bit time_t if they were configured to do so and glibc and the kernel are new enough. But enough closed source binaries don't support 64 bit time_t that we'll probably have problems in 2038.

https://cateee.net/lkddb/web-lkddb/COMPAT_32BIT_TIME.html


> even on 32bit linux systems, time_t is still 64bit

No, on 32-bit Linux systems, time_t is 32 bits (except perhaps on newer ports like 32-bit RISC-V).


https://stackoverflow.com/a/60709400/571787 states otherwise with recent Linux kernels + glibcs.


Note that the glibc release mentioned there (2.32) has been released only two months ago (https://lwn.net/Articles/828210/), and its release notes don't mention time_t now being 64 bits. The glibc manual page linked to by that answer (https://www.gnu.org/software/libc/manual/html_node/64_002dbi...) says "at this point, 64-bit time support in dual-time configurations is work-in-progress, so for these configurations, the public API only makes the 32-bit time support available", which probably means in practice that 64-bit time_t is still not available for common 32-bit configurations (unless you want to lose binary compatibility with all existing software). I haven't been following this Y2038 work, but it's probably been postponed for a later glibc release (perhaps even the next one).


Which MySQL still isn't safe for :P (2038 that is, for where they use it)



MySQL has come a long long way indeed but not enough to make me turn away from Postgres.


I used to see literally this same comment as far back as 10 years ago, maybe more.

Just saying - I have no horse in this race.


Well I’m not wed to anything - maybe sometime we’ll see an article about reasons TO switch.


For me MySQL isn't even worth touching. Just this week their 'stable' binaries from their website threw seg faults while executing large SQL batches on my dev box. I replaced it with mariaDB and had no issues.


> 1 bit sign (1= non-negative, 0= negative)

First time I ever saw a number where the leading sign bit has to be set to 1 to indicate non-negative.



Sorts better that way.


It would binary sort weird with negatives < 0 < positives but would put the largest magnitude negative closest to 0.

Also not Offset binary which uses 0111 for -1 rather than MySQL's 0001.


That's yet another incompatibility with the ISO/ANSI SQL specification. In the specification, the TIME type is defined as containing HOUR, MINUTE and SECOND fields, representing the "hour within day", "minute within hour" and "second within minute" values, respectively, so, the valid range for that type supposed to be "00:00:00:00.00000..." to "23:59:59.99999...". It's not intended to represent an interval, although that seems to be the intended semantics for MySQL's TIME type:

(from https://dev.mysql.com/doc/refman/8.0/en/time.html):

> but also elapsed time or a time interval between two events (which may be much greater than 24 hours, or even negative).

For representing temporal intervals, the specification defines two kinds of INTERVAL types (year-month and day-time). Year-month intervals can represent intervals in terms of years, months, or a combination of years and months. Similarly, day-time interval, can represent intervals in terms of days, hours, minutes or seconds, or combinations of them (e.g, days+hours, days+hours+minutes, hours+minutes, etc.)

As a sidenote, the TIME and DATE types are related to the TIMESTAMP type in that TIMESTAMP can be thought of as combination of a DATE part (year, month, day) and a TIME (hour, minute, second) part.


Well MySQL has a history of ignoring the SQL standard even for the most simple things (even if those were defined before MySQL even existed), so it's not really surprising they got the TIME data type wrong as well.


Here’s a Redshift oddity that I don’t think is documented:

select sum(y.a), count(y.a) from(select distinct x.a from ( select 1 as a union all select 2 as a union all select 1 as a)x)y

sum | count -----+------- 4 | 3

Sqlite3 returns the correct results of sum of 3 count of 2.

To fix this don’t use subqueries.


Interesting, Postgres does this correctly:

https://dbfiddle.uk/?rdbms=postgres_12&fiddle=dc4ce40bd52695...


I gave up on MySQL a long time ago, when I realized that I had to activate special types of settings just to make Unicode characters work in tables.

In Postgres, it just works out of the box.


Must have been more than 10 years ago when utf8mb4 was added where you don't have to activate any kind of special settings.

I was using MSSQL prior to 2010 so I have no idea of MySQL unicode handling before that


> Must have been more than 10 years ago when utf8mb4 was added where you don't have to activate any kind of special settings.

Less than 10 years ago; MySQL 5.5 went GA in December 2010.


“I am struggling to imagine the circumstances where ..... can break ....”

If only I had a penny for everyone I heard this argument and we ended up breaking regression tests or something really obscure in the qa or customer setup


I always catch myself when I'm saying it and try to make sense of the cognitive dissonance of saying nothing will break and being somewhat convinced that something, somewhere will.


Fear the unknown unknowns!


everytime* not everyone

And yes, I have done it so many times myself too


I have yet to find a situation where using a native datetime format made more sense than using a unix timestamp in integer fields.



Ultimately this is a bit of a rant about why MySQL didn't bother changing the TIME type to support an elegant maximum value of 1,024 hours instead of 838.

But, seriously? Who cares? It's not even close to an extra order of magnitude of range. The type is obviously meant to be used for time values that have a context of hours within a day, supporting a few days as headroom... so supporting 1,024 instead of 838 is pointless -- if you're getting anywhere even close to the max value, you probably shouldn't be using this type in the first place.

And yes, it's probably best not to change it for backwards compatibility. Can I imagine a case where it could break something? No, not off the top of my head. But it probably would break some application somewhere. And for such a widely deployed piece of critical foundational infrastructure, being conservative is the way to go.

Nothing about this seems WTF at all, except for the author's seeming opinion that elegant, power-of-two ranges ought to trump backwards compatibility with things that probably made sense at the time.


IIRC there was also another problem with MySQL and .NET: MySQL used the year 0 as an uninitialized DateTime but .NET starts at year 1.

I think the workaround was to pass some parameter in the connection string.


TL;DR

So the answer is just it's for backwards compatibility with MySQL 3?

I was kind of hoping for more.


[flagged]


The MySQL documentation suggests using it to store a time interval:

"The hours part may be so large because the TIME type can be used not only to represent a time of day (which must be less than 24 hours), but also elapsed time or a time interval between two events (which may be much greater than 24 hours, or even negative)." [1]

How is the author's usage incorrect? Sure there's a documented allowed range, but it seems to me that the complaint is about the poor size of the range relative to other SQL technologies.

[1] https://dev.mysql.com/doc/refman/5.7/en/time.html


The documentation is valuable in the following way: based on almost every aspect of the documentation, the datatype is not something I would use.

Those who use it and get burned didn't get the main message.


"If we want to complain that MySQL has an utterly useless TIME datatype, and separately they really should have an INTERVAL datatype, then let's do that."

Uhh, done?

"The issue" is that MySQL has a TIME type that is goofy as all get-out, and this guy's friend had a bug that prompted him to look into the TIME type and find out how goofy it is. I don't recall him complaining that it's not in the docs (since that's where he looked it up). The bug was only a story, the thing that prompted him to look into it. Take a breath, this isn't StackOverflow (not that being rude and offended/butthurt is welcome there either).


Please don't call names or cross into personal attack on HN, regardless of what other commenters do.

https://news.ycombinator.com/newsguidelines.html


[flagged]


> Maybe you should take a moment to review the etiquette you're trying to hypocritically enforce.

Although they did breach it, they did not invoke the etiquette, they only made a more general recommendation of calming down (however snarkily).

You, however, are invoking the etiquette while breaching multiple items from it in your original comment.


Guilty as charged; if I could still edit it, I might remove the word "butthurt."


The issue seems to be that the .NET core provider for MySQL that OP uses maps the .NET Timespan type to this MySQL TIME datatype. They probably didn't think this through.


[flagged]


MySQL not having a proper type to express time spans seems like a fault to me, and "poor design". Of course you can just use an integer for it, but that is a slippery slope, in the end you'll find that you can use strings or byte arrays for everything and you end up with no type system at all.

The surprise here is not that the type has limits but that they are so awkward and that there is no better strongly-typed alternative.


A missing feature is not the same thing as poor design. If you need time spans in your project it could be a fault, but every product has missing features and choosing a product that is missing features you want might be a poor decision - depending upon other tradeoffs you're making.


If MySQL did not have a TIME type, then I would agree with you.

The problem is that MYSQL does have such a type, and it seems to not be very useful for modern applications.

Maybe you could argue semantically that this is still a “feature” that’s missing. Perhaps. But it seems more like a defect, even if it is documented.

If arrays in Golang could only be 5 elements long (say), I think we would characterize that as a defect, even if it is well documented.


It seems like poor design to me, but more so on the .NET side than MySQL side. MySQL retaining backwards compatibility is sensible, though seems a bit awkward that they don't introduce a TIME_LONG type or similar that can hold more useful ranges. .NET just mapping a time span with a wider range down to MySQL's fairly limited range seems destined to cause problems.


>If we want to complain that MySQL has an utterly useless TIME datatype, and separately they really should have an INTERVAL datatype, then let's do that.

Well isn't this complaint basically the last paragraph of TFA verbatim?


A vast majority of developer issues can be boiled down to RTFM, as usual.


Documenting a bad choice doesn't make it the right choice.


> Not being too experienced with MySQL, as I prefer PostgreSQL

Because that's why. He likes PostgreSQL and you need to know about it.


The issue is that the authors expectations didn't match with reality, because he hadn't yet read the docs.

The moment the author wrote "I normally use PostgreSQL" I knew that this would devolve into a "MySQL Bad" even if it's well defined and documented behavior.


The author had no specific expectations. The author received a report of an error, investigated, found that MySQL's TIME did not match EF Core's expectations for a timespan (hence causing the error):

> for a brief moment I assumed the incorrection in this value was the hundreds of hours, as one could reasonably assume that maybe TIME values were capped at 24 hours, or that a different syntax was needed for values spanning multiple days

And then it turned out that wasn't the case and they basically got nerd-sniped by the issue into finding out why the type had such weird bounds.


> I prefer PostgreSQL for reasons that will soon become self-evident

This outlines his expectations, and his biases, quite clearly. Also see: the title of the article.


I don't think there were any biases here. The research into the docs came first, and the title seems like the conclusion rather than the hypothesis.

Besides, I don't use databases all that much and I don't write blog posts either, but if I did, and I came across this issue, this might be the exact article I would write (except the liking Postgres thing).


> This outlines his expectations, and his biases

No, it does not. What it does is explain why they are admittedly "not too experienced with MySQL" and didn't straight remember that obviously the range of MySQL's TIME type is '-838:59:59' to '838:59:59' so 960 would be out of range.

Also please don't staple dicks onto people, the author's gender is irrelevant to the article and it's very disturbing.

> Also see: the title of the article.

The title of the article, and the introduction you're quoting, follow from the article itself. Unless you're calling the author a liar and think they made up the entire thing just to bash mysql.

That the author ultimately considers the behaviour of MySQL's TIME type and its bounds "a WTF"… is entirely defensible.


So, mysql is preserving backwards compatibility? Good. A flag to break this backwards compatibility and offer a larger range is probably warranted, but until that's implemented this behavior is fine.

"I think this is stupid" is a really poor reason to break backwards compatibility, despite how many other software projects use this reasoning.

But of course, MySQL bad, PostgreSQL good.


> "I think this is stupid" is a really poor reason to break backwards compatibility, despite how many other software projects use this reasoning.

True. But his argument seems to be in the direction of "this is highly unexpected behaviour" and I tend to agree. The number of applications broken by extending the date range probably dwarves to the number of bugs avoided by not having the time span break at such a strange length.


> "this is highly unexpected behaviour"

The behavior in question is having an upper and lower bound on a time interval. This strikes me as highly expected behavior.

The maximum on the interval is lower than the author expected; and reading the docs quickly cleared up what the interval maximum is.

The entire rant boils down to "MySQL's choice to keep backwards compatibility is stupid, because I think this interval limit should be larger."


The unforgivable thing here isn't the anachronism -- it's the lack of a suitable replacement. If you want a full i64 worth of duration at a reasonable resolution, you're on your own.


Reading the article it seems like the core issue is:

>To make matters worse, it appears that the most popular EF Core MySQL provider maps .NET’s TimeSpan to TIME by default, despite the fact that TimeSpan can contain intervals in the dozens of millennia (it uses a 64 bit integer and has 10-8 s precision)

It's this mapping is bogus. If the ranges don't match TimeSpans should be mapped to something else.

MySQL's BIGINTs are 64bits, so that might be a more reasonable target. Of course you lose some "typing" information in the schema, but it's probably worth it.

But I suppose it would make sense for MySQL to provide a "BIGTIME" alias for this purpose, and deprecate the old one maybe.


On the one hand no reaches for MySQL because it has robust types you can use for the columns. On the other hand not having a robust useful Interval type does not speak well of MySQL in this day and age.


It's a type with a minimum and maximum. How in the world is this an "unforgivable" sin? Shall I rail on C for the `size_t` type only having 16 bits?

> lack of a suitable replacement

MySQL's various string "alternatives" have proven how stupid having multiple types with different limits really can be.


I agree it's not a necessary change. But, on the other hand, how much compatibility would it really break? I cannot imagine much applications are dependent on MySQL throwing an error at 800-odd hours (and it would arguably be a very strange design decision).


MySQL is notorious for quietly guessing what you must have meant. Writers need to validate up front, and then try to read back what was actually written.


Having a type mean different things on different versions of the database is not good. If you want a new type that does new things, it should have a new name. If the old type is so bad, you could make an sql mode to blacklist it.

Yes, that leaves a legacy of why should I use TIME2 or whatever instead of TIME, etc. That's the downfall of having a successful project over a long time frame and not having designed it perfectly at the beginning.

See also UTF8 shouldn't be used in MySQL because it's dumb. But you can't change UTF8 to do the right thing, you have to have a new name that's better.


It would be a huge change in behavior to fix utf8.

Changing the limit from 838 to 999 wouldn't.


> how much compatibility would it really break

Probably only a few applications; but that's still too many.

And backwards compatibility does matter. A change here means that some few dozen developers have to now troubleshoot a previously-stable application which now fails silently in odd corner cases.


then create new type. time2 or datetime13 with improved behaviour and say in docs to prefer newer unless your specicaly know that you need old behaviour.


MySQL wouldn't be in that situation if they had paid attention to the definition of the TIME (and INTERVAL) data types specified in the SQL standard long before MySQL was created


I learned the semi hard way on PoC system where the Mysql index corrupted.

I was lucky and could simply redeploy my application, but I have never used Mysql since.


How does a corrupted index and a TIME type come into play with each other?


Because the way MySQL does other things seems has been troublesome. IIRC, MySQL essentially generates incremented integer table and column ids, so rebuilding tables relationships meant that you had to manually regenerate the table schema, then update all the old table files with the new table and column ids which were incremented on from what they were previously.

With Postgres, you can just drop the binary table data files in, restart the database and pg rebuilds the indexes and relationships.

From a quick google, it now looks like MySQL is now more robust in this regard.


How'd this get voted so many points? RTFM and use a more appropriate datatype.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: