Hacker News new | past | comments | ask | show | jobs | submit login
MySQL Text vs. Varchar Performance (nicj.net)
112 points by mrdraper on May 10, 2011 | hide | past | favorite | 20 comments



First, he mentions that VARCHARs are stored in-row, and then he wonders why his server's performance goes down when he joins on a VARCHAR(30000)? And uses EXPLAIN as the very last step? I don't quite get it, honestly.


He doesn't seem to be aware that there's a difference between storage engines when he says that "mysql" stores varchars in-row, when that's really MyISAM. MyISAM is now considered obsolete and you shouldn't be using it anyway. Innodb treats large varchars exactly like text/blob columns.


Could somebody please point me to a longer yet still intelligent comparison of MyISAM and InnoDB? Thx.


Short version:

MyISAM doesn't support transactions and only supports table-level locking, so a heavy concurrent read/write workload will bring it to its knees. Generally you shouldn't use MyISAM anymore unless you know what you're doing and have a specific reason for doing so.

InnoDB is much more crash-safe, supports transactions, and can support concurrent read-writes using row-level locking. It's generally a much more advanced storage engine than MyISAM and it's what Oracle is investing its development efforts in going forward.


In the article he mentions that MySQL still uses MyISAM for temporary tables, which were being created by the query he was running (which, granted, did SELECT * which is usually a bad idea). If the size of the temporary table was the cause of slowness, wouldn't it affect InnoDB and MyISAM tables equally?


It would seem so.

It would probably always have been creating on-disk temporary tables because text columns aren't permitted in memory tables. It was probably just that the fixed-length row format forced them to become extremely large.

The performance advantage that he expected from the schema change would only arguably make sense with respect to whatever storage engine he was using, and it's actually not really true for InnoDB. [Actually come to think of it I can't think of any advantage do doing it for MyISAM either, I don't think it has a concept of overflow pages. InnoDB 1.1 has a lot of performance enhancements for text/blob columns, though...]


This is why I like VARCHAR(MAX) in SQL Server if the row is < 8192 chars it's a VARCHAR if it's > 8192 it's TEXT. You only pay the penalty on rows that exceed that threshold. Best of all you can store 2GB of text in a VARCHAR(MAX)

http://msdn.microsoft.com/en-us/library/ms176089.aspx


Using VARCHAR(MAX) still seems to have a significant performance impact, however - http://richardlees.blogspot.com/2010/07/varcharmax-performan... - perhaps the fact it has to check whether the varchar exceeds 8192?


It doesn't just have to check the length of that one field - it needs to check that the whole row wouldn't pass the per-row length limit even if the field itself doesn't.

[N]VARCHAR(MAX) columns can lead to greater page tearing and other performance issues over time if their value is changed between a size that would fit in-row and a size that won't too - so as well as the hit on INSERT/UPDATE performance there may be a (less significant) hit on read performance over time too, on top of the slight hit from simply needing to check where the data is and go find another page if it isn't in-row.

For the most part [N]VARCHAR(MAX) should be used as a more efficient (and far more convenient) replacement for [N]TEXT.

It should only be used as a replacement for [N]VARCHAR(<something>) when the data in the column may need to break the ~4000 characters mark (for NVARCHAR, the limit is ~8000 characters in the case of VARCHAR) or when the total bytes for a row will.


This is where I must have picked up the habit of always limiting my varchars to 8,000 chars. I'm in the habit of always using text for anything larger and I couldn't remember why.


Can anyone offer a TL;DR?


MySQL lifts restriction on VARCHAR'length from 255 to 64K. So you can change TEXT columns to VARCHAR(30000). It might not be a good idea to do it though as when MySQL sorts things it uses a temp table with no variable length columns allowed - so these VARCHARs turn to CHARs which can explode the row size and thus the size of the whole temporary table above the maximum size of an in-memory temporary table. This commits it to disk and as it is huge it takes a long time.

Conclusion: long varchars are there but they suck. Limit their length and test test test before deploying changes to the live.


if things are slow, check EXPLAIN first.


Which strangely seemed to be one of the last things this guy did, mysql explain and profile are usually enough to see what is going on.


Good advice. "Explain" in this case wouldn't be enough to work out what's going on, however. In both the TEXT and VARCHAR cases he expected the queries to be using temporary tables so the output from mysql explain (both showing the use of temporary tables but not the storage medium such as disk or memory) would not have helped.


Plus I kind of enjoyed the round about way he explained it.


EXPLAIN provides a bit of the clue but the meat of the problem is that the VARCHAR is being changed to a CHAR. EXPLAIN is just verifies that something is going on to force a temp table to be created.


I'm confused; in the example, couldn't he avoid the filesort entirely by just adding an key on t2.t1id? Is it really optimal to be sorting the whole table -- no matter how optimally -- on any common serving path?


To be clear, it's not just "when a TEXT/BLOB is included in a sort" that causes on-disk temp tables. One would expect to take a hit when sorting on a TEXT field.

But actually it's much worse than that: you can get on-disk temp tables if you're selecting a TEXT field while doing an ORDER BY or GROUP BY that uses any columns from the second table in a join.


Learned a lot from that, very fascinating ... would have liked to see benchmarks after he made his adjustsments and a comparison with an innodb version of the database.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: