Monday, October 6th, 2008

Optimizer Edge cases

I love teaching EXPLAIN in training classes with the world.sql sample database. One of my favorite edge cases to try and explain to students is:

SELECT Name FROM Country WHERE continent = 'Asia' AND population > 1 000 000 000;


If you add an index on Continent,Population and Population, MyISAM will choose to use the composite index (Continent,Population), whereas InnoDB will choose just the Population index.

It's a simple geography question... all of the countries in the world with > 1B people *are* in Asia. Since both indexes are equally effective, InnoDB chooses to use the one with the shorter key_len, despite the fact it will have to do a second stage check on the data rows to verify this.

I think that this decision (shorter index) is the right one - since unless the database has index pinning, it should always factor what the cost would be to load the indexes from disk.

The storage engines maintain their own statistics. Most of the time MyISAM seems to be more accurate (See: ANALYZE TABLE), but not today.
(Leave a comment)