Morgan Tocker (mtocker) wrote,
Morgan Tocker

IO scheduling in the 2.6 kernel

I was surprised by even the gap I saw on Vadim's post on the improvements of using the Noop IO scheduler. I've been changing my thoughts on what to set the scheduler to lately, and it's all leaning to Noop as the default.

An explanation first:
IO Schedulers (aka elevators) are a method of trying to get the best possible performance out of your disk subsystem as possible. Since your disk is essentially a mechanical device - it has a difference in performance between whether or not you are performing actions sequentially - or when you are performing actions randomly. And this difference can be huge! Last time I tested, a typical 7200RPM consumer hard drive could write 60MB/s sequentially, but performance dropped to only a few MB/s when I started trying to write small pieces of random data.

So how do the IO schedulers work?
They achieve this (mostly) by doing request reordering and merging, and by trying to read platters in one continuous direction. They may even detect that you are writing sequential blocks, and slightly delay an operation in order to 'save cost'.

Each IO scheduler will have different algorithms regarding how they do this reordering. For example, on a desktop Operating System you are probably more concerned about your MP3s not skipping than about the maximum sustained performance.

Death to schedulers
The problem with using techniques like IO scheduling is that the Linux kernel is pretty dumb to all the layers below it. Hard drives themselves have their own scheduling mechanisms, and if you are running a RAID controller *it* will have it's own scheduling mechanisms.

The last point is important - If you are doing scheduling when you have a RAID controller, from Linux's perspective it's probably all one big block device. The scheduler is making all sorts of assumptions about blocks being aligned on disk and it's WRONG WRONG WRONG - you probably have some sort of striping. So all the IO scheduler is doing is adding latency (bad) and to probably applying some partial serialization to writes (double bad).

So in that case, it's better to tell Linux to mind it's own business. In which case you want the Noop scheduler.

If you are curious where to learn more, I think the best references to learn more about scheduling have been some of the talks by the Youtube guys, and an earlier post by Domas Mituzas.
Tags: mysql
  • Post a new comment


    default userpic

    Your reply will be screened

    Your IP address will be recorded 

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.