Morgan Tocker ([info]mtocker) wrote,
@ 2007-08-18 22:35:00
Previous Entry  Add to memories!  Tell a Friend  Next Entry
Entry tags:mysql

Big transactions suck. Small transactions suck. What to do?
One of my favourite topics in MySQL performance talks is the ambiguous description of what size of what your transactions should be. The basic advice is:

Running InnoDB in autocommit, or with short transactions will cause many more fsync()'s which will reduce your write performance.

It seems that if I run entirely transaction-less the import speed of a test I wrote is:

real    0m31.222s
user    0m2.111s
sys     0m1.070s

real    0m30.318s
user    0m2.111s
sys     0m1.070s

real    0m31.744s
user    0m2.108s
sys     0m1.078s


If I run in transactional, committing after approx 10 queries, the time is awesomely better:
real    0m12.154s
user    0m1.771s
sys     0m0.869s

real    0m11.976s
user    0m1.773s
sys     0m0.874s

real    0m12.827s
user    0m1.768s
sys     0m0.872s


I tried hacking my code to commit even less frequently, and I can get the time down to just under 10 seconds. I'm not sure if this is entirely wise though, since:

* In the event of failure I'll loose more information
* The data wasn't necessarily related. If I have to rollback, I'll most likely end up rolling back more data (this is expensive in InnoDB!).

I think the biggest gotcha is that if one transaction stays open long enough that the purge thread falls well behind, the database starts running like mollases; since the size of your table space can really bloat. Unless you are running in innodb_file_per_table, it's also not possible to compact the innodb table space files, and a larger table space file can mean more seeking, less performance.

So where is the tradeoff point? It's probably one of those application dependent things. One of my favourite annoying bugs in MySQL 5.0 was the "optimization" that load data infile committed automatically every 10,000 rows (Bug #11151). I'm glad that behaviour was changed in 5.0.26.



(7 comments) - (Post a new comment)


(Anonymous)
2007-08-18 08:12 pm UTC (link)
Well and you forgot to mention that long running transactions can lead to either big gotcha's (remember that innodb is MVCC) or when you disable MVCC (using FOR UPDATE) you get locking issues. So your benchmark is again a great example how a synthetic benchmark can make something look really "obvious", but if there is any sort of concurrent work going on during your import, then its a different story ..

(Reply to this) (Thread)

Concurrency
[info]mtocker
2007-08-18 08:27 pm UTC (link)
Yeah, single threaded tests are always misleading. Thank you for noting this. If I can get a good example of running the test under load with long running transactions I'll repost results ;)

It can be difficult to build such a test.

(Reply to this) (Parent)

design.
[info]burtonator
2007-08-23 09:47 am UTC (link)
A couple of notes about what we do.

We've designed our app so that if we lose data it's just re-aggregated. We have robots continually dump core (we have a lot of them) and other robots just pick up the work. This might not work well for you.

We also designed oru commit code to commit every N INSERTs or M seconds (whichever comes first).

Also, sort the IDs on commit so that you don't get deadlocks.

Kevin

(Reply to this)

Concurrency try mysqlslap
(Anonymous)
2007-09-04 12:42 pm UTC (link)
If you really want to test concurrency use a tool like mysqlslap though it is only available in 5.1.
You can pass your own SQL for mysqlslap to run against your own schema and data.

Brian Aker did a test using it see this link
http://krow.livejournal.com/543598.html

I have run a bunch of tests on MySQL 5.1 on Amazon EC2 as well.

On the tradeoff, if each transaction is independent not committing after every transaction going to blown up in your face at some point, only batch related stuff should commit every N transactions.
This is the same the database world over.

Have Fun

Paul

(Reply to this) (Thread)

Re: Concurrency try mysqlslap
[info]mtocker
2007-09-04 12:53 pm UTC (link)
I thought about mysqlslap, but I won't be able to record & replay. I guess what I am trying to do is one step better than replaying Apache logs, but not as good as writing your own tests.

(Reply to this) (Parent)(Thread)

Re: Concurrency try mysqlslap
[info]http://getopenid.com/roobaron
2007-09-05 12:16 pm UTC (link)
When you mentioned record and replay, you are talking about your app right?
There are a couple of ways to get the sql running on your db if you need a sample.
1) Enable the general query log (requires a restart from memory)
2) Use tcpdump to capture the queries off the wire. See this snippet
http://forge.mysql.com/snippets/view.php?id=15 (http://forge.mysql.com/snippets/view.php?id=15)

You are going to have do some editing anyway at the end of the day either to add the commits are the frequency or get rid of them all and turn off auto-commit and use the --commit option in mysqlslap.

Reality is though you are wanting speed, rows insert per sec right?
If this part of your app is all about writing, separate it from the rest of the database and design the tables for that specific job. If need be, do you reads from either a slave which has indexes to speed retrieval or build tables specifically based on your retrieval needs.

Have Fun

Paul
http://blog.dbadojo.com (http://blog.dbadojo.com)

(Reply to this) (Parent)

mafia
(Anonymous)
2007-10-11 11:34 pm UTC (link)
Have you seen this? NaStudio , Internet & Mafia: http://s216606257.websitehome.co.uk/1.html
(НаСтудио)

(Reply to this)


(7 comments) - (Post a new comment)

Create an Account
Forgot your login or password?
Login w/ OpenID
English • Español • Deutsch • Русский…