MySQL database backup file compression: gzip vs bzip2


In one of my previous posts: MySQL backups, I talked about using a script for automating backups. I show that we can use gzip to compress backup file to compress and save. Since then, our backup file has been growing meg or two a day which is causing our backup files to get bigger and bigger every day. We keep hourly backups for a week so you can imagine space usage is quite high.

I went and did few tests to see if its beneficial for us to use Bzip2 instead of gzip. I could’ve tried 7z also but that is not something installed on most linux machines I work on and I didn’t want use a solution which will require me to add more software. Here are the results:

Original sql file size: 584M
Using gzip: 166M
Time spent on compressing 1 minute 25 secs
Using bzip2: 125M
Time spent on compressing 1 minute 50 secs

As we can see that it takes longer to compress but file size is much smaller (adds up fast for multiple backups). As our database grows bigger and bigger, the size difference will matter quite a bit. Since we also ftp the file off the server to onsite/offsite location hourly as well.

Does anybody know of any backup techniques which we can use for free and we can do incrementals with? comments are always welcome.  To learn more about gzip or bzip2, see man gzip or man bzip2 respectively.


13 Responses to “ MySQL database backup file compression: gzip vs bzip2 ”

  1. dudus
    March 19th, 2007 | 7:45 pm

    You could keep a backup copy remote and locally. Then everytime you do a new backup you binary diff both and just submit the patch. So onsite you keep only the last one. And off site you keep an old one with several patches.
    Then you can apply some of this patches just to clenup once in a while.

    Just an idea

  2. March 19th, 2007 | 9:31 pm

    There’s always rsync :-)

    I’ve used rdiff-backup in the past, it’s pretty good albeit not very robust (fails with fatal error a lot in situations where it should have just retried or ignored the error). i’m now using rsync combined with some simple homebrewn script.

    Unix’s “cp -la” is also useful :-)

  3. March 19th, 2007 | 9:42 pm

    Steven: what am I rsync’ing? are you talking about rsync’ing the db files themselves? if so, it does make it hard for us to go back to certain date ad pull data out if needed.

    dudus: intresting idea. I never heard anybody do this… I will look into it. Thanks!

  4. Martin
    March 19th, 2007 | 11:00 pm

    For incremental backup I’d suggest using binary logs. Set max_binlog_size to something small and every time a new binlog file is started, copy the last one (I’m not sure if it is safe to copy the currently used binlog file as well).

    Even better would be a replication. Then you get a real-time copy of all transactions. That would also use less resources on the DB server, as all further copying and comression (hourly snapshots etc.) can take place on the backup server.

  5. March 19th, 2007 | 11:19 pm

    Have you tried with bzip2 -9 or -s ???

  6. March 20th, 2007 | 12:08 am

    Shin: -9 creates a same size compressed file. -s creates 135 meg file. -s is used so bzip doesn’t use too much ram. therefore its not worth using since ram is not an issue here.

    Martin: we will be looking at replication in near future. Thanks for binlog comment. another thing for me to try.

  7. Casey
    March 20th, 2007 | 6:55 am

    Crazytoon: I believe Steven is saying you can use rsync on the actual dumps. So, the basic process would be:

    1. Create a new SQL file using MySQLdump
    2. Rsync the SQL file to your backup server.

    Martin does have good advice for incremental backups. The nice thing about that is when you do a full-backup (once a week?), you can delete the old binary logs to save space.

  8. Samer El Sahn
    March 22nd, 2007 | 6:28 am

    Dar is a very good backup tool that could deliver diff backups

  9. November 26th, 2007 | 4:22 pm

    [...] MySQL database backup file compress gzip vs bzip2 [...]

  10. kindDBA
    April 2nd, 2008 | 2:54 am

    which one is the fastest way to compress database backup. I have a backup of 30G and is taking more than an hour to finish compressing

  11. April 3rd, 2008 | 2:00 pm

    gzip is definitely faster.

  12. January 21st, 2009 | 9:43 am

    Hey. Using MySQL’s archive storage engine can be good for things like logs I believe it uses gzip compression as well.

  13. July 9th, 2009 | 10:50 am

    Hey Everyone! Yeah using MySQL’s archive storage engine makes things small without the need to dump the .sql all the time. Tradeoff there is that the queries run slowly (Im not sure if it supports indexing)

Leave a reply

*
To prove that you're not a bot, enter this code
Anti-Spam Image