I am reaching out to mysql experts out there! We just inherited a site which uses MySQL master/slave replication. We set up the master and slaves per MySQL documents and did some of our own tweaking in my.cnf to get things working. Things were great when we were testing the site but than after we put it up, we started to notice missing data on slaves. At first, we thought maybe we missed something with the import of data on slaves, so we re-imported. All tables in our db, at that time, were myisam. After a day or so, we started to see data corruption on some of the slaves. We went through and repaired all the tables. Few hours after the repair, we checked row counts and the repaired tables didn’t have same count as master did. Eventually most of our slaves were at the same state so we decided to re-import data from master again.
Before we went ahead and re-imported, we made a decision to move to innodb for *most* of our crucial tables. After altering on master, we did a dump/import. Converting tables to innodb got rid of our data corruption problem but our missing rows issue is still there. Hardware on the master is 4×1.86Ghz, 8 gigs of ram; slaves hardware is: 4×1.86, 4 gigs of ram.
Here is the config running on our master:
< some of the part of the config has been cut off to keep it short/simple and to the point >
skip-locking
ft_min_word_len=2
long_query_time=30
max_connect_errors=10000
max_connections=1000
#skip-grant-tables
skip-bdb
delay-key-write=ALL
key_buffer = 1G
read_buffer_size = 2M
read_rnd_buffer_size = 8M
myisam_sort_buffer_size = 64M
thread_concurrency = 8
skip-name-resolve
max_allowed_packet = 32M
#low-priority-updates
table_cache = 1024
thread_cache = 8
query_cache_size = 64M
#
# added by sunny >
wait_timeout = 120
tmp_table_size = 128M
sort_buffer_size = 128M # Sort buffer is used to perform sorts for some ORDER BY and GROUP BY queries.
read_rnd_buffer_size = 8M # Read the rows in sorted order by using the row
# pointers in the result file. To optimize this,
# we read in a big block of row pointers, sort them,
# and use them to read the rows in sorted order into a row buffer.
thread_cache_size = 8
#
# Try number of CPU's*2 for thread_concurrency
thread_concurrency = 16
log_slow_queries
log_long_format
#
# INNODB Settings
innodb_additional_mem_pool_size = 32M # was 16
innodb_buffer_pool_size = 4G
innodb_file_per_table
innodb_log_file_size = 50M
innodb_log_buffer_size = 8M
innodb_flush_log_at_trx_commit = 0
innodb_lock_wait_timeout = 50
innodb_flush_method=O_DIRECT
Here is the config running on our slave:
# The MySQL server
[mysqld]
port = 3306
socket = /tmp/mysql1.sock
pid-file = /usr/local/mysql/var/mysql.pid
relay-log = /usr/local/mysql/var/mysql-relay-bin
relay-log-index = /usr/local/mysql/var/mysql-relay-bin.index
log-error = /usr/local/mysql/var/mysql.err
master-info-file = /usr/local/mysql/var/mysql-master.info
relay-log-info-file = /usr/local/mysql/var/mysql-relay-log.info
datadir = /usr/local/mysql/var
server-id = 31
skip-slave-start
skip-locking
skip-name-resolve
ft_min_word_len=2
long_query_time=30
max_connect_errors=10000
max_connections=1000
#skip-grant-tables
skip-bdb
delay-key-write=ALL
key_buffer = 512M
max_allowed_packet = 1M
table_cache = 512M
myisam_sort_buffer_size = 64M
thread_cache_size = 4
read-only
#slave-skip-errors=all
#
# added by sunny >
wait_timeout = 120
tmp_table_size = 128M
read_buffer_size = 2M
sort_buffer_size = 128M # Sort buffer is used to perform sorts for some ORDER BY and GROUP BY queries.
read_rnd_buffer_size = 8M # Read the rows in sorted order by using the row
# pointers in the result file. To optimize this,
# we read in a big block of row pointers, sort them,
# and use them to read the rows in sorted order into a row buffer.
thread_cache_size = 8
query_cache_size = 16M # changed to 16 3/1/07
#
# Try number of CPU's*2 for thread_concurrency
thread_concurrency = 8
log_slow_queries
log_long_format
#
# INNODB Settings
default_table_type = INNODB
innodb_additional_mem_pool_size = 32M # was 16
innodb_buffer_pool_size = 2G
innodb_file_per_table
innodb_log_file_size = 50M
innodb_log_buffer_size = 8M
innodb_flush_log_at_trx_commit = 0
innodb_lock_wait_timeout = 50
innodb_flush_method=O_DIRECT
Please feel free to comment/scrutinize any of these settings. If you think they should be different, please give quick summary about why they should be different. Some of these settings were inherited from old setup of mysql (where they had it hosted before).
Right now we are just trying to find out why we have missing rows and why our slaves lag behind constantly. We have refreshed data almost every day for last two weeks. We barely have load average above 1 or 2. What can we do to track down the problem? What are the things we should be looking out for?