« useful bash MUTEX code | Main | Security Theatre of the Absurd, part 3 »

MySQL resiliency after a crash and relay log files

One of my clients had a major power failure at his (new!) data center the other week, and one of the side effects was that there was quite a bit of data corruption (on an ext3 filesystem). One thing we lost was the LDAP database (that we recovered from the LDAP slave) and one file that got lost was a MySQL relay log file for a MySQL "slave".

MySQL appears to use these "relay log" files as a buffer between reading from a master server and feeding into a slave—it's supposed to make things more resilient.

Of course, if you end up with a missing file, you end up with the following in your mysqld.log file (on Linux):

070304 22:24:43 [ERROR] Failed to open the relay log './<host>-relay-bin.000057' (relay_log_pos 494316200)
070304 22:24:43 [ERROR] Could not find target log during relay log initialization

This error will prevent you from starting up a slave, and will give you an error:

ERROR 1201 (HY000): Could not initialize master info structure; more error messages can be found in the MySQL error log

The solution that I found that finally worked was to:


  1. Create a new dump of the master database, FLUSH TABLES WITH READ LOCK, recording the master log position, etc.
  2. Load in the dump into the slave. (Thus far we're following regular directions for creating a slave.)
  3. Stop MySQL.
  4. cd /var/lib/mysql; rm *relay-bin* master.info (This gets rid of all the old information about the old slave info.) (DANGER: Like all other instructions to delete files, use this at your own risk. Make backups of everything if you're unsure.)
  5. Restart MySQL
  6. Execute the CHANGE MASTER command that you use to set up the slave

TrackBack

TrackBack URL for this entry:
http://www.jbaltz.com/mt/mt-tb.cgi/47

Comments

Hello,
Find /var/lib/mysql/relay-log.info file. In this file try to change mysqld-relay-bin.x name to the real name found in /var/run/mysqld directory. It worked for me without redumping data from master.

I tried that--this is the case where a binlog might be missing altogether...in which case, you've got to rebuild the slave from scratch, and you have to wipe out the old binlogs and slave information.

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)