Oh, the unsuspected woes of server migration

Last night, I migrated The Internet Company’s servers from a Linux-VServer host to an OpenVZ host. It all went well, except one crazy detail.

The OpenVZ host runs on CentOS, and apparently its way of calling gettimeofday(2) doesn’t agree with PLD’s glibc’s way. Specifically, the 9th bit of the resulting time is wrong … some of the time.

` with kernel.vsyscall64=0: 2011-05-09 17:14:48 -0600 1304982888 1001101110010000111010101101000

with kernel.vsyscall64=1: 2011-05-09 17:19:56 -0600 1304983196 1001101110010000111011010011100

with kernel.vsyscall64=0: 2011-05-09 17:15:04 -0600 1304982904 1001101110010000111010101111000 `

Fixed!

So what happens is that you get errors like Dovecot complaining of things like this:

May 9 15:36:31 host dovecot: pop3: Fatal: Time just moved backwards by 290 seconds. This might cause a lot of problems, so I'll just kill myself now. http://wiki2.dovecot.org/TimeMovedBackwards

since the mail server needs to know what time things arrived, and the date going backward is a sign that the clock is Not Reliable.

It also gives errors like this when things move forward again:

May 9 15:33:54 host dovecot: imap-login: Error: master(imap): Auth request timed out (received 0/12 bytes) May 9 15:33:54 host dovecot: pop3-login: Error: master(pop3): Auth request timed out (received 0/12 bytes) May 9 15:33:54 host dovecot: imap-login: Error: master(imap): Auth request timed out (received 0/12 bytes)

So the solution? set kernel.vsyscall64=0. It fixes the mismatch between guest and host on OpenVZ, making the five-minute jump disappear. Just pop that line into /etc/sysctl.conf and then apply it with sysctl -p.