INND and crashing machines; what you don’t get at first gloss
After the past few days of multiple outages at one of my customer's LA datacenters, I got to learn a few things about the resiliency of popular packaged unix software. Read this all the way, since the very last step is the one that came up to bite me.
innd, the popular NNTP package, does not like it when it is shut down uncleanly, e.g. by a power failure. One of the popular messages you may receive is:
Server throttled File exists writing SMstore file — throttling
What this essentially means is that the history file—the database that tells innd what articles it has seen already—has been corrupted, and the canonical thing to do (as detailed in web pages) is to rebuild your history files:
# cd /var/lib/news
# rm history*
# sudo -u news /usr/lib/news/bin/makehistory -b -O -F
...and wait for a long while, makehistory scans all the articles in your news spool to rebuild its database, and the sudo -u news makes sure that the history file has the correct ownership. (The -b flag means “delete ‘bad’ articles”, the -O means “regenerate the overview file”, and the -F means “fork off a separate process to flush the overview data to disk”.) After this, you must convert these into binary database files in dbz format:
# sudo -u news /usr/lib/news/bin/makedbz -s \
`wc -l history | awk '{print $1}'` -o
(The -s means “this is the approximate initial number of entries to use” and the -o means “don’t make any symbolic links and you copy them over manually; go ahead and overwrite old databases”.) All of that is meant to appear on one line, the \ means “continue onto the next line”.
Then you can restart innd, using your favorite method. For the Linux variety we use, it is:
# /etc/init.d/innd restart
Up to this point is the canon; it’s what you see when you google the phrase Server throttled File exists writing SMstore file — throttling.
If you go to the one Spanish-language blog describing this, you will find an additonal step: renumbering the articles that may be there.
You see, when you’ve deleted gone through the process of regenerating your history databases, you might see some (what must be, I can’t verify this) inconsistencies in the article numbers, or a difference of opinion as to what the next article number should be. Therefore, a crucial step before you start receiving new articles, but after you’ve started innd, is
# /usr/lib/news/bin/ctlinnd renumber ''
This tells the running innd that it needs to perform a scan of the overview database to make sure that it knows what number (filename) to give the next article, otherwise you can end up with errors of the sort:
Jul 24 17:34:36 HOSTNAME innd: SERVER↵
cant store article: File exists
Jul 24 17:34:36 HOSTNAME innd: tradspool: ↵
could not open /var/spool/news/articles/news/group/name/267106↵
File exists
In other words: the history database, the overview database, and the file system all need to be in agreement as to what articles have been seen, and what to name the next file. This is assuming you’re using the tradspool method of storage (newsgroup hierarchy name gets translated into unix directory hierarchy, and the article number is the file name). If you’re running some other version of file storage, you’ll need to read someone else’s blog...