August 15th, 2010

Mail filter extensibility

The biggest internal requirement that I have for a new mail filter setup is extensibility. The actual decision as to what is and is not spam needs to be left up to modules.

I hesitate to write a system that is a suite of full ACLs, like Exim or Postfix’s access controls. Postfix’s are barely flexible enough to work at all, and Exim’s are so overwhelming and yet limited that you have to be a programmer to write a system that’s not going to break or lose mail, and a clever programmer at that.

Every technique for filtering has a natural place in the flow of things: RBLs are early, at HELO or RCPT TO time; Learning filtering must come after DATA has been received, and could either stream or receive the message as a single dump. Filtering at HELO time should be rare: you can’t check a per-destination whitelist that early. You have to wait for RCPT TO, and in fact, many senders may retry again and again and again if you reject at HELO instead of RCPT TO.

So each plugin receives some part of the SMTP-time data: early ones get IPs and connection-related information, and later ones get the full message data.

Plugins essentially distill their input into a status: “good”, “bad”, “not sure”