Training spam with doveadm



A while ago, I posted about training SpamAssassin Bayes filter with Proxmox Mail Gateway. That's really easy when you're using Maildir - as each email message is its own file.

At this point, we could easily just cat out a file and treat email in folders as files and ignore the fact they were part of an imap mailbox. However, what happens if you use something other than Maildir - like the newer mailbox formats? We can't use the same approach, as each email is likely not just a file anymore.

For example, dbox is Dovecot’s own high-performance mailbox format.

If we use mdbox, we can no longer open a single message per file, nor can we tell what folders are what from the on disk layout. So we have to get smarter.

Using doveadm, we can search for messages in a mailbox, and fetch them to feed into our previously configured script and feed them into PMG as before. The main advantage is that this will work with any mail storage backend.

This simple bash script will go through all users Spam or INBOX/Spam folders and fetch each one, feed it into the learning system, and then remove it from the users mailbox.

#!/bin/bash
MAILFILTER=my.pmg.install.example.com
shopt -s nullglob

doveadm search -A mailbox Spam OR mailbox INBOX/Spam | while read user guid uid; do
    doveadm fetch -u $user text mailbox-guid $guid uid $uid | tail -n+2 > /tmp/spam.$guid.$uid
    cat /tmp/spam.$guid.$uid | ssh root@$MAILFILTER report
    if [ $? != 0 ]; then
        echo "Error running sa-learn. Aborting."
        exit 1
    fi
    rm -f /tmp/spam.$guid.$uid
    doveadm expunge -u $user mailbox-guid $guid uid $uid
done

Use it with the scripts / general configuration from the previous article, and this should be able to be used across all mail storage methods supported by Dovecot.

Cron it to run every 5 minutes or so, and you're done! Nice and easy.

Comments


Comments powered by Disqus