Here's a list of tips and tricks to write efficient mail filters
using maildrop. Maildrop - just like any programming language -
can do certain things better than others.
When maildrop is started, it reads
$HOME/.mailfilter, or some other file containing
mail filtering instructions. Before doing anything else, this
entire file is read and semi-compiled. Any grammatical or syntax
errors will result in maildrop terminating with a temporary error
code, without doing anything with the message.
The entire file must be 'semi-compiled". The
'semi-compilation' process consists of building an logical
representation of the filtering statements in memory.
If you have a long and complicated set of instructions, it
will be semi-compiled whether or not it will actually be
executed. For example, consider the following code:
if ( /^Subject: rosebud/ )
{
...
some
really
long
set
of
instructions
...
}
If the contents of the "if" construct are large,
maildrop may spend considerable amount of time and memory
semi-compiling filtering instructions that may never be used.
Consider putting all these instructions into a separate file, and
using the include statement to execute them.
The include statement is processed only when it
is actually executed by maildrop. That carries certain side
effects. One feature of maildrop is the semi-compilation which
weeds out errors in the filtering recipe file, holding all mail
and not doing anything until the errors are fixed. The
include statement is not processed until it is
actually executed. If there are errors in the file referenced by
the include statement, maildrop will not know about
them until it executes the include statement. When
maildrop detects the error, it will terminate with a temporary
error code, holding all mail. However, any filtering instructions
before the include statement would've already been
executed.
These variables are initialized to the number of lines in the
message, and the size of the message in bytes. They may not be
exact, due to the presence of the From_ line, or -A arguments on
the command line, but they will always be a reasonable
approximation.
Scanning headers for patterns will take the same amount of
time whether the body of the message has 100 or 1,000 lines.
However, the "b" option causes the message body to be searched.
If someone sends you a 5 megabyte binary file, each pattern
matched against the message body will take a significant time to
complete.
Whenever possible, the first thing you should do is check both
variables to see if the message is excessively large. If
so, divert the message to a separate mailbox, or reject it.
Weighted scoring is especially sensitive to the size of the
message being delivered. Normally, as soon as a pattern match is
found, maildrop terminates the pattern scan. If weighted scoring
is used then the pattern scan is immediately restarted, until the
end of the message is reached.
A fixed overhead is required to start a pattern scan going, so
that weighted scoring that matches a very short pattern, like
/[:lower:]/:Dbw,1,1 will take a significant amount
of CPU time to complete. Maildrop is simply not optimized for
these kinds of situations.
Nevertheless, on a Pentium 200, /[:lower:]/:Dbw,1,1 perform
reasonably well for messages less than 60K long.
Here's an example of using $LINES and
$SIZE to divert large messages to a separate
mailbox:
if ($LINES > 1000 || $SIZE > 60000)
{
to "mail/IN.large"
}
.
.
.
Maildrop implements the ! operator in patterns by breaking up
the pattern into two separate patterns, which are compiled and
executed separately.
Whenever the first pattern is succesfully matched, maildrop
runs the second pattern. If the second pattern does not match,
maildrop resumes matching the first pattern until it matches
again.
Since starting a pattern match involves some fixed overhead,
minimizing the number of times the first pattern will match will
reduce the amount of time it takes to match the entire
pattern.
For example: you're trying to extract an IP address from the
first Received header. Here's your first attempt at doing so:
/^Received:.*![0-9\.]+/
IP=$MATCH2
This will not work at all. From the manual page for
maildropfilter:
When there is more than one way to match a string,
maildrop favors matching as much as possible initially.
So this fragment will set IP to the last digit, or period found
in the first Received: header, because the first half of the
pattern will match as much as possible.
Here's your second attempt, then:
/^Received:.*[^0-9\.]![0-9\.]+/
IP=$MATCH2
This will work, except the first half of the pattern will end
up matching every character in the Received: header that's not a
digit or a period. Maildrop will then stop and try to run the
second half of the pattern.
Your mail server will probably put a left bracket before the
IP address of the relay. Noting that, the following code picks up
the IP address as efficiently as possible:
The logical operators || and
&& work in maildrop just like they work in
C. Consider the following expression:
if (/ ... some pattern ... / || / ... some other pattern ... /)
{
.
.
.
}
If the first pattern is found, maildrop will not execute the
second pattern, since the logical expression will be true no matter
what. You can use that to your advantage. If you have two patterns,
and one of them takes more time to process, put the other first.
Even if both patterns are relatively quick, if you expect one of
them to be found almost every time, put that one first so that it
will not be necessary to run the second pattern most of the time.
The same concept works for the &&
operator:
if (/ ... some pattern ... / && / ... some other pattern ... /)
{
.
.
.
}
If the first pattern is not found, maildrop will not execute
the second pattern, since the logical expression will be false no
matter what.
The foreach statement is implemented as follows.
Maildrop will execute the regular expression, and compile a list
of all the patterns in the message matched by the regular
expression. Afterwards, maildrop will execute the
foreach statement, once for each matched
pattern.
It is important to note that maildrop will create a list in
memory containing every pattern matched by the regular
expression. if the regular expression is found many times in the
message, this will require a lot of memory. For example, the
following is a Very Bad Thing[tm]:
foreach /[:alpha:]/
{
.
.
.
}
Maildrop does not include any built-in limits on resource
usage, except for the watchdog timer. Therefore, as a system
administrator, it is your responsibility to limit resources used by
the maildrop process.
This implementation of the foreach statement is
the one that yields the least number of surprises. Observe that
the contents of the foreach construct can modify the
message using the xfilter statement. Furthermore,
the regular expression may include variable substitution, and
those variables could be modified by the foreach
statement. If the foreach statement were to be executed "on the
fly", these factors would yield unpredictable - and rather messy
- results.
Under certain conditions, maildrop will need to use a
temporary file. Temporary files are created in the
$HOME/.tmp directory. Temporary files are used to
store large messages (instead of storing them in memory).
Temporary files may also be used by the xfilter command, or in
other situations.
Maildrop will automatically delete all temporary files before
terminating. However, if the maildrop process is killed, or
terminates abnormally, a temporary file may remain and take up
disk space. To have those leftover files automatically cleaned
up, put the following code in /etc/profile, or some
other file that all users run after logging in: