Amateur radio contest log checking has always been a mysterious process, so I thought I would publish my steps as I go from email to published results. All code in this is licensed under the GPL and I will post it when I'm done. I hope that this will make log checking for other contests easier, more accurate and faster (although I've never been exactly fast with the OQP logs).
Point and click is not fun for me
Now that all the logs are in, and I've downloaded them from the mail server into my mail program, called Mozilla Thunderbird, I could just read through each email, click on the attachment and save it. That sounds like a lot of work, we aren't the FLQP, but I don't like to do repetitive tasks. Why not automate this task.
No GUI needed
GUI's are very hard to automate, so there won't be any GUI for what we are about to do, just plain old text files. Now, Thunderbird stores emails in a single file called Inbox, a plain text file of a
specific format, with each email starting with a line that starts with the word "From":
From - Sun Oct 7 08:28:58 2007
Hmmm, date information, looks like it could change from message to message. Can I write a program to match a string that changes? I happen to know that there is a programming language that makes parsing files (especially non-binary files like our Inbox) easy (well easier). It is called
perl.
Perl is the swiss army knife of programming. I use it because it is powerful and does what I need (if I can figure out what to tell it to do).
Reading an Inbox
Perl has a lot of built in functions to make our life easier and let us treat our Inbox file as a series of emails in a short amount of time. The first thing we need to do is figure out how to parse the file given that the string that separates each individual piece in our Inbox is not static or unique. Perl has a way of representing patterns as a string called a regular expression. The regular expression or pattern I use to represent the email separator is:
^From\s\-\s\w{3}\s\w{3}\s{1,2}\d{1,2}\s\d\d:\d\d:\d\d\s\d{4}$
This says, look for lines that start (^) with From followed by a space (\s), a dash, another space, word characters 3 times, a space, word characters (\w) 3 times ({3}), 1 to 2 spaces, 1 to 2 digits (\d{1,2}), a space, 2 digits and a colon followed by 2 more digits and a colon followed by two digits, a space, and finally a 4 digits in a row and nothing else (the dollar sign). Easy, well not really, but better than clicking the mouse button over and over.
Now, we will start writing our perl program to treat our single Inbox file like individual emails messages.
Here is the code:
1 #!/usr/bin/perl -w
2
3 open(INBOX,"<Inbox");
4
5 my $tbregex = '^From\s\-\s\w{3}\s\w{3}\s{1,2}\d{1,2}\s\d\d:\d\d:\d\d\s\d{4}$';
6 my $inmsg = 0;
7 my $msg = '';
8
9 while (<INBOX>) {
10
11 if (/$tbregex/ && $inmsg) {
12
13 print STDOUT "$msg \n";
14 $msg = '';
15
16
17 }
18
19 $inmsg = 1;
20 $msg .= $_;
21
22
23 }
24 close INBOX;
Yea, perl has some weird syntax, but it is pretty easy once you get started. The first thing we do is open our Inbox file, on line 3, which means open our Inbox file for reading and call it INBOX . Then we loop through it. Since the first line of our Inbox file will match our regular expression ($tbregex), I set a flag to keep from doing anything with it the first time if our expression matches. Notice how easy it is to test if our regular expression matches a line in the file. This is where perl does some stuff behind the scenes, letting us skip some syntax, that other programming languages impose, by using variables it pre-defines for us. In this case the "if (/$tbregex/..." says match our regular expression held in variable $tbregex to the default pattern matching space, which happens to be the current line we are reading from the file. This variable has a name, $_, we use it explicitly when we write each line from the file to a temporary variable we use to hold our message on line 20. Sorry about the long winded explanation, but I wanted to show the power and simplicity of perl and why I chose it for this project.
We keep reading from the file and building up our temporary copy of the email message stored in $msg until the if statement containing our regular expression becomes true at the start of the next message. We should now have a complete email message in the variable $msg for processing. In the example above, all I do is print the message to standard output, which is usually the terminal screen, if you are in a Linux environment. You can also do this in Microsoft Windows thanks to the fine people at Active State who provide a port of perl to that environment.
Ah, you say, what happens if I send you an email with a line like "From - Sun Oct 7 08:28:58 2007" in my email message. Trust me, the Mozilla Thunderbird programmers have thought about this too.
That was easy, now on the removing the log files from the email and saving ourselves a bunch of clicking in part 2.