Querying MIME mails

Tags:

A discussion about how to better handle mail content got me thinking. The situation is that "a mail" is actually a tree where there are one or more "root" mail bodies (one for text, one for HTML, and maybe one with rich text). Then there are some attachments and these attachments are (or are not) referenced from the roots. The attachments can be forwarded mails, inline images or attached files.

Detecting and handling such attachments is inconvenient as I always feel that I'm doing manual tree iteration using MIME::Parser and MIME::Entity. As manually doing this is inconvenient and also hides the logic that I actually want. Having a different way to specify the things I'm interested in would make my code clearer.

I wonder what makes more sense, an SQL/DBI interface, so I can say something like

select parent_obj
     , entity_obj
  from parsed_mail
 where mime_type='image/jpeg'
   and content_disposition='attachment'

Alternatively, an XPath-style query like

 //mail/item[@mime_type="image/jpeg"
   and @content-disposition="attachment"]

but XPath gets unwieldly due to the quoting, while SQL is more verbose. I don't like inventing my own language since I usually fail at that and using a known query language gives me existing tools that I can leverage.

I guess I'll have to implement or at least mock up both query styles, and maybe also CSS as a query style, to see how the different queries would/could be formulated. Example queries would be

  • give me all attachments of this (nested/forwarded) mail
  • give me all inlined images of this mail
  • give me the mail text of this mail, preferring HTML over RTF over plain text