An Open Letter to Most Email Notifications
Abstract: You suck, and do not know not to manage many of the security aspects of email.
Hi, for those of you that do not know, I’m Teknikal_Domain. I manage my own servers, run my own websites off those servers, and have put myself through the almost-literal hell that is self-hosting one’s own email. Because of this, I am very familiar with the workings of email, SMTP, various other industry policies and practices, and the like. And from this experience, I’ve learned a few things: Most modern email notifications from companies, or, from any mass-email distributing service, like Mailgun or Amazon SES, fundamentally, seem to have no clue of how email is supposed to work.
Let’s start with a basic one: Sender Polity Framework, or SPF. SPF is a special DNS TXT record that is used to designate certain IP address as (dis)allowed for sending mail from a particular domain. In general, you’ll have one that allows your IPs to send mail on behalf of your domain, and that will disallow everyone else which would cause those messages to undergo extra scrutiny. I see some occasional fails here because sites will add the servers their MX records point to, but those MXes aren’t always the senders. I’ll be disregarding how stupid that is in its own right, because I can see legitimate reasons, but that’s just the beginning.
DomainKeys Idenfified Mail, or, DKIM signatures are the other way of validating emails, and come in the form of a cryptographic signature over the email message body and (certain named) headers, with the private key being held only accessible to the sending MTA, and the public key available as, yes, a DNS TXT record, so any receiving MTA may fetch the key and compare it. With this, I have multiple issues: The first is that most places are running 1024-bit unprotected RSA keys. By “unprotected” I mean “DNSSEC does not include the key in its RRsets”, which, given the low proliferation of DNSSEC overall, I cannot say I’m surprised about, but I believe the point still stands. As well, using 1024 bit keys? Which, while valid, 2048 is a requirement to be supported and at least standard everywhere else (very few websites have 1024 bit RSA certificate keys, but the vast, vast majority have 2048), and depending on the DKIM library in use, even 4096 is supported, though, a key of that length is not required to be. There’s also ECDSA keys available now, but they’re basically nowhere that I’ve seen, so I’ll ignore those for now.
In reality, you want a DKIM signature from the author’s domain, meaning if I get an email from
[email protected], I want to see a DKIM signature from
xyz.com and not
Third-party signatures are valid, but aren’t as well-received as author-domain signatures (or at least, unless you set up ATPS).
By the way: Please use a valid key and a good signature. Almost every DKIM failure I see is because a bad key or a malformed signature.
And as the final part of the pretty standard trio, DMARC, which standard for Domain Message Authentication, Reporting, and Conformance, which is, again, a DNS TXT record (they really love these, don’t they?) that lists some extra details about SPF and DKIM: if you need a valid result from the exact domain or of subdomains are valid, if failed messages should be rejected, held for inspection, or not affected, and addresses that receiving MTAs can send incident reports to for messages.
Because of how DMARC works, even though more messages than not have a
FAIL SPF result, since the DKIM signature is good, DMARC has passed.
However, a few generate DMARC fails, including emails sent by the creator of UnrealIRCd to the announcements mailing list
And on the topic of those reports I mentioned, please, try not to send reports for messages sent to your DMARC reporting address, that will just cause an infinite mailing loop of reports being sent back and forth unless someone has enough smarts to tell their filter to disregard mails to its own reporting address.
As an additional note, I find it funny that Gmail gave me a reject the other day, because the DMARC aggregate report I tried to send hit an inbox that was suspended for receiving too many emails, and the address was literally
Now, onto some more technical details.
There are two classes of response codes in SMTP, the protocol that MTAs talk in:
4xx codes, like 431, and
5xx codes, like 554.
4xx code is a “transient” failure, which means “please try again later”.
5xx code is a permanent failure, don’t try again, and send a non-delivery report to the sender if you can.
Please don’t mix these up.
If I give you a 5xx code, that means do not try again. Period.
Additionally, if I give you a 4xx code, please, wait.
The tradition for a concept called greylisting is to block you for 5 minutes before allowing the message through.
I had to shorten that to one minute.
Because multiple services would try every 45 seconds or so, but only three times, then give up.
Some retry immediately, leading to three rapid-fire retries, and then an IP ban because you disobeyed my directions: wait.
Even better, some mailers I’ve seen use multiple servers in a load-balancing setup, like Twitter’s.
This is cool, if it’s done right.
If your system will cause each email retry to be sent to a different server, and thus, resetting the timer since mine at least checks against the client IP, you have an issue.2
Even better, Do not mix multiple servers with no-delay retries!
That, to me, shows that whoever configured your system isn’t as experienced as you likely thought they are.
Also, please, when I enter my email address, with, say
mail.tdstoragebay.com as the domain, do NOT send to
tdstoragebay.com DOES NOT HAVE AN MX RECORD
I am looking at you, certain unnamed bank.
For added bonus, have your support team say “well the email was delivered, what difference does it make?”
Also, for the unaware, the SMTP transaction goes through a number is distinct phases: a client connects, introduces itself with a
HELO, gives the address an email address was sent from, then the addresses it’s going to, and then sends a
DATA command, which instructs the receiver to process the actual email contents.
Theoretically, at any step of this process, your email may be rejected — A client may be rejected after it’s
HELO, a server may block emails being sent from a certain address, since every command has a response, every command can, therefore, be given an error code.
Even Postfix has a specific config line, to delay rejections until after a client has started listing recipients, because apparently most MTAs just can’t deal with it any earlier.
Even better, some don’t care.
I’ve seen multiple transaction logs where you see this sort of flow:
- Mail from X
- recipient is y
554 Blocked by spam filter
554 Error: no valid recipients
Or in other words: the sending side just didn’t care about the error and tried sending the email anyways. That is… stupid. I have no clue how you configure a proper MTA to ignore an error and continue on.
And for one final point: Use actually common, standard, TLS ciphers. Amazon SES is particularly guilty of this: No email I’ve received from SES comes through an encrypted channel, because you get a “No shared cipers” error. In layman’s terms, my mail server, and Amazon’s mail server, have a list of encryption mechanisms they can use and understand. And of our lists, not one item on there is common to both. I know I run a somewhat restricted cipher list, I block some of the low security ones like RC4 and the like, and try to keep up with industry standards. Amazon, but others as well, mainly just can’t it seems. Luckily, when you can’t send through TLS, they will fall back to a plain-text delivery, but that alone is what’s preventing me from deploying a new thing called MTA-STS, it’s like HSTS but for email, and since SES is incapable of this in the first place, any company that uses SES as their email service, is now a company I would be unable to receive email from because of that.
Major companies and email services just do not know how email works, how to set it up, and how to secure it properly. And half the time, I really just cannot understand how you get it that wrong.