State of art

Discussion:

State of art

Brian Candler

2005-03-24 18:06:49 UTC

OK, a bit more googling around and I found www.im2000.org (it wasn't linked
from DJB's original paper) and from there
http://homepages.tesco.net./~J.deBoynePollard/Proposals/IM2000/

Is this the current 'state of the art' with regard to these proposals, or is
there anything more up-to-date / more detailled / competing?

I would like to make a few observations and raise a few questions re. what
I've seen so far.

(1) There is an existing protocol which allows people to authenticate to a
message store and put messages there; which allows people to retrieve a
message given its message-id; and which lets the user query whether new
messages have arrived using a sequence number. That protocol is NNTP.

The example given at
http://homepages.tesco.net./~J.deBoynePollard/Proposals/IM2000/CaseStudies/public-mailing-list.html
sounds *very* much like running your own NNTP server to host your mailing
lists.

NNTP server software might form the basis of a prototype implementation, if
not the real thing; NNTP servers have the additional benefit of being able
to replicate to other NNTP servers (e.g. if lists.ietf.org wanted to have
some mirrors around the world), and of having been heavily load tested by
carry gigabytes of pr0n and war3z every day :-)

I see on "djb's questions" that this has been ruled out. But it doesn't hurt
to play.

(2) Continuing with mailing lists, there is no mention of RNASP on that
page, so I guess that in the mailing list scenario, the message store is not
expected to send out notifications to all the list members?

Does this mean that there are essentially two different types of message
store account, those intended for personal use and those intended for lists
(perhaps set by some sort of flag?) Or is posting a message to a mailing
list effectively the same as posting a mail with zero recipients?

Is there never any case where a user can post a message via their own local
message store? (e.g. by sending a mail via their own message store to
"upgrade-to-***@lists.ietf.org"; the mailing list server receives a
notification, collects the message, and puts it in its own message store for
public consumption)

If that's not true, then either:
- mailing list message stores are fully open to the public to post.
In that case, what (if anything) stops these message stores becoming
bogged down with spam a la USENET? Especially if public cancels are
disabled?
- you must get an account (username/password) on the mailing list message
store before you're allowed to post, which forces you to go through some
sort of registration procedure, and store the username/password on your
client. That just makes life harder for people who want to join lists,
although an easy hurdle for mailing list spammers to overcome.

(3) Now, to the meat. Looking at a typical transaction at
http://homepages.tesco.net./~J.deBoynePollard/Proposals/IM2000/CaseStudies/lucy-reading.html

I observe that:
- Lucy is manually filtering her mail
- Lucy's client still has to retrieve the headers of the mail from the
spammer's server, in order to decide if it's spam or not
- By doing so, Lucy has immediately confirmed to the spammer that her
account works and she is reading from it
- Lucy can choose to block notifications from a particular message store;
she could also subscribe to a blacklist of known spammer message stores.
This is exactly the same situation as today where we have IP blacklists
of SMTP sources. She can also blacklist a tuple of <message store,
user ID>

So: how, exactly, is life made harder for spammers, and easier for
legitimate users?

Let me put myself in a spammer's shoes for a minute (holding my nose to
avoid the smell). What would I do?

(a) Set up my own message store, and send notifications to every E-mail
address I could find. This has the nice advantage that I only need to keep
*one* copy of the spam on my server! However, people might start building
bulk-detectors like DCC, in which case I might prefer to keep a separate
copy for each recipient, each of which is subtly different (disk space is
cheap, after all; a 250GB hard drive holds a lot of spam). Users may block
notifications based on the user ID I use on the message store, but then I
just create new user IDs. Ultimately I may find my message store is
blacklisted by its IP address.

But for as long as it works, people are forced to find notifications in
their inbox of the spam from me, and at least look at the headers, as they
do now. Or they'll build content filters which behind-the-scenes download
the mail and decide to chuck it or keep it. Same as now.

(b) I use my ISP's message store; they provide me with this service as part
of my Internet access. That would be equivalent in the current world of
relaying SMTP via my ISP's smarthost. Now, the world cannot block
notifications from that mail store, because in doing so they would block
notifications from legitimate users too.

They *could* however block notifications based on the user ID, not just the
IP address of the message. The assumption is that a well-run ISP will not
give out new user IDs repeatedly to the same physical person. (That may or
may not be true, especially if 'hotmail' services continue to exist).

This is the first real advantage that the new architecture would have. It's
more or less the same as if all mail submissions to an ISP's smarthost were
required to use SMTP AUTH, and this information were carried along with the
message. IM2000's advantage here is that messages never go through multiple
hops, and therefore you don't need to worry about authentication information
being forged. But see below.

(c) OK, so my own message store is blacklisted, and I can't use my ISP's
mailstore because my user ID is blacklisted. So, now I go to my network of
0wned zombie machines. I install message stores on all of those, and I send
notifications from them. This is the same in the current world of sending
SMTP from those boxes. Those boxes may send notifications directly, or they
may relay via the ISP's smarthost. Again, people can blacklist by IP if they
receive them directly, or can blacklist the message store user ID.

This is almost the same as the current world, except that it's hard to
blacklist mail which spews via an ISP's smarthost. To fix this, users would
have to use SMTP AUTH to talk to their smarthost, and SMTP would have to
have some way of carring the SMTP AUTH information forwards through the
smarthost.

But actually it does: RFC 2554 defines an extension
MAIL FROM:<***@bibble> AUTH=***@flowerpot1.org

The AUTH information is only as trustworthy as the server you receive it
from, of course, and ISTM this is the actually the fundamental point where
IM2000 wins.

But if SMTP AUTH were widely used, it would be sufficient for our purposes:
- blacklist IP X where it's clear that IP X is a spammer or bot machine
- blacklist IP Y + AUTH ID Z where it's clear that IP Y is an ISP
smarthost which has not been compromised, and AUTH ID Z is the account
being used by the spammer or bot.

Bingo. So the only problem left is to convince the whole world to stop
running smarthost services on port 25 which relay based on source IP only,
and make everyone use SMTP AUTH. And that, at the end of the day, seems to
be the biggest hurdle.

The IM2000 approach can then be considered to be like starting the Internet
from scratch where everyone uses SMTP AUTH. At that point, an incoming
message could be rejected if the MAIL FROM line didn't carry an AUTH string;
and anyone who was running a relay which faked up AUTH strings would be
blacklisted by IP address.

A similar effect could be achieved without the AUTH extension, if submission
MTAs forced the Sender: header to SMTP AUTH id. But then you'd have to wait
until the end of the DATA phase to check it.

So in summary:
- When spammers install IM2000 servers, there are very few reasons why it
would make their lives any harder. They will *save* storage costs, and they
will *save* bandwidth for undeliverable spams (just sending notifications
instead)
- Users will still be forced to download headers or bodies, and to filter
them; and in doing so they will give out useful information about the
actual IP address they are on at that instant, and the fact that they have
read the message
- The one possible advantage I can see is that it requires users to
authenticate to message stores. Untrusted message stores can be blacklisted
by IP (just as now), but trusted message stores can be blacklisted by
<IP,userID> tuple, thus not affecting non-spammers on that store. But this
will only work if users cannot easily signup for free message store
accounts. And if the world continues to provide "hotmail"-type services
which allow sending as well as receiving, then clearly spammers can get as
many userIDs as they wish (until the whole "hotmail" service itself is
blacklisted for not controlling its userbase properly, which is again back
to where we are today)
- We will still rely on the good people who run public blacklist services if
we want our mail to remain relatively spam-free. And those blacklist
services tend to rely on donations, and are always at risk of legal threats,
violence and extortion, and lack of funds.

So to me, the 'brave new world' of IM2000 doesn't look all that much rosier
than we have today. Have I missed something?

(4)
http://homepages.tesco.net./~J.deBoynePollard/Proposals/IM2000/djb-answers.html

"Recipients are identified by mailbox names, just as they are with the
SMTP-based Internet mail system. Message stores use the DNS to determine,
from a mailbox name, where the RNASP service of each receipient's recipient
notification agent is to be found. In particular, they perform a SRV
resource record lookup on the domain name portion of the recipient's mailbox
^^^^^^^^^^^^^^^^^^^
name."

There is a huge opportunity here not to be missed - the ability to make the
whole E-mail address portable, not just the domain part. If you're going to
rebuild the world, why not arrange that mail to ***@pobox.com does a
DNS lookup for b.candler._at.pobox.com ?

This means:
- different users at the same domain could have different receipt
notification agents
- senders could check recipient validity before even sending an RNASP
notification

Wildcard DNS could be used if you want *._at.pobox.com to all hit the same
receipt notification server.

If you wanted this to work, you'd have to register your new address with the
recipient notification service (otherwise incoming notifications for
***@pobox.com would be refused with "I don't know that address"). But
essentially, forwarding services like pobox.com would become nothing more
than DNS hosting services.

(5) Has any thought been given as to how often notifications would be sent,
and for how long? If sending to someone who only collects their mail once a
week, how much bandwidth will be eaten up by useless notifications?
(Presumably, recipient notification agents are not allowed to 'silence' the
sending of these notifications on behalf of the receiver, because they are
non-reliable devices who don't guarantee to preserve state).

(6)
http://homepages.tesco.net./~J.deBoynePollard/contacting-the-author.html#IM2000
I'd like to send a mail to the author using IM2000, but in order to do so I
need to have a copy of the technical specs. Do any exist yet?

Well, that's enough for now...

Cheers,

Brian.

Brian Candler

2005-03-24 18:33:54 UTC

Permalink

Post by Brian Candler
I'd like to send a mail to the author using IM2000, but in order to do so I
need to have a copy of the technical specs. Do any exist yet?

Oops, I missed the links from
http://homepages.tesco.net./~J.deBoynePollard/Proposals/IM2000/Architecture/

I'll chomp through this in my own time.

Now, if someone would please kill the qsecretary...

Brian.

Erich Rickheit KSC

2005-03-24 21:36:30 UTC

Permalink

Post by Brian Candler
OK, a bit more googling around and I found www.im2000.org (it wasn't linked
from DJB's original paper) and from there
http://homepages.tesco.net./~J.deBoynePollard/Proposals/IM2000/
Does this mean that there are essentially two different types of message
store account, those intended for personal use and those intended for lists
(perhaps set by some sort of flag?)

Even so. And yes, these do look a lot like USENET groups, but I
will observe one major difference: there is no central list of im2k
groups, as there is in USENET. A porn spammer can't get a list of
groups matching '*sex*' and start spamming to alt.lawns.grassexperts;
he has to discover the group. This is (a bit) harder than discovering
mail addresses. Also, there is no store-and-forward; a spammer is not
despoiling a commons; he is attacking a private resource, so different
responses become available.

I see three broad classes of im2k groups:

Perfectly open. Anyone can post or read. Yep, those will get
filled with spam, no way out of it. A list owner can implement
cancelling anyway they please; that's outside the scope of the
protocol. This is a public USENET group. (public cancels were
never part of the NNTP protocol)

Perfectly private. Login required to post, login required to read.
This is a closed mailing list; the difference being that one polls
it when one is interested, rather than having messaged delivered
to you.

Publically readable. Anyone can read, login required to post.
This looks like an announce list; or a blog; or a moderated list
(where messages for moderation enter out of band); or an audience
watching a round-table discussion.

(the fourth sort, anyone can post, login required to read, is a
plain old mailbox)

Post by Brian Candler
Is there never any case where a user can post a message via their own local
message store? (e.g. by sending a mail via their own message store to
notification, collects the message, and puts it in its own message store for
public consumption)

Of course there is; but that happens outside of this protocol.

Post by Brian Candler
- you must get an account (username/password) on the mailing list message
store before you're allowed to post, which forces you to go through some
sort of registration procedure, and store the username/password on your
client. That just makes life harder for people who want to join lists,
although an easy hurdle for mailing list spammers to overcome.

That's called subscription; you have to do that for mailing lists
now. And I don't see why telling your mail client

Post by Brian Candler
So: how, exactly, is life made harder for spammers, and easier for
legitimate users?

Essentially, in that a spammer has to have a message store and bandwidth.

If he keeps it himself, he will quickly find it balcklisted
If he gets it through a responsible ISP, they will can his account when
they discover him to be a spammer
If he gets it through an irresponsible ISP, the ISP will find itself
blacklisted, and feel pressure to can his account.

Is this a hardship? It's based on the assumption that the cost of
people retrieving spam is significant. If I send out ten million
notifications, do I get ten million hits? That would make spam less
cost-efficient. Will I get a million hits? A thousand? Ten? This
statistic is what would make the difference.

Post by Brian Candler
(c) OK, so my own message store is blacklisted, and I can't use my ISP's
mailstore because my user ID is blacklisted. So, now I go to my network of
0wned zombie machines. I install message stores on all of those, and I send
notifications from them.

OK, you can send notifications from your zombies to your heart's
content. But your message only gets to people if that particular
zombie is up and able to serve it if, as, and when they request it.
(again, I need some statistics here)

The real mechanism here is that notifications distinguish between
mail stores identified with IP addresses, and stores distinguished
with domain names. My software has an easy policy:

whitelist for IP addresses. Only go to mail stores identified by
IP addresses if I have a previous arrangement with them (id est,
they're a mailing list or something I care about)

blacklist for domain names. Only someone who can publish a SRV
record for a domain can set up a message store for that domain;
now I can distinguish good domain from bad domains, as we've
discussed.

Post by Brian Candler
So to me, the 'brave new world' of IM2000 doesn't look all that much rosier
than we have today. Have I missed something?

Aside from those things based on statistical notions:

Forging mail become harder. I know where the notification comes
from, and I know where the mail store holding the mail is.

The spammer bears more of the cost of sending mail. An ISP cna
charge a spammer for bandwidth to their message store. This, it is
hoped moves us towards a postal-mail model, where senders of junk
mail actually pay their costs and help support the email systems,
rather than just leeching off it.

All this is the theory; I haven't yet decided whether I believe in it.

Post by Brian Candler
http://homepages.tesco.net./~J.deBoynePollard/contacting-the-author.html#IM2000
I'd like to send a mail to the author using IM2000, but in order to do so I
need to have a copy of the technical specs. Do any exist yet?

They're there, but there are gaps:

<http://homepages.tesco.net./~J.deBoynePollard/Proposals/IM2000/Architecture/>

Short form: use BER to deliver objects, defined in asp1 in those
articles. No one is answering questions about them, so either the
issue is dead, or this list is a red herring,

Erich

Brian Candler

2005-03-26 11:25:34 UTC

Permalink

Post by Erich Rickheit KSC

That's called subscription; you have to do that for mailing lists
now. And I don't see why telling your mail client

What I meant was, presumably you have to go to some sort of registration
page where you choose a username and password for the list (or are assigned
one), and then paste this into your MUA.

It's certainly not hard; it could even be automated by going to a webpage
which sends you something back that configures your MUA automatically - the
Windows world has this in the form of .INS files.

But the easier you make it, the easier it is for spammers to join these
groups, and therefore the list may as well have been publicly open in the
first place.

You're right that there's no official central directory of mailing lists as
there is for USENET; but that doesn't stop spammers collating their own
lists of them, as they do with E-mail addresses.

Post by Erich Rickheit KSC

Post by Brian Candler
So: how, exactly, is life made harder for spammers, and easier for
legitimate users?

But in today's world it's the same:
- If a spammer sends using SMTP from his own box, he will quickly find it
blacklisted
- If a spammer sends through a responsible ISP's smarthost, they will
process the abuse complaints, tie his IP address to a sending account
(maybe via RADIUS logs), and can his account
- If a spammer sends through an irresponsible ISP, the ISP will find itself
blacklisted, and feel pressure to can his account

And you forgot to add (in both cases):
- the spammer will then sign up for a new account, unless the ISP prevents
this somehow

However, there is one improvement in the IM2000 world: the
<ISP message store, account ID> tuple can be blacklisted. That reduces the
need for the ISP to behave responsibly or promptly to abuse complaints; the
blacklist manager does that for him. And that's a good thing: with
blacklists you have a choice of which blacklist to subscribe to. You can't
change the ISP that the spammer sent through, unfortunately.

(Also, ISPs are not inclined to cancel accounts, as it reduces revenue. They
are more likely to give a rap on the knuckles and warn them not to do it
again)

Post by Erich Rickheit KSC
Is this a hardship? It's based on the assumption that the cost of
people retrieving spam is significant. If I send out ten million
notifications, do I get ten million hits? That would make spam less
cost-efficient. Will I get a million hits? A thousand? Ten? This
statistic is what would make the difference.

I'm not sure I see the bandwidth difference compared to sending out ten
million spams via SMTP.

Remember, the spammer can write their own IM2000-compliant software. If it
were me, I would write software which keeps one copy of the spam, but sends
out notifications with ten million different message store account IDs (so
it looks like they're from different people). Only those recipients which
were active would come and download the mail from my server - saving me
potentially tons of bandwidth compared with sending out ten million copies
via SMTP which may fail to be ultimately delivered. I could install the same
software on 0wned machines, and use less bandwidth out of those too (not
that I care).

There are real advantages I can see though. Firstly, those "pull message"
hits are spread over time, as people check their mailboxes periodically.
That gives time for the message store holding the spam to be either
blacklisted or cleaned.

People who are on-line 24 hours per day may suffer the same with receipt
notifications appearing in their inbox, but those who connect only
intermittently will benefit.

DCC-type services, or automated spam detectors using spamtrap mailboxes,
could work more effectively.

There is the opportunity for "graylisting" to work more effectively too.
That is, if I get a receipt notification from a <message store, userID> that
I've never seen before, I hide it from the MUA for a few hours. In that
time, hopefully, if it's spam the server or account will have been
blacklisted. And unlike MAIL FROM, the <message store, userID> combination
cannot usefully be forged.

More interestingly, there is the opportunity for blacklists to get *real
proof* of spamming, directly from the horse's mouth so to speak. As long as
I've received what appears to be a spam, but have not yet downloaded the
body, I can pass on the notification message to a blacklist agency. They can
pull down the message body themselves; it can be examined for spamminess (by
a human being probably); and a decision made to blacklist either the whole
mail store or just the account. They can store the whole session as direct
proof of where the spam originated, not something which has been via a third
party and therefore could be forged.

However, note that you *cannot* download the body first before reporting it
as spam. If I were a spammer, then I would write my sending software so that
if the body were downloaded, the message would vanish immediately from my
message store (*without* an un-pin request being received first). Then the
spam checker would go to the message store but the evidence would be gone.

It does mean you are only guaranteed to be able to submit your spam proof to
one blacklist, but that's probably sufficient; the whole point of spam is
that there's lots of it to go round :-)

But it could work the other way round. That is, for *every* mail I receive,
I can pass on the notification to a blacklist agency (that I trust!) which
allows them to download a copy, *without* unpinning it. An automated system
can then decide if it may be spam; a human can check; and if it is, they
perform the blacklisting. Non-spam remains pinned so I can read it.

It only needs a small percentage of netizens to cooperate in this way -
perhaps only has to be the people who work in the blacklist organsations.
The benefit to me as an individual is I get my mailbox swept, but the
benefit to the net as a whole is that the blacklists get updated much more
quickly.

The advantage here over things like SpamAssassin is that each blacklist
agency can make their own rulesets for detecting spam, *and keep them
private* (whereas spammers can look at the source code for SpamAssassin and
work out how to tailor their spams to pass all the checks).

Post by Erich Rickheit KSC
OK, you can send notifications from your zombies to your heart's
content. But your message only gets to people if that particular
zombie is up and able to serve it if, as, and when they request it.
(again, I need some statistics here)

Yep, so I just came to the same conclusion as you. The zombies may remain
on-line for a long time, but they might get blacklisted quickly.

Post by Erich Rickheit KSC
The real mechanism here is that notifications distinguish between
mail stores identified with IP addresses, and stores distinguished

You have some IM2000 software? Where can I find it to play with? :-)

Post by Erich Rickheit KSC
whitelist for IP addresses. Only go to mail stores identified by
IP addresses if I have a previous arrangement with them (id est,
they're a mailing list or something I care about)
blacklist for domain names. Only someone who can publish a SRV
record for a domain can set up a message store for that domain;
now I can distinguish good domain from bad domains, as we've
discussed.

...although spammers could still get throwaway domains.

Post by Erich Rickheit KSC
The spammer bears more of the cost of sending mail. An ISP cna
charge a spammer for bandwidth to their message store.

Can't an ISP charge for IP bandwidth used now, and bandwidth through their
SMTP server now? (If you were billing by SMTP usage then you would need to
tie the sender identity to a billing entity and sending time; probably SMTP
AUTH to be most reliable). But ISPs dislike charge-by-usage in general, (a)
because it's hard to bill reliably, and (b) because customers don't like it
and will go elsewhere.

Post by Erich Rickheit KSC
This, it is
hoped moves us towards a postal-mail model, where senders of junk
mail actually pay their costs and help support the email systems,
rather than just leeching off it.
All this is the theory; I haven't yet decided whether I believe in it.

:-)

As I say, I can see how the new model would *reduce* the bandwidth that
spammers use (and pay for, if they're not stealing machines); I don't see
how it increases it. What I do see is ways for controlling spam that have
already been invented, to work *much* more effectively.

The main costs incurred in handling the spam problem actually end up in the
hands of the blacklists, and how they recover their costs (and deal with the
bullying and thuggery of spammers) is perhaps the biggest non-technical
problem.

Cheers,

Brian.