Comments on IM2000

I've assembled my comments on IM2000 at
http://pobox.com/~b.candler/doc/misc/im2000.html

Regards,

Brian.

Raul Miller

2005-04-12 00:04:11 UTC

Post by Brian Candler
http://pobox.com/~b.candler/doc/misc/im2000.html

I don't know that I agree with all your points there. Though I'd have
to think a bit more to decide for sure.

The interesting thing about this commentary is that it suggests a path
forward for the im2000 protocol.

The problem with im2000 has always been that SMTP is the standard
mail protocol, and no one really wants to deal with implementing an
alternative.

But, as a replacement for IMAP, all of a sudden the idea starts taking
life.

In principle, at least, you could have im2000 with smtp as a message
source, and you could build up a fairly nifty set of mail client
applications around it. You don't get any spam relief this way, but
you do get some of the other benefits if you use some variant as a mail
folder management system.

There's the obvious temptation to go overboard (I think using SOAP would
be going overboard, for example), but if it's kept simple and clean it
has potential.

Thanks,

--
Raul

Brian Candler

2005-04-12 09:32:50 UTC

Post by Raul Miller
The problem with im2000 has always been that SMTP is the standard
mail protocol, and no one really wants to deal with implementing an
alternative.
But, as a replacement for IMAP, all of a sudden the idea starts taking
life.
In principle, at least, you could have im2000 with smtp as a message
source, and you could build up a fairly nifty set of mail client
applications around it. You don't get any spam relief this way, but
you do get some of the other benefits if you use some variant as a mail
folder management system.

And conversely, there will be others who want to use it as a cleaner mail
transport replacing SMTP, but keep their legacy IMAP applications with a
proxy to interface to new architecture. Both paths are workable.

I don't think any new transport will gain acceptance unless it has some
tangible benefits, and probably the key one is anti-phishing - identifying
the sender unambiguously, preferably with their E-mail address.

Another problem I have with IM2000 is the repeated retransmissions of
receipt notifications, which may go on for maybe a month or more if the
recipient is not checking their mail. But nobody is *really* going to make
the RNA not have persistent storage; rebooting an RNA and losing a day's
worth of notifications would be far too serious an event.

Hmm. Updated my web page with a few more musings, in particular the
separation of message stores from message notification agents (which might
appeal if you are considering IM2000 primarily as a replacement for IMAP
rather than SMTP)

Regards,

Brian.

Raul Miller

2005-04-12 11:47:02 UTC

Post by Raul Miller
The problem with im2000 has always been that SMTP is the standard
mail protocol, and no one really wants to deal with implementing an
alternative.

You are right, as a "within-enterprise" system, im2000' could probably
offer something early in the adoption cycle.

Post by Brian Candler
I don't think any new transport will gain acceptance unless it has some
tangible benefits, and probably the key one is anti-phishing - identifying
the sender unambiguously, preferably with their E-mail address.

The best it can do is unambiguously identify the administrator of the
sending system. Whether it unambiguously identifies the people using
that system is up to the administrator, and any system can host an
extremely large number of addresses.

Post by Brian Candler
Another problem I have with IM2000 is the repeated retransmissions of
receipt notifications, which may go on for maybe a month or more if the
recipient is not checking their mail. But nobody is *really* going to make
the RNA not have persistent storage; rebooting an RNA and losing a day's
worth of notifications would be far too serious an event.

As long as repeated notifications are identical, redundant copies can be
deleted on the receiving side.

Ill behaved receipt notification sources can be ignored.

Thanks,

--
Raul

James Craig Burley

2005-04-13 06:40:51 UTC

Post by Brian Candler
http://pobox.com/~b.candler/doc/misc/im2000.html

Excellent document! Here's my take on it.

Item 2i, on content filtering possibly becoming unnecessary:

One person's UBE is another's WBE (Wanted Bulk Email). The
*pertinent* definition of "unwanted" email ultimately comes down to
a recipient's personal choices, and the basis of *most* (not all)
such choices revolve around *content*, not identity of originator.

As an exercise, imagine we all had unlimited bandwidth, storage, and
infinitesimal latencies in our communications with each other.
(It's not *that* difficult to imagine this; it's the direction in
which technology has been heading for centuries, even millenia.)

And, imagine the population was constantly changing -- not just new
people coming "on line", but existing people changing their ways
(going from being spammers to "solid citizens", etc.).

Here, technology is *clearly* not the limiting factor; our
individual capacity to cope with the influx of information, and sift
the chaff from the wheat, becomes more evidently the nub of the
problem. Any "trust system" that depends on a single worldwide
"root" is fatally flawed; we don't depend on one in real life
anyway.

In that scenario, the problem of "too much spam" is really one of
"too much information, not usefully prioritized for *me*", since, as
everyone who has received emails from friends about the threat to
the Brazilian rainforest or similar knows, what's important (and not
spam) to a sender is not always *that* interesting to a recipient,
and vice versa (a sender might easily offer advice that is
tremendously valuable to a recipient).

(This ideal world can't avoid the problem of viruses and the like,
since their equivalents exist in *our* real world. Even if we
assume there will be perfect security across all computer systems,
this ideal world will still be tempting for someone to *try* to
target by sending tons of "useless" emails trying to get people to
do something stupid. In practice, therefore, even ideal
technologies will require filtering based on *content*, not just
identity or history of the sender.)

Accordingly, especially as technology advances, but even now, the
ultimate solution includes giving end users an easy way to
prioritize their incoming email so UBE *tends* to be easy for users
to recognize, skim through, and filter out. That will probably mean
using AI-like technologies to serve as a per-user agent to do the
content filtering and prioritization automatically, though info on
the sender/originator can be very helpful as well. (E.g. one's
email-reading agent can say "this email appears to solicit a unique
partnership, but the sender is known to have solicited a similarly
unique partnership with about 200 million other people in the past
three days".)

The problem with *that* viewpoint is that it doesn't really justify
switching to IM2000, since spam is ultimately defined based on
*content*, not on the (label worn by or given to the) originator.
But analyzing *content* requires retrieving the message anyway, in
order to analyze it, apparently defeating one of the big wins of
IM2000.

Given that we'll never see this ideal world, and that latencies (as
well as outages) *are* a huge problem, we then move on to the One Big
Problem with IM2000 -- that of the end user seeing substantial delays
when trying to pull up any given message after clicking "read" on a
notification panel. (You address this in "Points for discussion and
further thought".)

To work around that, and to work around the fact that blacklists will
never be perfect, whitelists will never be easily maintainable, laws
will never be vigorously and uniformly enforced, and so on -- else we
could probably just stick with SMTP, thankyouverymuch -- the reality
is that the vast majority of *real* end users' mail-reading systems
*will* retrieve message contents immediately, or nearly so, after a
notification is retrieved -- to cache it (a form of prefetching), to
scan it for spam/viruses, or, most likely, both.

This leaves, for me, the "big win" for IM2000 -- that of being able to
retrieve a message without unpinning it. But that is not, in SMTP,
terribly unlike returning a temporary failure response after receiving
the message contents, yet still "delivering" the entire message, after
a fashion, for subsequent processing, analysis, etc., in order that a
subsequent delivery attempt by the upstream SMTP client might be
"rewarded" with a success response, a permanent failure response, or
just another temporary failure response (either because scanning has
been inconclusive or because the SMTP server has decided to simply
irritate/tarpit a "known" spammer).

This form of receipt, where the sender is told "I'm not accepting
*responsibility* for delivery, but I might look over the entire
payload and accept or permanently reject it later on if, or when, you
retry", is somewhat like today's greylisting. But since it accepts
transmission of the message and (potentially) does something with that
message, I might call it "beigelisting" or "ecrulisting". ;-]

Given this perspective, I wonder why we need to move to IM2000 at all,
when the delivery of the message *contents* is already handled fairly
efficiently and inline (without a reverse-DNS lookup) by SMTP. Adding
some protocol goo to enable tracking of once-delivered messages could
avoid the necessity of always transmitting an ecrulisted message a
subsequent time.

(We tend to think and talk as if message transmission involves the
sender giving *two* things to the receiver -- the envelope and the
message contents. I believe a *third* thing is involved: that of
*responsibility* for the message, or at least its contents. Spammers
get their big bang for the buck not just by transferring message
contents, but by transferring *responsibility* for those contents.
Any system that can accept contents while deferring accepting
responsibility includes a substantial potential built-in cost for
senders of UBE that most ordinary users will not find difficult to
bear.)

Item 8d, parenthetical comment on privacy problems, etc.:

IM2000 definitely makes for a "cleaner" separation between notifying
a recipient of the availability of a message and sending the message
along, since, with SMTP, you pretty much *have* to send the message
in order to make a sufficiently useful notification (taking into
account relaying, the recipient's desire to filter/prioritize based
on info in the header of the email that should really be in, or
"on", the envelope, and so on).

I don't worry too much about the implications of this, however.

First, just because an IM2000 message has been retrieved, or even
unpinned, does *not* mean any end user has viewed it. That could
have been done solely to apply a filter/prioritization to the
message, to assure that the sender isn't just sending out arbitrary
notifications, to "cache" a message that isn't ever actually read by
a real person (such as for an abandoned account), etc.

Second, the reality of any two-party exchange of information is that
the sending party can never safely *assume* the receiving party saw
an entire message *unless* the receiving party specifies it did so
via a confirmation.

Therefore, a receiving party can always choose to not confirm it
received the message, even though it did. (The sender can insist on
breaking the message into a series of distinct message chunk,
sending each chunk only after receiving confirmation that a previous
chunk has been read; this doesn't really solve the problem for the
final chunk, and makes the transmission protocol much slower, as
latencies become amplified in their effect.)

Item 9, "Retransmission of notifications"

What I think keeps getting overlooked in discussions about UBE,
SMTP, IM2000, and so on, is that it's not just the *recipient's* job
to somehow decide whether some message is desired.

It's the *sender's* job to decide how important successful delivery
of a given message is to that sender. As the end-to-end principle
implies, since email delivery can *always* ultimately fail (and,
even with SMTP, this can include the possibility of failure to
deliver a bounce back to the sender), it is up to the *sender* to
expend resources to assure successful delivery via the email system
or, if that fails (or might be deemed by the sender to fail, e.g.
no confirmation within the sender's desired timeframe), by some
other means.

These sorts of discussions take this *implicitly* into account,
especially when people talk about Challenge/Response -- "if I send
an email to you giving you advice on something you ask about in a
public forum, why should *I* waste time responding to *your* email
system's challenge?" -- but I think it should be made both more
explicit and fine-grained, on a per-delivery basis.

Therefore, a *sender* should be able to control just how
"persistent" his outgoing MUA is in terms of notifying the recipient
about the message being available, checking on the status of
delivery, and so on.

IM2000, as presently formulated, really doesn't offer much more
flexibility in this regard than does SMTP. (One could argue that
SMTP's flexibility in this regard hasn't been exploited, suggesting
that indicates the capability isn't desired. My counterargument
includes the fact that *spammers* already exploit this flexibility
by running "less persistent" SMTP clients; hence the apparent
"success" of anti-UBE schemes like greylisting. I've certainly made
some use of my primitive abilities to determine the relative
importance of outgoing messages, since, as I run my own MTA, I can
see whether, when, and sometimes how, certain important messages are
successfully delivered. Ideally, MUAs should make this easy by
reporting even temporary SMTP delivery failures back to the user in
some fashion, logging all delivery attempts, and so on, on a
per-message basis.)

An advantage to giving senders a wider range of options when it
comes to notifying and inquiring about outgoing emails is that it
allows *recipient* systems to actually "bias" their prioritization
of incoming emails based on the apparent persistence of the sender.

Rolling this sort of system out from the beginning would offer great
assurance that the retransmit features of IM2000 *would*, in fact,
be properly and thoroughly tested.

Item 10, "Dynamic equilibrium?"

Great point. Since one person's UBE is another's WBE (Wanted Bulk
Email), spammers will always have some market out there.

Therefore, I don't think "eliminating" spam is possible. I think
the solution lies in giving end users an easy way to prioritize
their incoming email so spam *tends* to be easy to for users to
recognize, skim through, and filter out.

My question is (and has been, for some time), will the expense of
moving to IM2000 justify the *relative* reduction in spam that we'd
actually see, in practice?

I believe any new email protocol -- IM2000 or an SMTP upgrade -- gives
us the biggest bang for the buck by replacing the bounce concept with
the tracking concept (widely used by delivery services such as FedEx
and UPS).

Accordingly, the response to a tracking request (issued, presumably,
by a sender) might, if the recipient so chooses, include more than
just "message contents have been retrieved" -- it might include
"message has been placed in recipient's in-box at [low/medium/high]
priority", "message has actually been read by recipient", "message has
been [archived/printed/trashed/shredded] by recipient", and so on.

Without this facility, many users on "friendly" terms will try to
present and/or obtain the information via some other means. So it
might as well be included as an option in protocol.

Since, in reality, IM2000 would *have* to be a store-and-forward
protocol to meet widespread acceptance (other IM2000 advocacy appears
to deny this, but your own description of how it would be useful
essentially *assumes* some capability equivalent to the
store-and-forward capability of SMTP, especially with POP3 and/or IMAP
added), issues revolving around transferring, redirecting, and/or
forward responsibility for message notification and content delivery
would presumably exist for message *tracking* as well.

All in all, I've come to believe the "email problem" is becoming just
an increasingly large subset of the general problem of sharing,
transmitting, authenticating, storing, mirroring, revising, and
end-of-lifeing (and so on), *information*.

That is, it seems to me that, between our increasing awareness of the
need for a great deal more flexibility in our email systems and of
that for other related systems (version control; website handling,
including RSS; blogging; software distribution; and so on), it might
be useful to begin thinking in terms of a "grand unified principle" of
information exchange, reduced to a coherent architecture that can be
designed and implemented as a general-purpose information broker on
top of an unreliable, heterogenous network like the Internet.

(E.g. sending an email is not terribly unlike uploading a file,
except, for one obvious difference, the sender/uploader doesn't care
what name is given to the file by the recipient -- unless the sender
wants to refer to it later, such as in subsequent requests for
tracking, deletion, revision, etc. Along this line of thought,
imagine how many security holes and other expenses would have been
avoided from the outset had Unix offered, instead of the /tmp
heirarchy and similar customs, a variant of fopen() that allowed the
caller to say "Lemme start writing to a private file, but *you*, the
OS, tell *me* how I can refer to it later via some handle".)

Your document really makes this point in many ways, when it talks
about IM2000 as a potential replacement for IMAP (and hence POP3),
which is, IMO, a very pertinent insight (since those are *already*
"pull" protocols), among other things you mention.

If we could pull *that* off -- design and build a "grand unified
information exchange protocol" (GUIXP? ;-) -- then layering an
email-like system on top of it should be a piece of cake, regardless
of whether it's IM2000-like (pull-based), SMTP-like (push-based), or
something in between.

After all, UBE is really just a subset of UBI (Unsolicited Bulk
Information), in the sense that UBI includes spam posted to blog and
other web sites allowing posting of arbitrary anonymous content, as
well as to USENET. So the fight is really not *just* in the email
arena, and many of the techniques used in that arena have plenty of
applicability elsewhere (and, presumably, vice versa).

And, personally, despite my server being targeted heavily by spammers
(I'm a joe-job victim as well), my "spam problem" doesn't really
strike me as being as difficult to handle as many of my *other*
information-management problems, many of which strike me as being
elegantly solveable by a more-general GUIXP-type system.

If such a system is conceivable, and doable within, say, 20 years, it
might not be worth rolling out IM2000 in its presently-proposed form.

--
James Craig Burley
Software Craftsperson
<http://www.jcb-sc.com>

Brian Candler

2005-04-13 09:17:55 UTC

James, thanks for those carefully thought-out comments.

I should say that the document I wrote was intended originally to consider
IM2000 and how it would interact with spam; its scope has already crept to
include other random thoughts on IM2000. It certainly wasn't intended to be
a design for a FUSSP :-)

In reply to some of the points you raise:

* The trouble with Artificial Intelligence is, it isn't. Today, AI basically
means tree-searching and pattern-matching algorithms. Maybe in 50 or 100
years time, there will be AI which actually *understands* documents (and
people) at the highest semantic level; until then, I wouldn't trust it to
make any value judgements on the content of my mail. And if it really *did*
understand my mail, I probably wouldn't want it to be reading it anyway :-)

* It's an interesting point that the real criteria as to whether spam is
wanted or not is based on the content. However, I think you're imagining a
future world where a significant portion of bulk E-mail is sent from
legitimate businesses that you might actually want to deal with.

From my own experience, the vast majority of spam is downright fraudulent.

Even if I saw something in a spam that interested me - taking your example,
it was talking about saving the Brazilian rainforest - if it asked for a
donation of $20, there's no way I'd send it. That's because (a) the spam is
almost certainly fraudulent, difficult to trace, and the $20 would just line
somebody's pocket; and (b) conversely, I know that any legitimate
organisation which was active in this area, would not dare to send spam.

Therefore, the fact that it's received as UBE *automatically* classifies it
as uninteresting to me, regardless of content. And so it should for everyone
else; unfortunately, even with a hit rate of 1 in 1000 or less, there are
still enough born every minute to make it worth the fraudulent spammer's
time. The other 999 spams cause huge annoyance, but the law enforcement
agencies at present seem completely disinterested in tracking down these
fraudsters - even when the spams include postal or telephone contact points
which would be easy to trace or sting.

* There are I think a large proportion of people who would rather see no
advertising at all in their inbox, even if some of it might be of interest
(e.g. special offers or discounts from legitimate companies that they might
want to deal with)

There's a parallel with the real world. In the UK we have this thing called
the "Mailing Preference Service". All marketers who follow the Direct
Marketing Association code of practice are required to pre-filter their
mailshots and remove all MPS-registered people from it.

It's remarkably effective; the amount of junk paper mail I get is now very
low (and the credit card offers and the like that I still receive go
straight into the bin, as I know the company does not follow the DMA code of
practice). And when I tell someone about the MPS, almost without exception
they are completely delighted and register immediately.

I think you're right that we suffer from information overload; and
advertising is for most people the lowest form of information. Many people
are happy to accept the risk of losing out on an advert that might actually
interest them, for a reduction in the total amount of advertising that they
are bombarded with.

There are other services (the "Postal Preference Service" having being set
up by the direct marketers with an intentionally confusing name) which allow
you to register your interests and opt-*in* to receiving direct mail in
categories of your choice. The Postal Preference Service is sponsored by the
Royal Mail, who have a pretty obvious interest in expanding the use of
direct mail advertising :-)

I do occasionally receive chain letters and lottery frauds through the post.
However it's rare because the cost involved is that much higher than
spamming.

I think a similar kind of classification could work in the on-line world:

- Bulk mail, untrusted sender
=> probably fraudulent (= chain letters)
- Bulk mail, classified as legitimate advertising by trusted third party,
classified by subject area
=> can choose to reject based on the fact it's advertising (= MPS)
=> can choose to accept or reject based on subject area (= PPS)

Such systems would work because you are not having to assess a level of
trust in potentially thousands of different organisations or individuals
trying to contact you each day, down to a handful of agencies who sign and
classify 'legitimate' advertising.

There's an SMTP service extension proposed for this (RFC 3865). I don't
think it will work unless it's combined with a whitelist - i.e. do a
third-party lookup to see whether mail from IP address x.x.x.x is
trustworthy with regards to its solicitations. And I think that would be a
lot less effective than the third party signing the message itself, rather
than asserting that all messages from a particular source are correctly
labelled. In particular, it would not work for messages sent via a shared
SMTP relay (smtp.example.net)

Spammers, of course, are not going to have any incentive to label their
messages according to RFC 3865 - they might as well label their IP packets
with the Evil bit (RFC 3514). So the only effect that RFC 3865 will have is
increased advertising presented to our mailboxes, albeit from more
'legitimate' sources. If spam is not correctly labelled, then you still have
all the existing difficulties with SMTP of tracing it back to source.

* While we are in a situation where on-line identities can be created at
will, and assuming you would like to be able to receive mail from people who
have not dealt with you before, whitelisting is not very useful for
controlling spam. It would require a network of trust which would prevent
individuals creating large numbers of identities at will; the identities
would have to be bound to a personal or corporate identity (and even
corporate identities would be weak, since companies can be created without
too much difficulty of expense).

Hence all E-mail senders would have to get their new on-line identity signed
by (say) their upstream ISP, in turn signed by some central authority; and
then unless you want to evaluate the trustworthiness of all those ISPs and
central authorities, you'd probably need a third party to do that for you
too.

I don't think many people will accept that they can't send E-mail unless
they've first had their identity checked.

However, it might be reasonable to receive mail only from domain X only if
the owner of domain X has got a certificate, and the existing X509
infrastructure would be fine for that.

That is, if I receive a mail from ***@mybank.com, I could expect it to
have a certificate for mybank.com; and if I receive a mail from
***@example.net ISP, then at least the example.net ISP would have a
certificate.

IM2000 could do this at the transport layer, by requiring TLS and a valid
signed certificate before you download sent mail from a remote message
store. The big financial winners here would be the CA's of course :-) I
think there would need to be a free 'test CA' that you could get
certificates signed for initial testing, but in general MUAs would not
accept test CA certificates.

There's ESMTP STARTTLS, but typically it works the wrong way round: when
sender A connects to recipient B, B presents a certificate saying "I am B".

B *can* be configured to require A to present a valid client certificate.
However I don't think you'd ever get to a stage where you could safely
reject incoming mail because the sender doesn't have a certificate; partly
because a critical mass of MTAs with client certificates will not build up
unless there is a driver to do it, and partly because a lot of MTA software
out there is not sufficiently flexible.

In any case, the trust model is weakened somewhat by relaying. If mybank.com
choses to relay via smtp.example.net, then at the receiver all I'll get is a
certificate for smtp.example.net. And then the return address you see, e.g.
MAIL FROM:<***@mybank.com> is not validated at all by the certificate.

Using domain certificates is not going to help spam much anyway, if:
- spammers send out mail from 0wned machines with upstream ISP accounts; or
- spammers get lots of domains and certificates (although then new CAs
may spring into life, whose remit is to sign only non-spammers domains)

However, it might stop spammers setting up mailstores on 0wned machines,
which would definitely be a good thing. If you want to run your own
mailstore, then not only do you need your own domain, but you need a
certificate. That's quite a high hurdle to jump; it might mean we have
fewer, larger mailstores in practice.

Given that we'll never see this ideal world, and that latencies (as
well as outages) *are* a huge problem, we then move on to the One Big
Problem with IM2000 -- that of the end user seeing substantial delays
when trying to pull up any given message after clicking "read" on a
notification panel.

Yep, there's a real-world tradeoff here.

At the moment, people expect their POP3 server to be reliable and available
99.9% of the time (or perhaps to be offline at 3am when few people are using
it). If it isn't, then they complain to their ISP. If the ISP doesn't sort
their act out, then they change ISP.

With IM2000, if you have problems downloading your incoming mail, the fault
is at the remote ISP, which you have no control over; your own ISP can
happily point the finger. Worse, even though it may be 3am in the sender's
timezone, it may be the middle of the day in your own timezone. In many
parts of the world, Internet links are still highly congested and mail may
only successfully get out at certain times of the day.

So IM2000 assumes the Internet to be 'reliable' - perhaps moreso than it
really is.

Therefore, a receiving party can always choose to not confirm it
received the message, even though it did. (The sender can insist on
breaking the message into a series of distinct message chunk,
sending each chunk only after receiving confirmation that a previous
chunk has been read; this doesn't really solve the problem for the
final chunk, and makes the transmission protocol much slower, as
latencies become amplified in their effect.)

And in any case, the receiver could send back automated
confirmation-of-reading messages to spoof this protocol.

a *sender* should be able to control just how
"persistent" his outgoing MUA is in terms of notifying the recipient
about the message being available, checking on the status of
delivery, and so on.
IM2000, as presently formulated, really doesn't offer much more
flexibility in this regard than does SMTP.

I think in principle it does; a message in a message store is labelled with
how long you want to keep it there, and it could also be labelled with an
indication of how aggressively you wish to retry. There's a little of this
sort of stuff in ESMTP DSN, but I don't think it's that widely implemented
in MTAs, and less so in MUAs.

An advantage to giving senders a wider range of options when it
comes to notifying and inquiring about outgoing emails is that it
allows *recipient* systems to actually "bias" their prioritization
of incoming emails based on the apparent persistence of the sender.

But I don't think that any precedence or priority flag assigned by the
sender has any value in absolute terms.

I think it may have value as a [sender,priority] tuple. That is, if I
receive a mail from [my boss,Important] then I may treat it as important.
But mail from an unknown third party labelled as Important almost certainly
isn't. In fact, I would probably apply a negative weighting; the more
important the message screams that it is, the less important that it is
likely to be. Consider counting the exclamation marks in spam subject lines
:-)

After all, UBE is really just a subset of UBI (Unsolicited Bulk
Information), in the sense that UBI includes spam posted to blog and
other web sites allowing posting of arbitrary anonymous content, as
well as to USENET. So the fight is really not *just* in the email
arena, and many of the techniques used in that arena have plenty of
applicability elsewhere (and, presumably, vice versa).

True. I think E-mail does have a special place, primarily because it's an
on-line analogy to a physical service we're all familiar with, and because
private one-to-one store-and-forward communication is an extremely useful
tool.

If you read someone's website or blog, then you've made an active decision
to go there and participate. If there's advertising there, then so be it. It
may be because the blogger put it there, or because spammers put it there
and the blogger does not control them. If it annoys you enough, you may
choose not to participate there. However, one's inbox is considered
'personal space' and not to be infiltrated by advertisers, and especially
fraudsters.

It would be interesting to see a list of desirable features or
characteristics of the GUIXP system you describe.

Cheers,

Brian.

James Craig Burley

2005-04-13 15:01:32 UTC

Post by Brian Candler
James, thanks for those carefully thought-out comments.

You're welcome!

Post by Brian Candler
I should say that the document I wrote was intended originally to consider
IM2000 and how it would interact with spam; its scope has already crept to
include other random thoughts on IM2000. It certainly wasn't intended to be
a design for a FUSSP :-)

Heh..."FUSSP"?

Post by Brian Candler
* The trouble with Artificial Intelligence is, it isn't. Today, AI basically
means tree-searching and pattern-matching algorithms. Maybe in 50 or 100
years time, there will be AI which actually *understands* documents (and
people) at the highest semantic level; until then, I wouldn't trust it to
make any value judgements on the content of my mail. And if it really *did*
understand my mail, I probably wouldn't want it to be reading it anyway :-)

Oh, I agree, AI is (probably) a promise that might never be fulfilled.

But there are things we're doing with today's technology *now* that
would have been thought of as "requiring" AI 20 years ago. Google
comes to mind, and when I think of how one of the ways it "works" is
by drawing conclusions partly based on connections among trusted
relationships, I think similar approaches can be brought to bear on
the UBE problem.

(We're already using such approaches on an ad-hoc basis, in the form
of RBLs, including arbitrarily blocking dynamic-IP-hosted MTAs like
mine.)

Post by Brian Candler
* It's an interesting point that the real criteria as to whether spam is
wanted or not is based on the content. However, I think you're imagining a
future world where a significant portion of bulk E-mail is sent from
legitimate businesses that you might actually want to deal with.

Not really, because, my "future world" includes a vast number of
smaller, lighter-weight businesses (and other organizations, as well
as people), from which I *might* indeed want to receive information on
deals, offers, alerts (e.g. from churches, charities), and so on.

The infrastructure allows that kind of thing, except *I* don't have
time to sort through it all myself, without some kind of technological
assistance.

(The way I've handled this in the past is to stop working on various
projects. I don't think that approach scales well to the rest of the
Internet; besides, I've pretty much run out of projects to stop
working on. ;-)

Post by Brian Candler
From my own experience, the vast majority of spam is downright fraudulent.
Even if I saw something in a spam that interested me - taking your example,
it was talking about saving the Brazilian rainforest - if it asked for a
donation of $20, there's no way I'd send it. That's because (a) the spam is
almost certainly fraudulent, difficult to trace, and the $20 would just line
somebody's pocket; and (b) conversely, I know that any legitimate
organisation which was active in this area, would not dare to send spam.

Right. Remember, the most "obvious" way to know whether something is
UBE, or at least BE, is the fact that it's sent in *bulk* to many
recipients.

Therefore, recipients sharing information that "profiles" sources
and/or topics of emails can discover, among themselves, that certain
sources and/or topics are so widely involved in email, within a given
period, that they indicate bulk. Then it's up to each recipient to
decide, based on the source, the degree to which it's unsolicited.

My concern, here, is that we keep in mind that making email systems
work *better* means, generally, we'll get even more email, and that
the overall population of email users is rising and will likely
continue to rise. (Given the low environmental costs and expense of
email versus other forms of communication, I think that's a *good*
thing.)

IM2000 may shift some of the burden of costs onto the sender, which
may reduce the overall % of UBE, and maybe it'll do this enough to
basically eliminate spam.

But it won't make the general problem of too much incoming email and
sorting through it efficiently and easy go away -- it might even make
that problem *worse*.

So, I find myself forced to imagine a more intelligent MUA that helps
users sift through their incoming email based on more personal
criteria. Such an MUA would obviously appreciate having more
information upon which to draw -- including message content -- in
order to prepare its presentation to its end user. Accordingly, it
would want itself, and other MUAs, to be able to share information by
propagating it upstream and to fellow MUAs, helping all of them detect
bulk email, etc.

As MUAs necessarily become more intelligent, I think the advantages of
IM2000 as a "pull" protocol become *smaller*, not greater, over SMTP,
and the efficiency advantages of IM2000 merely as a *new* protocol
(more precisely, of *any* new protocol that could replace SMTP but
still be an SMTP-like "push" protocol, perhaps with bounces) are not
all that great, even in today's environment, IMO.

Post by Brian Candler
Therefore, the fact that it's received as UBE *automatically* classifies it
as uninteresting to me, regardless of content. And so it should for everyone
else; unfortunately, even with a hit rate of 1 in 1000 or less, there are
still enough born every minute to make it worth the fraudulent spammer's
time. The other 999 spams cause huge annoyance, but the law enforcement
agencies at present seem completely disinterested in tracking down these
fraudsters - even when the spams include postal or telephone contact points
which would be easy to trace or sting.

When you say "received as UBE", I assume you mean from a known
*source* of UBE, that is, not based on content analysis.

A "razor" I apply to any system (like IM2000) or improvement (like
greylisting, Challenge/Response, etc.) designed for email is to
basically "cancel" components that are equal on each side of the
equation vis-a-vis today's SMTP.

So, since SMTP already provides means to block incoming content from
known UBE sources, IM2000's ability to do the same thing really
doesn't help matters much, in terms of justifying rolling out IM2000.

And since both allow a receiving system to "tarpit" a known sender of
UBE in some fashion, driving up her costs even just a bit, I cancel
*that* out.

What SMTP offers that IM2000 typically does not, and, based on your
and other proposals, cannot, is the abillity for a receiving system to
acquire message content without doing a DNS lookup and without
initiating a connection to any remote site.

That reduces the number of points of failure by at least two (DNS
doesn't need to be working to receive a message, which actually
mattered to *me* recently when Comcast screwed up its DHCP info
advertising nameservers but I still received incoming email; and a
remote sender's ability to accept incoming connections doesn't need to
be on-demand).

That, in turn, not only makes ordinary exchange of emails between
mutually trusting sites theoretically faster (SMTP's ancient protocol
introduces some unfavorably noise here -- HELO! ;-), but it makes it
easier and less expensive for an SMTP receiver to *tarpit* a known
sender of UBE.

Post by Brian Candler
* There are I think a large proportion of people who would rather see no
advertising at all in their inbox, even if some of it might be of interest
(e.g. special offers or discounts from legitimate companies that they might
want to deal with)

I don't see any way to achieve this without false positives, either in
an SMTP world or an IM2000 world. (In any case, in my view that
merely pushes the responsibility of filtering upstream, turning the
final MTA prior to the recipient's MUA into a sort of "secretary" for
that MUA, if the MUA isn't itself able to funciton in that fashion.)

For example, some people send legitimate email to mailing lists, and
they do so from free email providers that add smallish banners to
outgoing email advertising those providers' services.

That *guarantees* that, for people receiving such emails, one of three
things must happen:

1. The recipient will see advertising in their inbox. ("This email
sent by AcmeCo's Free Email Provider -- sign up today at
freeacmemail.com!")

2. The recipient will not see otherwise-legitimate email that
contains such advertising.

3. The recipient's "agent" will somehow "know" (here we get into AI
again, as it's basically an arms race -- see my jcb-sc.com site
for details, and, yes, I just "plugged" it purely for purposes
of illustration ;-) how to remove the advertising content from
the legitimate email, "solving" problems 1 and 2 at the same
time.

Post by Brian Candler
There's a parallel with the real world. In the UK we have this thing called
the "Mailing Preference Service". All marketers who follow the Direct
Marketing Association code of practice are required to pre-filter their
mailshots and remove all MPS-registered people from it.
It's remarkably effective; the amount of junk paper mail I get is now very
low (and the credit card offers and the like that I still receive go
straight into the bin, as I know the company does not follow the DMA code of
practice). And when I tell someone about the MPS, almost without exception
they are completely delighted and register immediately.

But email (and all electronically-based communication generally) is
getting further and further away from the "comfort zone" we
historically have had with tangible goods.

Consider bounces. They made sense in a tangible-delivery context,
because if you send Aunt Emma a package containing pictures of your
newborn, and it can't be delivered for some reason, you *really* need
those pictures to be shipped back to you by the delivery agent. That
agent simply dropping them into the trash isn't really a legitimate
option.

So, it also made sense, to the designers of SMTP and some of its
predecessors, that "electronic" mail was simply just mail done
electronically. As there was no tracking of tangible deliveries then,
they provided none in the electronic version. As the responsibility
for delivering tangible mail naturally has to tag along with the
content of that mail (and, pretty much or closely at least, with the
notification that the mail is available), SMTP was endowed with the
notion that responsibility for message delivery accompanied message
content, and included responsibility for subsequent notification of
delivery failure.

Now, with IM, TXT messaging, and so on, people are used to, and I
think even prefer, to not be sent *back* a whole new message saying
"message delivery failed, here's what you sent", rather, just be told
what the status of each outgoing message is.

Along these lines, the electronic world is *much* more heterogenous,
and changes much more quickly, than the world of tangible delivery,
especially the narrow example of the UK.

(As an exercise, ask yourself how useful MPS would be if international
entities could send arbitrary deliveries of any size to even entirely
made-up addresses within the UK, for zero, or nearly zero, cost.)

Post by Brian Candler
I think you're right that we suffer from information overload; and
advertising is for most people the lowest form of information. Many people
are happy to accept the risk of losing out on an advert that might actually
interest them, for a reduction in the total amount of advertising that they
are bombarded with.

Agreed. The tricks include separating the advertising from the
useful, informational email, as well as distinguishing useful targeted
advertising from mass-marketed BS.

Source-based distinctions (RBLs and the like) do a lot of this for us
now.

I don't think IM2000 offers enough, in this regard, to improve on
today, and what I *do* think it offers is the ability for us to more
flexibly respond to a more complicated, more interconnected world that
will, therefore, include each person likely being sent much more
*legitimate* email anyway.

Post by Brian Candler
I do occasionally receive chain letters and lottery frauds through the post.
However it's rare because the cost involved is that much higher than
spamming.

That is a key insight. If true, it gets us back almost to square one
when discussing SMTP, IM2000, the UBE problem, etc.

Your document explains why IM2000 does not, "out of the box", really
make sending BE any more expensive, and in fact might make it
substantially *less* expensive, if it is not, in deployment, wrapped
up in all sorts of extra "baggage" a la today's RBLs.

So chain letters and lottery frauds (I get lots of the latter, by the
way) become, potentially, *less* expensive to send in an IM2000 world.

And your document explains how, because messages aren't unpinned by
recipients until, based on *post-notification analyses of sources*,
they can decide whether reading and unpinning a particular message is
a worthwhile.

That implies an MUA infrastructure that include cooperation among MUAs
(which probably includes cooperation with MTAs), or their IM2000
equivalents, in that, *somehow*, a person who reads a piece of spam
needs to be able to tell their MUA that it's spam *and* that MUA needs
to somehow communicate that claim (and it's only a *claim*) to other
MUAs, presumably via intermediaries.

Yes, the underlying MTAs can "infer" that a given source is a spammer
based on discovering the overall message-sending pattern. But they
can, and probably do, do that *today* in the SMTP world. (This gets
back to the MTA-as-upstream-secretary approach.)

What IM2000 allows -- but only by substantially changing the MUA
interface as seen by an appropriate constant % of its user base -- is
deferring accepting *responsibility* for a message until its recipient
is (nearly) ready to actually read it.

SMTP actually *can* provide that, in a less-elegant way, via
ecrulisting (a la greylisting, except "deferred", or greylisted,
messages are generally made available to an MUA via notification a la
IM2000 and the MUA has the IM2000-like ability to tell the notifying
agent, or MTA, how to deal with future attempts to deliver the
message).

What I'm thinking of doing is implementing ecrulisting for myself and
improving my MUA (in GNU Emacs ;-) to see how such a thing would work
in practice, and perhaps offering a free SMTP server that does
ecrulisting on my web site (in source form) for others to try as well.

It's not as clean as IM2000, but it has a generally-similar profile in
terms of potential points of failure, and it would give users an
opportunity to see how well an IM2000-like MUA would work in practice,
except there would (generally) be no inherent latency-based lags in
retrieving message contents (which I think would be considered
unacceptable and thus worked around, via prefetching, in most, if not
all, IM2000 implementations anyway).

Post by Brian Candler
- Bulk mail, untrusted sender
=> probably fraudulent (= chain letters)
- Bulk mail, classified as legitimate advertising by trusted third party,
classified by subject area
=> can choose to reject based on the fact it's advertising (= MPS)
=> can choose to accept or reject based on subject area (= PPS)
Such systems would work because you are not having to assess a level of
trust in potentially thousands of different organisations or individuals
trying to contact you each day, down to a handful of agencies who sign and
classify 'legitimate' advertising.

I'm still having trouble understanding how IM2000 makes this work
*that* much better than SMTP.

Post by Brian Candler
If spam is not correctly labelled, then you still have
all the existing difficulties with SMTP of tracing it back to source.

But IM2000 doesn't really make those difficulties go away, except by
*either* dispensing with store-and-forward (which I interpret as
"IM2000 won't be widely adopted, *ever*"), or providing an out-of-band
means to represent transmission/relay information.

The latter is practical and is how SMTP probably *should* have been
designed -- that is, make message contents basically opaque as far as
the MTA is concerned, and put all the "Received:" and "Delivered-to:"
stuff into the envelope in some fashion. (In addition to MAIL FROM
and RCPT TO, it'd include RCVD BY, DLVRD TO, whatever, corresponding
to today's mucking with the message headers.)

That way, a receiver doesn't need to snarf down an entire 10MB email
to conclude that, while the *immediate* sender might be a generally
trusted relay, it got *this* particular message from a known, or
likely, source of spam.

Even here, this becomes a tradeoff in which IM2000, or a "proper"
SMTP, helps *spammers* reduce their costs, in that *they* wouldn't
have to transmit entire messages before a receiver decides to not
accept them based on originating or intermediate IP address.

The salient "feature" of IM2000, from the point of view of blocking
based on message stores that are no longer trusted, fades once the
store-and-forward capabilities of SMTP are implemented in an IM2000
world, which they *will* be, for a variety of practical reasons.

Post by Brian Candler
* While we are in a situation where on-line identities can be created at
will, and assuming you would like to be able to receive mail from people who
have not dealt with you before, whitelisting is not very useful for
controlling spam. It would require a network of trust which would prevent
individuals creating large numbers of identities at will; the identities
would have to be bound to a personal or corporate identity (and even
corporate identities would be weak, since companies can be created without
too much difficulty of expense).

Yup. When the originator is new to the receiver, the receiver
therefore *must* rely substantially on content analysis to determine
utility of email.

Therefore, as incoming email load goes up, content analysis becomes
more important to do via software assistance.

Today, we accomplish much of it by having an email infrastructure
that's so overloaded by UBE that we arbitrarily block substantial
portions of the Internet from accessing our MTAs, so our MUAs never
even see the messages. And we do other things, including limiting
email-transmission resources overall, that make sending email more
expensive than it *needs* to be. (All of this probably conspires to
*reduce* the amount of UBE sent, since much of it is sent based on its
attractive bang-for-buck ratio.)

That definitely has collateral damage. I don't use any traditional
RBLs, though I do block email with envelope senders of known spammers
based on a central repository run by someone who uses whois-style
lookups to confirm that domain names and/or email addresses are indeed
"owned" by known spammers (so joe jobs do not result in innocents
being listed). I haven't set up my SMTP server to *report* such
blocks on my logs yet, so I don't know how efficacious they are (silly
of me, but I've been busy with other things).

But I really don't end up *seeing* lots of spam, because my primitive
MTA (funnily patched qmail) dispenses with most of it via a
combination of very trivial tactics -- giving a multi-line greeting in
most cases, tarpitting multiple RCPTs, and a few other very simple
content-based things done on the fly -- which somehow conspire to
rebuke most older spam software (which can't tolerate multiline
greetings and HELO responses) and cause newer spam software to give up
sending to my site (as a result of the RCPT TO tarpitting).

(It "helps" that my qmail-smtpd is "vanilla" in that it doesn't
reject, out of hand, envelope recipients with unrecognized usernames.
Between that and the longtime "attractiveness" of my jcb-sc.com domain
to spammers, almost all incoming UBE has four or more RCPT TOs, and,
since only my wife and I are legitimate "human" recipients of email
here, almost all incoming email with more than two RCPT TOs is UBE.)

I'm considering switching to an ecrulisting-style system in order to
get a better grip on how many false positives and negatives I'm
*really* getting, since there's no way to be *sure* that, say, clients
giving up due to a multiline SMTP greeting or a minute-long delay
before each RCPT TO after the second are in fact trying to send UBE.

If I get around to this, my system might include using a catchall
address that runs a "no such user" MUA -- a concept that would likely
be useful in an IM2000 world -- that analyzes incoming (misdirected or
"evil") messages, shares information on likely spam with other MUAs
(like the one *I* use to read email), and so on, in order to elegantly
help me and my wife (or, more properly, the mail forwarder we use for
her, since she's mostly Blackberry based through her work ISP) sift
through all our incoming email.

My experiment would be designed to answer these questions: if I set up
my MTA to allow pretty much *all* email into my system, rather than
blocking it, and I relied more on my MUA to sort things out, would I
be just as happy with the results? Have fewer false negatives and
false positives? Feel better about forcing spammers to keep sending
me the same emails over and over again, knowing all they were doing
was wasting *their* resources?

Post by Brian Candler
I don't think many people will accept that they can't send E-mail unless
they've first had their identity checked.

Yup. It's ultimately up to a *recipient* to decide on what basis and
priority they want to be shown any given email. Everything else in
between involves conserving resources, avoiding resource starvation,
thus propagating recipients' selection and prioritization criteria
upstream, and so on.

Post by Brian Candler
However, it might be reasonable to receive mail only from domain X only if
the owner of domain X has got a certificate, and the existing X509
infrastructure would be fine for that.

[...really out of my domain of expertise at that point, though I think
I get the basic picture...]

Again, I'm having difficulty seeing how a practically-deployable
IM2000 really improves upon SMTP in these areas.

Post by Brian Candler
- spammers send out mail from 0wned machines with upstream ISP accounts; or

This is a *big* issue. As end-user (& computer) mobility increases,
it becomes increasingly obvious that we wouldn't count on having
IM2000 message stores sitting on originating machines any more than
today's email infrastructure assumes home PCs have IMAP mailboxes for
all potential recipients of email they send (and, after all, many of
those home PCs are blocked from sending SMTP email directly to sites
like aol.com, so they need to relay email anyway).

So, there *has* to be a widely supported means for any software
running on a home PC to send an arbitrary email "upstream" to an MTA
(SMTP server, IM2000 message store, whatever) that is more trusted and
more assuredly connected (that is, a real server, not a home PC that
can and might well be turned off as soon as a message is "sent") than
the home PC or a Blackberry.

That means 0wned machines will send such emails via that same method
anyway. IM2000 really can't improve on this at all, that I can see,
in terms of the volume of such messages reaching a recipient's inbox.

Post by Brian Candler
- spammers get lots of domains and certificates (although then new CAs
may spring into life, whose remit is to sign only non-spammers domains)
However, it might stop spammers setting up mailstores on 0wned machines,
which would definitely be a good thing. If you want to run your own
mailstore, then not only do you need your own domain, but you need a
certificate. That's quite a high hurdle to jump; it might mean we have
fewer, larger mailstores in practice.

Again, we can, and, in some cases in practice, we do, this already
with SMTP. I don't think assuming a trend towards fewer, larger
mailstores is wise, nor do I think *going* in that direction is wise,
because centralization of resources, or even trust, breaks down in all
sorts of ways. Also, as we centralize trust, SMTP with all today's
add-ons (such as AUTH) becomes sufficient for the task, AFAICT.

Post by James Craig Burley
a *sender* should be able to control just how
"persistent" his outgoing MUA is in terms of notifying the recipient
about the message being available, checking on the status of
delivery, and so on.
IM2000, as presently formulated, really doesn't offer much more
flexibility in this regard than does SMTP.

What I'm saying is that it isn't an *inherent* advantage of IM2000.
To take advantage of its potential requires doing a bunch of things,
maybe 95% or more of which would accomplish it for SMTP as well.

Sysadmins are always asking questions (on the qmail list anyway) about
how to get finer-grained control over lifetime of emails in their
outgoing queues, as well as over incoming message sizes, and the like.

There's little in SMTP that flat-out prevents such things, and little
in IM2000 that *magically* enables them.

IM2000, being a "new" protocol, can enable such things out of the box
with less muss and fuss. The question is, does that offset the
*penalties* that go along with IM2000 being a "new" protocol (in terms
of adoption expense, etc.)?

Post by James Craig Burley
An advantage to giving senders a wider range of options when it
comes to notifying and inquiring about outgoing emails is that it
allows *recipient* systems to actually "bias" their prioritization
of incoming emails based on the apparent persistence of the sender.

But I don't think that any precedence or priority flag assigned by the
sender has any value in absolute terms.
I think it may have value as a [sender,priority] tuple. That is, if I
receive a mail from [my boss,Important] then I may treat it as important.
But mail from an unknown third party labelled as Important almost certainly
isn't. In fact, I would probably apply a negative weighting; the more
important the message screams that it is, the less important that it is
likely to be. Consider counting the exclamation marks in spam subject lines
:-)

You're absolutely correct about that.

What I mean is that, in a greylisting-type system, it is assumed (and,
often, correctly so) that a second or third attempt to send an email
lowers the probability that it's UBE. So the first and maybe second
attempts are deferred by temporary rejection.

In an ecrulisting-type system, the first delivery attempt is deferred,
the message is delivered to the user's inbox anyway (with an
indication that it was deferred -- in fact, with a dynamic "feed"
concerning delivery attempts and other "discoveries" about the
message), and the user's MUA decides, potentially based on how many
delivery attempts were in fact made, as well as source IP of client,
other clients in the "Received:" headers, etc., how to treat the
message.

A typical spammer doesn't retry delivery of deferred messages, so
single-delivery messages could be ignored, or given very low priority
for display, by a given user's MUA.

IM2000 offers a similar potential, in that message notifications can
be used in determining how "interested" a sender really is in making
sure the message arrived, was delivered, was read, etc.

Message notifications could in fact trigger a series of increases in
priority for things like prefetching a message for an MUA -- as long
as the daemon seeing the notifications can also tell when such
notifications are being sent in bulk to a large population of
recipients (including spamtraps), in which case the daemon presumably
would decide to just chuckle and let the spammer expend resources
sending notifications, etc.

Post by James Craig Burley
After all, UBE is really just a subset of UBI (Unsolicited Bulk
Information), in the sense that UBI includes spam posted to blog and
other web sites allowing posting of arbitrary anonymous content, as
well as to USENET. So the fight is really not *just* in the email
arena, and many of the techniques used in that arena have plenty of
applicability elsewhere (and, presumably, vice versa).

True. I think E-mail does have a special place, primarily because it's an
on-line analogy to a physical service we're all familiar with, and because
private one-to-one store-and-forward communication is an extremely useful
tool.
If you read someone's website or blog, then you've made an active decision
to go there and participate. If there's advertising there, then so be it. It
may be because the blogger put it there, or because spammers put it there
and the blogger does not control them. If it annoys you enough, you may
choose not to participate there. However, one's inbox is considered
'personal space' and not to be infiltrated by advertisers, and especially
fraudsters.

I see email as representing a subset of a continuum here, though. If
I'm maintaining a software package, incoming patches and bug reports
are really a lot like email messages (historically, as well as today,
that's how they were transmitted). Similarly, if I'm maintaining a
web site (not really a blog, more like my actual sites), containing
technical information that might be found to be buggy or out of date
by arbitrary readers, incoming reports are also a lot like email
messages in the sense you mean, even though they don't *have* to be
sent in the form of email messages per se.

So, I want my incoming patch/bug server to have similar capabilities,
in terms of resisting UBI, validating content, and so on, to any email
system I might be using.

That is, I don't want to give up the many add-ons a widely used system
like email offers when I get into areas involving more-specific
content such as patches, bug reports, and so on.

Though I have yet to delve into the world of wikis, I *think* they are
kinda like what I mean here. It's the maintainer of the wiki'd site
who, theoretically, wants to receive arbitrary updates regarding the
content of the site from arbitrary people, and that's an email-like
model -- she's not going out to other peoples' sites looking for bug
reports on her own, necessarily -- but presumably the wiki system
provides communications that are tailored to allow fully automated
updates, which email, by itself, does not (unless we get into MIME
encodings for such things).

Post by Brian Candler
It would be interesting to see a list of desirable features or
characteristics of the GUIXP system you describe.

Your web page's proposal concerning folders and such really begins to
hit on such things. But I've written so much already, and am mainly
trying to figure it all out, so my tentative "design" for such a
system is really hard to express in any useful way at this point.

The only way I can really describe the system I'm imagining is to
suggest a Unixy/shell-like system designed, from the ground up, around
the assumption that *all* resources are dynamically located in
arbitrary places on an unreliable heterogenous network.

The basic building blocks for such a system would be, accordingly,
determined by studying how existing systems and proposals work and
figuring out what the underlying requirements really are, in an
approach not too dissimilar from how, nearly 40 years ago, a few
bright people looked at huge monolithic systems and proposed designs
like IBM MVS, JCL, Multics, and PL/I, and came up with Unix and C,
which somehow managed to survive and ultimately prosper despite their
comparative "feebleness".

So, just as I explained to my manager at an IBM shop, back in 1982,
that the one full *week* we both spent studying and experimenting with
JCL in order to come up with what, in PRIMOS, was the equivalent of
"pl1 *.pl1" (that is, compile all PL/1 source files in this directory
-- in Unix, think "gcc *.c"), I'm trying to imagine what kind of
low-level networking OS would make implementing IM2000 or SMTP
*faithfully* a matter of little more than some shell-fu.

An essential component (perhaps the main component) of the system I'm
imagining would be what I call a "bidirectional shell", or "bidish",
which is kinda like a combination of low-level features of SMTP, HTTP,
FTP, and similar protocols.

It would allow client/server communications regarding data spaces on
*each* end of the communication channel. A client might say "I have a
data stream to send you" (SMTP DATA, FTP PUT, etc.); but the server,
now knowing the stream was available on the client side, might be able
to ask questions about it or specify client-side processing ("give me
the first 100 lines" -- aka "pipe it through head -100l"; "send no
more than 1MB, feel free to compress via gzip or bzip2 if you like to
make that happen"; etc.).

So I always try to think, when I study protocols like SMTP, about just
what *semantics* underly each interaction, in order to try to discover
a "core set" of semantics that, if implemented, would enable just
about "everything", with a lot less work.

Some of this I've gotten ironed out pretty firmly, since I've been
thinking about those aspects for over 20 years. Real-world
equivalents of other aspects are still fairly new to me, and evolving
besides, so I am not ready to design the thing yet, just to start
experimenting, even assuming I had the time!

--
James Craig Burley
Software Craftsperson
<http://www.jcb-sc.com>

Brian Candler

2005-04-14 16:21:24 UTC

Post by James Craig Burley
Heh..."FUSSP"?

The Final, Universal Solution to the Spam Problem:
http://www.fussp.org/

Post by James Craig Burley
But there are things we're doing with today's technology *now* that
would have been thought of as "requiring" AI 20 years ago. Google
comes to mind, and when I think of how one of the ways it "works" is
by drawing conclusions partly based on connections among trusted
relationships, I think similar approaches can be brought to bear on
the UBE problem.
(We're already using such approaches on an ad-hoc basis, in the form
of RBLs, including arbitrarily blocking dynamic-IP-hosted MTAs like
mine.)

That's not artificial intelligence, that's just policy (or human
intelligence turned into policy). "I expect all legitimate mail to come via
an ISP smarthost and not directly from a client machine. Therefore, any
client machine which opens an SMTP connection to me must be a spam sender".

It's got a significant false-positive rate of course (as you've discovered),
and all it does is force spammers with 0wned machines to locate and use
their upstream ISP's smtp server instead, which they're now starting to do.

That's the problem with the current ad-hoc mix-n-match approach. You try to
combine a bunch of techniques which, today, may indicate the possibility of
spamminess. The more viciously you apply these, the more you break
legitimate mail, and the more the spammers adapt.

Take SPF: research apparently shows that more spam is SPF-compliant than
non-spam. So if you want to use SPF as part of your spam-filtering policy,
you should bias SPF-compliant sources as more likely to be spam :-)

But is it reasonable to *require* anyone with an E-mail address to accept
arbitrary amounts of junk and to have to put in place the infrastructure to
filter it themselves? Doesn't it make more sense for the person who is
interested in particular deals, offers, alerts etc. to go to a broker who
collates these things, and pull them down on demand?

Despite filtering, I occasionally get spams from Taiwanese companies who
sell earth-moving equipment or build ships or whatever, and are looking for
distributors in the West. Assume for the moment that these are legitimate,
and not fronts for scams. Optimistically, one in ten thousand recipients
might actually work for a wholesaler/distributor who would be interested in
importing these things.

At best, the other 99.99% is wasted bandwidth, storage and/or filtering
effort. Why should these people be able to *push* this content to me, and
therefore force me to filter it?

Let's say effective filtering is available, at a price. Let's say 50% of
Internet users have this effective filtering (they buy it and install it on
their own PCs, or their ISP buys it and installs it for them). What are the
other 50% of users supposed to do?

Post by James Craig Burley
Right. Remember, the most "obvious" way to know whether something is
UBE, or at least BE, is the fact that it's sent in *bulk* to many
recipients.
Therefore, recipients sharing information that "profiles" sources
and/or topics of emails can discover, among themselves, that certain
sources and/or topics are so widely involved in email, within a given
period, that they indicate bulk. Then it's up to each recipient to
decide, based on the source, the degree to which it's unsolicited.

The trouble with distributed-trust systems is that you have to apply a level
of trust to the value judgements made by other people, but don't have the
resources to evaluate each of those other people. In other words, a spammer
can easily infiltrate such a system by sending in false positive reports -
"I just received this mail, and it was just what I wanted".

Post by James Craig Burley
My concern, here, is that we keep in mind that making email systems
work *better* means, generally, we'll get even more email

Well, perhaps that's just the same problem restated differently.

If the postal mail system were to work better - if letters were delivered in
1 hour instead of 1 or 2 days - would that increase the amount of postal
mail I receive? Perhaps, because some things would be feasible to do by
postal mail which are not feasible now (e.g. some things which are done by
phone or fax now might be done by post instead).

But would that automatically increase the total amount of information which
I receive and have to process, post and non-post? I don't think so.

I think this only applies because spam is so cheap (and because spam is hard
to trace, although this applies to postal mail too)

Post by James Craig Burley
So, I find myself forced to imagine a more intelligent MUA that helps
users sift through their incoming email based on more personal
criteria. Such an MUA would obviously appreciate having more
information upon which to draw -- including message content -- in
order to prepare its presentation to its end user. Accordingly, it
would want itself, and other MUAs, to be able to share information by
propagating it upstream and to fellow MUAs, helping all of them detect
bulk email, etc.

As I say, I don't think this can work except in small communities of users
who know and trust each other. It could work if there is a trusted third
party which all the users in turn trust to make value judgements on their
mail. This is essentially what DCC and spamcop do now.

When you say "received as UBE", I assume you mean from a known
*source* of UBE, that is, not based on content analysis.

My incoming mail currently comes via some RBL blacklists (spam sources),
clamav (known bad content) and SpamAssassin on a friend's system.

You're right, if I am manually filtering messages in my inbox, then I
recognise spam by its content, its "spamminess". But I don't think that
automated systems are too good at this. Automated systems can analyse the
source and establish that it's bulk sending; humans are needed to judge
whether the bulk sending is justified (e.g. opt-in mailing lists, business
transactions) or attempts at fraud. This would be based partly on content,
and partly on exterior factors such as which IP address it comes from, which
in turn can be cross-referenced to forward and reverse DNS, registry IP
allocations, and such like.

Post by James Craig Burley
So, since SMTP already provides means to block incoming content from
known UBE sources

Partly. IP-based RBLs are pretty effective, but not 100%.

The main issue is that IP addresses are shared. If you receive a mail from
smtp.example.net, it may be a spammer blatting content through it, or it
could be a legitimate user sending a legitimate mail.

SMTP smarthosts almost always require no authentication other than the
source IP address being in a known range. Combined with dynamic IP
allocation, that means it's very hard for an SMTP server to know which
account is being used to send messages, and to limit messages sent from that
account.

To make these systems properly effective you'd need to force senders to use
SMTP AUTH when connecting.

Post by James Craig Burley
And since both allow a receiving system to "tarpit" a known sender of
UBE in some fashion, driving up her costs even just a bit, I cancel
*that* out.

It *can* be done with SMTP, but in general it isn't.

Why not? Because (a) it's reasonably difficult, and (b) almost all the
benefit doesn't accrue to the person doing the work, but to the rest of the
Internet.

Even if you could persuade 50% of all ISPs to do this work, the spam problem
would be reduced by (at most) 50%.

Solutions won't be applied unless the benefit accrues immediately to the
implementor. This is why people are implementing things like SES. (If you
implement SES, then you are immune to 'joe job' bounces, and it's *your*
users who benefit)

Post by James Craig Burley
Consider bounces. They made sense in a tangible-delivery context,
because if you send Aunt Emma a package containing pictures of your
newborn, and it can't be delivered for some reason, you *really* need
those pictures to be shipped back to you by the delivery agent. That
agent simply dropping them into the trash isn't really a legitimate
option.
So, it also made sense, to the designers of SMTP and some of its
predecessors, that "electronic" mail was simply just mail done
electronically. As there was no tracking of tangible deliveries then,
they provided none in the electronic version. As the responsibility
for delivering tangible mail naturally has to tag along with the
content of that mail (and, pretty much or closely at least, with the
notification that the mail is available), SMTP was endowed with the
notion that responsibility for message delivery accompanied message
content, and included responsibility for subsequent notification of
delivery failure.

Sure, and it worked just fine too. However it gave too much trust to
untrusted parties: when accepting the mail, you trust that the return
address they give is correct. Giving the wrong return address just causes a
nuisance in the event that spam is undeliverable; it also puts (some) people
on the wrong track when trying to report a spam incident, because they
(wrongly) assume that the return path has some validity, which it does not
for spam.

Note that SMTP does not *require* the entire message to be returned in the
event of failure; just a notification that it failed.

I'm still having trouble understanding how IM2000 makes this work
*that* much better than SMTP.

You get an undeniable sender identity. And because the message is not
mangled by SMTP hops, you have a better chance of taking a reliable hash of
the message content, and (if you wish) of crytographically signing it.

Post by James Craig Burley
That definitely has collateral damage. I don't use any traditional
RBLs, though I do block email with envelope senders of known spammers

But that's stupid, because the envelope sender cannot be trusted!

One of my friends works at a UK government agency. Their system is sent up
to reject all mail sent from addresses @pobox.com (and hence mine), saying
that this is a "known source of spam".

In fact, pobox.com has *very* tight policies for its customer base. But that
doesn't stop someone else sending out spams which appear to be from
***@pobox.com. And therefore, the person who implemented this "anti-spam"
system decided (stupidly) to block mail from pobox.com.

Such a blacklist *could* work with IM2000 though.

It *could* also work with SMTP AUTH and a whitelist of trusted SMTP sources
(that is, those who are known to implement SMTP AUTH and truthfully pass on
the authenticated sender information when relaying messages)

But there's too much intertia to implement this.

Post by James Craig Burley
based on a central repository run by someone who uses whois-style
lookups to confirm that domain names and/or email addresses are indeed
"owned" by known spammers (so joe jobs do not result in innocents
being listed).

That's roughly what SPF tries to do.

Post by James Craig Burley
But I really don't end up *seeing* lots of spam, because my primitive
MTA (funnily patched qmail) dispenses with most of it via a
combination of very trivial tactics -- giving a multi-line greeting in
most cases, tarpitting multiple RCPTs, and a few other very simple
content-based things done on the fly -- which somehow conspire to
rebuke most older spam software

But your definition of spam is "sent by a program which does not correctly
implement Internet RFCs". However, this means you'll refuse to accept mail
from "good" mail sources which also happen not to implement the RFCs in the
expected way. And besides, if your methods really are effective and lots of
people start to implement them, then the spammers will just "correct" their
code.

You are just helping to force spammers to write better software!

The same happened to checking domains on MAIL FROM. Originally, spammers
used to send mail with MAIL FROM:<***@xyz123> or other made-up addresses.
People started to validate the domain portion, since this was quite a good
indication of spam. As a result, all spam is being sent with valid return
addresses of innocent people (spammers have nice big lists of E-mail
addresses that they can pick these from, of course). So as a result, the
problem is worse:
* we now have "joe jobs" to contend with
* plus, users who have accidentally misconfigured MUAs, which used to be
able to send us mail, are no longer able to do so

Post by James Craig Burley
So, there *has* to be a widely supported means for any software
running on a home PC to send an arbitrary email "upstream" to an MTA
(SMTP server, IM2000 message store, whatever) that is more trusted and
more assuredly connected (that is, a real server, not a home PC that
can and might well be turned off as soon as a message is "sent") than
the home PC or a Blackberry.
That means 0wned machines will send such emails via that same method
anyway. IM2000 really can't improve on this at all, that I can see,
in terms of the volume of such messages reaching a recipient's inbox.

Well, they can easily count the number of messages sent via a particular
message store in a day, and apply a limit. This is harder to do with SMTP
relays.

Post by James Craig Burley
Sysadmins are always asking questions (on the qmail list anyway) about
how to get finer-grained control over lifetime of emails in their
outgoing queues, as well as over incoming message sizes, and the like.

Sysadmins can do whatever they like with messages on the systems they manage
- if their software is flexible enough. qmail is pretty poor in that regard.
If you haven't seen Exim, then you need to :-)

Post by James Craig Burley
IM2000, being a "new" protocol, can enable such things out of the box
with less muss and fuss. The question is, does that offset the
*penalties* that go along with IM2000 being a "new" protocol (in terms
of adoption expense, etc.)?

I think a replacement protocol will only be implemented if it gives
immediate benefits to the implementors themselves. I can see those as being:

- company and departmental mailservers
- people who communicate with small trusted user groups (e.g. family)

Those users would benefit from having more trust in the mail they receive
via the new protocol, and fewer filtering false-positives, whilst still
having gateways for the old protocols when talking to people who have not
migrated.

Post by James Craig Burley
What I mean is that, in a greylisting-type system, it is assumed (and,
often, correctly so) that a second or third attempt to send an email
lowers the probability that it's UBE.

Only because spam senders today don't bother to perform retries. If it
becomes worth their while to do so, they most definitely will. It is, after
all, not difficult.

Post by James Craig Burley
I see email as representing a subset of a continuum here, though. If
I'm maintaining a software package, incoming patches and bug reports
are really a lot like email messages

Oh indeed. For a lot of people, being able to receive mail from the whole

Post by James Craig Burley
Though I have yet to delve into the world of wikis, I *think* they are
kinda like what I mean here. It's the maintainer of the wiki'd site
who, theoretically, wants to receive arbitrary updates regarding the
content of the site from arbitrary people, and that's an email-like
model

It's more of a bulletin board model, based on trust. Anybody can update the
wiki, without reference to the admin. If it is defaced, the admin (or anyone
else) can put it back to rights.

Wikis are getting defaced a lot at the moment, because of link-counting
search engines. Wiki-spammers are putting links to their own sites on every
Wiki they can find, just to move their sites up in the search engine
rankings. Vandalism... but despite it, some wikis still seem to survive.

Regards,

Brian.

Brian Candler

2005-04-14 19:01:53 UTC

Post by James Craig Burley
Heh..."FUSSP"?

http://www.fussp.org/

Erm, not a good link. Try:
http://www.rhyolite.com/anti-spam/you-might-be.html
http://www.ahbl.org/funny/response1.php

:-)

James Craig Burley

2005-04-14 21:02:59 UTC

Post by James Craig Burley
Heh..."FUSSP"?

http://www.fussp.org/

http://www.rhyolite.com/anti-spam/you-might-be.html

Definitely better -- seen it before, but it looks different than I
remember. (Wondered whether the other one was *really* what it seemed
to be anyway, since I didn't actually see the acronym spelled out on
that page.)

I worry that IM2000 is a case of spammers-are-stupid-3.

Post by Brian Candler
http://www.ahbl.org/funny/response1.php

Also a longtime favorite.

--
James Craig Burley
Software Craftsperson
<http://www.jcb-sc.com>

James Craig Burley

2005-04-14 21:34:26 UTC

(Some quick clarifications, as I really need to give a full response
more thought.)

Post by James Craig Burley
That definitely has collateral damage. I don't use any traditional
RBLs, though I do block email with envelope senders of known spammers

But that's stupid, because the envelope sender cannot be trusted!

I didn't explain that very concisely, I guess. I block email with
envelope senders *known* to *belong* to spammers.

In that sense, pobox.com might be "0wned" in the sense of being
joe-jobbed, but so is my jcb-sc.com domain name. I don't block those.

The list I subscribe to is a list of domain names (and email
addresses, apparently) literally *owned* by spammers.

Strictly speaking, a domain name like ownedbyspammer.com can be
"forged" by someone else, but blocking that email is hardly a problem,
any more than recognizing that "HELO [*anything*.]ownedbyspammer.com"
in an SMTP session is sufficient to tell the SMTP server that *all*
email being submitted during that session is highly likely to be spam.

But your definition of spam is "sent by a program which does not correctly
implement Internet RFCs".

No no no no no! My definition of "spam" is everyone else's -- UBE.
It so happens that a substantial % of UBE-sending software can't
tolerate trivial variations in SMTP server responses, such as
multiline greetings.

Post by Brian Candler
However, this means you'll refuse to accept mail
from "good" mail sources which also happen not to implement the RFCs in the
expected way.

No I won't. *Their* clients will just have trouble transmitting email
to me. That's *their* problem -- I haven't "refused" it at all.

Post by Brian Candler
And besides, if your methods really are effective and lots of
people start to implement them, then the spammers will just "correct" their
code.

Well, of course. That raises the bar for them.

Post by Brian Candler
You are just helping to force spammers to write better software!

Please, ratchet down the rhetoric. I am not "helping" them in any
way, shape, or form.

I am just "lucky" in that maybe 99% of the spam heading into my server
is either a) misdirected, and thus bounced, b) obviously "owned" by a
known spammer, c) delivered by an SMTP client that exits upon seeing a
multiline greeting or HELO response, or d) delivered by an SMTP client
that gives up after more than a minute or so waiting for a RCPT TO
response.

All of the "fixes" spammers need to apply to jump those hurdles
represent additional cost, especially writing and deploying new
software.

Only because spam senders today don't bother to perform retries. If it
becomes worth their while to do so, they most definitely will. It is, after
all, not difficult.

No, as is widely recognized. It's a temporary tactic, one which
raises the bar in terms of increasing the expense of sending UBE (and
the visibility that one is doing so), which is kinda the whole point
of IM2000, right?

--
James Craig Burley
Software Craftsperson
<http://www.jcb-sc.com>

Brian Candler

2005-04-15 13:49:24 UTC

Post by James Craig Burley
I didn't explain that very concisely, I guess. I block email with
envelope senders *known* to *belong* to spammers.

Oh, so you mean the tiny proportion of spam which does not carry a forged
return-path. Clearly, the more people who implement this check, the higher
the incentive for spammers to forge their return addresses.

Maybe we need to think of adding some sub-classification to "spam". The vast
majority of the spam I see is from scumbags. They want to get their
fraudulent message into my inbox, and either get eyeballs on their website,
or get me to phone or fax them for the con-trick to begin. They don't care
if I can E-mail them back or not, and they don't care if any of their
millions of messages bounces. It's a positive boon if the E-mail is
difficult to track back to source (since they are almost certainly involved
in fraud anyway)

Hence the random return addresses, sending spam through 0wned bots, and so
forth.

There is a lesser category of spam: people who are engaging in legitimate
business, and who would quite happily be contacted by return of E-mail.
These people are probably also not too concerned about bounces, since when
you send to 10 million people you probably don't care than 1 million bounce
(and indeed, don't want to deal with the fallout), but they'll probably give
a real E-mail address in the message body.

These people don't use 0wned machines (which would also be criminal). Their
"crime" is simply a distasteful use of a broad barrage of E-mail to an
extremely large number of mailboxes, knowing full well that the vast
majority are not at all interested. They are happy to cause them annoyance
and waste their time, bandwidth and mailbox space.

Post by Brian Candler
But your definition of spam is "sent by a program which does not correctly
implement Internet RFCs".

And so does another percentage of legitimate E-mail sources. Maybe your trap
catches more spam than non-spam - today. But it will have false positives
today, and as the spammers change, it will get worse.

I think what you were saying before is that it's the *content* which
distinguishes a spam from a non-spam; I won't argue too strongly against
that. Certainly, a human being paid to delete spam from your inbox would do
a very good job, just by examining each message.

But for any sort of automated spam control to work, the only other useful
definition of spam I can see is "stuff that is sent by spammers".

If you were able to block all communication from spammers, you'd block all
spam (by definition). You'd also block all non-spam sent by spammers, but
since they are scum, most people don't care. You might care if your job is
to run an abuse-tracking service, and you need to be able to communicate
with spammers.

You'd block all non-spam sent by machines which have been hacked into by
spammers. That's an unfortunate consequence, but hacked machines really
shouldn't be on the network in the first place.

For me this means: blacklists of spammers' accounts (and machines controlled
by spammers).

IP-based RBLs work surprisingly well. If you combine a blacklist of IP
blocks owned by spammers (e.g. spamhaus), one of open relays and proxies,
and a dynamic one which reacts in real time to new spam sources (e.g.
spamcop), they work pretty well even in the current world. The problems are
to do with identifying new sources quickly enough, and more importantly the
collateral damage when you list an IP address which belongs to a mailserver
shared between spammers and legitimate users.

To avoid that damage, you need an indication with each message of the
account identity used to submit it. This could be done with SMTP AUTH; but
inertia means it won't happen. It is an inherent capability of IM2000.

Post by Brian Candler
However, this means you'll refuse to accept mail
from "good" mail sources which also happen not to implement the RFCs in the
expected way.

No I won't. *Their* clients will just have trouble transmitting email
to me. That's *their* problem -- I haven't "refused" it at all.

I fail to see the difference between causing a problem when someone tries to
send you a mail (such that it cannot be delivered), and refusing to accept
the mail.

Most people are *users* of E-mail software. They do not have the technical
ability to *fix* the E-mail software that they use.

Post by James Craig Burley
All of the "fixes" spammers need to apply to jump those hurdles
represent additional cost, especially writing and deploying new
software.

They *will* happen, and soon (i.e. within months at most), as surely as the
widespread implementation of MAIL FROM domain checks made spammers change to
sending out mail with real (but forged) E-mail addresses as the envelope
sender. The cost is minimal.

Post by James Craig Burley
No, as is widely recognized. It's a temporary tactic, one which
raises the bar in terms of increasing the expense of sending UBE (and
the visibility that one is doing so), which is kinda the whole point
of IM2000, right?

No.

I don't support any change in the E-mail system that we use today, which can
be bypassed just by the spammer changing tactic. Any solution has to be
*strong*. Otherwise, it's a total waste of time.

And incidentally, being on this list doesn't mean I think we *should* all
switch to IM2000. I'm still thinking about it :-)

It's clear that IM2000 still allows spam to be sent; ISTM however that *any*
concept of a 'mailbox' which arbitary people are allowed to drop mail
into, will also support the delivery of spam.

At the moment, I think the best we can achieve is the near-instant
"cancellation" of accounts being used to send spam; and the blacklisting of
IP sources which allow unrestricted creation of new accounts, or unlimited
sending of messages from a single account, or are entirely controlled by
spammers.

It has to be reactive like this, since spammers can create new on-line
identities at will.

IM2000 suits this sort of cancellation pretty well, and the level of changes
required across the Internet to achieve the same thing with SMTP would be of
similar order of magnitude.

Regards,

Brian.

James Craig Burley

2005-04-15 22:49:39 UTC

Post by James Craig Burley
I didn't explain that very concisely, I guess. I block email with
envelope senders *known* to *belong* to spammers.

Well, of course. As I said, I don't have any data on how much of the
UBE I *don't* get is blocked by this mechanism. I've used it for only
a few months, and doubt it is having all *that* much effect. But it
was fairly easy to set up, and I use it mainly out of curiousity and
because it just naturally followed from my stumbling on the site that
provides the data after receiving lots of spam from one domain that,
upon being googled by me, turned out to be owned by a known spammer
and on this data base (which was how I found it)!

I do think SPF and/or DomainKeys -- basically, any scheme designed to
reduce forgery across the board -- might make this kind of data base
more useful, and perhaps even necessary.

So this data base, which is trivially easy and cheap for me to
download every 24 hours and crunch into a qmail badmailfrom file
(largish, but no apparent need to cdb-ize it yet), makes it fairly
cheap and easy to sorta "leach" off those who are making SPF and
DomainKeys happen -- the sites that are incentivizing spammers to
*not* forge envelope-sender addresses, so they can control their own
SPF and DK records.

Once a data base like this is available, it helps not only with
envelope senders, but with HELO hostnames, and with any URLs found in
untrusted email content.

Of course, it "works" only insofar as one assumes *any* content served
via a domain owned by a "known spammer" is undesirable. The usual
caveats therefore apply ("I'm not a known spammer!", "How do I get off
this list??", "slashdot.org is a known spammer, it keeps sending me
summaries of articles I never requested!", etc.).

Post by Brian Candler
Maybe we need to think of adding some sub-classification to "spam". The vast
majority of the spam I see is from scumbags. They want to get their
fraudulent message into my inbox, and either get eyeballs on their website,

What website? The one using the domain name they own, the one they
point to via a static IP, or the one that belongs to somebody else but
whose HTTP server they've somehow 0wned?

This part of our discussion is *tightly* related to the key "feature"
of IM2000 over SMTP.

IM2000 assumes a message (notification) recipient *will* have to
contact some entity chosen by, and acting on behalf of, the sender.
That could be a bare IP address, though I notice some IM2000
proponents suggest "outlawing" that in the design (I don't see why one
should bother, since any arbitrary domain name can be easily pointed
to any IP address, but nevermind that).

The main utility of this distinction is that a sender must therefore
have some kind of "server presence" acting on his behalf, probably in
the form of a domain name. (To me, that's just another one or two
points of failure in the email system, but I'll continue....)

Problem is, there are an infinite number of domain names out there
(versus only about 4 billion IPv4 addresses), and they are not all
under control of one genial central authority that can quickly and
reliably determine who is and isn't a spammer.

So, there'll need to be data bases of "trusted" and, certainly,
"spammer-owned" domain names available for IM2000 MUAs to cross-check
against in order to know to not bother pulling down (or unpinning) any
message content tied to message notifications from problematic domain
names.

As it happens, the data base I'm accessing is an example of exactly
that. I don't know how long it'll last, or how useful it really is,
but as someone is reliably providing it for (basically) free and it
seems to be well-maintained -- my cron job emails me diffs every 24
hours, so I can spot obvious problems -- I don't see the problem with
using it.

Put another way: if you *aren't* using such a data base to block email
sent by known spammers, in an IM2000 world, you *will* be using it to
do exactly that.

Post by Brian Candler
or get me to phone or fax them for the con-trick to begin. They don't care
if I can E-mail them back or not, and they don't care if any of their
millions of messages bounces. It's a positive boon if the E-mail is
difficult to track back to source (since they are almost certainly involved
in fraud anyway)

The biggest problem with this line of thought, as illustrated by my
expanding on your mention of web sites, is that spammers for
*products* available online (versus, say, Nigerian spam) are highly
incentivized to provide easily reachable contact points in their
spams.

So, spammers obviously trying to dodge content filters, throwing tons
of arbitrary spaces and weird characters into what looks, on my screen
running my primitive MUA, almost like a pointillistic painting,
nevertheless insist on including a URL like http://www.whatever.ru, a
URL that is easily detected as the message comes in via SMTP and its
domain name looked up in the same badmailfrom file (should I bother to
patch my SMTP server accordingly).

Now, to the extent there has to be enough people to be duped into
contacting the spammers, the spam will make such contact easy.

Two obvious ways to do that: include a URL that is
machine-recognizable and -readable; and use a "real" domain name for
the envelope sender.

Both approaches yield a contact address that any 'bot can easily
compare to a data base of IP addresses and domain names known to be
*owned* -- not just forged -- by spammers.

So this takes a fairly easy shot at part of the problem. What doesn't
it handle?

- It doesn't scale up well, because of the sheer number of domain
names that can easily be obtained worldwide.

But improvements to DNS and domain-registration data base technology
can mitigate this somewhat, perhaps entirely. I.e. it *should* be
easy, given any domain name, to do non-human-assisted recursive
lookups on the owner of that domain name name and/or its various
sorts of hosts (registrant, ISP, and so on) until a sufficiently
certain threshhold of clarity ("trusted", meaning if spam is
received it can be reported and will be acted upon, or
"spammer-owned", or "untrusted") is reached.

- It doesn't detect spam sent with forged or useless envelope senders
*and* URLs in the email, but which might contain alternate
representations of contact info that machine readers can't pick up
(the classic "visit this URL [which is actually represented by a
JPEG or GIF]" case).

- It doesn't detect spam sent with forged envelope senders and/or URLs
that point to hijacked sites -- those not, strictly speaking,
"owned" by spammers, but sufficiently 0wned such that their own
*servers* are serving spam, exporting SPF/DK records, serving as
IM2000 mail stores, and so on.

- It doesn't detect spam sent with entirely legitimate envelope
senders and/or URLs, whose owners simply decide, on occasion, to
send UBE to which some of their recipients object as being spam,
despite normally being a major source of legit email. (Imagine if
AOL occasionally sent adverts for its services to all known non-AOL
users on the Internet, from ***@aol.com. Nothing
*technical* can stop this, if the message is designed to get past
the content filters of the day, because nobody can really block all
email from aol.com and still be said to be "using" email. Only
AI-like content analysis can help users avoid seeing email like
this, at least as high-priority email.)

Post by Brian Candler
But your definition of spam is "sent by a program which does not correctly
implement Internet RFCs".

I don't believe I've had any false positives of which I've ever become
aware, as a result of my running an SMTP server that requires a bit
more conformance on the part of a client, but I don't have a broad
range of sources of legitimate email these days.

But, you're right, as the spammers adapt, more spam will get past
these sorts of measures.

Given the trivial cost of implementing them, and the fact that they're
obviously still blocking a ton of spam for my domain, they've been
*totally* worth it.

I agree this sort of tactic is hardly a reason to *not* roll out
IM2000.

It does raise the issue of, how serious is the spam problem *now*,
*really*, such that we contemplate moving to a whole new system that
will still apparently require many of the same "augmentations" (RBLs,
whitelists, blacklists, etc.) that we are already using with SMTP in
order for IM2000 to actually stop spam?

Post by Brian Candler
I think what you were saying before is that it's the *content* which
distinguishes a spam from a non-spam; I won't argue too strongly against
that. Certainly, a human being paid to delete spam from your inbox would do
a very good job, just by examining each message.
But for any sort of automated spam control to work, the only other useful
definition of spam I can see is "stuff that is sent by spammers".

I think those two paragraphs really make the point. Until and unless
we have adequate "help" sifting through email such that *only* the
content (and its purported sender) is required as input, we *will*
need to continue coordinating and communicating information about
spammers, such as identification, tactics, etc.

So, let's say that content-only analysis is capable of solving N% of
the spam problem. I don't care what you think N is -- anywhere from 0
to 99.999 is okay.

That N% being handled by content analysis, and content analysis not
giving a rat's behind about whether the content arrived via SMTP,
IM2000, HTTP, FTP, or carrier pigeon, it ceases to be an issue in any
debate over *transport* technology.

So, we're now focusing on the remaining portion of spam -- 100% minus
N%, which we'll now treat as if it's 100% of the existing spam that is
*potentially* stopped by measures other than pure content analysis.

Now let's say that the sort of coordination and communication needed
to help everyone defined "spammers" in the sentence you use, "stuff
that is sent by spammers" -- constitutes X% of the remaining spam.

That is, we can theoretically eliminate X% of spam if we know exactly
*who* spams and *reliably* identify any incoming email (via SMTP or
IM2000) as coming from a known spammer.

Now, the pertinent question becomes, does requiring a sender to
provide a message store make *that* much difference, in terms of our
ability to reach the theoretical maximum for X, compared to what SMTP
is evolving to, in terms of putting practical requirements on senders
to inject a message from a source that is not immediately identifiable
as a known source of spam?

As long as spammers can 0wn machines that serve as message stores for
their spam, and own domain names that point to those stores, or simply
provide IP addresses pointing to them in their message notifications,
it seems obvious, to me, that the answer is "no".

Post by Brian Candler
If you were able to block all communication from spammers, you'd block all
spam (by definition). You'd also block all non-spam sent by spammers, but
since they are scum, most people don't care. You might care if your job is
to run an abuse-tracking service, and you need to be able to communicate
with spammers.

Well, you are already assuming that it is necessary, or nearly so, to
block *all* email sent by known spammers (that is, from machines they
own, machines they 0wn, and domain names they own or 0wn), else
blocking based solely on content analysis would be sufficient, and
everything else would be purely a question of efficient utilization of
resources between sender and recipient running the content analysis.

I'm saying that, from your point of view, you're right, but IM2000
really doesn't seem to offer much of a bang for the huge buck$ it'll
take to roll out as a *replacement* for SMTP. (Would IM2000 have been
a better "starting point" for us, 20-whatever years ago, than SMTP? I
think not. How about SMTP with tracking instead of bounces? Perhaps.
SMTP with no assumed handing off of responsibility for a message along
with its content? Almost certainly.)

Whereas, from *my* point of view, maybe let's take another look at the
content-analysis side of things, take inspiration from the fundamental
IM2000 theory of using the bulk nature of UBE against the spammer,
combine information on the apparent persistence of a sender in
notifying/resending a given email (which might tend to decrease
"likely spam score") with information on the apparent breadth of
recipients of similar emails (which might tend to increase "likely
spam score") with whatever content analysis can be efficiently done at
that point in time, and provide the end user with a display
summarizing *all* available email -- including likely spam -- with the
latter simply identified, as a collection, as such.

How does this help "stop" spam? It doesn't. But it makes spam much
more expensive in order to penetrate the "market" to the same extent.
(Spammers have to keep issuing IM2000-style notifications or SMTP
deliveries to overcome ecrulisting. That's a lot of work for
bulk-email senders, and it'll attract lots more attention from
upstream providers, etc.)

And it makes recipients' behaviors much more opaque to the spammer,
because they rarely actually see their spam get *rejected* -- it
merely sorta "hangs around" and then maybe disappears, gets "bounced",
gets flagged as "finally read" by a user who knew to simply skim
through it quickly as part of a "lemme read my big chunk 'o' waiting
spam now" exercise, or whatever.

And, especially with naive end users, the best thing you can do for
them, to convince them that "offers" are really scams and/or spams, is
to present those emails on their screens not intermingled with
otherwise-legitimate ones, but grouped together in a way that tells
them "this appears to be junk, and I [the content-analysis bot] have
put all the 419-ish stuff in one chunk, all the
enlarge-your-weiner-ish stuff in another, and all the notifications
that you've just won a lottery in yet another".

After all, even the stereotypical grandma on the Internet for the
first time is much more likely to yawn after reading the 5th 419-type
scam in a 60-second period, starting when she saw her first-ever such
scam. Then it'll be easier for her to just click the "yeah, I get it,
this sort of thing is junk" button, which might automatically tell her
friends' MUAs, propagate the information upstream to conserve
resources (but still try to hide as much info as possible from
spammers), and so on.

How can spammers respond to this? Well, they'll have to work harder
to *constantly* distinguish their spam, as well as their sources, from
other spam, because they'll be in even more direct competition with
each other for priority on as many end users' MUA screens as possible
and for *unique* positions on those screens. 1000 different people
spamming the entire Internet with ads for Cialis? They'll have to
*each* get pretty creative in their prose-writing capabilities! (The
best will presumably move on to writing political speeches and other
more "legit" jobs.)

Generally, my point of view is, anything we can do cheaply that
requires spammers to spend, by comparison, more time and money, is
worth looking at.

I don't think IM2000 is necessarily cheap enough compared to what
it'll require of spammers.

I do think IM2000 has the kernel of one of several Big Ideas that can
make ubiquitous spam slowly dissipate over time, as it becomes less
profitable.

(And I think it's better for society, overall, to let the market teach
potential spammers not to get into the biz in the first place rather
than to find and imprison them. It's cheaper; the spammers are not
otherwise hardened criminals; and it's better if potential spammers
are nudged into more-productive activities than doing prison work. I
don't think spam is as corrosive a thing, in our society, as illegal
drugs, so I don't believe a War on Spam, in the usual sense, is
necessary or likely to be an efficient use of society's resources,
even though I'll admit I've possibly harbored personal fantasies about
punishments for spammers who are caught and captured. ;-)

Post by Brian Candler
You'd block all non-spam sent by machines which have been hacked into by
spammers. That's an unfortunate consequence, but hacked machines really
shouldn't be on the network in the first place.

Right; hacked machines are not unlike people with split personalities
(or whatever it's called), where one personality is reasonable, even
useful, another a used-car-salesman type who won't stop trying to get
you to buy something.

Content analysis, if sufficiently sophisticated, can make a
distinction in such a case; if not, probably the best approach is to
ignore almost *anything* the source says or does.

But more severe measures, well outside the scope of any SMTP/IM2000
discussion, are needed when a machine (or person) becomes outright
"violent" in the milieu (as in participating in a DOS or dDOS attack).

Post by Brian Candler
IP-based RBLs work surprisingly well. If you combine a blacklist of IP
blocks owned by spammers (e.g. spamhaus), one of open relays and proxies,
and a dynamic one which reacts in real time to new spam sources (e.g.
spamcop), they work pretty well even in the current world. The problems are
to do with identifying new sources quickly enough, and more importantly the
collateral damage when you list an IP address which belongs to a mailserver
shared between spammers and legitimate users.
To avoid that damage, you need an indication with each message of the
account identity used to submit it. This could be done with SMTP AUTH; but
inertia means it won't happen. It is an inherent capability of IM2000.

I am still trying to understand how inertia won't mean *IM2000* won't
happen. There just doesn't seem to be that much of a difference, in
terms of what *really* has to happen in the real world, between an
SMTP AUTH world and an IM2000 world.

Whether it's a trusted message store or a trusted SMTP server, those
messages gotta get in their somehow and therefore have to come from
what that store or server *itself* considers a "trusted" (therefore
authenticated) source.

Sure, I understand that there will be some who will enthusiastically
jump on the IM2000 adoption bandwagon because it's a cool new tech and
spammers won't be exploiting it yet.

But, you know what? It's a whole lot easier to just run another SMTP
server, basically whitelisting anyone who contacts it, on another
port, using slightly different SMTP command verbs, and provide free
clients to your friends for them to use when they contact you.

That'd give adopters of that approach the same sweet sense of joy that
early adoption of IM2000 would give, without having to set up any mail
stores or even engage in any advocacy.

And I don't think "early adopters" describes the mind-set of the heavy
hitters in the email industry overall.

(But my background includes decades of catering to Fortran
programmers, so I might be excessively biased against rosy assumptions
concerning uptake on new tech, having been burnt by them myself on
many occasions. ;-)

Post by Brian Candler
However, this means you'll refuse to accept mail
from "good" mail sources which also happen not to implement the RFCs in the
expected way.

No I won't. *Their* clients will just have trouble transmitting email
to me. That's *their* problem -- I haven't "refused" it at all.

I fail to see the difference between causing a problem when someone tries to
send you a mail (such that it cannot be delivered), and refusing to accept
the mail.

I'm not "causing a problem". I'm providing a multiline greeting,
which happens to state terms for usage of the server in legal lingo.

The user's SMTP *client* is having a problem with that; that is
*their* problem, though they are free to contact me and be whitelisted
in such a way that they get a one-line greeting, if it's that
important to them.

*Many* SMTP servers on the Internet use these sorts of measures,
because they've found that maybe 99% of more of all SMTP client
connections that "fail" because of them are known sources of spam, and
it makes more sense to just let those connections fail than to keep
doing various expensive data base lookups to figure out who they are,
where they've moved to, etc.

Similar measures include rejecting: "presumptuous" clients, also known
as "early talkers"; clients that simply refuse to allow for more than
a one-minute delay in response to a command; and clients that treat
temporary rejections as permanent or as cause for retries so immediate
that the server can infer that the source is spam.

All these measures "break" known spam software. That they might
"break" a handful of narrowly distributed, but legitimate, SMTP
clients has always been understood. Some purveyors of such clients
have responded by fixing the bugs, which is a good thing, since most
of these measures merely stress logic that needed to be in better
working order anyway. ("Early talkers" are, simply put, dangerously
broken.)

Post by Brian Candler
Most people are *users* of E-mail software. They do not have the technical
ability to *fix* the E-mail software that they use.

Such users do not typically run their own SMTP clients that connect
directly to MXes of destination hosts. Their smarthosts run them, or
their IT departments run them.

However, users of *spam-blasting* software often run their own
(broken) SMTP clients. Hey presto, foil them and you block spam,
without any heavy lifting or false positives *at all* (since "false
positive" actually means "incorrectly categorizing an incoming
connection/email as spam", and no incoming connection or email is
categorized at all -- the client merely finds it can't succeed at a
delivery, which it's supposed to handle properly, as that sort of
thing is going to happen sometimes anyway).

Post by James Craig Burley
All of the "fixes" spammers need to apply to jump those hurdles
represent additional cost, especially writing and deploying new
software.

Right, it's an arms race, we all know that already. None of what you
have said on that topic is news to me; I doubt it's news to anyone
else here.

No.

Then what *is* the point of IM2000, when the various web pages
promoting it proclaim that it puts more of an economic burden on
*senders* than does SMTP?

Post by Brian Candler
I don't support any change in the E-mail system that we use today, which can
be bypassed just by the spammer changing tactic. Any solution has to be
*strong*. Otherwise, it's a total waste of time.
And incidentally, being on this list doesn't mean I think we *should* all
switch to IM2000. I'm still thinking about it :-)
It's clear that IM2000 still allows spam to be sent; ISTM however that *any*
concept of a 'mailbox' which arbitary people are allowed to drop mail
into, will also support the delivery of spam.
At the moment, I think the best we can achieve is the near-instant
"cancellation" of accounts being used to send spam; and the blacklisting of
IP sources which allow unrestricted creation of new accounts, or unlimited
sending of messages from a single account, or are entirely controlled by
spammers.
It has to be reactive like this, since spammers can create new on-line
identities at will.
IM2000 suits this sort of cancellation pretty well, and the level of changes
required across the Internet to achieve the same thing with SMTP would be of
similar order of magnitude.

Are you including the cost of rolling out IM2000 as a *wholesale*
replacement for SMTP in your calculations? By "wholesale", I mean the
costs to get to the point where pretty much *nobody* is bothering to
accept SMTP email coming from arbitrary sources anymore, because
IM2000 is working so well, versus the alternative of just continuing
to use SMTP and focus those resources on doing other things.

I find it hard to believe you are, because those costs are *huge*. It
seems, to me, that it'll be much less expensive to incrementally
augment SMTP, as much of a bodge as it has become, in the same ways
we'd have to augment an otherwise-pure, and entirely theoretical,
IM2000 world, in order to achieve basically the same results.

Again, I think the best thing for any IM2000 enthusiast to do *now* is
*not* to whip up a prototype of IM2000, rather, to whip up a quick
SMTP implementation (MTA, MUA, plus other goo) that mimicks an IM2000
"experience" fairly closely, so people can get a feel for how well
it'd work in the context of a system that *is* already experiencing
lots of spam.

Maybe we should try designing that prototype -- call it something like
IM1999, and if subsequent generations are needed, decrement to name
them IM1998, IM1997, etc. -- on another list?

--
James Craig Burley
Software Craftsperson
<http://www.jcb-sc.com>

James Craig Burley

2005-04-20 18:01:19 UTC

(Sorry for the long delay...been very busy!)

Post by James Craig Burley
Well, of course. As I said, I don't have any data on how much of the
UBE I *don't* get is blocked by this mechanism.

You might have some indication in your mail logs, if you have a good MTA.

As I said, I haven't patched it accordingly. Silly me. I have a new
server on order, and have long put off doing software upgrades as the
current one is a Pentium II running RedHat 7.3, etc.

I would propose outlawing [IP addresses in mail-store pointers] too,
for reasons described in my document (section 3). Essentially if you
see an incoming mail notification saying to retrieve mail for
prone to phishing.

I'd prefer the *protocol* not disallow it, rather, that the protocol
allow a message-notification recipient to reject or ignore a
notification due to implementing a policy of not allowing IP addresses
in notifications.

Overall, I believe any *new* email system must be useful in intranet
setups as well as on the Internet itself, and therefore should be as
immune as feasible from, e.g., DNS outages, since email is itself used
to help admins diagnose such things. (It has sometimes been useful
for me to send emails from postmaster@[192.168.0.1] to
craig@[192.168.0.72] or similar.)

Accordingly, I believe it's unwise to require any running DNS (or
/etc/hosts-style domain->IP-addr mapping) for IM2000 to operate at a
basic level. (I realize this puts IM2000 at a disadvantage from the
very start, since a third party of some sort -- a message store -- is
required for it to work.)

Since anti-phishing measures necessarily must combat things IM2000
can't inherently combat, they will be implemented in the end-user's
MUA (if not the end-user's brain ;-) and, accordingly, can easily
include checking for suspiciously-used IP addresses.

I say, put complexity where it belongs, and nowhere else.

It's a weakness of SPF that even if it were widely deployed, at very best it
would just validate domain names - and you can sign up for as many of those
as you wish.

Exactly.

So the designers propose some bizarre distributed reputation
system which means that new domains have to "earn" credit somehow. It seems
pointless to me when you might as well just look at the IP address being
used, which (unlike domain names) is a finite resource and can't be created
at will.

Bingo. I'm heartened to see I'm not the only one who sees things that
way!

Post by James Craig Burley
Put another way: if you *aren't* using such a data base to block email
sent by known spammers, in an IM2000 world, you *will* be using it to
do exactly that.

I think you'll be using a blacklist which looks up [IP address,msg store
account]. Or, if IM2000 is modified so that the sender message store account
*is* the sender's E-mail address, then you could look up the E-mail address
(for accounts belonging to spammers), and separately the IP address (for
message stores belonging to spammers).

This is item 2a on your web page, correct?

I think this presents a *huge* problem for blacklists, because
spammers can easily dDOS them by sending out bazillions of
notifications containing arbitrary sender-account IDs "tupled" with
otherwise-legit mail stores.

In my view, *any* system that depends heavily on looking up data in a
shared DB (such as DNS or a blacklist), where the "keys" for the data
are provided by an untrusted party (such as a potential spammer), is
highly prone to abuse.

There are all sorts of ways to *mitigate* the potential of abuse, of
course. But all such ways seem, to me, to be more directly and
usefully applicable to a "vanilla" SMTP or IM2000 setup.

For example, my patched qmail-smtpd server, though it doesn't log
badmailfrom hits, does happen to log any HELO/EHLO commands that
specify a host different from the reverse-DNS lookup (and since I
don't enable rDNS lookups, that means all connections incoming from
the Internet that say HELO or EHLO!).

Interestingly, I see runs like this fairly often:

2005-04-20 11:15:13.607144500 qmail-smtpd: 30603 @unknown[222.47.69.69] HELO slidemail.com
2005-04-20 11:15:13.689209500 qmail-smtpd: 30601 @unknown[222.47.69.69] HELO goatlantaga.com
2005-04-20 11:15:14.819248500 qmail-smtpd: 30605 @unknown[222.47.69.69] HELO aberystwyth.com
2005-04-20 11:15:16.853260500 qmail-smtpd: 30599 @unknown[222.47.69.69] HELO mystarship.com

Sometimes I'll get 20 or so incoming, nearly-simultaneous, connections
from a given IP address, each claiming to be acting on behalf of a
different, perhaps entirely legitimate, domain name. They're all
sending spam, of course (as verified by my logs, among other things).

Why would a spammer do this? Well, I can only speculate, but I assume
there are plenty of SMTP servers out there that treat HELO as
providing a FQDN and do lookups to be sure there's some legitimacy to
the domain name.

Assuming those lookups are done on external shared DBs, that means
those SMTP servers are giving spammers an easy way to launch a
coordinated distributed DOS attack on those DBs via those servers.

Once those servers fail to respond to their clients within a certain
timeframe, the clients must, since they are acting on behalf of
connections involving timeouts, make decisions regarding how to
proceed.

So, such an attack can potentially convince some servers to falsely
classify incoming connections -- accept spam or reject legit email,
depending on how they react to external DBs being unreachable at
crucial moments.

My SMTP server does nothing but log those, and does only *internal*
lookups on domain names in envelope senders. (I use no RBLs,
amazingly enough, out of a sorta-principled desire to experiment with
anti-UBE measures that don't let untrusted SMTP clients use my server
as a proxy to beat up external shared DBs. Pretty much the only
useful "key" that a spammer can't forge is the incoming IP address of
the SMTP client -- but I don't do any external lookups on that either,
at the moment.)

Now, scale that up. Imagine an IM2000 world, where message
notifications are *much* less expensive for spammers to send, where
they can point to legit message stores with impunity, and where they
can make up sender addresses regardless of their legitimacy, as long
as they aren't expecting to actually *deliver* spam that way.

What prevents spammers from gumming up the works? Even aside from the
fact that spammers will acquire and distribute data bases of "legit"
[mailstore, senderID] tuples -- just as they widely distribute my
legit ***@jcb-sc.com address as a "useful" envelope sender for their
spam, thus drowning me in joe jobs and/or getting me blocked by
(theoretically clueless but practically effective) admins -- they can
make up any number of tuples that would need to be blacklisted or
otherwise identified as illegitimate.

As they succeed in gumming up the works -- by discombobulating shared
(community) DBs that blacklist all the arbitrary [mailstore, senderID]
tuples spammers can come up with -- those "works" will either be
disabled so people can receive email (opening the floodgates to UBE)
or will keep mail from being successfully and quickly delivered.

It's also not clear exactly how IM2000 will prevent spamming by
sending notifications such as:

[im2000.comcast.net, ***@http.221.148.254.170]

Seems to me such spam might be fairly successful in an IM2000 world,
unless sender addresses are normally hidden from users, which is not
what you appear to be advocating.

With SMTP, it's easy for a spammer to have his SMTP client do:

MAIL FROM:<***@example.com>
RCPT TO:<***@http.221.148.254.170>
DATA
[...]
.

But, in addition to having to go through a DATA phase and hope the
message is accepted, at least with SMTP there's no assurance, to the
spammer, that example.com will be beaten up as an innocent third party
by recipients trying to pull down message contents, inquire about
senderID legitimacy, etc.

Post by James Craig Burley
The biggest problem with this line of thought, as illustrated by my
expanding on your mention of web sites, is that spammers for
*products* available online (versus, say, Nigerian spam) are highly
incentivized to provide easily reachable contact points in their
spams.

(1) hardly any law-enforcement agencies seem to be interested in chasing
them up at the moment, even given easy contact addresses;
(2) it's obvious, but seems to have escaped the SPF brigade, that the
envelope sender (MAIL FROM) plays *no* role at all in E-mail delivery,
*except* when a delivery failure occurs and therefore a bounce needs to be
returned to sender.

That'll be true in an IM2000 world, I assume. After all, the more
"obvious" a role that a senderID, especially a [mailstore, senderID]
tuple plays in email delivery -- in terms of user visibility -- the
greater the necessity for that information to be useful by being
easily memorized and, thus, prone to forgery and abuse.

So, say spammers learn that my (legit) mailstore tuple is
[im2000.comcast.net, ***@jcb-sc.com].

Now, they can't *send* IM2000 spam via that tuple, because they
(presumably) can't convince the im2000.comcast.net mailstore to serve
their spam under my senderID.

But that doesn't prevent them from sending out a bazillion spurious
notifications to everyone, claiming that a bunch of emails are at that
tuple, thus wasting everyone's time as they try to read those emails.

(That'll swamp im2000.comcast.net, but only with requests coming from
*real* people -- not necessarily as a result of notifications being
sent to nonexistent addresses, spamtraps, etc.)

Some people will respond by "blacklisting" my tuple in their personal
lists. Fine, that means they won't read any *legit* email I send them
either -- false-positive potential, just as with SMTP today. At least
these people won't beat up im2000.comcast.net so much once they
blacklist my tuple.

Or, they will respond by telling the notification *reception* agent
that acts on their behalf to first try verifying that such a message
exists, which requires contacting im2000.comcast.net. More likely,
such an agent will automatically do this for *any* incoming untrusted
email, regardless of how their users have configured their MUAs, and
maybe even for notifications sent to nonexistent users.

In any case, that'll swamp im2000.comcast.net such that it can't serve
*any* email, and there'll be nothing Comcast, nor recipients, can do
about it.

So, the "win" for IM2000 here is that spam can't be forged as coming
from someone else and still be delivered as spam.

The "loss" is that IM2000 pretty much requires the same sort of
"doublecheck" that people used to do with SMTP, when a server would
try to contact the envelope sender for untrusted email and see if
*his* SMTP server would VRFY that the address existed or accept a
bounce sent to that address (going as far as MAIL FROM:<> and RCPT
TO:<address> and checking the response, before discontinuing the
conversation).

Though IM2000 gives us the opportunity to improve the performance of
that sort of doublecheck, it does so only on a scalar basis (let's say
it makes it 10x more efficient), whereas spammers will have plenty of
incentive to ramp up attacks on certain popular, trusted mailstores at
a much higher rate than the performance is improved by IM2000.

Just as admins of SMTP servers "beaten" on by such doublechecks got
pretty angry at the admins who had their servers engage in such
"beatings", IM2000 mailstore admins will find themselves frustrated at
the increasing load of "doublecheck", or "prefetch" or "verification"
requests, on behalf of huge numbers of message notifications made by
spammers.

It's a vector of attack on both SMTP and IM2000, but at least with
SMTP it was fairly easy to disable the doublechecks and still receive
email.

With IM2000, the need for doublechecks is pretty much impossible to
turn off. One can't even receive message *contents* without doing
these sorts of "reverse" lookups, or "callbacks".

This returns us to filtering *notifications* based on IP address of
the *notifying* parties, to avoid flood-style attacks by spammers and
zombies.

IMO, as long as we're going to have to do that anyway, SMTP remains an
adequate protocol, as it requires no callback to any message store at
all, even for legit messages.
Yes, that probably will always be with us, until humanity is replaced,
nearly wholesale, with robots that can't be fooled by such tactics.
(But that'll probably eliminate the entire pr0n industry, at which
point further development of technology will presumably cease, leaving
the our new robot overloads with nothing to look forward to in their
miserable lives. ;-)

But I think this is probably off the track somewhat.

I'm not sure about that. It's important for us to look at the "big
picture" of UBE, remembering that spam and vermin target two entirely
different audiences, and avoid falling into the trap of expending vast
resources to combat what might turn out to be only a narrow, if
"obvious", means by which UBE meets its objectives, only to see
spammers expend comparatively small resources to work around the
problem. (In general, one wants to choose tactics and strategies that
encourage or compel the enemy to expend greater resources than one
needs to implement those tactics or strategies.)

Amusingly, I've recently started receiving spam that advertises its
products via ASCII art, including the phrase "BEST PRICES"! ;-)

Post by James Craig Burley
(Imagine if
AOL occasionally sent adverts for its services to all known non-AOL
*technical* can stop this, if the message is designed to get past
the content filters of the day, because nobody can really block all
email from aol.com and still be said to be "using" email.

much easier under IM2000 than SMTP, unless you are able to assert that all
mail coming from aol.com's servers has a genuine MAIL FROM address. If the
addresses, then you wouldn't be able to do that).

That was my point; I used salesguy237 as a "userID of the day" kind of
userID.

Just as AOL can gin up arbitrary userIDs that, necessarily, start out
as "legit", spammers can do the same for any legit mailstores they
know about, and of course they can make even "better" use of any
illegitimate mailstores they create, for as long as those stores are
viewed, by enough of the population, as legit.

Post by James Craig Burley
It does raise the issue of, how serious is the spam problem *now*,
*really*, such that we contemplate moving to a whole new system that
will still apparently require many of the same "augmentations" (RBLs,
whitelists, blacklists, etc.) that we are already using with SMTP in
order for IM2000 to actually stop spam?

I think the problem is serious, because every remote site I talk to
implements spam filtering differently. You may implement a reasonable set,
but many sites implement stupid sets. Sometimes my mail is bounced;
sometimes it is blackholed. The spam problem is causing a crisis of
confidence in E-mail; it is simply becoming less and less reliable as a
medium of communication.

That may be true. I'm not sure why I'm not seeing that on my own
personal end of things, but I'm not a typical "remote site".

I tend to think the biggest problem with SMTP is really the fact that
it (normally) prevents a server from accepting a message without also
accepting *responsibility* for that message -- which translates into
the need for generating a bounce.

I believe we can reduce or even eliminate the need for bounces, over
time, by introducing IM2000-like concepts into SMTP, but *without*
introducing (except as a per-message-negotiation option, I suppose)
the "pull" and "separate mailstore" concepts.

Post by James Craig Burley
That is, we can theoretically eliminate X% of spam if we know exactly
*who* spams and *reliably* identify any incoming email (via SMTP or
IM2000) as coming from a known spammer.
Now, the pertinent question becomes, does requiring a sender to
provide a message store make *that* much difference, in terms of our
ability to reach the theoretical maximum for X, compared to what SMTP
is evolving to, in terms of putting practical requirements on senders
to inject a message from a source that is not immediately identifiable
as a known source of spam?

I think it does.

The industry as a whole would need a better answer than that before it
spends $$$ to deploy IM2000 as an SMTP replacement.

Right now if I receive a mail via SMTP then only thing I *know* for sure
about the mail is the source IP address of the SMTP sender. And of course,
the RCPT TO address must be valid, otherwise I wouldn't get it.
But *everything* else in the conversation is not trustworthy; that is, the
spammer can choose to put whatever they like; and if I implement a filtering
policy which is widely used, they can put whatever is necessary to bypass my
filter. This includes the envelope sender, all the message headers and body.
But if I can only reliably blacklist on IP address, I am limited when it
comes to shared mail relays at ISPs. If 1 million users are all relaying
through one ISP's mail relay (a common case), plus 10 spammers, I either
have to blacklist the entire mail relay and suffer collateral damage, or I
have to accept the spam.

Right, I understand all that. But that is simply a Very Hard Problem
anyway; as I pointed out earlier, even the "hotpop"-like ads in the
headers of otherwise-legit email could be defined as "spam".

And, as I point out above, if you resort to blacklisting [mailstore,
senderID] tuples of "known spammers" at otherwise-legit mailstores,
you either:

- Overload the blacklist(s) you're using with all the spurious
senderID's that spammers can make up, or

- Overload the mailstore(s) with requests to validate whether any
given senderID a spammer might invent actually has a legit account
there (equivalent to SMTP VRFY or RCPT TO)

This problem could be mitigated by insisting that outgoing message
notifications go *through* (and thus be vetted by) mail stores, so
only "legit" email addresses are exported.

But that can be done for SMTP anyway, and in fact is done, sorta
kinda, in various ways. E.g. some ISPs deny ordinary users outgoing
TCP connections over port 25; filter incoming connections to their
internal SMTP servers to ensure MAIL FROM:<whatever> identifies an
addy the customer (identified by incoming internal IP address)
actually owns; provide AUTH access for external use by legit
customers; etc.

In essence, an IM2000 mail store is a third party required for all
email transactions. Either the sending party is somehow required to
first obtain permission from the third party prior to notifying a
recipient of message availability, or the sending party can cause a
receiving party to flood the third party with spurious requests.

In the former scenario -- sending party obtains permission from third
party, which is more than just whatever handshaking is needed to
actually store a message on that message store -- either the recipient
needs to be able to confirm that such permission was granted without
necessarily having to make a request from the third party (which is
technically possible, by the third party mailstore signing something
that the receiving party can check against a distributed DB like DNS),
or the recipient might as well be designed (by IM2000) to accept
notifications from *only* that third party anyway.

In that latter case -- accept notifications from only the third party,
not from the sending party -- we're mostly back to an SMTP relaying
world anyway. Certainly the sending party can just use SMTP to inject
the message into the third party. From there, it doesn't make all
that much difference, that I can see, whether the third party and the
recipient use SMTP or IM2000 to handle the message.

Note: it should be clear that SPF doesn't help at all here.

Agreed. SPF is, to me, like DomainKeys, of interest only after a
message has already been "vetted" by a human, or nearly-human, user
such that the "next step" is to try to figure out whether it was
forged.

Stopping incoming forged communications "at the door" is an extremely
expensive proposition, and gives the "enemy" an easy weapon to use
against you -- it's an overreaction to worry about whether each and
every communication might be "forged", in reality as well as on the
Internet.

E.g. I don't really care whether you actually are Brian Candler,
because I'm not investing anything in the proposition that you are who
you say you are. So resources I might otherwise expend to validate
your identity are spent on things I think are more worthwhile, such as
considering your viewpoints, responding to them, etc. You could be a
very intelligent German Shepherd, for all I know (but, on the
Internet, nobody knows etc. ;-).

IM2000 also wins because it's easy to build message stores which rate-limit
the number of messages sent per day from a particular account; it could be
enforced as good practice (i.e. if you offer free signups for new accounts
over the web, then you get your message store blacklisted if you don't
follow this principle)

I'm not sure why that's hard for SMTP -- aren't some ISPs doing that
already?

- All ISPs could disable outbound relaying by IP address. All customers
would be required to use SMTP AUTH to use the mail relay. This would require
all customers to change their MUA configuration, and a flag day at each ISP
where relaying by source IP address alone is disabled.

I don't see how this is different in an IM2000 world. Only when one
customer "forges" a message as coming from another customer is
authentication presumably required as a countermeasure, and that
problem exists for both SMTP and IM2000, assuming IM2000 works by
users putting messages in a central message store managed by their
ISP.

- All ISPs could carry forward the AUTHenticated user information in their
- When you receive an incoming SMTP session, you can send the IP address and
AUTH ID to a trusted third party (i.e. blacklist) to verify
- If the IP address belongs to or is controlled by a spammer, then the
message is marked as spam
- If the IP address is a genuine mail relay at an ISP (which the TTP has to
validate for themselves), and it provides AUTH information about the account
which injected it, then only mail from certain AUTH accounts will be
rejected.
However, this ain't going to happen. The pain is huge, and the short-term
gain is negligible.

How will the equivalent *not* have to happen with IM2000? Seems to me
it'll have all the same problems, unless IM2000 means we'll move to a
sorta-centralized, few-but-highly-trusted-mailstore world, which is
not really different from everyone using a few highly-trusted SMTP
relays.

However, an alternative mail architecture (e.g. IM2000, or something else),
"Reconfigure your own MUA to use the IM2000 service, and immediately all
mail you send to other IM2000 users will be delivered more reliably and in a
way that the sender cannot be forged".
That means, a business can get all its employees to move to the new
architecture, or a group of friends can all change to the new architecture,
and all get an immediate gain. Then lawyers and banks may start to use it.

Okay, then this is not a case of *technical* advocacy, more a case of
saying "it's a whole new system, we think it'll be fairly spam-free
compared to SMTP" -- and in fact it probably will for some time -- and
hoping to convince enough early adopters to jump on board, quickly
enough, that everybody else, except spammers, switches soon after.

I don't think that's anywhere near convincing enough. I've had a
similar sort of mindset all my life, in that I used to discount the
"installed base" issue and other forms of institutional inertia.

Experience has *almost*, but not quite, beaten that out of me.

Hence, despite my serious distaste for SMTP, my greater distaste for
UBE impels me to ultimately frame all my "grand ideas" in terms of
"how can this idea be implemented in SMTP in a gradual fashion?", even
though I do also think them through in terms of "what if we were
starting from scratch?".

(You'd probably be amazed at the language and OS designs that are
floating around in my head these days. In many ways, completely
unlike what we have today. But hardly farfetched -- mind-numbingly
simple, in fact. ;-)

Post by James Craig Burley
And, especially with naive end users, the best thing you can do for
them, to convince them that "offers" are really scams and/or spams, is
to present those emails on their screens not intermingled with
otherwise-legitimate ones, but grouped together in a way that tells
them "this appears to be junk, and I [the content-analysis bot] have
put all the 419-ish stuff in one chunk, all the
enlarge-your-weiner-ish stuff in another, and all the notifications
that you've just won a lottery in yet another".

That's essentially what SpamAssassin does. The problem is that the spammers
have equal access to the SpamAssassin source code, and so customise their
mails to bypass the rules.
It is done more effectively by third-parties which maintain their own
private rulesets. There are companies which offer this service: point your
MX servers at us, we'll filter your mail for spam and viruses, and send it
on to your mail server after cleaning. They work extremely well, but they
are expensive. This is an example of the *direct* economic cost of spamming.
If you want effective filtering, you either have to maintain your own
private rulesets (cost in time and expertise), or you have to use a third
party and pay them in cash.

In the long run, we'll need personal information agents acting in ways
like this on our behalf anyway. It's inevitable, as technology
increasingly floods us with *legit* info, never mind the illegit
stuff.

In any case, IM2000 does not, in any way, shape, or form I can see,
dispense with the need for content analysis. As an "illegit"
mailstore first comes online, or a [legit-mailstore,
evil-spammer-using-it] tuple springs into existence, *somebody* will
have to take the first steps towards discovering that email springing
from that source should have the Evil bit set.

And that requires retrieving the message contents and then analyzing
them somehow.

Further, in order to avoid everyone else having to do the same exact
thing, it requires notifying other white hats about the problematic
mailstore or tuple, so that they may be blacklisted.

IM2000 ultimately puts us in the position of hoping not just that this
will be required *less* often, as a % of overall email sent, but that
the economics will ultimate in spammers giving up and going away.

I'm not seeing a clear case being made for that scenario coming to
pass, certainly not clearly enough such that it justifies the
deployment of IM2000 as it now stands.

Post by Brian Candler
They *will* happen, and soon (i.e. within months at most), as surely as the
widespread implementation of MAIL FROM domain checks made spammers change to
sending out mail with real (but forged) E-mail addresses as the envelope
sender. The cost is minimal.

Right, it's an arms race, we all know that already. None of what you
have said on that topic is news to me; I doubt it's news to anyone
else here.

All I'm saying is, multiplied across the Internet, it's (a) a huge waste of
time and effort, and (b) is contributing to the decreasing reliability of
E-mail, which I consider to be a very serious problem.
People think they are doing good, when they are doing harm.

But IM2000 *requires* it, in essence: envelope senders specify domain
names that *must* be looked up, and thus must have obtainable,
verifiable DNS information, must not be in various blacklists, etc.

So IM2000 gives spammers a *guaranteed* weapon to use against DNS and
blacklists that SMTP doesn't assure them, because not all SMTP servers
do doublechecks (various forms of reverse lookups on sender-supplied
data, which can be arbitrary) against external DBs. Mine doesn't, for
example.

Validating the EHLO domain name is another case in point. It's futile.

Pretty much. Though, I really should get my server to double-check
EHLO domain names against my badmailfrom file and log matches, as
they'd indicate cases where the SMTP client actually *announces* that
it is acting on behalf of a known spammer (that is, the owner of the
domain name -- not just a domain name abused by spammers)!

I mean, as long as I've got the data base and use it for envelope
senders, might as well use it for EHLO, eh? Wonder if it'd catch
anything....

Admittedly the RFCs are badly worded in this area, but the whole point of
EHLO is to give a sender-supplied cookie which can be included in the
recipients log files for tracing. It need only have local significance.
If I get a message saying "EHLO winsrv01" that gives me useful information,
should I have to contact the site which is sending me the mail to try and
iron out a problem. If you enforce that the EHLO name *must* be a DNS name
which matches the IP address of the connection, then it has lost all
usefulness; you can simply lookup the IP address yourself in the DNS.
This sort of stupidity really bugs me, sorry :-)

IMO HELO/EHLO are kinda stupid anyway, and another win for IM2000 is
that they go away. ;-)

I think IDENT is in the same boat. It's simpler and more direct --
installed-base and badly-designed-OS issues aside -- for the upstream
system to do its own validation and logging of outgoing TCP
connections its users/clients initiate (ultimately) via its outgoing
network interface.

No.

Then what *is* the point of IM2000, when the various web pages
promoting it proclaim that it puts more of an economic burden on
*senders* than does SMTP?

I think they are making a wrong assertion. However there may be *other*
advantages that a new mail architecture can have, which make an environment
_permanently_ harder for spammers to work in.

It might also make an environment in which it is easier for spammers
to destroy the ability for the rest of us to work in, as I've
suggested above.

Perhaps [adoption] could also be achieved by starting a new, parallel SMTP
network, using port 26 say. The new network would *require* certain
operational practices to be in place (like using SMTP AUTH to submit mail).
At least you wouldn't have to rewrite any software, although you'd need some
sort of MX record replacement to identify people who can use port 26 to
receive mail.
But it's still not going to be as strong. For example, counting the number
of mails sent by a particular user is still quite hard. Consider an SMTP
relay cluster of 10 machines; they would need to communicate with a central
database to count the number of mails sent by one user over 24 hours.

How is that different from an IM2000 mailstore cluster of 10 machines?

And you'll still need a new blacklist infrastructure for looking up
[IP,auth-sender] instead of just IP address.

Sounds like item 2a on your web page! Again, how is it different?

I think IM2000 is great as a proposal. I don't think it wins as a
design.

What it suggests is that SMTP could evolve to incorporate IM2000
concepts in ways that *allow*, but don't *require*, clients and
servers to cooperatively use them.

A huge win of IM2000 is the (presumed) ability for a sender to
(repeatedly) notify a recipient of the availability of a message
without having to send that message with any particular notification.
This saves bandwidth in almost all cases of exchanges of legit email
messages, and of many involving UBE as well.

That doesn't necessarily *require* the recipient to be able to "pull"
the message from the sender (sender's message store).

Instead, the recipient could notify the sender that it was finally
ready for the message to be *pushed*, a la SMTP's DATA phase.

Similarly, an IM2000 sender is able to "remind" a recipient that a
message is still available, without having to send the message.

In SMTP, one can't do that, because all the recipient sees, before the
DATA phase, is the envelope sender/receiver pair. There's no
transmission or message ID they can use to uniquely identify a
particular [envelope sender, envelope recipient, message] combination,
other than the message content.

That facility *could* be added to SMTP in an incremental fashion --
not requiring any particular ordering of upgrades, across the board or
vertically.

And, in SMTP, a recipient can't (normally) accept the contents of the
message without also accepting responsibility for the contents,
whereas IM2000 allows retrieving a message without "unpinning" it.

This is useful for subsequent anti-UBE analysis, for prioritization,
for a user to read and then decide how to dispose of the message, etc.
And IM2000's key advantage here is that, while such analysis goes on,
the *sender* (well, his mailstore) remains responsible for it and for
storing it, increasing the costs of senders of UBE.

But, in fact, an SMTP server can do this by convincing the SMTP client
to consider the delivery attempt to have failed due to temporary
error.

One way is for it to issue a 4xy response code after the end of the
DATA phase. That arguably tells the client the message will *not* be
"delivered", but it doesn't actually mean the server can't do whatever
it wants with the contents of the message.

Another way is for the server to simply never issue a response to the
DATA phase, leaving the wave function uncollapsed, so to speak. The
client can't assume the message was delivered, nor can it assume it
wasn't. It'll timeout the connection, unless the server simply closes
it first.

Either way, the server is free to do whatever it likes with the
content, even though it has not accepted responsibility for it.

So, it can, as a result of downstream analysis/disposal, respond
similarly or differently to future delivery attempts -- or treat the
lack of a second, third, or subsequent delivery attempt as, itself,
signaling something "interesting" about the email.

This obviates the need for a server sending bounces to innocent third
parties in cases where the server simply doesn't ever accept
responsibility for an email until it is absolutely sure that it will
either deliver the message to a grateful recipient or drop it
entirely. 4xy and 5xy response codes leave responsibility for the
message with the injecting client, as does simply never responding to
the end of the DATA phase.

(This is not unlike IM2000's advantages that you identify as 2c and
2d, perhaps among others, on your web page. E.g. by the time the
subsequent delivery attempts might have resulted in the server
accepting responsibility for the email, the server has potentially
learned that the source, or an upstream source, is a known spammer.)

A big problem with "abusing" SMTP this way is that downstream entities
(relays, IMAP/POP3 boxen, MUAs, end users) don't understand the
concept of a message being available but not the responsibility for
it, so they don't know how or why they should notify upstream entities
that they do or don't wish to accept responsibility.

IM2000 poses the same challenge to downstream entities, of course, so
we're going to have to upgrade those entities anyway if we intend to
deploy IM2000.

Another technical advantage of IM2000 is that a recipient can notify a
sender immediately upon the recipient accepting responsibility for the
message. This is a nice feature that allows rapid reuse of storage
among cooperative agents, as well as rapid notification of sender that
the message has indeed been received.

SMTP doesn't allow this at all -- the server must wait for the client
to decide to connect again, and, even if it does, there's no elegant,
consistent means for a recipient to notify the sender that the message
has been accepted, read, whatever.

Generally, SMTP doesn't offer any benefits that require out-of-band
recipient->sender communications, though pertinent upgrades are of
course possible with SMTP (at substantial expense). SMTP does have
DSNs; these don't reliably notify *senders*, given the forgery (or joe
job) problem, and they're not elegant, but they're vaguely like what
IM2000 would provide out of the box

On the other hand, IM2000 is less reliable to the extent it *requires*
such communications, because the communications path a recipient must
use to contact the sender might not be available at the time the
communication is required.

The most glaring example of this is exposed by the "vanilla" IM2000
message-pull model, in which the end user decides to read an IM2000
message and the system can't pull it up because it is sitting on an
unreachable message store -- one that might have been reachable when
the notification was first sent.

To the extent SMTP is extended to *allow* this sort of communication,
it exposes itself to similar sorts of failures.

But at least SMTP users will have more flexibility in deciding just
how reliably they want certain message contents to be made available
to them. Whitelisted senders can be allowed to push messages via the
classic SMTP method, including the server accepting responsibility for
them, thus eliminating at least one, almost certainly two, potential
points of failure for each such delivery.

IM2000 users won't have the choice of making sure senders "push"
messages to recipients to ensure that subsequent reverse lookups
aren't needed and thus can't fail -- *unless* IM2000 is redesigned to
be a cleaner, more flexible SMTP.

And I am entirely in favor of a cleaner, more flexible SMTP, even if
it's called IM2000, and even if "uptake" remains a problem.

But I prefer to *conceptualize* in terms of a brand-new, clean system,
and then *actualize* in terms of incrementally improving SMTP in that
direction, in order to speed uptake of whatever I might think is
useful.

--
James Craig Burley
Software Craftsperson
<http://www.jcb-sc.com>

Brian Candler

2005-05-04 11:07:27 UTC

Post by James Craig Burley
I think this presents a *huge* problem for blacklists, because
spammers can easily dDOS them by sending out bazillions of
notifications containing arbitrary sender-account IDs "tupled" with
otherwise-legit mail stores.

Sure, any blacklist is subject to that kind of attack. Blacklists which run
on top of the DNS benefit from distributed caching.

Dealing with junk notifications is one of the problems I have with the
IM2000 approach. Even without blacklists, you will need to validate them
somehow (e.g. by doing a callback to the originating message store, or by
some cryptographic method). You don't want a blast of fake notifications to
drown out legitimate ones.

With SMTP you can of course open large volumes of SMTP connections to hosts.
This may trigger some sort of validation process (RBL lookup, sender
verification callback etc). I think the trick is to keep to a minimum the
amount of work which a random attacker can cause the system he/she is
talking to (and any referenced third party) to perform.

Post by James Craig Burley
My SMTP server does nothing but log those, and does only *internal*
lookups on domain names in envelope senders. (I use no RBLs,
amazingly enough, out of a sorta-principled desire to experiment with
anti-UBE measures that don't let untrusted SMTP clients use my server
as a proxy to beat up external shared DBs. Pretty much the only
useful "key" that a spammer can't forge is the incoming IP address of
the SMTP client -- but I don't do any external lookups on that either,
at the moment.)

That's a shame. You ought to try RBLs, because they are surprisingly
effective even in the current SMTP world, and to evaluate how blacklisting
through DNS works in practice.

I suggest you set up lookups against three RBLs:

(1) Known spammers and 0wned machines (e.g. sbl-xbl.spamhaus.org)
(2) Open relays (e.g. relays.ordb.org)
(3) Dynamic spam sources (e.g. bl.spamcop.net)

You can of course just configure your MTA to perform the lookups and add a
warning header, rather than rejecting the mail. You can then examine your
received spams and measure how effective they might have been.

Post by James Craig Burley
As they succeed in gumming up the works -- by discombobulating shared
(community) DBs that blacklist all the arbitrary [mailstore, senderID]
tuples spammers can come up with -- those "works" will either be
disabled so people can receive email (opening the floodgates to UBE)
or will keep mail from being successfully and quickly delivered.

I think notifications need to be validated, for example by
- a callback to the proported originating host; or
- cryptographic check using a key in the DNS

One notification generates one callback or lookup, so it's not a DoS
amplification service. The spammer won't actually achieve anything, since
the notification will be discarded having failed validation, so there isn't
anything to be gained.

Sure, they will have wasted some resources on some random machines on the
Internet - but there are plenty of ways to do that at the moment anyway.

Post by James Craig Burley
It's also not clear exactly how IM2000 will prevent spamming by
Seems to me such spam might be fairly successful in an IM2000 world,
unless sender addresses are normally hidden from users, which is not
what you appear to be advocating.

I did suggest that (sender fingerprints), but I think it's better that the
sender address *is* a genuine E-mail address, and that the notification is
discarded unless it is proved that this notification did originate from that
E-mail address. That gives phishing protection too.

Post by James Craig Burley
DATA
[...]
.

Errm, but the recipient address won't exist, so there will be a 550 error
after the RCPT command, and the DATA command will be rejected.

Post by James Craig Burley
So, say spammers learn that my (legit) mailstore tuple is
Now, they can't *send* IM2000 spam via that tuple, because they
(presumably) can't convince the im2000.comcast.net mailstore to serve
their spam under my senderID.
But that doesn't prevent them from sending out a bazillion spurious
notifications to everyone, claiming that a bunch of emails are at that
tuple, thus wasting everyone's time as they try to read those emails.
(That'll swamp im2000.comcast.net, but only with requests coming from
*real* people -- not necessarily as a result of notifications being
sent to nonexistent addresses, spamtraps, etc.)

Indeed.

Post by James Craig Burley
Some people will respond by "blacklisting" my tuple in their personal
lists.

No, clearly you cannot blacklist a notification based only on the
(potentially forged) information it contains.

You could however blacklist notification sources based on their IP address.
There may be people sending notifications from dynamic IP addresses, but
that's unlikely; legitimate IM2000 message stores will be on fixed IPs, and
the notifications will *normally* come from the message store itself.
However, the IM2000 protocol explicitly allows notifications to be
forwarded, which is a useful capability.

That's why I think it's probably better to validate notifications to prove
they came from the claimed message store (and discard them if not), rather
than maintain a separate IP-based blacklist of spurious notification
senders.

Spurious notifications are unlikely to be sent often, since they don't
achieve anything apart from a waste of resources, so I think it's unwise to
build a separate infrastructure to filter them out, one which won't be
tested very often.

Post by James Craig Burley
In any case, that'll swamp im2000.comcast.net such that it can't serve
*any* email, and there'll be nothing Comcast, nor recipients, can do
about it.

For each one notification message sent (TCP connection), the recipient will
open one TCP connection to im2000.comcast.net

The attacker could just open those connections herself, and apply a similar
kind of DoS anyway. However, this approach *does* hide the source of the
attacker.

Perhaps, when contacting im2000.comcast.net, the validation request should
include the purported source IP address of the notification? Clearly this
could be forged too, but if you get a lot of notifications from people all
over the Internet and they all claim to have received notifications from the
same IP, you have good evidence of the source of the attack.

As I say, this attack won't achieve anything (such as the delivery of a
spam), only wasted resources - so eventually the attacker is likely to get
tired. But I do agree with you that this needs to be considered carefully,
and the best solution arrived at such that this approach is at least no more
efficient than DoS'ing the message store directly.

Post by James Craig Burley
So, the "win" for IM2000 here is that spam can't be forged as coming
from someone else and still be delivered as spam.
The "loss" is that IM2000 pretty much requires the same sort of
"doublecheck" that people used to do with SMTP, when a server would
try to contact the envelope sender for untrusted email and see if
*his* SMTP server would VRFY that the address existed or accept a
bounce sent to that address (going as far as MAIL FROM:<> and RCPT
TO:<address> and checking the response, before discontinuing the
conversation).

Except that at the moment, an SMTP callback only verifies that the address
exists, not that the mail you're trying to receive was sent by that person.

Putting signed cookies in the envelope sender, such as SES/SRS/BATV are
proposing, does actually make that connection - although somewhat weakly,
because if a spammer captures one of your cookies, they could re-use it with
a different mail.

Post by James Craig Burley
Though IM2000 gives us the opportunity to improve the performance of
that sort of doublecheck, it does so only on a scalar basis (let's say
it makes it 10x more efficient), whereas spammers will have plenty of
incentive to ramp up attacks on certain popular, trusted mailstores at
a much higher rate than the performance is improved by IM2000.

Maybe. People do DDoS attacks on all kind of Internet infrastructure, for
all kinds of reasons. It's a pain, but you're not going to fix that without
fixing the Internet itself.

Post by James Craig Burley
Amusingly, I've recently started receiving spam that advertises its
products via ASCII art, including the phrase "BEST PRICES"! ;-)

Cool. You can't accuse them of lack of imagination. It's another reason why
(IMHO) content analysis will always be doomed, until we have genuinely
intelligent robots. And if they are that intelligent, they will wonder why
they are doing this job in the first place. "Filter my spam, Marvin" -
"Brain the size of a planet and they ask me to filter out spam..."

Post by James Craig Burley
I tend to think the biggest problem with SMTP is really the fact that
it (normally) prevents a server from accepting a message without also
accepting *responsibility* for that message -- which translates into
the need for generating a bounce.

That's pretty much essential in a store-and-forward model though. In the
current world, all MUAs are set up to send outbound mail via a
store-and-forward server (and indeed, blacklists are set up to *refuse* mail
sent directly from end-user IP addresses, so replacing MUAs with something
cleverer which followed MX records in the DNS would not work)

Once you've handed the message over to the S&F server for delivery, if it
fails, it needs some way to tell you so. That need not be in another E-mail
message, but if not then there's an out-of-band notification required; and
to prevent this channel from being used for spamming, it needs to be
unforgeable.

But if I can only reliably blacklist on IP address, I am limited when it
comes to shared mail relays at ISPs. If 1 million users are all relaying
through one ISP's mail relay (a common case), plus 10 spammers, I either
have to blacklist the entire mail relay and suffer collateral damage, or I
have to accept the spam.

Right, I understand all that. But that is simply a Very Hard Problem
anyway; as I pointed out earlier, even the "hotpop"-like ads in the
headers of otherwise-legit email could be defined as "spam".

Ah well, then you have a single E-mail with a split personality: part
legitimate, part spam. If you dislike it sufficiently, then you can refuse
to communicate with that person, or ask them to use a less obnoxious E-mail
service.

Post by James Craig Burley
And, as I point out above, if you resort to blacklisting [mailstore,
senderID] tuples of "known spammers" at otherwise-legit mailstores,
- Overload the blacklist(s) you're using with all the spurious
senderID's that spammers can make up, or
- Overload the mailstore(s) with requests to validate whether any
given senderID a spammer might invent actually has a legit account
there (equivalent to SMTP VRFY or RCPT TO)

I don't see that *necessarily* follows, nor that the overloading you're
proposing is any worse than a direct DoS attack from the spammer to the
message store or to the blacklist.

Post by James Craig Burley
This problem could be mitigated by insisting that outgoing message
notifications go *through* (and thus be vetted by) mail stores, so
only "legit" email addresses are exported.
But that can be done for SMTP anyway, and in fact is done, sorta
kinda, in various ways. E.g. some ISPs deny ordinary users outgoing
TCP connections over port 25; filter incoming connections to their
internal SMTP servers to ensure MAIL FROM:<whatever> identifies an
addy the customer (identified by incoming internal IP address)
actually owns; provide AUTH access for external use by legit
customers; etc.

It can be done, but in general it isn't. Outgoing relays would have to
become more complex to perform AUTH database lookups, and huge customer
bases would have to be migrated over to using SMTP AUTH. Only when the last
customer has switched to using SMTP AUTH, and relaying by source IP address
only is disabled, can the source be considered 'trustworthy'.

And then there's the problem of which E-mail addresses a particular user is
allowed to use; either many local databases will have to be built, or one
distributed one.

As an example, I login to smtp.example.net as user "foo123", and then I want
to send outgoing mail with MAIL FROM:<***@pobox.com>.

Either:
(1) The ISP maintains a local list which ties foo123 <=> ***@pobox.com
Every time a new user comes along, they will have to state which E-mail
addresses their account is allowed to use, *and* prove to the ISP's
satisfaction that they do in fact own those accounts.

(2) There is a global database. As owner of ***@pobox.com, I ask the
pobox.com domain owner to publish information in the DNS which ties
[foo123, smtp.example.net] to my address, SPF-style.
There are problems with this if the ISP decides to change their
infrastructure, such that outgoing mail starts appearing from a different
machine or IP address. This would invalidate the policy.

(3) The ISP rejects all mail from me unless it has
MAIL FROM:<***@example.net>. This is unacceptable for lots of reasons:
- This may be a temporary account, e.g. at a cybercafe; I will need to
receive bounces after I have left that location.
- I may be sending a bounce (MAIL FROM:<>)
- There are many legitimate reasons for using variable envelope senders;
e.g. I may be sending using VERP, SES, SRS or BATV envelopes.

Anyone who did this would break all their users, and yet there would be
little benefit unless the *whole* Internet did it (which will never happen
even if this were decided to be 'best practice')

Seriously, changing to something different (maybe IM2000, maybe not) would
be easier. You can have a phased migration. Over time, you would give more
credence to your new mail and less to SMTP mail.

Post by James Craig Burley
E.g. I don't really care whether you actually are Brian Candler,
because I'm not investing anything in the proposition that you are who
you say you are. So resources I might otherwise expend to validate
your identity are spent on things I think are more worthwhile, such as
considering your viewpoints, responding to them, etc. You could be a
very intelligent German Shepherd, for all I know (but, on the
Internet, nobody knows etc. ;-).

Woof :-)

I'm not sure why that's hard for SMTP -- aren't some ISPs doing that
already?

Because
- mail relays generally don't have an unambiguous indication of *who* the
person is sending through them (without SMTP AUTH, or some sort of
callback into a RADIUS accounting system)
- mail relays are typically built in clusters, so in order to count the
number of messages sent by a particular user in a particular period of
time, you would need to use a central database

Certainly you can rate-limit the number of messages sent within a single
SMTP session, but that's not very useful, because
- the sender can open multiple SMTP sessions
- a typical usage pattern for legitimate users is to compose a number of
mails locally on their machine, dial up, and then send them out
(which means the thresholds have to be set very high)

...[forcing the world to adopt SMTP AUTH]...

However, this ain't going to happen. The pain is huge, and the short-term
gain is negligible.

How will the equivalent *not* have to happen with IM2000?

You can start setting people up on IM2000, and include gateways to SMTP
(inbound and outbound).

People can immediately see which mail has come in through IM2000, and which
through SMTP. They can use this to make value judgements on their mail. It
would particularly benefit closed user groups (e.g. companies, groups of
friends) where they know the other party is on IM2000, as it means these
identities would be forged if they come in via SMTP.

You can apply more stringent filters on SMTP mail. As more and more of your
correspondents are on IM2000, the less worried you are about false positives
on SMTP. For granny who only wants to communicate with grandchildren, SMTP
could be disabled entirely for that account.

Over time, though, IM2000 becomes less and less of a "closed user group" of
course. Actually, I think a mail replacement architecture should explicitly
support the idea of closed user groups. And possibly instant messaging too.

Post by James Craig Burley
Okay, then this is not a case of *technical* advocacy, more a case of
saying "it's a whole new system, we think it'll be fairly spam-free
compared to SMTP" -- and in fact it probably will for some time -- and
hoping to convince enough early adopters to jump on board, quickly
enough, that everybody else, except spammers, switches soon after.

No, if I believed that then I wouldn't suggest it.

Any new solution has to be a *permanently* harder environment for spammers
and fraudsters to work in. I think there are some strong arguments why
IM2000 could have an anti-spam infrastructure which works much better and at
much lower cost (of complexity and false positives) than SMTP, which I
documented in my comments on the web.

Post by James Craig Burley
In any case, IM2000 does not, in any way, shape, or form I can see,
dispense with the need for content analysis. As an "illegit"
mailstore first comes online, or a [legit-mailstore,
evil-spammer-using-it] tuple springs into existence, *somebody* will
have to take the first steps towards discovering that email springing
from that source should have the Evil bit set.

Certainly, and that's why we need the third parties (blacklists) to help us
with that.

But IM2000 allows alarm bells to trip quickly based on message sources,
potentially to perform some first-line content filtering, then to highlight
the source to a human, who makes the final decision as to which button to
press (e.g. blacklist sender account, blacklist mailstore). Borderline
sources can be greylisted temporarily until this decision has been made.
Once the source has been blacklisted, anyone who logs in later will not even
see the notification.

If a spam broadside only gets in front of a few dozen eyeballs then the
spamming attempt won't have been worthwhile, and they will give up.
Furthermore, if there is better evidence of where the spam came from, then
anti-spamming laws might be more effective.

Post by James Craig Burley
IM2000 ultimately puts us in the position of hoping not just that this
will be required *less* often, as a % of overall email sent, but that
the economics will ultimate in spammers giving up and going away.
I'm not seeing a clear case being made for that scenario coming to
pass, certainly not clearly enough such that it justifies the
deployment of IM2000 as it now stands.

Well, maybe we have to build it and see. As well as writing IM2000, we write
an IM2000 spam-sending toolset and publish it too, to check that things work
as we expect.

Right, it's an arms race, we all know that already. None of what you
have said on that topic is news to me; I doubt it's news to anyone
else here.

It's the same point - DoS is DoS. It doesn't really gain them anything more
than a direct DoS assault on whatever it is they're trying to attack.

But it's still not going to be as strong. For example, counting the number
of mails sent by a particular user is still quite hard. Consider an SMTP
relay cluster of 10 machines; they would need to communicate with a central
database to count the number of mails sent by one user over 24 hours.

How is that different from an IM2000 mailstore cluster of 10 machines?

Most likely each user would have an account on a single machine in the
cluster, and you'd spread your users across them. Then it's easy.

But even if you decided to have a cluster with a shared NFS backend, and
users could connect to any of those 10 machines to submit a new mail: all
you need to do is maintain a state file within the users' mailstore area
giving a history of sent mail. Each new submission adds to that history, and
may cause a threshold to be reached.

This is unlike SMTP, where there is no permanent record of mails passing
through the system, and all outbound mails are lumped into a single queue,
not a separate queue for each user.

And you'll still need a new blacklist infrastructure for looking up
[IP,auth-sender] instead of just IP address.

Sounds like item 2a on your web page! Again, how is it different?

That's the point. Moving to a parallel SMTP world would be pretty much as
difficult as switching to IM2000.

Post by James Craig Burley
What it suggests is that SMTP could evolve to incorporate IM2000
concepts in ways that *allow*, but don't *require*, clients and
servers to cooperatively use them.

I'm happy to see specific proposals here, and as you're no doubt aware, a
lot of work is going on already.

The trouble with greylisting by giving a 4xx response after the DATA phase
(or unceremoniously dropping the connection) is that there are plenty of
broken MTAs out there which are likely to treat this is a permanent
rejection.

Like SPF, you may end up breaking more than you fix.

Post by James Craig Burley
But I prefer to *conceptualize* in terms of a brand-new, clean system,
and then *actualize* in terms of incrementally improving SMTP in that
direction, in order to speed uptake of whatever I might think is
useful.

Well, that's reasonable enough. But the extended SMTP may look not very much
like SMTP by the time the job's done :-)

One example you didn't mention was the problem of making a canonical hash of
a message (for detecting tampering, for applying signatures and so on). Too
many SMTP agents mangle messages as they go through, and indeed the SMTP
specification *requires* them to do so: to add Received: headers, to add
other missing headers, to correct linebreaks into the canonical \r\n form,
to break lines if they are longer than a certain limit, to reformat 8-BIT
encodings as 7-BIT if the receiving SMTP server doesn't implement 8BITMIME,
and so on.

If we switched everyone to using SMTP CHUNKING (BDATA), that would be a
start. But we would then need an extension not to interfere with headers,
and perhaps a way to keep things in the envelope such as Received:
timestamps and signatures.

Ultimately, I believe a new messages store with an SMTP gateway would
probably end up being easier to build *and* easier to interwork with the
rest of the Internet... and it could deliver tangible benefits immediately.

Regards,

Brian.

James Craig Burley

2005-05-05 06:46:50 UTC

Sure, any blacklist is subject to that kind of attack. Blacklists which run
on top of the DNS benefit from distributed caching.

I've previously posted, at some length, about the dangers of relying
on distributed (DNS-style) caching as an effective solution to the
problem of giving attackers direct ability to trigger database lookups
(and especially inserts) based on arbitrary keys.

(IMO, distributed DNS-style caching is to this sort of attack as more
CPU and RAM are to a brute-force implementation of an NP-complete
problem such as TSP as the problem sizes increase.)

Post by Brian Candler
Dealing with junk notifications is one of the problems I have with the
IM2000 approach. Even without blacklists, you will need to validate them
somehow (e.g. by doing a callback to the originating message store, or by
some cryptographic method). You don't want a blast of fake notifications to
drown out legitimate ones.

Yup.

Post by Brian Candler
With SMTP you can of course open large volumes of SMTP connections to hosts.
This may trigger some sort of validation process (RBL lookup, sender
verification callback etc). I think the trick is to keep to a minimum the
amount of work which a random attacker can cause the system he/she is
talking to (and any referenced third party) to perform.

Yup. A key insight, I think, is that *most* attacks are designed to
cause a recipient to read spammer's mail. That is, they are rarely
designed to cripple the ability of a recipient's system to *receive*
email. Instead, attacks are directed at components of the system upon
which a recipient system might rely to distinguish welcome from
unwelcome email (that is, to designate UBE).

Therefore, it seems best to try to design any email system such that
the receiving system relies as little as possible on external
entities, especially third parties, to decide whether to accept, or
how to prioritize, incoming email.

IM2000 clearly violates this design goal. That doesn't mean it's
fatally flawed; it just means a whole lot of thought and effort must
go into protecting those third parties (mail stores, blacklists,
etc.), since they'll be easy to attack without necessarily destroying
the ability of recipients to read email under normal circumstances.

That's a shame. You ought to try RBLs, because they are surprisingly
effective even in the current SMTP world, and to evaluate how blacklisting
through DNS works in practice.

I don't question that. "Everyone else" uses RBLs. So I don't think I
have a whole lot more to learn, and therefore to add to the pool of
expertise, by using them. (My priority isn't so much to *stop* UBE
coming into my system as to *study* it. Secondarily, I want to slow
it spreading to other systems down by actually letting more of it
enter my system without bothering me, personally, or anyone else.)

Post by Brian Candler
One notification generates one callback or lookup, so it's not a DoS
amplification service. The spammer won't actually achieve anything, since
the notification will be discarded having failed validation, so there isn't
anything to be gained.

Yup. If there's a protocol by which a 3rd party can say "you've sent
me too many misdirected callback/lookup requests, please throttle it
down", that might be helpful.

I tend to agree. In fact, *personally*, I really don't mind seeing
short spammy messages in notifications, as long as I can quickly skim
and skip them, and as long as I'm aware they didn't consume much in
the way of my system's resources to receive them. I think most users
would also be willing to live with that.

(The spammier the notification, the more certainly I know to not
bother reading the message that accompanies it.)

This leaves the problem of phishing, but it might actually be easier
to teach people that "messages" themselves do *not* actually, or
legitimately, arrive as *notifications* themselves -- that
notifications should *only* tell them about the existence of messages.

(In short, there are to be no email equivalents to postcards, where
the message is written on the outside of the "envelope". Or, perhaps
the equivalent to a postcard could be expressly designed into the
system, as it does have utility as a concept, as long as users are
easily able to understand the implications of sending and receiving
messages in that form.)

Post by James Craig Burley
DATA
[...]
.

Errm, but the recipient address won't exist, so there will be a 550 error
after the RCPT command, and the DATA command will be rejected.

Whoops, good catch! Lemme try that again:

MAIL FROM:<***@http.221.148.254.170>
RCPT TO:<***@jcb-sc.com>
DATA
[...]
.

Does that make more sense?

I'm not sure how any useful email system blocks such messages, and I'm
not really sure *why* it should. Especially as standalone
IM2000-style notifications, it seems like a case of diminishing
returns to try to erect a defense against ordinary users seeing ones
like this such that they actually try to visit the web site in
question.

("Overhead" and "bandwidth" cease to be valid concerns when such
notifications are so cheap and easy to send to legit mailboxes. If
too many spammers send them at once, end users will quickly tire of
them, never do the hand-copy-paste-edit thing into their browser, and
the market value of such spam will plummet. If spammers send such
spam to too many spamtraps, their source IPs will be easily blocked
and pending notifications from them safely and easily dropped. All of
this applies to SMTP email as well, except that unless something like
greylisting is used, the recipient's system has to receive the message
contents even if the end user would choose to skip reading it.)

Post by James Craig Burley
Some people will respond by "blacklisting" my tuple in their personal
lists.

No, clearly you cannot blacklist a notification based only on the
(potentially forged) information it contains.

Whew, okay, but I hope *everyone* (who might implement or admin
IM2000) gets that point!

Post by Brian Candler
You could however blacklist notification sources based on their IP address.

Which takes us back to where we are right now, since we can (and do)
do that now...?

Post by Brian Candler
There may be people sending notifications from dynamic IP addresses, but
that's unlikely; legitimate IM2000 message stores will be on fixed IPs, and
the notifications will *normally* come from the message store itself.

Right. I'm still wrapping my head around IM2000's potential
architectures and designs.

Post by Brian Candler
However, the IM2000 protocol explicitly allows notifications to be
forwarded, which is a useful capability.

More than useful: critical. It's not clear IM2000 will fly if it
doesn't do full-blown relaying a la SMTP, but certainly the
notifications themselves need to be relayable.

Post by Brian Candler
That's why I think it's probably better to validate notifications to prove
they came from the claimed message store (and discard them if not), rather
than maintain a separate IP-based blacklist of spurious notification
senders.

Ah, okay.

Post by Brian Candler
Spurious notifications are unlikely to be sent often, since they don't
achieve anything apart from a waste of resources, so I think it's unwise to
build a separate infrastructure to filter them out, one which won't be
tested very often.

In essence, a notification becomes a handshake between a message store
and a recipient (agent). That could be handled solely via a TCP
connection a la SMTP, in which case the client's IP address can't be
(easily) forged.

Post by James Craig Burley
In any case, that'll swamp im2000.comcast.net such that it can't serve
*any* email, and there'll be nothing Comcast, nor recipients, can do
about it.

For each one notification message sent (TCP connection), the recipient will
open one TCP connection to im2000.comcast.net

Might as well have im2000.comcast.net itself notify via TCP connection
to the recipient? What is gained by waiting for the recipient to
decide to try to validate the sender, if we're assuming recipients
will always want to do so?

Oh, well, one thing: when recipients trust the sending IP, for
example, then there's no need for a validation, so no need for
anything other than an incoming UDP packet containing a notification.
(And maybe a UDP packet in return saying "got it", so no further
notifications need be sent?)

This is getting down below the level of detail we need to understand
the ramifications, I tend to think. You seem to understand the need
to minimize amplification, so I'm less worried about it being a
problem in any system you design (or vet).

Post by Brian Candler
The attacker could just open those connections herself, and apply a similar
kind of DoS anyway. However, this approach *does* hide the source of the
attacker.
Perhaps, when contacting im2000.comcast.net, the validation request should
include the purported source IP address of the notification? Clearly this
could be forged too, but if you get a lot of notifications from people all
over the Internet and they all claim to have received notifications from the
same IP, you have good evidence of the source of the attack.

Yup. I've long (and I mean for *decades*) assumed this kind of
automated determination of trustability based on sufficient numbers of
*reasonably* trusted neighbors would be necessary for goodness to
thrive and evil to be vanquished on a hostile global network.

Post by Brian Candler
As I say, this attack won't achieve anything (such as the delivery of a
spam), only wasted resources - so eventually the attacker is likely to get
tired. But I do agree with you that this needs to be considered carefully,
and the best solution arrived at such that this approach is at least no more
efficient than DoS'ing the message store directly.

We're definitely on the same page here.

Except that at the moment, an SMTP callback only verifies that the address
exists, not that the mail you're trying to receive was sent by that person.

Well, yes, but what I was referring to was that such callbacks were
considered highly annoying by (sysadmins for) victims of joe jobs,
weren't they?

IM2000 can (and IMO will) make them *cheaper*, but will they be cheap
*enough* to be sure *all* sysadmins will find such callbacks, which
appear to be intrinsic to a deployable IM2000 (*replacing* SMTP),
acceptable?

(I have to phrase these as *questions*, since that episode apparently
predates my awareness of how SMTP works, my experience as an SMTP
server admin, etc. I've only read some short descriptions of it.)

Maybe. People do DDoS attacks on all kind of Internet infrastructure, for
all kinds of reasons. It's a pain, but you're not going to fix that without
fixing the Internet itself.

The main problem, besides widespread lack of upstream propagation of
filtering rules in our infrastructure, is when packets coming in from
a given IP address represent an arbitrary *mix* of "good" and "bad".

If IM2000 implies callbacks on all message notifications, then it
implies spammers can attack the callback mechanism *directly* by
sending spurious notifications to a large-bandwidth site, with such
notifications purporting to be from a small-bandwidth site.

Once the small site is drowned by the large site's overwhelming stream
of callbacks, such that it either ignores (or responds negatively or
positively to) *all* callback notifications or tells the large site to
stop sending notifications for awhile, the spammers have either
succeeding at DDoS'ing all IM2000 email sent from the small to the
large site, or at getting the large site to accept *any* IM2000 email
sent to the large site and *purporting* to be from the small site.

This is why I actually *want* an infrastructure that encourages email
to be sent directly from even dynamic-IP hosts (like laptops) to their
recipients, with as few intermediaries as possible, and without
relying on callbacks to validate email.

(When I think "callback" I think "potential blowback". I want to be
*sure* that the former can't be turned into the latter.)

Post by James Craig Burley
Amusingly, I've recently started receiving spam that advertises its
products via ASCII art, including the phrase "BEST PRICES"! ;-)

Heh. Or, content analysis can decide it can't understand the content
sufficiently and return it to sender (somehow) with a message "please
express this succinctly, in plaintext [English|French|Klingon]".

That's pretty much essential in a store-and-forward model though. In the
current world, all MUAs are set up to send outbound mail via a
store-and-forward server (and indeed, blacklists are set up to *refuse* mail
sent directly from end-user IP addresses, so replacing MUAs with something
cleverer which followed MX records in the DNS would not work)
Once you've handed the message over to the S&F server for delivery, if it
fails, it needs some way to tell you so. That need not be in another E-mail
message, but if not then there's an out-of-band notification required; and
to prevent this channel from being used for spamming, it needs to be
unforgeable.

Right. Though, my dynamic-IP host actually tries sending directly
first, and automatically falls back to my upstream ISP's SMTP server
only if that fails (in certain ways). (I wrote the patch that makes
qmail-remote work that way. It became pretty popular among some qmail
users in boats a lot like mine. It's comforting to see most of my
outgoing messages go directly to the MX for a recipient, in my mail
logs, instead of always going upstream to a relay, at which point my
ability to track delivery status evaporates.)

Anyway, what if *responsibility* does not accompany relaying as you
describe?

In that case, the MUA can't expect a bounce or DSN to ever come back
in case of failure (or success). Which is pretty much just as well,
from a design point of view, because DSNs on *successful* delivery
aren't really working well enough in SMTP to be useful; waiting as
much as two weeks for a bounce that might never arrive is not a
reliable way to be sure an important message actually reached the
destination; the infrastructure to allow bounces to be sent back is
expensive, as we've been discussing; and when that infrastructure
isn't working, even a successfully-delivered message can't result in a
DSN announcing that success.

Instead, what if the MUA periodically contacted the recipient (or the
same store-and-forward, or upstream, relay) and asked it "how is
delivery of that message progressing?".

It would have to allow not only for answers equivalent to the
2xy/4xy/5xy responses we presently have, but for "it appears to have
been lost, please resend", since the MUA would have responsibility for
that message.

That would *eliminate*, by design, the *need* for any sort of callback
system (although one could still be provided, and used on an
as-needed, optional basis).

Add (to a mythical new system a la IM2000) a more reliable way to
detect and eliminate duplicate message deliveries, and an MUA can
handle "important" deliveries by simultaneously delivering an email to
its upstream relay *and* the ultimate destination, in case the
connection to one or the other times out, or the delivery fails or is
rejected on one side or the other, before the laptop running the MUA
is disconnected or powered off.

On the other end, as the duplicate is detected and eliminated, it
could be noted that it appeared to have been *legitimately* duplicated
in order to indicate importance to the *sender* and, perhaps,
prioritized accordingly for the recipient. (If you receive the same
message via two routes, one of which is direct from a dynamic IP host
and the other is via a trusted relay, you can be pretty sure it's a
legit message even if you've never heard from the sender before.)

On the other hand, a site-wide receiving agent might notice a ton of
*similar* messages sent to all sorts of addresses for which it is
responsible -- including spamtraps -- all containing URLs like
enl4rg3y3r0rg4n.example.com.

In that case, it could assign all those messages very *low* priority
for all its recipients. Or it could just delete the messages and
remember that the originator sent what appears to be UBE. (Remember,
it wouldn't have accepted *responsibility* for those messages; just
the notifications and the contents. So it can just "drop" messages.)

Meanwhile, without accepting responsibility, all incoming-email
servers are able to accept messages without ever sync()'ing them to
disk, among other way cool stuff.

There would need to be substantial changes in email infrastructure
throughout to properly support such a system.

As it happens, many similar improvements have to be made to properly
support IM2000.

Right, I understand all that. But that is simply a Very Hard Problem
anyway; as I pointed out earlier, even the "hotpop"-like ads in the
headers of otherwise-legit email could be defined as "spam".

It's an extreme case of the mixed-source problem I mentioned earlier,
so it is illustrative, yet insoluble in the general case. (Whether a
reader considers the non-spammy part of the message sufficiently
worthwhile to have to see the spammy part is *purely* an end-user
decision. Nothing short of pure AI can reliably and correctly make
that decision; even human beings can't make it, in some cases, until
long after they've read the message.)

That is, to the extent we (say, IM2000 enthusiasts) assume we're going
to move to a few, "trusted", intermediaries for exchanging email --
whether those intermediaries are IM2000 mailstores or SMTP relays --
we have to accept that there *will* be a mix, tolerable to some but
not all, of advertising accompanying messages coming from some or all
of those intermediaries.

It helps me remember to doubt any "promise" that anyone makes that
some new system, filter, whatever, will "eliminate spam", because I'll
always remember to say "what, exactly, *is* spam, and how is that
system or filter sure it'll eliminate only what *I* consider to be
spam?".

IM2000 and, I believe, ecrulisting or something along those lines,
both really promise only to make the "B" part of the "UBE"
sufficiently more costly, relative to the "E" part, that genuine spam
becomes much less prevalent. (No proposals, of which I'm aware, truly
deal with the "U" aspect of "UBE", even though techniques like
whitelisting and greylisting touch on it.)

I don't see that *necessarily* follows, nor that the overloading you're
proposing is any worse than a direct DoS attack from the spammer to the
message store or to the blacklist.

A direct DoS attack is *also* a way to overload any central blacklist.
That's why it's unwise for a SMTP server to *rely* on any external
blacklist to accept incoming email, unless it can afford to not
receive incoming email.

But with a traditional DoS, a blacklist server can presumably block
all packets coming from a particular IP or IP range for the duration
of the DoS, and still serve requests from other portions of the 'net.

To really do damage, a spammer needs a DDoS -- an attack coming from
all sorts of 0wned machines all over the 'net.

But even *these* machines can be detected and blocked by the blacklist
owner as, over time, the pattern of the DDoS becomes distinguishable
from the pattern of requests from legitimate sources.

Requests regarding blacklisting from *legitimate* sources, however,
can be used to DDoS the blacklist provider in ways that *it* cannot
possibly resist, without becoming useless *as* a blacklist provider.

For IP-based BLs, this is somewhat mitigated by the fact that the
space of potential IPs is limited, and once a blacklist knows an IP
address is a source of bad requests, it can tell its clients to
blacklist that source *entirely*, meaning it won't even see incoming
*notifications* from such sources.

IM2000-based blacklisting would have to get pretty involved to be able
to do stuff like that. E.g.:

- 0wned machine stuffs spam into message store

- Store notifies recipient

- Recipient asks blacklist about [store, sender] combo

- Blacklist detects attempt to DDoS it via such requests, so
gets IP address of 0wned machine from recipient somehow

- Blacklist suggests store not accept further notifications
from recipient for a period of time

- In meantime, blacklist stops adding anything to its data
base regarding [store, *] notifications originating from
0wned machine

I'm not saying it ain't possible.

I *am* seriously suggesting it's *so* intricate that we might as well
do *half* this much work to make SMTP work just as well, if not
better, than IM2000 might work even in a "perfect world" of 100%
deployment without worrying about coexisting with SMTP.

Keep in mind that once you design callbacks into a system, you
essentially turn *every* system that sends email into a sort of
blacklist server, since it must accept and deal with callbacks, which
in turn are used to deny acceptance of forged email from spammers.

Therefore, *any* such system is potentially a target for attack just
as would be any of *today's* blacklist servers.

I am really, really worried about rolling out any design that makes
ordinary servers (like mine) targets of DDoS-like attacks by spammers
via *legitimate* hosts.

It can be done, but in general it isn't.

This is the line of reasoning you use to which I most object.

"Can be done but isn't" => "*IM2000* can be done but isn't".

Sysadmins and the suits who make the decisions aren't looking for a
whole new system to roll out. They're looking for *incremental*
improvements to the current one.

What is it about IM2000 that makes any of us think it'll be the
"killer app" that replaces SMTP? That's what I think this discussion
really boils down to, beyond the (IMO highly useful) overview of
various email issues.

(This discussion almost belongs on a metalist discussing various
potential and existing messaging systems, but I don't know of any
offhand.)

Post by Brian Candler
Outgoing relays would have to
become more complex to perform AUTH database lookups, and huge customer
bases would have to be migrated over to using SMTP AUTH. Only when the last
customer has switched to using SMTP AUTH, and relaying by source IP address
only is disabled, can the source be considered 'trustworthy'.
And then there's the problem of which E-mail addresses a particular user is
allowed to use; either many local databases will have to be built, or one
distributed one.
As an example, I login to smtp.example.net as user "foo123", and then I want
Every time a new user comes along, they will have to state which E-mail
addresses their account is allowed to use, *and* prove to the ISP's
satisfaction that they do in fact own those accounts.

This isn't a problem if the envelope sender is required to be
***@example.net.

If it can be anything arbitrary, then IM2000 will have similar
problems, correct?

Post by Brian Candler
pobox.com domain owner to publish information in the DNS which ties
[foo123, smtp.example.net] to my address, SPF-style.
There are problems with this if the ISP decides to change their
infrastructure, such that outgoing mail starts appearing from a different
machine or IP address. This would invalidate the policy.

I don't quite see how IM2000 makes this any easier.

Post by Brian Candler
(3) The ISP rejects all mail from me unless it has
- This may be a temporary account, e.g. at a cybercafe; I will need to
receive bounces after I have left that location.

IM2000 eliminates bounces, but only *after* SMTP is basically
eliminated from the Internet. Until then, IM2000 will have the same
basic problem: a message store will have to validate whatever
*arbitrary* sender address is assigned to a user's outgoing message,
or worry that the user might send joe jobs.

Even once bounces are eliminated, how does an IM2000 *sender* actually
know whether an outgoing message has been received, if she's moved on
from her cybercafe? Same problem: either use a consistent address, or
"log on" to the same mail store via something akin to AUTH and require
the message store to validate against a list of legitimate sending
addresses.

Post by Brian Candler
- I may be sending a bounce (MAIL FROM:<>)

This gets into the whole issue of whether it's acceptable for IM2000
to not provide a way for a recipient to send back a corresponding
automatic notification of some sort to a sender.

But I believe that bounces are the biggest problem with SMTP anyway,
which is why I suggest we move away from them, one way or another (by
moving to IM2000, by slowly moving towards something like ecrulisting
if it's feasible, or by moving to a new push-based, bounce-free
system).

Post by Brian Candler
- There are many legitimate reasons for using variable envelope senders;
e.g. I may be sending using VERP, SES, SRS or BATV envelopes.

I don't know BATV offhand, but the rest strike me as kludges to work
around SMTP's built-in limitations. IM2000 doesn't appear to solve
the problems requiring VERP, though it certainly changes the landscape
(possibly so much that we can't recognize the new problems it'd
introduce). SES and SRS are, I believe, obviated by IM2000, at least
in an SMTP-free world. I don't know what BATV is offhand -- "Bounce
And Tell Vinnie"?. ;-)

Post by Brian Candler
Anyone who did this would break all their users, and yet there would be
little benefit unless the *whole* Internet did it (which will never happen
even if this were decided to be 'best practice')

I don't quite buy that argument. The first part of it is true for
adopting IM2000 anyway. With regard to the second, it seems to me
that it can be up to each "trusted" relay to decide whether and how
best to validate users that submit messages to/through it.

For example, there's not really *that* much difficulty, as far as I
can see, in having a relay handling 1M users not only validate their
outgoing envelope senders against some list, but allow those senders
to be VERP-encoded to boot.

Post by Brian Candler
Seriously, changing to something different (maybe IM2000, maybe not) would
be easier. You can have a phased migration. Over time, you would give more
credence to your new mail and less to SMTP mail.

I'm not convinced it'd be easier to change to IM2000, though perhaps
to something different.

As to giving "more credence" to messages sent via the new system, that
general capability (of prioritizing messages for recipients based on
various criteria) is necessary for my ecrulisting proposal. It's
going to be necessary for IM2000 anyway, so we might as well take
advantage of it and try it out within the current SMTP infrastructure
(which ecrulisting exploits, without really committing any particular
portion of the SMTP world to use or accommodate it, as it coexists
with existing standards and *most* practices).

I'm not sure why that's hard for SMTP -- aren't some ISPs doing that
already?

Because
- mail relays generally don't have an unambiguous indication of *who* the
person is sending through them (without SMTP AUTH, or some sort of
callback into a RADIUS accounting system)

Huh? Source IP isn't enough to disambiguate a paying customer? Or do
you want to include roaming users?

What does IM2000, or any new system, offer that makes this a
non-issue?

Post by Brian Candler
- mail relays are typically built in clusters, so in order to count the
number of messages sent by a particular user in a particular period of
time, you would need to use a central database

See below, where this is revisited.

Post by Brian Candler
...[forcing the world to adopt SMTP AUTH]...

However, this ain't going to happen. The pain is huge, and the short-term
gain is negligible.

How will the equivalent *not* have to happen with IM2000?

You can start setting people up on IM2000, and include gateways to SMTP
(inbound and outbound).
People can immediately see which mail has come in through IM2000, and which
through SMTP. They can use this to make value judgements on their mail. It
would particularly benefit closed user groups (e.g. companies, groups of
friends) where they know the other party is on IM2000, as it means these
identities would be forged if they come in via SMTP.
You can apply more stringent filters on SMTP mail. As more and more of your
correspondents are on IM2000, the less worried you are about false positives
on SMTP. For granny who only wants to communicate with grandchildren, SMTP
could be disabled entirely for that account.
Over time, though, IM2000 becomes less and less of a "closed user group" of
course. Actually, I think a mail replacement architecture should explicitly
support the idea of closed user groups. And possibly instant messaging too.

Wait, wasn't SMTP AUTH described here as a way for a legit client to
deposit an email on a widely trusted relay?

What I'm asking is, how does a legit IM2000 client deposit an email on
a widely trusted mailstore with something a lot like AUTH?

No, if I believed that then I wouldn't suggest it.
Any new solution has to be a *permanently* harder environment for spammers
and fraudsters to work in. I think there are some strong arguments why
IM2000 could have an anti-spam infrastructure which works much better and at
much lower cost (of complexity and false positives) than SMTP, which I
documented in my comments on the web.

Indeed. That's what got this discussion started.

My concern is that IM2000 isn't *enough* permanently harder, and has
an excessively high built-in failure rate for *legit* email due to its
additional points of failure, to justify the expense of rollout.

I've already identified a few ways in which SMTP can (and in some ways
already does) offer what you identify as IM2000's advantages.

One clear example where SMTP does *not* do so is your item 2f (IIRC).

In particular, there's no way in SMTP for a server to respond to a
notification of a message with something meaning "please relay that
through one of the following servers: ...".

That would be quite useful. The fact that an IM2000 recipient can
simply toss a notification to a trusted 3rd party so *it* can pull up
the message is, IMO, a *big* advantage to IM2000.

I haven't yet gone over all your items in such detail. So I'm not
convinced either way; IM2000 might indeed have just enough in its
advantage to justify rollout.

Post by Brian Candler
Furthermore, if there is better evidence of where the spam came from, then
anti-spamming laws might be more effective.

Yup, this is why item 2f is such a big win for me. Instead of
recipients of spam forwarding (possibly forged) spam to a government
agency, imagine how much cleaner the system would be if (IM2000)
recipients simply forwarded *notifications*, allowing the agency to
retrieve the *contents*, which, of course, couldn't be claimed to be
forged without implication the agency *itself*.

(Of course, said government agency would employ its own army of 0wned
machines, or their equivalent, so a spammer's mail store would have a
much harder time responding differently to a knock on its door by the
email equivalent of a 'narc. ;-)

Well, maybe we have to build it and see. As well as writing IM2000, we write
an IM2000 spam-sending toolset and publish it too, to check that things work
as we expect.

Hey, I was hesitating about bringing that up anywhere, but it might
actually be worth deploying *free* spam-sending software widely and
publically, in order to help knock the profitability out from under
the UBE industry.

Of course, it would be a risk taken under the assumption that the
world will be able to deal with the results. Kinda like governments
giving away various drugs in order to knock the profitability out from
under the illegal drug trade.

Post by James Craig Burley
So IM2000 gives spammers a *guaranteed* weapon to use against DNS and
blacklists that SMTP doesn't assure them

It's the same point - DoS is DoS. It doesn't really gain them anything more
than a direct DoS assault on whatever it is they're trying to attack.

Again, I'm not so sure. I *think* it's easier to block a direct DoS
than a DDoS, and, in turn, easier to block a DDoS than a coordinated
attack where all the requests are coming from legit sources.

And, again, I hesitate to fundamentally denote each and every host
that sends legitimate email as a key component in a global blacklist
and, therefore, a target for spammers.

How is that different from an IM2000 mailstore cluster of 10 machines?

Most likely each user would have an account on a single machine in the
cluster, and you'd spread your users across them. Then it's easy.
But even if you decided to have a cluster with a shared NFS backend, and
users could connect to any of those 10 machines to submit a new mail: all
you need to do is maintain a state file within the users' mailstore area
giving a history of sent mail. Each new submission adds to that history, and
may cause a threshold to be reached.
This is unlike SMTP, where there is no permanent record of mails passing
through the system, and all outbound mails are lumped into a single queue,
not a separate queue for each user.

I'm not sure how this differs *fundamentally* from IM2000, except the
ISP has to expect to hold lots more outgoing messages in its queue
than otherwise.

After all, an outgoing SMTP queue *can* be structured so it is
per-user.

(I'm not sure an IM2000 mail store has to be so structured, offhand,
by the way.)

I still think this is a minor implementation issue on the SMTP side,
but maybe someone who *knows* big-iron SMTP-relay implementations can
chime in.

And you'll still need a new blacklist infrastructure for looking up
[IP,auth-sender] instead of just IP address.

Sounds like item 2a on your web page! Again, how is it different?

That's the point. Moving to a parallel SMTP world would be pretty much as
difficult as switching to IM2000.

Um...except we already *have* SMTP AUTH client and server software
deployed, etc....?

Post by James Craig Burley
What it suggests is that SMTP could evolve to incorporate IM2000
concepts in ways that *allow*, but don't *require*, clients and
servers to cooperatively use them.

I'm happy to see specific proposals here, and as you're no doubt aware, a
lot of work is going on already.
The trouble with greylisting by giving a 4xx response after the DATA phase
(or unceremoniously dropping the connection) is that there are plenty of
broken MTAs out there which are likely to treat this is a permanent
rejection.
Like SPF, you may end up breaking more than you fix.

Indeed. IM2000 (or any similarly "new" system) has the advantage of
unceremoniously dumping all *previously* broken software.

Hopefully, it won't come with lots of *new* broken implementations.
;-/

I also wonder if maybe it's better to just continue to incrementally
improve SMTP and, just as you suggest recipients might "favor" IM2000
email over SMTP email, they would favor newer, conforming SMTP clients
over broken ones, while still allowing (most) email to get through
from broken ones via whitelisting, specialized bridges, etc.

Well, that's reasonable enough. But the extended SMTP may look not very much
like SMTP by the time the job's done :-)

Right.

Post by Brian Candler
Ultimately, I believe a new messages store with an SMTP gateway would
probably end up being easier to build *and* easier to interwork with the
rest of the Internet... and it could deliver tangible benefits immediately.

I tend to think so as well, especially if we are indeed heading
towards a world with fewer, more-trusted, relays/stores.

On the other side of the equation, I personally hold out hope for a
more distributed, jungle-like world of clients injecting email
directly into recipient's systems (via their servers), since I believe
that, in the long run, that's a more robust model for a mode of
communication that *has* to be robust to be useful.

There's no question SMTP's long history of bodges makes my hoped-for
outcome much more problematic under the current SMTP model, so the
best course of action to try it out might be to offer free,
incremental improvements to the SMTP world in the form of new, better
implementations that recognize each others' capabilities in the form
of protocol improvements (the usual EHLO and new-verb combo, mainly).

Or, the mess of SMTP bodges might justify designing and rolling out a
system that is completely new, like IM2000, but is push-based, like
SMTP, yet bounce-free, like IM2000, because it employs tracking, like
FedEx, yet leaves responsibility delivery with the sender a la the
end-to-end principle, like TCP.

Whew. Maybe I ought to just stop talking and design either or both of
the systems *I* envision, and offer it as a sort of low-level,
nuts-and-bolts way for sysadmins to communicate with each other about
problems with higher-level layers of the 'net (including, potentially,
IM2000 ;-).

Then, leave it up to fate to decide whether "sysadmins" come to
include ordinary people using home PCs, blackberries, etc., and
whether it's worth bridging it with classic SMTP.

Whaddya think?

--
James Craig Burley
Software Craftsperson
<http://www.jcb-sc.com>

James Craig Burley

2005-05-05 07:19:27 UTC

Post by James Craig Burley
I tend to agree. In fact, *personally*, I really don't mind seeing
short spammy messages in notifications, as long as I can quickly skim
and skip them, and as long as I'm aware they didn't consume much in
the way of my system's resources to receive them. I think most users
would also be willing to live with that.

Try that on a mail account that gets a dozen or less legitimate mails
and thousands of spams a day. That's quickly a lot more than you can
"skim" without wasting tons of time and missing the legitimate mails[1].

Yes, at that scale, I can see that would be a problem.

IM2000 seems poised to make it *easier* for spammers to spam on that
scale, in its "natural state" anyway. (Notifications are much cheaper
to send; message contents rarely need to be sent along with
notifications.)

I can't see how to effectively deal with that problem without
employing techniques that make some of IM2000's key advantages over
SMTP relatively trivial after all.

--
James Craig Burley
Software Craftsperson
<http://www.jcb-sc.com>

Brian Candler

2005-05-05 09:43:03 UTC

Post by Brian Candler
Sure, any blacklist is subject to that kind of attack. Blacklists which run
on top of the DNS benefit from distributed caching.

I've previously posted, at some length, about the dangers of relying
on distributed (DNS-style) caching as an effective solution to the
problem of giving attackers direct ability to trigger database lookups
(and especially inserts) based on arbitrary keys.
(IMO, distributed DNS-style caching is to this sort of attack as more
CPU and RAM are to a brute-force implementation of an NP-complete
problem such as TSP as the problem sizes increase.)

My theoretical counter-argument to this is that if you can cripple the DNS
using this approach, then you're saying that the DNS itself is open to a DoS
attack. That may be true, but then that is an entirely separate problem and
exists whether or not it's used for blacklist lookups.

My pragmatic counter-argument is that DNS-based blacklists exist already,
and work. Spammers *do* attempt DoS operations periodically against those
blacklists - against their DNS infrastructure but also against their web and
mail infrastructure and anything else they can attack. These attacks are no
different to normal DoS attacks and are handled in the way such attacks are
always handled.

Anyway, I'm not wedded to the idea of using DNS as a blacklist lookup
mechanism, but I do think that it's a good idea to use a well tried and
tested mechanism.

Post by James Craig Burley
DATA
[...]
.
Does that make more sense?

Yep, although this spurious info will be hidden in a Return-Path: header and
unlikely to trouble the recipient.

Yes. But in IM2000 the notification is "push", and it does not necessarily
originate from the message store. The sender of the notification can't forge
their source IP address, but that doesn't stop them sending a spurious
notification, and the notification doesn't have to come from the message
store itself. These two combined mean that you need to validate it.

But that does make me think of another option. Suppose all notifications
were forced to come from the originating message store. If you want to
forward a notification to someone else, you do it by asking the originating
message store to do it for you. Hmm... I'd have to think if there would be
any benefits in that.

The SMTP equivalent would be a kind of 'redirect' response to an incoming
message. I vaguely remember the protocol actually has a response code for
that, which nobody actually implements. Ah yes, RFC 821:

S: RCPT TO:<***@USC-ISIB.ARPA>
R: 551 User not local; please try <***@USC-ISIF.ARPA>

RFC 2821 (3.4) advises caution again this, and SMTP clients are allowed to
treat this as a bounce, which everyone does.

Post by Brian Candler
Except that at the moment, an SMTP callback only verifies that the address
exists, not that the mail you're trying to receive was sent by that person.

Well, yes, but what I was referring to was that such callbacks were
considered highly annoying by (sysadmins for) victims of joe jobs,
weren't they?

A callback to verify isn't annoying, since it doesn't deliver a mail - but
an actual delivered bounce is, of course.

Post by James Craig Burley
IM2000 can (and IMO will) make them *cheaper*, but will they be cheap
*enough* to be sure *all* sysadmins will find such callbacks, which
appear to be intrinsic to a deployable IM2000 (*replacing* SMTP),
acceptable?

Some people argue that callbacks are "too expensive". I don't think there
are any hard economics to back that up; they just want to minimise their CPU
and network overhead, which is fine, but then the cost of handling all that
spam must be pretty high too. I don't think the overhead of a TCP socket
setup is particularly high.

Post by James Craig Burley
Instead, what if the MUA periodically contacted the recipient (or the
same store-and-forward, or upstream, relay) and asked it "how is
delivery of that message progressing?".

I think that's almost the same, except you've turn an active notification
into a passive polling.

If the client polls, then it will need some sort of message key to query the
state of a particular message. In that case, the server could have made an
active notification using the same message key.

Post by James Craig Burley
It would have to allow not only for answers equivalent to the
2xy/4xy/5xy responses we presently have, but for "it appears to have
been lost, please resend", since the MUA would have responsibility for
that message.

So, whilst the message itself may have been lost, the *state* of the message
must be retained for the client to poll at some later stage - and
potentially must be kept for a very long time.

Post by Brian Candler
It can be done, but in general it isn't.

This is the line of reasoning you use to which I most object.

I think it isn't done because the economic cost is extremely high, the
immediate gains are virtually zero, and the gains are only realised when
everybody *else* does it. In other words, there's no business case. Who
wants to spend money on a project which will only deliver benefits if
everybody else joins in as well - when we know perfectly well that there's a
large subset of people on the Internet who won't?

OTOH, implementing something like SES/BATV on your own server does have a
business case: you do some work, and you immediately stop joe-jobs coming
into your mailboxes. That's much easier to take effect. You can then
leverage other features on top of that (such as callbacks to validate
envelope senders)

Post by Brian Candler
Every time a new user comes along, they will have to state which E-mail
addresses their account is allowed to use, *and* prove to the ISP's
satisfaction that they do in fact own those accounts.

This isn't a problem if the envelope sender is required to be
If it can be anything arbitrary, then IM2000 will have similar
problems, correct?

IM2000 (as in current document) requires that the 'envelope sender' be the
actual mailstore hostname and account ID on that mailstore - otherwise it
simply doesn't work.

My own suggestion is that the sender ID be an E-mail address tied by design
to the same mailstore account.

I was trying to get at the issues which would occur if you go down the route
of stay-with-SMTP-and-try-to-patch-it-up.

Post by James Craig Burley
IM2000 eliminates bounces, but only *after* SMTP is basically
eliminated from the Internet. Until then, IM2000 will have the same
basic problem: a message store will have to validate whatever
*arbitrary* sender address is assigned to a user's outgoing message,
or worry that the user might send joe jobs.
Even once bounces are eliminated, how does an IM2000 *sender* actually
know whether an outgoing message has been received, if she's moved on
from her cybercafe?

Well, the cybercafe probably wouldn't give you a mailstore account (unless
you're a regular customer). You would use a mailstore at a third party (a la
"hotmail") or your home ISP.

Now, the same could apply today: the cybercafe could simply not install a
mail relay, and block port 25. That would force you to connect over port 587
to your home ISP, and use SMTP AUTH to submit a new mail.

There's a chicken-and-egg though: most ISPs don't support this service; so
cybercafes can't block port 25; and so ISPs have little incentive to add it.

If we wanted to move to a new SMTP world order, it's hard to see how to
force it to happen. You could take a radical approach: blacklist everyone
who doesn't implement the new world order policies. In that case, you will
find yourself unable to talk to almost anyone (in which case you might as
well set up in the IM2000 world instead).

But there's a more fundamental problem: how can you *tell* whether a
particular SMTP relay server complies with the "new world order" rules or
not?

The new world order says things like:
- you cannot submit mail on port 25 based on purely your IP address. You
must always use SMTP AUTH when submitting new mail.

However, you can't actually *test* this without going to the site, setting
up a machine on one of *their* IP addresses, and testing it for yourself.

That's one reason why a completely new protocol actually makes sense, in my
opinion. Mail received via the new protocol must be complying with the
new rules, if the protocol itself requires those rules as part of its
fundamental mode of operation.

Post by Brian Candler
- There are many legitimate reasons for using variable envelope senders;
e.g. I may be sending using VERP, SES, SRS or BATV envelopes.

I don't know BATV offhand, but the rest strike me as kludges to work
around SMTP's built-in limitations.

Absolutely. And even if we design an SMTP new world order, then either those
kludges will still be needed, or a whole raft of extensions to SMTP will
need to be added such that they are no longer needed.

That would put another barrier to joining the NWO: you'd first have to
upgrade your mailserver to SMTP++.

Post by James Craig Burley
I don't know what BATV is offhand -- "Bounce
And Tell Vinnie"?. ;-)

"Batavia Access Television", according to Google. But the ninth hit is the
one I meant: "Bounce Address Tag Validation (BATV)"

Post by James Craig Burley
I'm not sure why that's hard for SMTP -- aren't some ISPs doing that
already?

Because
- mail relays generally don't have an unambiguous indication of *who* the
person is sending through them (without SMTP AUTH, or some sort of
callback into a RADIUS accounting system)

Huh? Source IP isn't enough to disambiguate a paying customer? Or do
you want to include roaming users?

With dynamic IP pools, source IP doesn't tell you *which* paying customer,
only that they are *a* paying customer.

Post by James Craig Burley
What does IM2000, or any new system, offer that makes this a
non-issue?

That you have to authenticate to the mailstore to submit a new message. It
doesn't work without it.

Post by James Craig Burley
Wait, wasn't SMTP AUTH described here as a way for a legit client to
deposit an email on a widely trusted relay?
What I'm asking is, how does a legit IM2000 client deposit an email on
a widely trusted mailstore with something a lot like AUTH?

Same. You deposit your message in your own mailstore account (e.g. one which
your ISP has given you on their mailstore), and the protocol requires you to
authenticate to it to achieve that.

Post by James Craig Burley
My concern is that IM2000 isn't *enough* permanently harder, and has
an excessively high built-in failure rate for *legit* email due to its
additional points of failure, to justify the expense of rollout.

You may well be right; I've not made my own mind up on that either, and
certainly there may be ways to improve it.

Post by Brian Candler
Furthermore, if there is better evidence of where the spam came from, then
anti-spamming laws might be more effective.

Yup, this is why item 2f is such a big win for me. Instead of
recipients of spam forwarding (possibly forged) spam to a government
agency, imagine how much cleaner the system would be if (IM2000)
recipients simply forwarded *notifications*, allowing the agency to
retrieve the *contents*, which, of course, couldn't be claimed to be
forged without implication the agency *itself*.
(Of course, said government agency would employ its own army of 0wned
machines, or their equivalent, so a spammer's mail store would have a
much harder time responding differently to a knock on its door by the
email equivalent of a 'narc. ;-)

Absolutely. And that's also a good reason for sticking an MD5 hash of the
message in the notification - so they can't just give out a different one
instead :-)

SMTP could be improved to allow something similar, but it would probably
require signatures on messages. Problems there include message body
mangling, and the privacy concerns that people might not *want* a long-term
proof of sending of a message. With IM2000, the notification proves the mail
was sent by X if you can use it to retrieve the message from X's message
store, but once the message has been unpinned, the proof is gone.

Post by Brian Candler
This is unlike SMTP, where there is no permanent record of mails passing
through the system, and all outbound mails are lumped into a single queue,
not a separate queue for each user.

Well, for example you don't want one user to be able to query the status of
messages sent by another user, or to delete messages sent by another user.
That means you authenticate to the mailstore, and have control only over
your own messages.

SMTP *could* have per-user queues, but what I really meant was per-user
*state* giving the history of messages sent. That's what I meant by a shared
database, in the case of a cluster.

With a cluster of SMTP relays, you *could* use a shared NFS server to
maintain history for each user, but it's a clunky form of database, and you
might as well use SQL or LDAP or something like that. That's because there's
no other reason for an SMTP relay to have access to shared storage; it can
keep its own queue of outbound SMTP messages on its own local disk.

And you'll still need a new blacklist infrastructure for looking up
[IP,auth-sender] instead of just IP address.

Sounds like item 2a on your web page! Again, how is it different?

That's the point. Moving to a parallel SMTP world would be pretty much as
difficult as switching to IM2000.

Um...except we already *have* SMTP AUTH client and server software
deployed, etc....?

We have *bits* of what is required for the SMTP NWO. As outlined above, we
need some way to force people into the NWO, which I believe means some way
of telling whether a particular server complies with the NWO or not, which
is very difficult (to the point of impossible).

A new blacklist infrastructure (based on sender+IP, not just IP) is just an
example of another extra thing which has to be done to implement the SMTP
NWO.

Post by Brian Candler
Like SPF, you may end up breaking more than you fix.

Indeed. IM2000 (or any similarly "new" system) has the advantage of
unceremoniously dumping all *previously* broken software.

Maybe. The problem is that it's hard to detect broken versus non-broken
implementations of existing software.

This means you can't piggyback on existing mechanisms (like returning 4xx
to a DATA section); instead you have to advertise a new SMTP capability in
EHLO such as "250 PROPER4XX" which means that the recipient agrees to handle
it properly. In which case, you might as well define a new extension which
does what you actually want.

I see where you're coming from - more peer-to-peer and less centralised. The
instant-messaging model fits well here.

The problem comes with spam and identities. Whatever identity I choose for
myself (whether it be a domain name or a cryptographic key), if I can
generate new ones at will, blacklisting becomes completely ineffective. You
then have to consider options like limiting each person to one domain or
having crytographic keys signed by the government, neither of which I like.

Having a third-party involved - a message store on a fixed IP address - does
give *some* control over the rate at which new identities are created,
and the rate at which new identities can send mail. Other ideas for
achieving the same goal are of course very interesting!

Any sort of closed-user-group model doesn't have this problem. But if you
wish to be able to accept mail from strangers on the Internet, then it is a
major problem. Some FUSSP suggestions are basically ways of making it
difficult for strangers to introduce themselves to you - such as "hashcash".
Challenge-response systems are in that class too (but they won't work once
spammers build systems to response automatically to the challenges)

Post by James Craig Burley
Or, the mess of SMTP bodges might justify designing and rolling out a
system that is completely new, like IM2000, but is push-based, like
SMTP, yet bounce-free, like IM2000, because it employs tracking, like
FedEx, yet leaves responsibility delivery with the sender a la the
end-to-end principle, like TCP.

The familiar push model has advantages too. It might be worth trying to
design a new protocol under this model, and see if you can end up with the
same advantages as IM2000 (or other ones).

Regards,

Brian.

James Craig Burley

2005-05-05 17:07:15 UTC

Post by Brian Candler
Sure, any blacklist is subject to that kind of attack. Blacklists which run
on top of the DNS benefit from distributed caching.

I've previously posted, at some length, about the dangers of relying
on distributed (DNS-style) caching as an effective solution to the
problem of giving attackers direct ability to trigger database lookups
(and especially inserts) based on arbitrary keys.
(IMO, distributed DNS-style caching is to this sort of attack as more
CPU and RAM are to a brute-force implementation of an NP-complete
problem such as TSP as the problem sizes increase.)

That's not a counter-argument, IMO, since DNS *is* open to a DoS (or
DDoS) attack. At least, that's my understanding.

Therefore, anyone who wants to bring down DNS can do so, but to what
end? To keep people from reaching web sites (to which their local DNS
caches might successfully refer them anyway)? To keep people from
receiving email sent via SMTP after MX lookups?

Once you "overload" the DNS system as a point of failure for a
callback mechanism (a la SPF, SES, IM2000, etc.), however, it becomes
a much more tempting target, and also easier for an attacker to focus
on a *specific* point on that target.

Why would an attacker do that? To defeat the utility of callbacks
that validate incoming message notifications (and related
information), so email recipients can no longer rely on such
techniques to distinguish senders of desireable vs. undesireable
email.

Post by Brian Candler
My pragmatic counter-argument is that DNS-based blacklists exist already,
and work. Spammers *do* attempt DoS operations periodically against those
blacklists - against their DNS infrastructure but also against their web and
mail infrastructure and anything else they can attack. These attacks are no
different to normal DoS attacks and are handled in the way such attacks are
always handled.

And they sometimes succeed. I haven't kept close watch, but my
impression is that several blacklists of various sorts have, in fact,
been shut down (after being partially or fully incapacitated) as a
result of such attacks.

Those who *relied* on such blacklists to decide whether to accept a
given incoming email were SOL as these attacks became more successful;
their false positive and/or false negative rates increased
substantially, or so I would assume. (Else they didn't really *rely*
on those blacklists.)

If IM2000 relies, as a *technology*, on blacklists or their equivalent
(and third-party message stores are darn close to equivalent, to the
extent they're dragged into the mess needed to defend blacklists
against attack), then IM2000 can be shut down nearly across the board
by spammers attacking blacklists.

Post by Brian Candler
Anyway, I'm not wedded to the idea of using DNS as a blacklist lookup
mechanism, but I do think that it's a good idea to use a well tried and
tested mechanism.

In *general* you're correct. However, it's *not* a good idea, from an
engineering point of view, to *overload* a mechanism that was not
originally designed to bear such a load.

The best counter-argument to this I can think of is "well, the
Internet as a whole was not really engineered either, so let's just
keep adding things onto it and learn from what breaks and what
doesnt'".

IM2000 advocates, as well as advocates of other callback and/or
blacklisting schemes that rely on external third parties, should be up
front about the potential downsides along these lines.

Remember, whenever IM2000 is broken by spammers, people will just fall
back to SMTP, and IM2000's reputation will keep taking hits. It'll be
very difficult to counter that by saying "well, the problems aren't
really with IM2000 itself, but with third-party blacklists, with DNS,
etc".

In the meantime, ordinary users will be increasingly upset that "the
Internet keeps slowing down" because the upstream DNS caches upon
which they rely (unawares) for much of their browsing are thrashing
about just dealing with callback and/or blacklist requests.

Post by James Craig Burley
DATA
[...]
.
Does that make more sense?

Yep, although this spurious info will be hidden in a Return-Path: header and
unlikely to trouble the recipient.

I think my original point was that, with IM2000, it *has* to be shown
in order for the end user to make a reasonable decision, and with any
similar system implemented via SMTP (a la greylisting, or
ecrulisting), it has to be shown as well. (Except with SMTP, there's
nothing *but* the envelope sender and recipient, plus the injecting
SMTP client's TCP/IP info, to show, unless the server accepts the
entire message, which isn't necessarily how greylisting is
implemented.)

I think JdBP's IM2000 proposal, which is much more detailed than djb's
and which commits to certain design decisions, commits to this one as
well: *all* notifications come from the originating message store.

Whether that permits a recipient to simply forward a notification to
another party without notifying the store depends mainly whether the
request for the contents of a message include the recipient's IP
address (or similarly unique identification) in the key or
authorization for such a request. If not, forwarding of notifications
is trivial; if so, then, yes, the *store* would have to handle
forwarding at the request of a recipient.

Post by Brian Candler
The SMTP equivalent would be a kind of 'redirect' response to an incoming
message. I vaguely remember the protocol actually has a response code for
RFC 2821 (3.4) advises caution again this, and SMTP clients are allowed to
treat this as a bounce, which everyone does.

Heh. I didn't recall seeing that. But you're right, it's there! I
gather the grammar of the message accompanying the response code is
inadequately defined for any clients to generally rely on such
messages, however.

Post by Brian Candler
Except that at the moment, an SMTP callback only verifies that the address
exists, not that the mail you're trying to receive was sent by that person.

Well, yes, but what I was referring to was that such callbacks were
considered highly annoying by (sysadmins for) victims of joe jobs,
weren't they?

A callback to verify isn't annoying, since it doesn't deliver a mail - but
an actual delivered bounce is, of course.

I suppose it can be annoying if spammers use it as a channel to do
dictionary attacks. If callbacks necessarily validate *messages* as
well as *addresses*, that solves one set of problems, but creates
another, as we've gone 'round and 'round on that already.

Again, it's not *scalar* overhead that's the real problem. It's
blowback that's substantially out of proportion with regard to a given
host's *actual* use of email.

A given host *should* be "big" enough (CPU, RAM, storage, bandwidth)
to handle its *legitimate* incoming and outgoing email needs.

"Big" enough includes costs inherent to each legitimate incoming email
and each legitimate outgoing email.

Since, these days, each legitimate incoming email includes a cost for
dealing with incoming spam, hosts typically have to be "bigger" than
would otherwise be necessary. (Obviously this is true for hosts that
send *outgoing* spam -- hence the phenomenon of 0wned machines. ;-)

The problem with callbacks is potentially actually much larger than
that with bounces.

With bounces of joe jobs, a host must also cope with incoming email
that represents such bounces. In a sense, this is part of the cost of
having *outgoing* email in the first place; if the host doesn't have
responsibility for any domain names from which email is ever claimed
to eminate, it presumably won't receive any joe-job bounces.

And since bounces are expensive for senders to process and send,
joe-job bounces have an *inherent* property such that sites that might
otherwise mindlessly send them are incentivized to not do so, because
each joe-job bounce it sends presumably represents some UBE *it*
accepted.

Now, with callbacks, you've not only de-incentivized those third
parties, you've actually *incentivized* them to rely *solely* on such
callbacks instead of other, more localized and/or dedicated, measures
to block incoming UBE.

So, now your typical host must be built out such that if it has
responsibility for *any* outgoing email (i.e. hosts a domain name), it
is going to be able to cope with a *deluge* of callback requests that
cannot feasibly be otherwise dealt with. (The senders of those
requests have no incentive to stop sending them. And they have no
*other* way to reliably disambiguate email legitimately
vs. illegitimately coming from that typical host.)

I think that's almost the same, except you've turn an active notification
into a passive polling.
If the client polls, then it will need some sort of message key to query the
state of a particular message. In that case, the server could have made an
active notification using the same message key.

Right. It's *almost* the same. But it doesn't require a reverse
lookup a la a callback at all. The *sender* remains responsible for
asking about messages.

That's kind of like how IM2000 makes the sender more responsible for
message *contents*, except the contents move (quickly) downstream
along with the notification, while the sender remains responsible for
tracking the package, as it were.

So, whilst the message itself may have been lost, the *state* of the message
must be retained for the client to poll at some later stage - and
potentially must be kept for a very long time.

Indeed. In fact, this is exactly what the end-to-end principle
implies anyway.

Post by Brian Candler
It can be done, but in general it isn't.

This is the line of reasoning you use to which I most object.

Right. That's how I view IM2000.

Post by Brian Candler
OTOH, implementing something like SES/BATV on your own server does have a
business case: you do some work, and you immediately stop joe-jobs coming
into your mailboxes. That's much easier to take effect. You can then
leverage other features on top of that (such as callbacks to validate
envelope senders)

Exactly.

This isn't a problem if the envelope sender is required to be
If it can be anything arbitrary, then IM2000 will have similar
problems, correct?

IM2000 (as in current document) requires that the 'envelope sender' be the
actual mailstore hostname and account ID on that mailstore - otherwise it
simply doesn't work.
My own suggestion is that the sender ID be an E-mail address tied by design
to the same mailstore account.
I was trying to get at the issues which would occur if you go down the route
of stay-with-SMTP-and-try-to-patch-it-up.

Okay. I think I might be losing track of the conversation;
essentially, what I'm doing (or trying to do) is comparing the effort
needed to roll out IM2000 with that needed to incrementally improve,
site by site, SMTP to do *mostly* similar things.

Well, the cybercafe probably wouldn't give you a mailstore account (unless
you're a regular customer). You would use a mailstore at a third party (a la
"hotmail") or your home ISP.
Now, the same could apply today: the cybercafe could simply not install a
mail relay, and block port 25. That would force you to connect over port 587
to your home ISP, and use SMTP AUTH to submit a new mail.
There's a chicken-and-egg though: most ISPs don't support this service; so
cybercafes can't block port 25; and so ISPs have little incentive to add it.
If we wanted to move to a new SMTP world order, it's hard to see how to
force it to happen. You could take a radical approach: blacklist everyone
who doesn't implement the new world order policies. In that case, you will
find yourself unable to talk to almost anyone (in which case you might as
well set up in the IM2000 world instead).
But there's a more fundamental problem: how can you *tell* whether a
particular SMTP relay server complies with the "new world order" rules or
not?
- you cannot submit mail on port 25 based on purely your IP address. You
must always use SMTP AUTH when submitting new mail.
However, you can't actually *test* this without going to the site, setting
up a machine on one of *their* IP addresses, and testing it for yourself.
That's one reason why a completely new protocol actually makes sense, in my
opinion. Mail received via the new protocol must be complying with the
new rules, if the protocol itself requires those rules as part of its
fundamental mode of operation.

I think you already answered your questions above, when you suggested,
in a previous email, that, in a mixed world (IM2000 and SMTP),
recipients would tend to give higher priority to messages arriving
(entirely?) via IM2000.

On the SMTP side, it's not all *that* hard to prioritize incoming
messages based on a recipient's perception of the trustability of
upstream relays, including whether SMTP AUTH was used. (But in cases
where use of AUTH isn't reliably reported in "Received:" headers,
either the relay in question always requires AUTH and thus acquires,
over time, lots of trust, or will have to find other ways to be sure
it avoids the problem of being a mixed source of UBE and desireable
email and, thus, untrusted.)

Post by Brian Candler
- There are many legitimate reasons for using variable envelope senders;
e.g. I may be sending using VERP, SES, SRS or BATV envelopes.

I don't know BATV offhand, but the rest strike me as kludges to work
around SMTP's built-in limitations.

Absolutely. And even if we design an SMTP new world order, then either those
kludges will still be needed, or a whole raft of extensions to SMTP will
need to be added such that they are no longer needed.
That would put another barrier to joining the NWO: you'd first have to
upgrade your mailserver to SMTP++.

Yup. I'm still not sure how best to proceed.

Post by James Craig Burley
I don't know what BATV is offhand -- "Bounce
And Tell Vinnie"?. ;-)

"Batavia Access Television", according to Google. But the ninth hit is the
one I meant: "Bounce Address Tag Validation (BATV)"

Ah, right, and, indeed, it shares with VERP, SPF, and SES, the
property of encoding the envelope sender to work around the lack of
another crucial field in an SMTP message notification (mainly, the
lack of a machine-encoded "in reference to" field).

Post by James Craig Burley
I'm not sure why that's hard for SMTP -- aren't some ISPs doing that
already?

Because
- mail relays generally don't have an unambiguous indication of *who* the
person is sending through them (without SMTP AUTH, or some sort of
callback into a RADIUS accounting system)

Huh? Source IP isn't enough to disambiguate a paying customer? Or do
you want to include roaming users?

With dynamic IP pools, source IP doesn't tell you *which* paying customer,
only that they are *a* paying customer.

Really? Why not?

Post by James Craig Burley
What does IM2000, or any new system, offer that makes this a
non-issue?

That you have to authenticate to the mailstore to submit a new message. It
doesn't work without it.

Oh, okay -- so it is pretty much the same overall effort as moving to
SMTP AUTH, modulo the fact that SMTP AUTH got their first. ;-)

You may well be right; I've not made my own mind up on that either, and
certainly there may be ways to improve it.

I'd rather we get it right and deploy it before some proprietary
vendor does something similar *enough* to take over much of the useful
email-address space on the 'net.

Post by James Craig Burley
After all, an outgoing SMTP queue *can* be structured so it is
per-user.
(I'm not sure an IM2000 mail store has to be so structured, offhand,
by the way.)

This is an issue with my proposals (including ecrulisting) as well,
and in fact it might be an issue with greylisting too, though I
haven't studied it in detail.

(By "issue" I mean everything might seem to work as long as the
proposal is not widely-enough used to come under focused attack, but
could be a show-stopper otherwise.)

Post by Brian Candler
SMTP *could* have per-user queues, but what I really meant was per-user
*state* giving the history of messages sent. That's what I meant by a shared
database, in the case of a cluster.

Yes, SMTP definitely *needs* that once ecrulisting and/or tracking are
added, and probably *wants* that even today as greylisting and other
anti-UBE measures increasingly tend to stop, or at least slow down,
deliveries of legitimate email.

Post by James Craig Burley
On the other side of the equation, I personally hold out hope for a
more distributed, jungle-like world of clients injecting email
directly into recipient's systems (via their servers), since I believe
that, in the long run, that's a more robust model for a mode of
communication that *has* to be robust to be useful.

I see where you're coming from - more peer-to-peer and less centralised. The
instant-messaging model fits well here.

Indeed.

Post by Brian Candler
The problem comes with spam and identities. Whatever identity I choose for
myself (whether it be a domain name or a cryptographic key), if I can
generate new ones at will, blacklisting becomes completely ineffective. You
then have to consider options like limiting each person to one domain or
having crytographic keys signed by the government, neither of which I like.

Agreed with the latter. I'm not yet convinced that the at-will
generation of new identities is *solely* a problem; IMO it can be
leveraged as part of the solution.

Post by Brian Candler
Any sort of closed-user-group model doesn't have this problem. But if you
wish to be able to accept mail from strangers on the Internet, then it is a
major problem. Some FUSSP suggestions are basically ways of making it
difficult for strangers to introduce themselves to you - such as "hashcash".
Challenge-response systems are in that class too (but they won't work once
spammers build systems to response automatically to the challenges)

Right. My ideal is that incoming email from total strangers would
*tend* to be immediately accepted, and my *proposals* revolve around
the notion that, prior to the human recipient actually reading a given
message, the sending stranger's MUA repeatedly requesting status (or,
in ecrulisting, resending the message) would, in the absence of other
information on the tendency of the sender to send Bulk email, tend to
*increase* the priority of that message as seen in the recipient's
list of pending messages.

That doesn't seem to *punish* anybody for sending legit email. In
many cases, such emails are about as instantaneous as can be, since
there's no reverse lookup (not even rDNS), no callback, etc. If the
recipient doesn't care for the message, they can either signal its
unacceptability to the sender, or instruct an upstream entity to
consider the message Unsolicited.

Since it would be mostly a matter of automation to infer whether it is
Bulk Email as well, that means that a sender of UBE becomes recognized
as such on a per-site basis.

And the sender of legit email naturally *wants* to know the progress
of the transmission, and, since her local DNS cache probably already
has the destination MX in it, repeated lookups to request status
updates are little different, in terms of the hit on the DNS system as
a whole, than users hitting "Refresh" or even just clicking on a link
that hasn't yet gone stale in their browsers.

That covers the *legit* cases of email, plus the general means by
which UBE is detected as such.

(Do I want a whole new protocol that makes all this even *better*? Of
course! ;-)

Now, *illegitimate* email can still be blocked by pretty much all the
"usual suspects" -- techniques like RBLs and SPF can be used -- but
they system as a whole is more flexible, so these techniques can
actually be used by the end user's *MUA* without the *sender*
necessarily being aware that the message progressed that much further,
beyond the SMTP server just upstream from the MUA, or the relay just
upstream from that, etc.

So, here we come to the nub of the "solution" I'm proposing, since it
is crucial to know whether it might actually work.

It depends on white hats having substantial resources and a
willingness to use them.

In essence, just as black hats can exploit the "fractal" nature of the
"timespace" of email as a whole -- as I pointed out earlier, there's
really no way to effectively snapshot the universe of legitimate email
addresses, legitimate domain names, etc., though at least the IPv4
space is theoretically manageable (but not the IPv6 space) -- white
hats can exploit that as well.

So, by sites individually choosing to deploy some spamtraps, and
perhaps using RBL and other technologies not to necessarily *block*
incoming messages but, rather, to simply *ignore* them and let the
senders repeatedly inquire as to their status...

...*receivers* of UBE turn their systems into "black holes" for mail
that actually *is* Unsolicited Bulk Email.

Again: if it's not Unsolicited, senders who realize their messages
aren't getting through will quickly find another communications
channel (which is something SMTP doesn't reliably provide information
on at all anymore), assuming they aren't told within, say, 24 hours
that their "To:" email address had a typo and was thus unrecognized.

If it's not Bulk, a receiving system can easily recognize that it
hasn't seen 1M messages coming in from that particular sender's IP
address in the past 48 hours, most or all of which have gone unread or
marked as Unsolicited by their readers.

Accordingly, the receiving system can increase the priority of pending
messages, take responsibility for them, or at least give senders more
useful information as to why delivery hasn't yet occurred.

This approach is a bit like IM2000 in that it gives the sender more
responsibility for "caring" for outgoing messages.

But it avoids the necessity of making the messages *themselves* sit on
sites remote from recipients.

So it "punishes" senders of UBE in that they either become (well,
remain) careless senders -- which recipients can more easily
distinguish from caring ones -- or they expend a great deal more
resources to send and care for each outgoing piece of their Bulk
email, while taking the very significant risk that Bulk email sent to
spamtraps and/or sites that otherwise autodetect Bulk sending will
result in such care accomplishing literally *nothing* in terms of
getting a message into a given end user's eyeballs.

If that isn't enough to stem the tide of UBE, I sincerely doubt IM2000
can do any better, because it can only promise increased costs for
sending UBE at the expense of increased costs to recipients to receive
*all* email (because of the pull model, at the very least).

The familiar push model has advantages too. It might be worth trying to
design a new protocol under this model, and see if you can end up with the
same advantages as IM2000 (or other ones).

Actually, I think that's what I've been doing, on this list mainly.
Just a question of whether and when I (or someone else) actually go
forward with it.

--
James Craig Burley
Software Craftsperson
<http://www.jcb-sc.com>

Marc W. Mengel

2005-05-05 18:43:29 UTC

Not if you do it right...

As long as you don't let people read mail(IM2000) or deliver
mail(SMTP) if they can't reach the blacklist service (via *whatever*
protocol) DoS-ing the blacklist service doesn't let people read your
spam; rather it prevents them reading anything, including the spam. And
once the DoS ends, people can once again get the blacklist info and drop
the spam.

That is, the signifigant difference is that in the SMTP
implementations to which you refer, DoS-ing the blacklist works because
mail gets delivered if the blacklist is unreachable. If you refuse mail
("try again later") when you can't reach your blacklist(s), DoSing
the blacklist just backs up *all* the mail, and doesn't let your spam
through, either.

However, that does bring in your other argument -- if people DoS the
blacklists for a service that works this way, and stop *all* mail,
people may decide that this design is untenable, and dump it. Classic
prisoners dilemma -- if everyone would stick to a
no-blacklist==no-delivery rule, the spammers would have to give in
'cause they can't live without email delivery, either. But as long as
some amount of folks just drop the rule to let their legitamate mail
through, then the spammers win. And the spammers are willing to
go a few days or a week without their mail getting through...

Marc

James Craig Burley

2005-05-05 21:40:15 UTC

Post by Marc W. Mengel
As long as you don't let people read mail(IM2000) or deliver
mail(SMTP) if they can't reach the blacklist service (via *whatever*
protocol) DoS-ing the blacklist service doesn't let people read your
spam; rather it prevents them reading anything, including the spam. And
once the DoS ends, people can once again get the blacklist info and drop
the spam.
That is, the signifigant difference is that in the SMTP
implementations to which you refer, DoS-ing the blacklist works because
mail gets delivered if the blacklist is unreachable. If you refuse mail
("try again later") when you can't reach your blacklist(s), DoSing
the blacklist just backs up *all* the mail, and doesn't let your spam
through, either.
However, that does bring in your other argument -- if people DoS the
blacklists for a service that works this way, and stop *all* mail,
people may decide that this design is untenable, and dump it. Classic
prisoners dilemma -- if everyone would stick to a
no-blacklist==no-delivery rule, the spammers would have to give in
'cause they can't live without email delivery, either. But as long as
some amount of folks just drop the rule to let their legitamate mail
through, then the spammers win. And the spammers are willing to
go a few days or a week without their mail getting through...

Yes, this is all correct.

It's important to realize that there exists a *continuum*, between
unimpeded blacklists and fully-DoS'ed blacklists, such that email
delivery *slows* down, and such that *some* hosts experience more of a
slowdown than others.

That is, spammers can focus their attacks on the sites that they find
to be most of concern. (Just as some apparently attacked and
destroyed a site that was providing free C/R for hotmail, or some
other free email service, by forcing it to thrash around delivering
all sorts of spurious Challenges.)

Practically speaking, this has significant implications for any design
that assumes all white hats would agree to stop accepting email in the
face of such an attack.

How would they know whether the attack was across the board? They'd
have to *communicate* regarding whether to keep relying on the DoS'ed
blacklist, or move to a new one. And how would they do that, since
their main method of communication -- email -- is under attack? How
would they all agree to resume accepting email, since the effects of
the DoS would probably never be *completely* uniform?

If the attack wasn't across the board, that means only a few sites are
affected. That is, either the blacklist has chosen to specifically
stop responding to them (they're the source of too many problematic
callbacks, through no fault of their own), or *that* blacklist is out
while others are working.

Both possibilities have worrisome implications. If spammers can
convince a blacklist operator to selectively deny some of his own
customers, he doesn't have any incentive to let up on the attack as
long as he is able to send email to users on systems that aren't
denied as customers of the blacklist.

But if there are multiple blacklists to choose from, the customers of
the targeted one might simply choose to switch to another blacklist in
order to receive email, at which point the attacker "wins", as the
blacklist no longer serves any purpose.

Alternatively, the customers of the targeted blacklist can communicate
with all legitimate (IM2000) customers of *all* blacklists, or perhaps
the blacklist operators communicate among themselves (recursing to the
communication problem described above either way)...

...and *everyone* decides to stop receiving email as long as *any* of
them are under attack. This is a sort of "solidarity" response to an
external attack.

Problem is, this last approach is an overreaction to an attack; it
greatly amplifies the effect of *any* attack, as long as it has any
promise of success, improving the success rate of such attacks to the
point that the system as a whole is useless.

In summary, I don't believe blacklists represent a viable long-term
"solution" to the UBE problem. They have short-term convenience, but
they don't scale up well.

It might actually be better to have, instead of a few central
blacklists, a few central relays (for SMTP) or mail stores (for
IM2000) that provide only authorized access to email, upon which
everyone (or most everyone) chooses to rely to exchange email.

Since everyone is ultimately relying on a handful of root servers for
DNS, it's not all that difficult to believe that centralized email
*exchange* (rather than exchanging only some tangentially interesting
metadata on email) could be the simplest and most direct approach to
getting rid of (most) UBE.

I don't like that solution either. I just like the fact that it makes
the centralized third parties act, simultaneously, both as *enablers*
and *disablers* of email exchange, making it much harder for spammers
to DoS them while still getting their UBE through them.

--
James Craig Burley
Software Craftsperson
<http://www.jcb-sc.com>

James Craig Burley

2005-05-05 23:27:33 UTC

Post by Brian Candler
But that does make me think of another option. Suppose all notifications
were forced to come from the originating message store. If you want to
forward a notification to someone else, you do it by asking the originating
message store to do it for you. Hmm... I'd have to think if there would be
any benefits in that.

I think JdBP's IM2000 proposal, which is much more detailed than djb's
and which commits to certain design decisions, commits to this one as
well: *all* notifications come from the originating message store.
Whether that permits a recipient to simply forward a notification to
another party without notifying the store depends mainly whether the
request for the contents of a message include the recipient's IP
address (or similarly unique identification) in the key or
authorization for such a request. If not, forwarding of notifications
is trivial; if so, then, yes, the *store* would have to handle
forwarding at the request of a recipient.

JdBP appears to allow the forwarding of notifications without the original
message store being involved: [...]

Okay. That seems reasonable.

The blacklisting is done using the domain name of the message store
(illegal.com), so I imagine it must be against notifications which come in
where the originating message store is illegal.com, not that the source IP
address of the notification is *.illegal.com

I believe that's the case as well.

Post by James Craig Burley
On the SMTP side, it's not all *that* hard to prioritize incoming
messages based on a recipient's perception of the trustability of
upstream relays, including whether SMTP AUTH was used. (But in cases
where use of AUTH isn't reliably reported in "Received:" headers,
either the relay in question always requires AUTH and thus acquires,
over time, lots of trust, or will have to find other ways to be sure
it avoids the problem of being a mixed source of UBE and desireable
email and, thus, untrusted.)

The trouble is you can't rely on the sender *telling* you that the message
was submitted via SMTP AUTH, and that therefore you should trust it. If so,
spammers would just make their mail systems make the same declaration.
With SMTP, you need to verify independently whether the mail server in
question is actually trustworthy - for example by having trusted agents on
that ISP's network performing the testing for you. Ick.

I'm saying, why care whether a third-party relay uses AUTH or any
other *particular* mechanism?

All you *really* care about is whether that relay is well-run enough
to avoid accepting more UBE, likely destined for your system, than you
feel comfortable with.

Leave it up to the operator of that relay to decide how best to gain
that trust. It can advertise that it uses AUTH generally, or, via
"Received:" headers, *when* it uses AUTH. Whether that impresses you
is up to you, since one relay might be AUTHless but UBE-free and
another might always AUTH but be a huge source of UBE.

AUTH is just another of those things that is an extra hoop to jump
through but, otherwise, not really needed most of the time.

Accordingly, until it's *required* to be used most or all of the time,
people won't bother with it.

And if they *do* require it, there'll be considerable pressure to
provide it in a "seamless" way that would ultimately make it nearly
useless in terms of determining whether a *particular* human being
tends, or tends not, to deliberately send UBE to the rest of us.

(Even if you do test a network for AUTH, what does that really tell
you? That it won't let just *any* random spammer access it -- just
the paying ones? Etc.)

Make AUTH sufficiently easy for everyone to use, and everyone --
including those with 0wned machines -- will use it.

Post by Brian Candler
With dynamic IP pools, source IP doesn't tell you *which* paying customer,
only that they are *a* paying customer.

Really? Why not?

I dial into a modem, and I get an IP address out of a pool. The SMTP server
knows that I'm connecting from IP address x.x.x.x, but not my customer ID.
It could only find that out by some out-of-band database which associates in
real time IP addresses with customers - for example, you could build some
infrastructure which takes RADIUS accounting packets and builds a real-time
mapping, and query that using some suitable protocol (LDAP perhaps). Even
then, you'd have to live with the fact that RADIUS accounting is UDP-based
and unreliable, and so occasionally you will associate x.x.x.x with the
wrong customer.

Amazing. I just assumed ISPs had better internal coherency in their
infrastructure -- that any (internal) server could do the equivalent
of an rDNS lookup on x.x.x.x and get a unique ID for a customer, even
though the outside world might see something much more opaque (like
the same IP address encoded and decorated).

(I'm still confused as to how IM2000 significantly changes any of
this, other than to perhaps *require* ISPs to do what you're saying
they would find annoyingly necessary to do for SMTP anyway.)

OK, but a spammer can emulate that behaviour too.

Of course. A spammer can do so for any *particular* message she
sends.

Can a spammer do it reliably, consistently, for all the 100M or so
messages she sends out each day, without triggering auto-detect
mechanisms?

I'm not sure that'd be possible. And I think this approach even
handles the problem of massive numbers of 0wned machines on the net,
until their 0wnership is turned into making the machines *accept*
incoming UBE and prioritize it highly (at which point their users will
take their machines' 0wnership more seriously ;-).

Post by James Craig Burley
Since it would be mostly a matter of automation to infer whether it is
Bulk Email as well, that means that a sender of UBE becomes recognized
as such on a per-site basis.

In the limiting case, the spammer generates a new on-line identity for every
individual message sent out. It looks like lots of new individuals popping
up on the Internet, sending their first E-mail. How do you deal with that?

They have only so many IP addresses from which to send their *Bulk*
email, and they send them to so many spamtraps, that the recipient's
MTAs and MUAs can fairly easily detect such activity and flag such
messages as "likely UBE", even before any content analysis is
performed (though it can be performed anyway).

Post by James Craig Burley
So, by sites individually choosing to deploy some spamtraps, and
perhaps using RBL and other technologies not to necessarily *block*
incoming messages but, rather, to simply *ignore* them and let the
senders repeatedly inquire as to their status...
...*receivers* of UBE turn their systems into "black holes" for mail
that actually *is* Unsolicited Bulk Email.

...

Post by James Craig Burley
If it's not Bulk, a receiving system can easily recognize that it
hasn't seen 1M messages coming in from that particular sender's IP
address in the past 48 hours, most or all of which have gone unread or
marked as Unsolicited by their readers.

Ah, but that's it then. You *do* need to rely on a sender's IP address as
part of their identity in order to detect "bulk", not just their domain name
or public key.

More precisely, one relies *primarily* on a sender's IP address as
part of their identity, and can often not bother looking up their
domain name, public key, reputation, etc., in order to determine that
the IP address in question is a source of UBE.

A receiving agent can accomplish this without relying on any third
party, *including* DNS!

I think that's pretty cool, and darn near bullet-proof, assuming the
concept as a whole makes sense (which it might not).

Which brings us back to shared SMTP relays, and the fact that legitimate and
bulk E-mail may spew forth from the same machine, which is really what makes
life difficult right now. So then it's just down to whether spammers 'track'
their outbound mail in a pattern which is similar to how a legitimate sender
would do so.

My approach makes it much more expensive, over time, to run SMTP
relays that accept UBE, since they find it much harder to foist
responsibility for that UBE on to other hosts, and in the meantime
even *legit* email tends to sit around on the server much longer while
downstream entities (including recipients) use finer-grained
techniques to assess the likelihood that any *given* email is UBE,
given that they can't easily make that decision based on the IP
address of the relay (as it is a "mixed blessing").

Indeed, I assume (some) spammers *will* simply use the same sort of
tracking algorithm that legit MUAs employ. That's part of the beauty
of this approach: they'll be investing much more, in the way of
resources, into each email they send. It's what IM2000 promises,
without the warts (delays reading messages) or expense (designing and
implementing a whole new email infrastructure) -- namely, that
*senders* bear more of the cost of sending UBE.

Post by James Craig Burley
So it "punishes" senders of UBE in that they either become (well,
remain) careless senders -- which recipients can more easily
distinguish from caring ones -- or they expend a great deal more
resources to send and care for each outgoing piece of their Bulk
email, while taking the very significant risk that Bulk email sent to
spamtraps and/or sites that otherwise autodetect Bulk sending will
result in such care accomplishing literally *nothing* in terms of
getting a message into a given end user's eyeballs.

The 'care' taken is, presumably, comparable to that needed to deal today
with getting a 4xx response and to resend later. I don't think it's hard;
it's what standard MTAs do all the time.

That's correct, modulo a few well-known cases (old versions of Lotus
Notes; Yahoo Groups; etc.), I gather.

If I were writing a bulk SMTP sender, I wouldn't use a standard MTA, which
is not optimised for keeping track of 10M recipients for one message.
Rather, I'd write my own which keeps one copy of the spam and tracks the
recipients in some sort of database (10M recipients can be kept in a hash
table in RAM very easily, given say 512MB of RAM). I'm almost inclined to do
so as a proof-of-concept.

Sure. As with IM2000, it isn't the huge number of message *contents*
that a UBE sender has to deal with, because, as you noted, a typical
sender of UBE has just a small number of *unique* messages to send to
a huge audience. (Accordingly, IM2000 actually penalizes senders of
large numbers of *unique* messages moreso than it does most senders of
UBE.)

So, let the sender track *all* those messages, as you suggest. Now
they're expending much more, in the way of their resources, per
*important* outgoing message than the typical sender of legitimate
email, *both* of whom get to shovel the *content* across the wire the
first time they try.

(Note that this implies the legit sender will tend to spend much more
of her content-shoveling bandwidth on *desired* email than will the
sender of UBE. With IM2000, the sender of UBE isn't wasting their
outgoing bandwidth on email that's never read, while the sender of
legit bulk email has not only a huge outgoing bandwidth budget but
also the need to run a mail store. *Always* try to tilt the economic
wins in favor of the white hats, and the losses towards the black
hats.)

As counterintuitive as that sounds, it give legitimate senders and
recipients of email a "leg up" on spammers, because they *already*
care about *their* messages, don't mind having to demonstrate that
care, and in fact, on today's Internet, would likely welcome a system
that would tend to *reduce* the costs of exchanging their email
(mainly because it basically eliminates bounces) while making it
easier to know just how a given message is progressing.

(In particular, a sufficiently smart MTA that handles all incoming and
outgoing messages for a site can decide to reject or simply ignore all
incoming *bounces* as long as it has no outstanding deliveries that it
is unable to track in the new way. Bounces don't vanish overnight;
they, along with joe jobs, become *gently* less expensive as they're
needed less often. Of course, "outstanding deliveries" would include
messages sent to traditional SMTP servers anytime in, say, the
preceding two weeks, and for which no conclusive bounces/DSNs had been
received.)

I don't think spammers are stupid; E-mail is their business, and they know
how it works very well. Maybe the few days' programming needed isn't worth
their while yet, which is why the greylisters are seeing some benefit, but
it only needs the next version of SpamSenderPro or whatever to have this
feature, and greylisting will die overnight.

Yes, that's long been predicted, and will probably come to pass. But
greylisting provides no benefits to those exchanging legit emails,
because it can only *delay* such exchanges. So the UBE senders are
still in a better position, since they don't really care whether their
UBE arrives in a mailbox immediately, or five, ten, or twenty minutes
later, as long as it is accepted before they are forced to move on,
whereas people exchanging legit emails *often* want instant delivery.

(Ecrulisting, as well as my other proposal, doesn't make things any
worse for ordinary users exchanging emails. It's the underlying
infrastructure that might have to work harder, or not -- maybe a lot
harder for the sender, a little harder and a bit more cleverly for the
receiver, etc.)

By "move on" I mean senders no longer (re)send email from that same IP
address, or at all. Equivalently, they no longer request tracking of
messages previously sent from that same IP address; they no longer
send message notifications from that address; they no longer provide a
message store from that address; etc. And maybe "from that address"
is impertinent; it's not clear there's any need to design a
requirement that tracking or resending must come from the same IP
address into the system.

Continuing here, I'm assuming that many users don't read their inbox
frequently enough to be sure the sender is still online when they
finally see the message. So the general question here is what happens
when the sender has moved on before the reader finally gets around to
invoking their MUA and seeing the in-box? I don't mean in specific
cases, I mean what happens in terms of the ability of senders of large
amounts of legit email versus senders of UBE?

Looking at these issues, with vanilla SMTP, once a sender moves on,
the fact that her outgoing email has been fully accepted makes it that
much more complicated for recipients to handle the fact that, despite
senders having "moved on" and/or been determined to send UBE, all the
*accepted* email must still be delivered, dropped, or bounced. After
all, responsibility has already been accepted. And bouncing email
after accepting responsibility for it during an SMTP conversation is a
big problem, as we all know.

With IM2000, if that email has not actually read by a real person, it
cannot be read at all, as the sender has moved on. Once a few MUAs
detect that, they can notify the "collective" that *all* pending
notifications from that source are suspect. Here, responsibility has
not been accepted, but it's not clear whether false positives are
involved (the "moving on" might have been a legit laptop sending legit
email being disconnected or given a new IP address), so there's a
problem with the fact that the messages aren't actually available to
the reader (or a content-analysis engine).

With my system, until that email is *fully* delivered, the sender
retains responsibility for it. "Moving on" can therefore be
legitimately interpreted as no longer demonstrating interest in it.

But the message itself is (usually) available to the reader. If the
reader wants to accept responsibility for the message, or signal some
other error, that sort of "bounce" is communicated upstream to the MTA
(SMTP server) -- tagged to the unique ID for the message, whatever
*that* is, kept by the MTA --- so that it can respond accordingly to
the next tracking request (or, with ecrulisting, next delivery
attempt).

I believe this makes it much more feasible for a host to keep track of
"origins of interest" for incoming email, and thus notice when a given
origin, representing a sufficient number of emails whose contents,
have been accepted but not for which responsibility has been accepted,
suddenly stops requesting tracking information (for ecrulisting, this
means it stops retrying delivery).

Such an event doesn't result in *loss* of email, since, in my system,
exchange errs on the side of duplicate delivery and duplication of
effort, but it can result in determining that the source was not
sufficiently interested in delivery -- or that it was, but lost
connectivity. And it can do this based on a sampling of the pending
messages.

In the extreme (but likely frequent, in today's world) case, if an MTA
simply decides to drop *all* pending email from a given source, it can
do so in a fashion that either lets the sender discover the fact by
issuing new tracking requests (which either fail or yield "discarded
by server") or resending (which the MTA might /dev/null but not
disclose accordingly). Such mail is never "lost", however, since it
was never truly *sent*, in that responsibility was never
*transferred*, in the first place.

My proposal also can perform content analysis of the sort that is
performed today -- without, as IM2000 tends to do, disclosing the
existence of "listening" recipients -- because the "default" is to
accept the full message without accepting responsibility for it.

This is definitely *not* the sort of "hard system" that most everyone
else proposes to deal with UBE. It's more of a "soft" response that
allows for failure, in that it has fewer points of failure.

But "soft" responses, or defenses, can be much more effective against
an opponent that throws a flurry of misguided punches and then gives
up, only to move on to attack someone else, or return later. Each
such punch goes further, because it isn't repelled, and sucks the
attacker into spending more time and energy throwing punches that, in
the end, accomplish little more than never having engaged in the
attack in the first place.

There are difficult aspects to my proposal. With a new, clean design,
the challenges are mostly coming up with a design that won't have
aggravating aspects down the road; with SMTP, it'd be annoying to have
to take into account problematic aspects of that infrastructure,
including a biggie, the difficulty of uniquely identifying any message
*transmission* regardless of the *path* that message takes to reach a
given MTA. (E.g. "Received:" headers largely don't make a difference
in message uniqueness, but the "older", or more-upstream, ones, if
they disagree about the origination point, might be enough to denote
two delivery attempts as pertaining to two distinct messages even if
the contents are otherwise identical.)

And, as with IM2000, my proposal requires lots more close
(fine-grained) interaction between MTAs and MUAs. Probably something
new in place of mbox's and maildirs, for example. I don't know for
sure.

--
James Craig Burley
Software Craftsperson
<http://www.jcb-sc.com>

Brian Candler

2005-05-06 08:02:23 UTC

OK, sure. But then the question is - how do you *evaluate* that?

It's kind-of done now, but it works best for objective tests which can be
made from outside - e.g. "is this mail server an open relay?" If the test
fails you blacklist them, and if the IP owners wants to be removed, you can
easily re-test.

Things like "does this mail server owner manage their users properly?" is a
lot more woolly. Some blacklists will just blacklist any IP even if only one
or two pieces of spam are seen from it (regardless of whether millions of
non-spams are sent). But establishing "well run" involves either talking to
the users of that ISP, or going to that ISP and performing tests, or talking
to the ISP themselves, or otherwise gathering indirect evidence and weighing
up the probabilities.

The sort of things we're talking about are:

1. The mailserver operator knows the customer identity of every piece of
mail submitted into the system.

[I'm fine that it's either SMTP AUTH or some other mechanism, although the
most reliable one with dynamic IP will be SMTP AUTH]

2. This customer identity is carried forward *in the SMTP session* when
relaying to another host (e.g. as AUTH= parameter)

[That's to allow blacklists to operate on (relay,customerID) rather than
just relay address, in the case where the relay is trustworthy. Clearly it
can be tested whether this information is provided or not; whether or not
it's trustworthy information depends on whether the relay itself is
trustworthy. The same judgement would have to be made about IM2000 messages
stores]

3. The mailserver limits customers to sending a small number of messages per
day, unless the customer has established themselves as a bona fide sender of
large amounts of mail. [Same here; both SMTP and IM2000 could be extended to
indicate the number of messages sent recently by the same customer]

4. The ISP takes measures to limit the number of free signups [same]

Looking at this list, I think you're right that most of these weigh
similarly for IM2000 as for SMTP.

Post by James Craig Burley
So, let the sender track *all* those messages, as you suggest. Now
they're expending much more, in the way of their resources, per
*important* outgoing message than the typical sender of legitimate
email, *both* of whom get to shovel the *content* across the wire the
first time they try.

I still don't much buy "expending _much_ more"

I thought a bit more about this last night. In order to bypass 4xx
greylisting systems, I only need to store one *bit* of information for each
recipient!

Let's say a spam sending program accepts as its input a gzipped stream of
E-mail addresses. As it unzips this stream, it tries to send mail to each
recipient (handing them out to a pool of parallel processes of course).

Now, all I need to do is allocate one bit of memory for each recipient, and
set it to zero, as I unpack. When a successful delivery occurs, or a
definite failure, I set it to one. There's no indexing or hashing required;
the N'th E-mail address in the gzip file is associated with the N'th bit in
my bitmap.

When I'm finished, I rewind my gzip file and repeat the operation, but only
send mail to each recipient which still has a zero bit. Rinse and repeat.

I could handle resends for 100M recipients with just under 12MB of RAM, and
a trivial modification to my existing spam-sending program.

I don't even have to send identical copies of mail, with identical envelope
senders. All I need is a 'callback function' which synthesises, for a
particular recipient, an envelope sender and message body for that
recipient. The only requirement is that the callback function gives the same
results for the same recipient, such that when I rewind and retry, each
recipient sees an identical replay. (Greylisting systems tend to greylist
[sender,IP] tuples rather than just [IP])

With a little more care I could arrange for my resends to occur at set
intervals which more accurately mimick a real mailserver (e.g. spend 15
minutes delivering messages, then rewind and attempt to re-deliver the
failed ones from the first block, before moving on). It would then become
very difficult to 'fingerprint' a spam sending program based on its retry
intervals.

Post by James Craig Burley
With IM2000, if that email has not actually read by a real person, it
cannot be read at all, as the sender has moved on. Once a few MUAs
detect that, they can notify the "collective" that *all* pending
notifications from that source are suspect. Here, responsibility has
not been accepted, but it's not clear whether false positives are
involved (the "moving on" might have been a legit laptop sending legit
email being disconnected or given a new IP address), so there's a
problem with the fact that the messages aren't actually available to
the reader (or a content-analysis engine).

Yes, this is a problem with IM2000, and it quite strongly discourages the
use of mailstores on dynamic IP addresses. It could be achieved with dynamic
DNS, but that's unlikely to be a reliable solution. IM2000 mailstores *need*
to be online whenever someone tries to fetch mail from them, in the same way
that currently a POP3 server *needs* to be online when someone tries to read
their mail from it.

But it would be hard to infer from the non-availability of a mailstore that
it's not legitimate. There are plenty of legitimate but poorly-run
mailservers and networks out there already.

Post by James Craig Burley
With my system, until that email is *fully* delivered, the sender
retains responsibility for it. "Moving on" can therefore be
legitimately interpreted as no longer demonstrating interest in it.

I think that's the same. If you care about your message, you'll deposit it
in a well-connected mail relay on a fixed IP address.

Post by James Craig Burley
In the extreme (but likely frequent, in today's world) case, if an MTA
simply decides to drop *all* pending email from a given source, it can
do so in a fashion that either lets the sender discover the fact by
issuing new tracking requests (which either fail or yield "discarded
by server") or resending (which the MTA might /dev/null but not
disclose accordingly). Such mail is never "lost", however, since it
was never truly *sent*, in that responsibility was never
*transferred*, in the first place.

That means the recipient can see status as:
( ) Not transferred
(*) Transferred but not accepted (yet)
( ) Rejected
( ) Accepted

Effectively in SMTP, you have 1 (your SMTP relay may send you a warning if
the message is still on its queue), 3 (a bounce), and 4 (successful
delivery). State 2 is new, and a bit more woolly. The message contents have
been transferred - but if I'm genuinely interested in delivery, should I
attempt to transfer them again? Or is it sufficient just to keep polling for
status given a tracking ID?

If the majority of mail ends up in this state, then under your proposal the
sender of the mail will need to keep polling, just to prove that they are
'still interested'. That will be a big workload increase for all legitimate
mail, but spammers can easily match the raised bar too. I might need to
allocate *two* bits in my memory bitmap to properly record the state of each
message! However, if I have to remember a tracking-ID which was allocated by
the recipient, then that's some more work. I might end up keeping a spool
file on disk containing the state of each message. At 100 bytes each, my
100M messages might require 10GB of disk space to track (but only if they
all end up in this state; not-yet-transferred, rejected and accepted
messages don't require it)

And if I send my 100M messages using a network of 1000 0wned machines, then
the resources required per machine are cut down by a factor of 1000.

That does however suggest a sort of super-greylisting: instead of sending
back a tracking ID, if the recipient is suspect I send them back a huge
cookie (say 4KB of random data) and ask them to retry at least 2 hours
later. When they retry, they must return the same cookie. I don't waste
storage, because I just keep a cryptographic hash of the data, but the
sender is required to hold on to 4KB of crud for that period.

It's still not expensive enough though - 4KB times 1M recipients is only
4GB - and yet the extra sending and receiving of 4KB of data is costly to
the recipient as well as the sender.

"Hash cash" type challenges would be more effective; I send you a
cryptographic puzzle of known complexity, and won't deliver the message
until you send back the answer. This has the advantage of not requiring the
message to be delayed for a fixed amount of time; it is only delayed
depending on how quickly you solve the puzzle.

Hash-cash is currently difficult to deploy because either (a) all *clients*
need to implement it, or (b) all *mail relays* implement it, which is very
difficult because of the [intentionally] very high computation overhead when
sending large amounts of mail.

Also, to avoid expending hash-cash work except where necessary, you need a
reliable way to whitelist people you've communicated with before. In my
opinion, envelope-sender is not really good enough in the long term for
this, since it's trivially forgeable. Spammers *can* infer information about
who you communicate with, especially for people who use mailing lists.

But something like DomainKeys plus Hash Cash might work.

Post by James Craig Burley
There are difficult aspects to my proposal. With a new, clean design,
the challenges are mostly coming up with a design that won't have
aggravating aspects down the road; with SMTP, it'd be annoying to have
to take into account problematic aspects of that infrastructure,
including a biggie, the difficulty of uniquely identifying any message
*transmission* regardless of the *path* that message takes to reach a
given MTA.

You can get the recipient to allocate a unique ID, which is unique locally
to them, such that [recipient,ID] is globally unique. When the message is
relayed you'll get another pair; the sender then has to associate the
recipient's ID with their own ID, potentially forming a chain. This would be
what happened if you received a packet by FedEx and resent it using DHL :-)

Or you can just get senders to allocate IDs, so that [sender,ID] is a fixed
unique key. That's what Message-ID: is supposed to do, although guaranteeing
uniqueness is hard. <Message-ID:, submitter-IP-address, submitter-identity>
should be enough, and if kept in this form, would give useful information
for evaluating the source too.

Regards,

Brian.

James Craig Burley

2005-05-06 19:20:54 UTC

OK, sure. But then the question is - how do you *evaluate* that?

That's the central question: how do you evaluate whether a third party
(a relay or mailstore) is *really* doing a good job of preventing UBE
from being intermixed with legit email?

I don't believe there's much point in doing it by analyzing the
specific tactics used or choices made by that third party, because
that amounts to micromanaging its operation.

Post by Brian Candler
It's kind-of done now, but it works best for objective tests which can be
made from outside - e.g. "is this mail server an open relay?" If the test
fails you blacklist them, and if the IP owners wants to be removed, you can
easily re-test.

Open relay tests had their short-term utility in the past, but are,
conceptually, of little use. It doesn't *really* matter whether a
third party is an "open relay" in the technical sense; what matters is
whether it openly relays a substantial % (versus all email it relays)
of UBE.

Pro-actively searching for, and then blacklisting, open relays wasn't,
IMO, fruitful because they were open relays -- it was fruitful because
the *fact* that they were open relays strongly suggested that their
admins were lax about securing their systems against abuse by internal
or external entities, so blocking an open relay served to get the
admin's attention to a growing problem.

Post by Brian Candler
From a reliability point of view, the fact that an important email can

be freely and quickly routed through one or more immediately available
(and thus perhaps "open") relays is a *plus*.

And with a better-designed email transport protocol, a relay could be
truly open and yet still have plenty of incentive to be sure it isn't
exploited to relay UBE. (For one thing, as more anti-UBE tactics such
as blocking based on source IP are able to look "through" the IP of
the connection client, which might pertain to a reasonably trusted
open relay, to the IP of its upstream injector in the topmost
pertinent "Received:" header, and block the message based on whether
*that* IP is in the blacklist.)

Post by Brian Candler
Things like "does this mail server owner manage their users properly?" is a
lot more woolly. Some blacklists will just blacklist any IP even if only one
or two pieces of spam are seen from it (regardless of whether millions of
non-spams are sent). But establishing "well run" involves either talking to
the users of that ISP, or going to that ISP and performing tests, or talking
to the ISP themselves, or otherwise gathering indirect evidence and weighing
up the probabilities.

Right. So I don't see SMTP AUTH as an especially persuasive solution,
nor its slow uptake as a particular problem.

Post by Brian Candler
1. The mailserver operator knows the customer identity of every piece of
mail submitted into the system.

[...]

Post by Brian Candler
2. This customer identity is carried forward *in the SMTP session* when
relaying to another host (e.g. as AUTH= parameter)

[...]

Post by Brian Candler
3. The mailserver limits customers to sending a small number of messages per
day, unless the customer has established themselves as a bona fide sender of
large amounts of mail. [Same here; both SMTP and IM2000 could be extended to
indicate the number of messages sent recently by the same customer]
4. The ISP takes measures to limit the number of free signups [same]
Looking at this list, I think you're right that most of these weigh
similarly for IM2000 as for SMTP.

Right. More generically, the issue boils down to, do you trust an
immediately upstream (but external) relay/mailstore to give you
reliable information (even if rendered opaque via a hash or unique ID)
identifying the entity immediately further upstream?

If you do, you can be more generous about accepting incoming email
from that entity even if you might believe some of it could be UBE, as
long as *you* can learn to distinguish among *its* upstream sources
with regard to whether *they* send UBE and/or you can, via some
reliable means, notify the immediate upstream source that its own
upstream sources are sending UBE (or even ham, since positive feedback
is helpful too).

Yup. But storage isn't the issue; it's cheap as beans.

Post by Brian Candler
With a little more care I could arrange for my resends to occur at set
intervals which more accurately mimick a real mailserver (e.g. spend 15
minutes delivering messages, then rewind and attempt to re-deliver the
failed ones from the first block, before moving on). It would then become
very difficult to 'fingerprint' a spam sending program based on its retry
intervals.

*If* you assume that the typical spammer will have sufficient
incentive to write and/or buy specialized software to inject huge
numbers of emails into the system, then you're effectively assuming
that sending spam *can* be cheaper than sending ham:

- It's somewhat cheaper to send a message and not track delivery at
all; but recipients can easily notice that.

- It's a bit cheaper to trivially track delivery than to do so in
the usual fashion; but recipients might notice that pattern as
well.

I don't disagree that *some* spammers would use such tactics.

I'm saying, I'm assuming spammers can, ultimately, easily afford to
use whatever off-the-shelf software implements my proposal. In fact,
it's inherent in my proposal that an implementation *will* offer a
"blast-it-once-and-forget-it" mode to cater to senders of SBE
(Solicited Bulk Email) who are sending messages that, if they aren't
accepted and read after one delivery, no biggie (maybe they are simply
notifications of temporary conditions). So spammers will have a range
of tools to use, from the beginning, without having to pay $$ for them
up front. (A controversial idea, to be sure, but, again, it's
inherent, and it helps me avoid the dangers associated with assuming
anything akin to Security Through Obscurity in developing my
proposal.)

That is, they can easily afford, simply because of economics, to buy
big-enough iron to send, say, 1G emails a day, without needing special
spam software, because the vanilla email software works just fine and,
naturally, from the point of view of the recipient, does not appear to
exhibit any suspicious behavior on a per-piece basis.

What I think spammers *can't* afford is to keep this software running
for days or even weeks as it continues to track the status of
deliveries, while the authorities are hunting them down as a result of
so many of those 1G emails making it into spamtraps, and while so many
admins have already updated their *local* IP blacklists so that
messages coming from them are *never* confirmed as having reached the
recipients' eyes.

So, spammers will still be in a hit-and-run business, and *if* they
use specialized software to do el-cheapo delivery, that'll be easy
enough for ordinary receiving software to notice.

But after they've hit and run, the fact that they "ran" will be almost
as evident as it would be with IM2000, except they'll have already
made copies of outgoing messages for recipients to review, to submit
to content analysis, to submit to authorities, etc.

If spammers can flourish despite all that, the *only* way I can see
that IM2000 would stop them is if it *required* the sender provide a
domain name for callback *and* if domain names were locked down even
more tightly than the IPv4 address space.

The former makes IM2000 less reliable as a mail protocol than either
SMTP or my proposal. The latter ain't gonna happen; in fact, it is
more likely that the IPv6 address space, or something similarly large
and intractable, will ultimately replace the IPv4 space (or,
equivalently, that the IPv4 space will become more complicated,
insofar as more and more addresses will be NATs and thus "mixed
blessings").

My IM2000 mailstore would be on a dynIP address, if it existed today,
since that's how my system presently is hosted. I've had three, maybe
four, distinct IP addresses over the past two or more years. So it
isn't a *big* additional point of failure, but you're right that it
*is* one.

In "my" world, my dynIP host is a fine candidate for sending outgoing
email to other hosts. Even AOL might someday accept it directly,
under my proposal, since it'll be easier to distinguish my hosts'
"mail-care behavior" from those of 0wned machines, which necessarily
run specialized software that tries to stay under the radar of the
machines' true owners.

Post by Brian Candler
But it would be hard to infer from the non-availability of a mailstore that
it's not legitimate. There are plenty of legitimate but poorly-run
mailservers and networks out there already.

True, and that statement is, IMO, the kiss of death for IM2000, since
non-availability of a mailstore means non-readability of mail!

I think that's the same. If you care about your message, you'll deposit it
in a well-connected mail relay on a fixed IP address.

Not necessarily. *I* will try to inject the messages I care most
about directly to the SMTP server listed in the MX record for the
receiving domain.

Why? Because I trust *my* system setup more than I trust it *plus* my
provider's SMTP relay, because I can more easily track the progress of
outgoing SMTP sessions, and because I can therefore more quickly know
when to try another method to get my message to its destination.

Remember, a third party necessarily knows *less* about the "profile"
of a message's importance. So the frequency of its tracking requests,
the choices it makes regarding how quickly to try alternate routes,
and so on, is unlikely to agree with the choices *I* might make for a
given outgoing message.

Yes, a protocol could allow such a profile to be relayed along with
the message itself, in the hopes that the third party (your
"well-connected mail relay") will honor it. But it might not be
willing or able to do that, even if all the protocol goo is correctly
designed and implemented, which is not a trivial task in the first
place. (Essentially, such a protocol ends up being little different
from "here's a bash script to have crond run every N minutes
until...".)

Whereas, as long as the outgoing message is being delivered and
tracked *locally*, there's little difficulty in an MUA providing
buttons such as "Track", "Redeliver", "Deliver Via Alternate Route",
and so on, so its user can *personally* care for an outgoing message,
if the default tracking policy for that MUA isn't suitable for that
particular message.

(That is, ultimately, what *I* want as an end user sending email. I
want buttons and displays so I can directly track and muck with my
*personal* outgoing messages -- I'm not speaking as a sysadmin here --
so I want my *MUA* injecting messages directly to the SMTP server
listed as an MX, even if it simultaneously injects them to a local or
remote, e.g. upstream, MTA, should direct delivery fail before I
disconnect my laptop. And I want a similar degree of control over
incoming messages as a reader -- I want to be able to decide whether
to accept or reject messages and on what basis, and "bounce" them or
whatever my upstream MTA can do on its own.)

( ) Not transferred
(*) Transferred but not accepted (yet)
( ) Rejected
( ) Accepted

Yes.

Post by Brian Candler
Effectively in SMTP, you have 1 (your SMTP relay may send you a warning if
the message is still on its queue), 3 (a bounce), and 4 (successful
delivery). State 2 is new, and a bit more woolly. The message contents have
been transferred - but if I'm genuinely interested in delivery, should I
attempt to transfer them again? Or is it sufficient just to keep polling for
status given a tracking ID?

The former in current SMTP, the latter in a new protocol (SMTP++ or
something entirely new), for as long as the sender is satisfied with
the progress of the delivery overall.

Since the system relies more on dup detection and elimination (which
becomes nearly trivial under an entirely new protocol), the sender
*could* decide to attempt to transfer the message again, but via a
different path (say, to a backup MX, even to a different recipient,
such as to a user's home address instead of a work address, in case
they are working at home).

While it's tempting to frown on multiple deliveries using different
paths, anyone interested in *reliable* and *immediate* delivery of
emails should welcome a system that *allows* that sort of thing from
the get-go, and leaves it to market forces to decide under what
circumstances it's used.

Post by Brian Candler
If the majority of mail ends up in this state, then under your proposal the
sender of the mail will need to keep polling, just to prove that they are
'still interested'.

No. That might *indicate* such interest. But if the sender isn't
genuinely interested in the email to *that* degree, there's no need to
track the delivery, is there? A quick answer to someone's question on
an email list is really more of interest to the *recipient* than the
sender, so the sender doesn't really care if it reaches the
destination.

And, to be clear: with my ecrulisting proposal, only ecrulisted email
ends up in this state, but with my *full* proposal, email will
*usually* end up in this state, unless it's coming from whitelisted
senders, which I'm trying to not rely on to assume a workable system
(though it's trivial to add).

Post by Brian Candler
That will be a big workload increase for all legitimate
mail

I've already addressed that in detail. In short, it is *hardly* a big
workload increase, if compared to coping with the uncertainty of not
knowing whether delivered email reached its destination, with bounces
coming back, and especially with joe-job bounces.

Also, my new proposal can be implemented less expensively by making it
behave a lot more like vanilla SMTP, if desired, on a per-site basis.
(So, a message is deemed "accepted" once it reaches the equivalent of
a POP3 or IMAP box, even though the MUA or user who fetches it might
later decide it's spam.)

But I'm trying to think in terms of the messaging facility we
*ideally* want to use, as end users, for the next several decades, as
millions more new users come online, as existing users improve their
behavior, and so on.

Post by Brian Candler
but spammers can easily match the raised bar too.

Addressed previously. True for all designs; *inherently* least true
for my proposal, since spammers have to send *all* message contents at
least once (despite the vast majority of recipients being unlikely to
ever read them), as with SMTP but not IM2000, and would have to then
track *all* deliveries in a typical fashion to avoid recipient's
detecting lack of interest in incoming email from a previously unknown
source, which IM2000 provides in its own way, but not SMTP.

In short, I believe my proposal actually *lowers* the bar for senders
and recipients of legitimate email compared to today's SMTP (vanilla
SMTP + all the $#@!% that's going on to try to stop UBE), and spammers
have to work harder to match that *lowered* bar -- and they won't be
able to do so as economically as they've been with SMTP and as they
probably would with IM2000 (which raises the bar for *everyone*, due
mainly to requiring third-party mailstores, besides requiring all-new
software).

Post by Brian Candler
I might need to
allocate *two* bits in my memory bitmap to properly record the state of each
message! However, if I have to remember a tracking-ID which was allocated by
the recipient, then that's some more work. I might end up keeping a spool
file on disk containing the state of each message. At 100 bytes each, my
100M messages might require 10GB of disk space to track (but only if they
all end up in this state; not-yet-transferred, rejected and accepted
messages don't require it)

Again, this might all be true, but I'm not basing my proposal on the
assumption that spammers can't afford the iron and connectivity to
send *their* payloads via my proposal.

Post by Brian Candler
That does however suggest a sort of super-greylisting: instead of sending
back a tracking ID, if the recipient is suspect I send them back a huge
cookie (say 4KB of random data) and ask them to retry at least 2 hours
later. When they retry, they must return the same cookie. I don't waste
storage, because I just keep a cryptographic hash of the data, but the
sender is required to hold on to 4KB of crud for that period.

Yes, recipients can play all sorts of fun games like that, especially
with a new protocol, which I have partially designed in my head with
an eye towards just that sort of fun. ;-)

For one thing, legitimate responses to a tracking request might
include:

- No response (always an option for recipient)

- Not yet read (an obvious one)

- Read (another obvious one)

- Responsibility accepted by recipient (obvious)

- Lost contents, please resend (of course, but will sender resend?)

- Never heard of it (maybe never sent?)

- Send future tracking requests to [set of hosts...]

- Please resend message through one or more of [set of hosts...]
(like a per-delivery, or even per-tracking, MX)

- Send SHA1 of message (to double-check local contents are correct)

With responses such as these, which senders can legitimately ignore or
honor with less worry over complications of the sort we got into
discussing IM2000, recipients -- especially multiple independent, but
cooperating, recipients -- can play all sorts of games with senders to
make sure they *really* want to get their messages through.

Now, off-the-shelf implementations would probably implement senders so
they automatically honored such responses to all such tracking
requests, and *possibly* in a way to allow some degree of local
(sender) control over the extent to which such responses are indeed
honored (e.g. "don't bother resending this message, I don't want to
chew up my outgoing bandwidth with these pics").

So spammers would have to either tune these implementations so they
kept their resource utilization to a minimum as recipients began
figuring out they could try making the spammers' systems dance for
them, in which case they'd risk more recipients quickly figuring out
that they *are* sending spam (for one thing, they don't dance!)...

...or they'd run them full-blown, in which case they'd give recipient
systems much more control over their systems' resource utilization
than SMTP presently allows. (It'd be in IM2000 territory.)

And this is all without letting *senders* control *recipients* any
more than they do with SMTP (and even a bit less) or as much as they
might be able to do with IM2000 (since mailstores, including those run
by spammers, can twiddle their thumbs and the like in response to
requests to the stores).

That is, *none* of the responses listed above require the recipient to
keep much of anything in the way of state around for the message. No
open TCP connection is required, for example, until a sender responds
to, say, a "send SHA1 of message" response with a new connection
saying "here's the SHA1 of message #dshjk453dsk, which you asked for".

However, the design could certainly allow for *all* such exchanges to
take place within a single TCP session, and I'd probably design it
that way, as well as to allow a hit-and-run-style delivery where the
entire message, including a *sender-provided* ID, are thrown down a
newly opened connection and then the connection is closed by either
party. (That would be super-fast delivery indeed, AFAIK. Could even
be done via a UDP packet in many cases. Interested senders woud
presumably resend the same message once or twice more, and/or send a
tracking request, though not necessarily.)

Post by Brian Candler
It's still not expensive enough though - 4KB times 1M recipients is only
4GB - and yet the extra sending and receiving of 4KB of data is costly to
the recipient as well as the sender.

Ah, but the recipient can decide they have excess bandwidth and CPU,
perhaps even to pretend they have many more users than they do. *I*
certainly do that, basically without trying -- you should see my email
logs -- yet my old Pentium II 233MHz is keeping up with it without
breaking a sweat.

We do this with SMTP and would presumably do it with IM2000 as well.

But with my proposal, receiving email is cheapest of all, so it's more
"rewarding" for sites to have spam traps, to have those traps behave a
lot like real users, etc., in terms of the low cost of doing this
*versus* the higher cost to spammers.

You can get the recipient to allocate a unique ID, which is unique locally
to them, such that [recipient,ID] is globally unique. When the message is
relayed you'll get another pair; the sender then has to associate the
recipient's ID with their own ID, potentially forming a chain. This would be
what happened if you received a packet by FedEx and resent it using DHL :-)
Or you can just get senders to allocate IDs, so that [sender,ID] is a fixed
unique key. That's what Message-ID: is supposed to do, although guaranteeing
uniqueness is hard. <Message-ID:, submitter-IP-address, submitter-identity>
should be enough, and if kept in this form, would give useful information
for evaluating the source too.

Right. Again, that's in the details.

I think I should restate my essential *economic* point here, which was
buried in my previous email.

In my view, the right approach is *not* just making it inherently more
expensive to send UBE. If that was the right approach, then the
solutions would be obvious, and in fact are often put forward: require
AUTH, require $$ to send email, require using one of a few
well-trusted relays/stores, etc.

In fact, I don't really care whether it becomes *super-cheap* to send
UBE. As far as I can tell, it's *already* so cheap that it'd be
nearly impossible to *practically* raise the cost by any factor to
make a useful difference.

The economic reality is that senders of UBE depend on having lots of
recipients willing, eager, and *able to afford* to buy products
advertised by UBE. So, while senders want sending UBE to be as cheap
as possible for *them*, they also have an interest in making sure that
*receiving* email is cheap enough for their target market, and in not
themselves being drowned out by others offering apparently identical
services. (This almost perversely argues *for* IM2000, in that
spammers won't find it a particularly rewarding environment -- but
only because few *legitimate* users will find it so. ;-)

Accordingly, they really don't care all that much about the
"cheapness" of sending UBE. What they do, as a *group*, is simply
select the least expensive way they believe will reach the widest
possible target audience of *buyers*. If that's running some
specialized software via 0wned machines, they'll do that; if it's
running ads in newspapers, they'll do that.

And, naturally, if sending UBE is free, then sending non-UBE is likely
to be free as well. It's important, for spammers, to be sure their
messages are *intermixed* with "desireable" messages, just as
advertisers prefer putting ads in newspapers over distributing
standalone flyers, putting commercials in popular TV shows over
running infomercials on obscure cable-TV channels, and so on, all else
being equal.

(The email equivalent of this might be those advertisements I
mentioned earlier, included in headers for emails sent via "free"
providers. If we move towards a model of having only a few
well-trusted relays or mail stores, there'll be economic pressure for
those stores to offer free accounts that result in such intermixed
advertising, as in "Received: from brian by ... brought-to-you-by
Pepsi <http://www.pepsi.com>". I don't want this future. ;-)

So, suppose sending UBE is essentially "free", in terms of iron and
connectivity costs. The real expense comes with running the business,
owning *useful* domain names or other points of presence to sell the
goods being advertised, and avoiding legal trouble. (For legitimate
businesses, the latter is not necessarily less of an expense, sadly;
but that's another issue. The main thing is, senders of legit email
don't have to worry about having to constantly relocate in IP space as
well as meatspace.)

In a free-to-send-UBE world, spammers can send, say, 1T (1K*1G) pieces
of UBE each and every hour if they like.

What's the *real* problem with that? Two biggies:

1. The infrastructure handling *receiving* emails might be unable to
cope with the load, so legit email wouldn't get through.

2. Those reading email have trouble finding the few emails they
really care about among the huge amounts of UBE they (mostly)
don't, so legit email isn't always read.

Focusing on #1, I say let's think in terms of a system that makes
*receiving* email as cheap as possible, at the (possible) expense of
the sender.

That changes the balance of economics back to being in favor of
recipients, but it also potentially changes it in favor of senders of
legitimate email.

Why? Because senders of legitimate email already have a demonstrable
interest in a recipient reading their email (in most cases), otherwise
they wouldn't send it or consider it "legitimate". So their
expenditure-to-desire ratio ends up being smaller, in relation to
spammers, with my proposal than with SMTP or IM2000.

Now, the easiest and cheapest way I can think of to receive an email
is to accept an *incoming* connection delivering it, after which I can
take my own sweet time deciding what to do with it.

I don't want to have to look up any return address to provide status
information (as in a bounce or DSN). I don't want to have only a
short window of time in which to respond definitively or risk
duplicates or redelivery (as with the response to the SMTP DATA
phase). I don't want to have to respond *at all* to the delivery or
to the sender's subsequent tracking requests.

How can it get any cheaper than that? Yet, even if I don't do any of
the things above, I can still receive a complete message, display it
for an end user via an MUA, and the user can act on it as she sees
fit.

Now, SMTP doesn't quite fit the bill, but comes reasonably close.
IM2000 doesn't fit it all that well either, but at least I can choose
to not even receive a particular message if I don't want (though
there's some question whether this is practically different from
SMTP), the downside being that, if I *do* want it, *I* have to go to
the trouble of looking up the "mail store" in my phone book,
connecting to it, etc.

My proposal, especially as a brand-new protocol, would make
*receiving* email fundamentally as cheap and easy as possible,
assuming a deterministic universe (no mind-reading allowed ;-).

On top of that *foundation*, all the stuff we presently add, or might
add with IM2000 -- possibly even the option of a recipient of a
message notification saying "don't send me the message, tell me where
I can retrieve it [a la an IM2000 message store]" -- can be added, as
desired.

But, the foundation is now tilted as much as possible in favor of the
"recipient class", without in any way deliberately hurting the
"sending class" (although recipients might choose to make senders jump
through various sorts of hoops, of course, as they have more control
under this system than under SMTP).

Now, here's the *crucial* advantage of the economics of my proposal.

If it's cheap as possible to be a member of a recipient class, and if
it remains reasonably cheap to be a member of a *sending* class, then
market forces will continue to favor making email an inexpensive, yet
reliable, form of nearly-immediate communication...

...but the most expensive "membership" in such an environment would be
to that of the *bulk* sending class.

This is the central insight that led djb to propose IM2000 (and others
to propose nearly-identical systems) -- that bulk sending needs to be
inherently more expensive, *relative* to sending and receiving
non-bulk email -- but it makes the relationship between sending UBE
and sending or receiving *all* other kinds of email even *more* tilted
against the sender of UBE.

As the expense of ordinary exchange of email drops across the board,
that frees up capital for other things, such as improved anti-UBE
measures, improved vetting of new users by ISPs to be sure they
haven't previously engaged in spamming, etc.

Meanwhile, the expense of sending (or relaying) UBE goes *up* with my
proposal, because either more and more outgoing messages must be
redelivered more times (SMTP with ecrulisting, greylisting, etc.) or
must be tracked to try and convince recipients they really are
important (with my proposal), either of which implies a *longer-term*
commitment to sending email (a commitment which a sender of legit
email inherently makes nearly 100% of the time).

Under that design, senders of UBE find themselves expending more
resources to be sure their able to send to only "live" people, not
spamtraps; to demonstrate long-term interest in their outgoing email,
so daemons monitoring mailboxes on behalf of users who only
occasionally check their mail won't tell MUAs for other, more active,
users that there appears to be an overall lack of interest
demonstrated for email sent from IP address x.x.x.x; and to avoid
being arrested for spamming by virtue of the fact that they're having
to stay online longer in order to demonstrate all that interest for
all that outgoing email.

This won't eliminate UBE. Nothing will -- or, if it can, it can with
my proposal in place at least as well as with SMTP and probably if
IM2000 was in place.

Instead, UBE will be forced, over time, to go more "upscale", in that
we might still see Unsolicited Email that is less "Bulky" (it's more
targeted) advertising higher-margin products (real Rolex watches
instead of fake ones ;-), etc.

Such economic pressures should make UBE, as a whole, less prevalent in
legitimate users' mailboxes -- which, besides lowering the costs to
run legitimate email operations, is the whole point.

So, when I think in terms of the three fundamental technologies under
discussion -- SMTP, IM2000, and my design (which evolved purely from
these lines of thought; I didn't just "invent" it and then try to
figure out how to justify it, believe me!) -- I try to picture them
both "naked" and "clothed" with similar anti-UBE measures, in order to
compare apples to apples.

By "naked" I mean no blacklists, no spamtraps, etc., a world not too
different from Internet email circa, say, 1987.

But I add, to the MUA, the ability for a recipient to indicate "I read
this message, [save/discard/print] it, it's legit" versus "I haven't
really read this message yet, even though it's staring at me on the
screen", versus "I've decided this message was Unsolicited [and is
maybe the 100th message I've seen just like it, so it's probably
Bulk]". This ability is becoming more important anyway; reasonable
defaults would be provided, and might even change based on "regions"
of an in-box (lower-priority regions, being mostly UBE, would perhaps
default to "I haven't really read this message").

And I assume it's generally important for a *sender* to know whether a
recipient has actually had an opportunity to see the message, so the
sender can decide whether to take some other approach to being sure it
gets to the recipient (phone call, avian carrier, etc.).

SMTP doesn't fare well here due to opaqueness to the sender and lack
of control by the recipient. Only some traditional guarantees
regarding bounces mitigate that somewhat; but *recipients* aren't
required to "bounce" messages they don't like or don't get around to
actually reading, even though they're sitting in their in-box.

IM2000 fares pretty well. It also pretty much demands this extra
level of sophistication for MUAs and end users (or at least enough of
them to make IM2000's assumptions concerning anti-UBE measures work).

But, in terms of transport *quality*, my proposal gives the sender and
the recipient end-to-end communication regarding the progress of the
message, and the recipient *full* control over such communication.

Neither SMTP nor IM2000 come close to that, though in certain corner
cases, each offers a superior benefit (which my proposal can obviously
offer as extensions to its foundation; they're just not
*foundational*, or *inherent*, to it). For examples, SMTP offers the
sender the prospect of a rapid bounce in certain useful cases; IM2000
offers the sender the prospect of being pro-actively notified, by the
recipient, that it will accept responsibility for a message
("unpinning" the message). In my proposal, both notifications don't
occur until the *sender* sends its next tracking request, in terms of
the foundation of the system.

And, in terms of transport *costs*, my proposal is cheapest for the
recipient, by a fairly substantial margin; for messages that go unread
for a time, potentially much less expensive for *senders* as well
(since they need only track, not re-send, messages).

(I'm relying an important distinction between SMTP and my proposal
here. With SMTP, you usually don't get a bounce if a message makes it
into a POP3 or IMAP mailbox, even if the users' MUA decides it's spam
or the user never gets around to reading it. Such bounces would
improve the transport *quality* of SMTP but, if widely-enough
implemented to do that, would greatly increase the *costs* of SMTP.
So, comparing apples to apples, SMTP would have to provide such
bounces across the board to equal my proposal, even though it doesn't
do so today and is thus "cheaper".)

Since my proposal obviates the need for bounces as does IM2000, joe
job are eliminated, and each host's capacity for handling incoming and
outgoing email can be speced based on *actual* expectations for
*legitimate* outoing email plus legitimate incoming email plus a
certain amount of incoming UBE.

There's no need to spec for receiving joe-job bounces; no need to spec
for hosting a mail store; no need to spec for sending lots of bounces
after accepting too much email when blacklists are unreachable; etc.

Just spec for (on top of merely sending and receiving legit message
contents) tracking legit outgoing email and for accepting and
processing tracking messages for incoming email. And tracking
requests are small, cheap, and not usefully forged in either direction
(or so I hope!).

Of course, since there are no bounces, the overall expense of the
average message remains more in line with its *existence*, not its
*size*, as there's never a need to *return* the entire message, or
even a portion of it, to a sender, as bounces are designed to do.

Most of all, though my proposal still *allows* for "hit-and-run"
deliveries, which are the norm for SMTP but don't work for IM2000,
they *aren't* the norm, and senders are in fact given, and encouraged
to use, tools to demonstrate ongoing interest in outgoing emails,
which will tend to separate the spammers from the rest of the senders
unless the spammers decide to "normalize" in terms of technology and
presence on the Internet.

Remember, "demonstrating interest" is not intended by me in the sense
of hashcash or similar ideas -- it is not a way to *artificially*
increase sender costs. It is part and parcel of a sender's legitimate
interest in tracking the progress of an outgoing message, assuming the
sender really has such an interest.

If the sender has no such interest, no such cost is paid, and
legitimate messages from popular senders (offering advice, or
responding to previously sent emails) will tend to fare well in
delivering their messages, while unknown or unpopular senders (such as
spammers) will not.

(False positives aren't a problem with my proposal compared to SMTP,
since a message is, presumably, never fully accepted and *then*
discarded as spam, unless that's what a recipient *wants* to do. SMTP
servers accept and silently drop messages all the time these days.)

But if the sender has a legitimate interest, demonstrating that
interest by tracking the message is a reasonable, low-overhead way
versus being prepared to receive a bounce.

--
James Craig Burley
Software Craftsperson
<http://www.jcb-sc.com>

Brian Candler

2005-05-08 09:49:23 UTC

Post by Brian Candler
If the majority of mail ends up in this state, then under your proposal the
sender of the mail will need to keep polling, just to prove that they are
'still interested'.

That I disagree with strongly. If I didn't care whether the recipient got
the response, I wouldn't have spent the time to compose it in the first
place.

(It actually annoys me when I send a mail in response to a mailing list, and
get the direct copy to the author bounced because they used a fake From:
address, or they use TMDA, or some other filter which rejects direct mail.
Posting a message in a public forum with a reply address IMO solicits a
response, and it is very impolite to reject that response.)

Anyway, I think the point is this: if I send a mail, I *do* want it
delivered. It's up to the mail system to do whatever is necessary to get it
delivered, or let me know if it cannot.

The SMTP world specifies various rules, or hoops the mail system must jump
through if you like, to perform a successful delivery. For example, if it
cannot connect to the remote SMTP server, or it can but it gets a 4xx
response code, then it must back off and try again later. Given that most
mail deliveries (perhaps over 99% ?) are immediately successful, then it's
true that some spam sending programs don't bother to implement jumping
through some of the less frequently-met hoops, and so greylisting was born.

But I think the fundamental problem with your proposal - if I understand it
properly, which I may not as it seems to be quite complex - is that you
would like to see a new set of hoops to jump through to ensure successful
delivery of mail. These hoops may involve being given a tracking number, and
having to poll status using that tracking number - potentially for a long
period of time after sending, if the recipient chooses not to confirm that
the message has been successfully delivered.

Now, the trouble is, everyone will have to write their mail sending system
to be able to jump through those hoops, because when I press "send" on my
mail client, I expect the mail system to do the *utmost* to get it
delivered. There's no point having another button which says "send but don't
try very hard". Why would I ever push that button when I could push "send"
to have a better chance of my message being delivered?

Now, because this new mail architecture provides more hoops to jump through
which are met more often, then people who write mail sending software
(including spammers) will be forced to make their code jump through those
hoops. That is, greylisting will become *less* effective, because the new
mail system puts a stronger requirement on senders to retry when the mail is
not completely delivered, and so spammers will do the same.

Post by Brian Candler
but spammers can easily match the raised bar too.

But that's not a very high bar, is it? I mean, it's the same as what we have
now (i.e. spammers have to send a copy of every mail to every recipient),
plus some subsequent probing to indicate "interest" in the message delivery,
which is easily added.

If this modified SMTP doesn't provide any better anti-forgery, or any better
tracking of spam to original source, then it's hard to see what benefits it
offers. And if it's SMTP just with added complexity, then the complexity
itself will become a problem as there are more corner-cases which may cause
failures; that is, if there are 10 different responses to receiving a
message which the recipient may give, and one of those is very rarely used,
then you may well find it doesn't interoperate very well. Even SMTP which
just has 2xx (OK), 5xx (fail) and 4xx (tempfail) has this type of problem;
you will find mail systems which do stupid things if you give a 4xx response
to a EHLO or a MAIL FROM, so in the end to maximise interoperability you
have to accept EHLO and MAIL FROM, and give 4xx/5xx responses to each RCPT
TO recipient (even if what you were trying to do was to reject the MAIL FROM
sender)

Post by James Craig Burley
(The email equivalent of this might be those advertisements I
mentioned earlier, included in headers for emails sent via "free"
providers. If we move towards a model of having only a few
well-trusted relays or mail stores, there'll be economic pressure for
those stores to offer free accounts that result in such intermixed
advertising, as in "Received: from brian by ... brought-to-you-by
Pepsi <http://www.pepsi.com>". I don't want this future. ;-)

Absolutely. There's no such thing as a free lunch - and you only use
free hotmail-type services if you don't mind your correspondence being
polluted in this way.

Post by James Craig Burley
So, suppose sending UBE is essentially "free", in terms of iron and
connectivity costs.

Agreed

Post by James Craig Burley
1. The infrastructure handling *receiving* emails might be unable to
cope with the load, so legit email wouldn't get through.
2. Those reading email have trouble finding the few emails they
really care about among the huge amounts of UBE they (mostly)
don't, so legit email isn't always read.
Focusing on #1, I say let's think in terms of a system that makes
*receiving* email as cheap as possible, at the (possible) expense of
the sender.

Hmm, except (1) is OK at the present - that is, E-mail infrastructure copes,
even with >50% of all mail being spam. (1) is a problem for ISPs, but can be
solved by suitable spending of (their customers') cash. (2) is where the
real problem lies, for end-users at least.

Post by James Craig Burley
Now, the easiest and cheapest way I can think of to receive an email
is to accept an *incoming* connection delivering it, after which I can
take my own sweet time deciding what to do with it.
I don't want to have to look up any return address to provide status
information (as in a bounce or DSN). I don't want to have only a
short window of time in which to respond definitively or risk
duplicates or redelivery (as with the response to the SMTP DATA
phase). I don't want to have to respond *at all* to the delivery or
to the sender's subsequent tracking requests.
How can it get any cheaper than that? Yet, even if I don't do any of
the things above, I can still receive a complete message, display it
for an end user via an MUA, and the user can act on it as she sees
fit.

OK. That's what we have now, minus the definite acceptance or rejection of a
mail at transfer time.

SMTP is already cheap to deliver to, and already people have been adding
various hoops to jump through (e.g. validating EHLO domain, validating MAIL
FROM) - some of which are arbitary, since the EHLO domain should not need
"validating" in the first place, and indeed the RFCs tell you that you must
not. Your proposal needs either to be able to convince people to drop their
arbitary hoops, or give them a new set of standardised hoops to jump through
- hoops which minimise the risk of rejecting legitimate mail, which
unfortunately current ones often do.

I see where you're coming from about leaving policy up to individual sites,
but unfortunately that leaves those sites free to choose very bad policies.
For example, as I said before, there are sites which reject all mail which
has an envelope-sender of ***@pobox.com (and so my mail is blocked). That's
based on the stupid assertion that because they've seen some mail which had
MAIL FROM:<***@pobox.com>, then pobox.com is a spammer's domain.

In fact the opposite is true - pobox.com controls its users very tightly
indeed - but spammers have been forging mails with that domain.

The trouble is that the mail system purports to give some information about
a message (the sender/origin), when in fact that is NOT information, it's
mere heresay.

So more important to me is that any "information" which a receiving site
might use to validate or reject a message, is correct.

Now, even if your proposal removes the idea of a sender address and bounces
altogether, what you have instead is that the whole message headers and body
may be received before you analyse whether to accept or reject the message.
That means, for example, that people will do the same sort of stupid
filtering on the From: header that they used to do on the MAIL FROM
envelope. And of course, forged From: headers have another particular
problem with "phishing"-type attacks.

So I think it's essential that any proposal include ways to inherently
validate this information. IM2000 does because you *have* to use the sender
identity to collect the mail; if it's forged, mail collection doesn't work.
Something like DomainKeys on top of SMTP tries to do the same, although it's
heavyweight and take-up of anything relying on public-key cryptography has
historically been poor.

Post by James Craig Burley
As the expense of ordinary exchange of email drops across the board,

would it drop very much, compared to SMTP? The bandwidth and disk space for
receiving mail are the same, so you just lose the cost of being required to
send bounces after a message has been accepted.

Of course, most *recipient* E-mail systems avoid this already, by validating
the recipient at RCPT TO time and sending a 5xx response if the mailbox does
not exist or is otherwise unavailable. So the main cost here is at the
sending mail relays, who may be forced to try to return bounces to
(possibly) non-existent senders.

Eliminating that may reduce the cost of building outbound mail relays, a
bit.

Post by James Craig Burley
that frees up capital for other things, such as improved anti-UBE
measures, improved vetting of new users by ISPs to be sure they
haven't previously engaged in spamming, etc.

Ah, but if *I* save some money at my ISP, that doesn't free up money for
*other people's* ISPs to invest in controlling their users, unfortunately.

Post by James Craig Burley
Since my proposal obviates the need for bounces as does IM2000, joe
job are eliminated, and each host's capacity for handling incoming and
outgoing email can be speced based on *actual* expectations for
*legitimate* outoing email plus legitimate incoming email plus a
certain amount of incoming UBE.

Joe-jobs are a small subset of the problem, IMO. If that's your main
concern, then you can fix that using SES/SRS/BATV today. It would help if
there were a common agreed standardised format though, so that things like
mailing list software could recognise an encoded sender.

Post by James Craig Burley
(False positives aren't a problem with my proposal compared to SMTP,
since a message is, presumably, never fully accepted and *then*
discarded as spam, unless that's what a recipient *wants* to do. SMTP
servers accept and silently drop messages all the time these days.)

But with tons of spam hitting an inbox, how are you going to stop people
becoming overwhelmed with it - your problem (2) above? Surely people will
*demand* automatic filtering, and that in turn will involve accepting and
silently discarding mail as people do know? In my definition, automatically
receiving a mail into a 'spam' folder counts as 'silently discarding', since
people rarely look at such folders, and valid messages which end up there
are almost certainly lost anyway.

Regards,

Brian.

James Craig Burley

2005-05-09 08:56:04 UTC

Post by Brian Candler
If the majority of mail ends up in this state, then under your proposal the
sender of the mail will need to keep polling, just to prove that they are
'still interested'.

That I disagree with strongly. If I didn't care whether the recipient got
the response, I wouldn't have spent the time to compose it in the first
place.

Even a message like "thanks" or "ok" or "the dog needs a walk"?

Anyway, to the extent you care, *you* have to deal with it
accordingly. That is true regardless of whether you're sending via
SMTP, IM2000, or some other system. (I care enough about getting my
email to reach AOL users that I patched qmail-remote to handle that
sort of thing gracefully.)

Post by Brian Candler
Anyway, I think the point is this: if I send a mail, I *do* want it
delivered. It's up to the mail system to do whatever is necessary to get it
delivered, or let me know if it cannot.

No, it's up to *you* to do whatever is necessary to get it delivered,
because *you* have a much wider range of methods by which to transmit
your message to the *person* on the other side.

Email is just *one* way. So "the mail system" should make whatever
reasonable best effort it can, *and* give you reasonably honest
feedback as to its progress (including "I don't know, the recipient's
system doesn't seem to want to tell me", if that's appropriate), so
*you* can decide how to deal with it.

Of course, you can tell *your* end of the mail system that an outgoing
message is crucial, and, under my proposal, it could make that clear
to all the downstream entities. But they'd have more freedom and
flexibility to respond saying "um, no, it doesn't seem that
interesting to *us*" than under SMTP (which has little more than a
black-or-white ability to respond that way).

Post by Brian Candler
But I think the fundamental problem with your proposal - if I understand it
properly, which I may not as it seems to be quite complex - is that you
would like to see a new set of hoops to jump through to ensure successful
delivery of mail.

Absolutely the opposite: I propose to eliminate *all* built-in hoops,
and, where there's a necessity of choosing whom to burden, to place
the burden on the *sender*, giving the *recipient* the most freedoms
and the most control over the terms of any transaction (including
*willingly* making things easier for a particular sender, or all
senders, compared to other senders or other recipients).

What you call "hoops" are merely various measures recipients and
senders could employ on top of the base system in order to assure that
message transmission and delivery were successful, that the sender is
truly interested in sending *that* email (not just one out of a
bazillion), etc.

And, in fact, my system is actually quite *simple*: a sender sends a
message by transmitting, to a recipient's MX, a notification
consisting of an envelope recipient and the message contents.

If the sender wants to subsequently track the message, the sender
either provides its own tracking ID (probably a unique message ID) or
waits for a response (that it somehow requests in its transmission)
from the recipient containing such an ID or, potentially, a final
delivery status.

If the sender doesn't want to be any more complicated than that, it
doesn't have to. It can simply notice that its delivery might not
have been immediately accepted and report that to its user.

Or, it can use the agreed-upon tracking ID (sender's message ID or the
recipient's replacement ID) when it sends subsequent tracking requests
to the recipient's MX.

A recipient's life can be extremely simple as well: accept incoming
message and report success to the sender as soon as the message is
written to disk in a user's mailbox. That can all happen within a
single TCP session, involving just one handshake: accept message
transmission, respond with indication of success (responsibility for
message taken by recipient) or lack of success (responsibility not
taken).

Increasingly levels of complexity would be allowed by the
specification, but would be optional for both participants.

Post by Brian Candler
These hoops may involve being given a tracking number, and
having to poll status using that tracking number - potentially for a long
period of time after sending, if the recipient chooses not to confirm that
the message has been successfully delivered.

That's one of the hoops a *recipient* might choose to require the
sender to jump through. If the sender offers its own unique ID, and
the recipient doesn't insist on replacing that with its own, the
recipient can simply "promise" (a la an SMTP 2xy response to the DATA
phase) to take responsibility for the delivery.

I imagine most exchanges would work that way, because they'd involve
sender/recipient combinations that had been previously used, so each
would "whitelist" the other. (The recipient would either accept
responsibility for an incoming message from that sender right away, or
the sender would just assume the recipient would make every effort to
deliver the message and not bother sending lots of tracking requests,
or both.)

And since these conversations can naturally occur between MUAs (though
the recipient's MUA might have to be "reached" via an MX that knows
where to find it, assuming its online -- a sort of smarthost), these
whitelists and blacklists are much smaller and exist on a
per-recipient and per-sender basis -- directly managed by each, though
potentially "backed up" by the usual sorts of upstream lists that
exist today (ISP-wide lists, RBLs, DULs, and the like), for when the
local list provides insufficiently definitive or up-to-date
information.

Post by Brian Candler
Now, the trouble is, everyone will have to write their mail sending system
to be able to jump through those hoops, because when I press "send" on my
mail client, I expect the mail system to do the *utmost* to get it
delivered.

That's your problem right there: "The mail system" seems to include
both the *sender* and the *recipient* subsystems in your view. But
you can legitimately expect only *your* side of things to "do the
utmost" to get it delivered. So you can't burden "the mail system"
without burdening *others*, unless you magically exclude all senders
of UBE from "the mail system" in order to convince all potential
recipients of your email to voluntarily submit to a system that
restricts their use of it so it always deliveries your email.

But, if by "do the utmost" you mean you're willing for your MUA to
send (and potentially resend) your message to *all* listed MXes, and
track them fairly constantly, until it's sure the recipient has seen
the message, without having to worry about duplication of the message
in the recipient's in-box, you can do that with my system...but not
with SMTP or IM2000.

Yes, your MUA might "annoy" the admin of those MXes, but if you do
this only for messages you *really* care about, and if you're not
sending UBE, you won't likely show up on the admin's radar at all, and
if you do, you'll just incentivize her, once she confirms you're not
sending UE, to improve your "rating" so you get a more positive
response from her system more quickly, maybe the first time.

Post by Brian Candler
There's no point having another button which says "send but don't
try very hard". Why would I ever push that button when I could push "send"
to have a better chance of my message being delivered?

If the default is like SMTP, the default "Send" button would probably
do little more than transmit the message and then send one tracking
request about a minute later, another maybe 10 minutes later, a third
maybe 24 hours later, and a fourth about a week or two later, as
necessary.

That reflects the reasonable correspondence between likelihood of
successful delivery versus getting a bounce back. (Of course, there's
no reason this proposed system *can't* do something akin to a
"bounce", if the recipient system chooses to cooperate.)

So, that is, at *most*, four tracking requests per typical outgoing
email. Hardly a big load for a message *your* MUA already knows
about.

And assuming your IP address isn't blacklisted by the recipient, that
first or second tracking request should be enough to ensure that your
message, which probably hit the recipient's incoming mailbox when it
first arrived, doesn't get prioritized *downward* as likely UBE.

But if you're sending and tracking *many* outgoing emails to a
recipient's entire system (their MX), and that system notices that
you're not whitelisted on many of them (based on MUA activities, such
as content analysis, users hitting "this is spam" buttons, etc.),
they'll be able to deprioritize or even drop *all* your pending
messages and just ignore, or respond without committing to deliveries
to, further tracking requests from you (which you might have to keep
sending to not risk being treated as an uncaring sender in the future,
though doing so for a few messages out of many that you send would
probably not have that effect).

A la IM2000, until your sending MUA gets a confirmation of
responsibility being accepted (and assuming you don't otherwise
archive all your outgoing messages), *your* MUA bears the cost of
storing the message locally until the recipient agrees to relieve it
of that burden. (Remember, though it may have accepted the content
previously, it can always subsequently claim "dog ate my homework", in
essence, asking you to resend the entire message.)

*Unlike* IM2000, your *local* MUA does that, not some third-party
message store that, in reality, does nothing more than to conceptually
shift the anti-UBE filtering problem to a third party. (Submission to
message stores could be done via SMTP, and probably would be for
nearly 10 years, after widespread use of IM2000 by ISPs, as far as I
can tell, from the various discussions of how IM2000 would actually
work, in practice.)

So the burden is where it belongs: on you, as the sender, to get the
message through, via email or some other means.

Now, if your MUA doesn't get a sufficiently positive response to its
initial submission or to a subsequent tracking request -- say, the
recipient's system has strong policies regarding opacity of its users'
activities -- it *might* see something even more helpful: a *reply* to
your outgoing email, in the form of an incoming email.

That is, suppose your MUA sees an incoming email with the equivalent
of this in the headers (though my design would probably include the
ability to encode this sort of thing in the envelope, since it does
pertain to message transmission, not just content):

References: <***@fdsafdsa.fdsfdsa>

If it's smart enough to notice that your *outgoing* email just got
referenced via that ID, then it can decide to not bother tracking it
anymore! Sure, maybe no entity accepted "responsibility" in a strict
sense; but clearly it was seen and acted upon (by a user, by
list-distribution software, whatever), so what's the difference? Your
sending MUA can stop worrying about tracking it, and maybe just delete
your local copy of the outgoing message.

Now, suppose that 1m/10m/1d/7d tracking schedule doesn't work for you
for a *given* piece of email. Well then, adjust it for that
particular email, or just hit the "Track" button when you went to
throw in another tracking request. The recipient's system might not
respond, of course, but that's life.

Do you think you, as a sending user, would prefer being able to track
progress of delivery like this, via your MUA and a well-designed
tracking facility, or would you prefer today's bounce/DSN mechanism,
with all its warts?

(Speaking as an end user, I'd much prefer tracking. Even if bounces
and DSNs were simply notifications based on message IDs and thus fully
and cleanly implemented and cheap, I'd rather know my message was
*accepted* and even *read*, say, 10 minutes after I sent it, than to
wait for not getting a bounce. The closest thing to tracking would be
an across-the-board implementation of DSNs on successful delivery or
on end-user reading of email; in many cases, that's preferable to
tracking, since it's pro-active and thus potentially immediate, but
it's not *quite* as reliable, since it assumes DNS lookups work in the
reverse direction, and the Internet isn't The Phone Company.)

Post by Brian Candler
Now, because this new mail architecture provides more hoops to jump through
which are met more often, then people who write mail sending software
(including spammers) will be forced to make their code jump through those
hoops.

They won't be *forced* to do anything. A sender *can* do nothing more
than throw a submission transmission at a recipient. It needn't wait
for any response whatsoever, or send any tracking messages.

Why might it do this? Well, if it knows, from experience, that the
recipient's system has already whitelisted it and is reliable, why
should it bother doing anything else?

(In particular, embedded controllers would *love* this new protocol.
They'd be designed to assume they're always "whitelisted", and
wouldn't have to implement anything as complicated as SMTP. Any new
email protocol that wants to be adopted widely will, IMO, have to be
easily and naturally implementable in embedded controllers and
processors.)

So, the first three or four times a sender's MUA transmits a message
to a recipient's MTA/MUA ("MA"), it might get less-definitive
responses than desired for awhile, until both sender and recipient
MA's learn, via their owners' actions, that the email relationship is
"confirmed".

After that, both the sender and recipient MA's trust more in each
other. The recipient is more likely to quickly accept responsibility
(a la a 2xy SMTP response) to any email sent by that sender; the
sender is less likely to send frequent tracking requests to the
recipient MA until it reaches the desired level of assurance.

(Since exchanges are MA<->MA without needing large, trusted, third
parties such as smarthosts, relays, and mail stores, the problem of
"mixed blessings" -- opaque sources of what appears to be a mix of
legit email and UBE -- is potentially greatly reduced. That is, a
given MUA acting on behalf of a particular person is unlikely to be
considered a "mixed blessing" by a given recipient to which that
person sends email! The problem of another person behind that same IP
address sending forged email remains, but that becomes more of a
*local* problem that an admin, if not the two mutually trusting users
themselves, can address, while the problem of another person at
another IP address is often detectable as potential forgery by the
recipient MUA, which can easily notice this, if it isn't more
definitively handled by something like SPF or SES.)

Spammers are left much more in limbo in this situation. They aren't
immediately shut out, except by recipient MA's who choose to shut them
out (and thus expose that they are doing so), and they might never see
any definitive indication any of their messages have been *rejected*.

But their messages also won't tend to be widely *accepted* enough for
them to ever acquire "whitelist" status on more than a small % of
their recipients' MAs.

So, spammers will have to find other ways to convince those MAs that
each *new* message they send is even *more* important than the
previous one.

But, each of those MAs, having already heard from spammers with
consistent identification (incoming IP address, for example), will be
even *more* suspicious of such payloads.

(And each recipient's MA will be more "tailored" to that user's needs,
meaning the upstream ISP that acts on behalf of many users will be
less likely to employ some draconian anti-UBE filter that a spammer
can reverse-engineer and then target his spam to "get around".)

Spammers will naturally keep trying to gin up new identities, but the
new system is more easily able to "punish" that, without necessarily
triggering a lot of false positives, by expecting any *legitimate*
messages that come from unknown sources to be reasonably and
frequently tracked by their sender.

Reasonably tracking outgoing messages will end up being an effort
senders of *legitimate* email can easily do, when it comes to their
sending messages to "newish" recipients. (The key insight here is
that *true* creation is a rare event, compared to distributing
advertisements, aka UBE, or to self-replication, aka viruses and
worms. Senders of legit email are more in the camp of "true
creators", so they'll naturally have plenty of resources and desire to
publish their rare creations. If www.timecube.com doesn't prove that,
*nothing* will. Uniqueness and rarity do not, by themselves, imply
desireability; they do discourage involvement in the UBE business,
however. ;-)

Spammers won't be able to afford track their millions or billions of
outgoing email so easily, and they'll have less and less success at
it, over time, as recipient MA's become more sophisticated (based on
learning where, on the Internet, spammers tend to congregate, what
messages they tend to send, and so on).

On the other hand, there are those recipients who are *required* to
accept all incoming email *content*, for various reasons (legal and
otherwise).

My proposal makes that more efficient and easier to deal with, because
accepting *content* needn't imply accepting *responsibility*, and it's
unlikely there'd be legal requirements that recipients accept
*responsibility* immediately, as long as *content* was accepted and
reasonable best efforts made to deliver it downstream.

IM2000 makes life harder for recipients under such constraints, and in
fact they might legally be *unable* to accept *any* incoming IM2000
message notifications, because, upon acceptance, their potential
inability to retrieve actual message *contents* (due to DNS or
mailstore outage) could be viewed as a serious failure on their end.

(I don't agree with those requirements, but certainly it's clear that
IM2000 is a less reliable mechanism for receiving, as well as sending,
email, since it assumes a reliable and trusted third party -- a
mailstore -- will be available 24x7.)

Post by Brian Candler
That is, greylisting will become *less* effective, because the new
mail system puts a stronger requirement on senders to retry when the mail is
not completely delivered, and so spammers will do the same.

No, "the new mail system" puts no such requirement on senders. It
merely allows *recipients* to raise, or lower, the bar as they see
fit, on more of an ad-hoc, per-message, *per-recipient* basis.
("Per-recipient" probably only with a new protocol or SMTP++, given
the RCPT TO problem.)

"The new mail system" is simply more *flexible* than the current, yet
subsumes all of the current systems' pertinent capabilities. Of
course, senders will tend to *want* to send tracking requests on some
sort of reasonable schedule; recipients will tend to *want* to take
advantage of that tendency to help them decide whether incoming email
is legit or UBE when it is otherwise difficult to characterize, unless
senders of UBE always send tracking requests in just the same way (in
which case they're being "punished" for sending in Bulk, even if the
*content* is all the same, something IM2000 doesn't punish them for
nearly so well).

As to making greylisting "less effective": greylisting is a technique
that is highly specific not only to SMTP as a protocol, but to its
particular universe of implementations (including spamware). It'd be
irrelevant, as a specific technique, under the new protocol.

The main thing people don't like about greylisting is that it delays
legitimate (but not whitelisted) email delivery. Ecrulisting "solves"
that problem by delivering "rejected" email right away, though
downstream entities should be able to feed back instructions to the
MTA regarding how to handle subsequent delivery, assuming spammers
increasingly retry deliveries in such cases; SMTP++ and/or my proposal
would address that problem as well.

(Immediate delivery of such email might not seem to make sense, but,
at the level greylisting is deployed, it often can. In many cases, it
is merely a tactic used by an SMTP server to try to avoid depositing
so much spam in a POP3/IMAP/mbox/maildir mailbox for a user. If the
user isn't soon going to read her mailbox, the delay isn't a problem,
so whether greylisting delays it, or ecrulisting sorta-delivers it and
then yanks it back out again, doesn't really matter. If the user is
"live", then ecrulisting lets the user read the email right away,
unless the user is, for the moment, choosing not to read such email.)

Post by Brian Candler
but spammers can easily match the raised bar too.

Right, just as running their own message stores is easily added. As
long as you're going to play *that* game, why bother changing SMTP at
all? Spammers *already* get around SPF, RBLs, DULs, and so on, just
by upping the ante. Any tactic to make things *more* expensive for
spammers is, apparently, going to make it more expensive for everyone
else as well.

The question would be, can spammers generally afford to track the
millions of messages they send every day, just to make them appear as
important as typical legit email? Even if so, their joe jobs aren't
causing bounces to innocent third parties; recipients have other ways
to notice and filter their spam; and they have to be "present" on the
Internet for longer periods of time than the usual SMTP hit-and-run
approach requires of them.

So *my* approach is to make *ordinary* exchange of emails as
*inexpensive* as possible, favoring recipients over senders where
there's any conflict in the design, so resources are freed up by
*legitimate* exchanges in order to better expend them defending
against *illegitimate* ones.

Post by Brian Candler
If this modified SMTP doesn't provide any better anti-forgery, or any better
tracking of spam to original source, then it's hard to see what benefits it
offers.

Well, as a new protocol, it'd be trivial to provide "better tracking
of spam to original source", since problems like parsing "Received:"
headers would be solved "out of the box".

Meanwhile, "anti-forgery" is orthagonal to stopping UBE. It's also
orthagonal to exchanging messages. Your own posts have done a great
job of explaining how and why; you seem to be forgetting (or ignoring)
the implications!

Accordingly, under my proposal, pretty much all of the present
anti-forgery systems being discussed would have similar applicability,
and similar pitfalls. (Some pitfalls, like having to encode envelope
senders or recipients a la SES or BATV, would be eliminated, of course
-- that's the same sort of advantage *any* new protocol, including
IM2000, would offer.)

One big upside is that any anti-forgery checking could be delayed
until well after message *contents* are received, so such checks could
affect the decision as to whether to subsequently accept
*responsibility*, or perhaps just quietly drop the email before the
end user ever sees it. Ideally, that's done after (cheap) content
analysis decides it can't assume the message is UBE, so the burden
placed on third parties to authenticate the user is kept to a minimum;
after that, more-expensive content analysis could then be done (this
includes end-user reading of the message, of course).

(And, with a properly designed protocol, which I have increasingly in
mind these days, anti-forgery checking could be done *without*
receiving message contents -- just the envelope, as defined under the
new system. Some anti-forgery checking could even be requested of the
*sender* by the *recipient*, though that makes certain tradeoffs.
Generally, a recipient is going to be choosing between incoming
bandwidth use and increased latencies in almost any transaction
anyway. Since recipients have a wide range of needs and potential
bottlenecks, and these are variable even for a given recipient in any
particular situation, my protocol would try to offer maximum
flexibility here.)

Post by Brian Candler
And if it's SMTP just with added complexity, then the complexity
itself will become a problem as there are more corner-cases which may cause
failures; that is, if there are 10 different responses to receiving a
message which the recipient may give, and one of those is very rarely used,
then you may well find it doesn't interoperate very well.

Yes, that's a risk. It's a much bigger risk with IM2000, of course,
where not only does there exist *coding* risk, there exists *dynamic*
risk (an email needs at least *three* distinct parties to be up and
running to be successfully exchanged).

Post by Brian Candler
Even SMTP which
just has 2xx (OK), 5xx (fail) and 4xx (tempfail) has this type of problem;
you will find mail systems which do stupid things if you give a 4xx response
to a EHLO or a MAIL FROM, so in the end to maximise interoperability you
have to accept EHLO and MAIL FROM, and give 4xx/5xx responses to each RCPT
TO recipient (even if what you were trying to do was to reject the MAIL FROM
sender)

Right. That (partially) speaks to the problems of having inadequate
specs, ancient code, etc.

One way in which my system improves on this is that it places *no*
requirements on recipients in most cases. (Maybe not at all, though
one might assume that if a recipient responds to a tracking message by
saying "recipient has actually read and understands message", that
would indeed be the case.)

And since anti-UBE measures would be uppermost in implementor's
thoughts, they'd likely take full advantage of making *recipient*
software resistant to UBE and abuse in various forms.

Accordingly, it'd be much harder for buggy *sender* software to be
successfully deployed in the wild. Its inability to cope with a
certain kind of response would be more quickly found during testing,
since recipient MAs, compared to yesteryear's SMTP servers, would
present a fairly wide range of responses.

And I've already got some tricks up my sleeve, in terms of making the
protocol and its implementations inherently resistant to certain
common sorts of bugs, though those aren't pertinent in this discussion
(which is really more about whether IM2000 is a worthwhile direction
in which to head, versus something like the direction I'm *proposing*,
but not yet 100% convinced is preferable or that, if it is, obviates
the need for IM2000).

My approach definitely turns the "be liberal in what you accept,
conservative in what you produce" koan on its head, insofar as "what
you accept" becomes, in light of the end-to-end principle, a matter of
accepting that you, as *sender*, might never know for sure whether
your outgoing email has been truly received and read, and "what you
produce" pertains to the degree the sender produces only legitimate,
i.e. desired, emails and a reasonable pattern of tracking requests.

(In other words, the burden of the koan is entirely on the *sender*.
The recipient is under few, if any, requirements in terms of what to
accept or what to produce.)

But (1) is often used as a major argument against UBE, and especially
in favor of deploying anti-UBE measures that result in lots of false
positives (or at least more-risky paths for email to take).

In particular, if (1) isn't a problem, then IM2000 isn't a solution,
because IM2000 doesn't inherently help with (2) anymore than does SMTP
or anything else I can think of.

(Well, I suppose it does if you assume you can reliably detect
forgeries, and always assume forgeries are UBE, *and* assume anything
*else* is *not* UBE. That makes whether a message is forged pretty
much the sole determining factor for whether it's spam! The
phenomenon of 0wned machines argues otherwise.)

Post by Brian Candler
SMTP is already cheap to deliver to, and already people have been adding
various hoops to jump through (e.g. validating EHLO domain, validating MAIL
FROM) - some of which are arbitary, since the EHLO domain should not need
"validating" in the first place, and indeed the RFCs tell you that you must
not. Your proposal needs either to be able to convince people to drop their
arbitary hoops, or give them a new set of standardised hoops to jump through
- hoops which minimise the risk of rejecting legitimate mail, which
unfortunately current ones often do.

Right. Though, whatever arbitrary hoops people employ now would work
just fine, where pertinent, under my all-new proposal, and more of
them would be pertinent under an SMTP++ variant of it (and especially
using ecrulisting).

(By "pertinent" I mean that validating EHLO domain would cease to make
any sense in a protocol that was HELO/EHLO-free. Rejecting
presumptuous, or "early talker", clients would cease to make sense in
a protocol that required no handshaking. Using multi-line greetings
to foil stupid spam software would cease to make sense in a protocol
that had no such thing as a greeting. Etc.)

Post by Brian Candler
I see where you're coming from about leaving policy up to individual sites,
but unfortunately that leaves those sites free to choose very bad policies.

Which leaves us no worse off than today. ;-/

I get the impression you don't yet have a *coherent* view of what an
email *system* should do.

That is, when you talk about your priorities as a sender, they seem to
be in serious conflict with what you'd previously been saying (or
implying) about the priorities a recipient (such as yourself) might
have.

This is not unusual. Most everyone tends to want *any* system to
favor *their* immediate needs, and, by dumping their "requirements"
onto that system, they are often able to be blissfully unaware or
unconcerned that they are imposing conflicting requirements on the
system.

(E.g. "I want to feel safe, so if someone seems scary to me, I want to
be able to call the police and have them investigated. I want to feel
safe and free, so the police must never investigate *me* just because
someone else says I seem scary to *them*." The only way "the police"
can certainly meet both requirements is to cater to only a single
individual in their entire jurisdiction. Different societies resolve
the conflict in different ways, but there is *always* going to be an
external conflict like this for people who hold such internally
conflicting views.)

The essence of my proposal derives from recognizing that what you or I
might call "very bad policies" is, nevertheless, the *right* of sites
to implement.

However, it also derives from recognizing that the *worst* policies
involve burdening third parties (especially those not explicitly
willing to so participate) with "helping" a recipient decide whether
incoming email is legitimate.

That is, whether a recipient's policy is "very bad" may be subjective,
but I think we can all agree that policies that inflict collateral
damage (transmissions or blacklisting that might negatively affect
otherwise-uninvolved parties) are *objectively* "very bad", insofar
as, in their presence, the entire Internet becomes less useful.

That metric rules out Challenge/Response, callback implementations of
all sorts, and, generally, bounces/DSNs (in the presence of potential
joe jobs, aka forgeries), at least in their present forms in the SMTP
world.

It also appears to rule out IM2000, or at least a whole class of
potential designs for it.

(To avoid this problem in IM2000, only mail stores would be allowed to
send message notifications, and a notification would be required to
point directly back to the same mail store -- not at some other store
-- for retrieval to work at all. That not only implies that mail
stores *can* be identified in notifications by IP address; it
basically *requires* such identification, rather than use of domain
names, along with the notion that stores are long-lived at a given IP
address, or long-enough lived so that a response to a notification can
*assume* the destination of the response is the IP address from which
the notification originally eminated. In essence, an IM2000 message
notification might as well be in the form of a TCP connection to a
recipient's agent via a protocol that allows that recipient to say "go
ahead, send the message" or "I'll fetch it later by initiating a TCP
connection to this same IP address". The former is basically SMTP;
the latter would be new in IM2000; neither includes the sort of
inherent anti-forgery capability so many, such as yourself, expect
from IM2000.)

Once you accept that recipients should be, by *design*, prevented from
implementing policies that inflict collateral damage on innocents,
you've pretty much exhausted your ability to create a design that also
practically prevents a recipient from making *other* choices,
regarding filtering and prioritization, to which you object.

And since recipients ultimately are the ruling class, in that they'll
drive the decision as to whether to move away from SMTP to some new
system *and* what that system will be, you'll find that, to convince
recipients to choose a system that *inherently* rules out choosing
policies that inflict collateral damage on the Internet, it's best to
give them *maximum* freedom to do whatever else they want, regardless
of whether that includes choosing what you describe as "very bad
policies".

(In general, it's inconceivable to me that anyone will be able to
deploy a widely-used mail system that keeps recipients from filtering
or prioritizing based on RBLs, DULs, and the like. But I know plenty
of intelligent people who would call these "very bad policies"; I'm
not crazy about blocking based on DULs, since I'm one myself!)

Post by Brian Candler
The trouble is that the mail system purports to give some information about
a message (the sender/origin), when in fact that is NOT information, it's
mere heresay.

Ah, here's a wonderful attribute of my new proposal: there is *no*
requirement that a sender provide any equivalent to "MAIL FROM:".

So, right out of the box, recipients don't have any envelope sender
address to assume they should, or have to, look up. Anonymous emails
are *inherently* supported by it, and recipients can easily choose to
accept or reject them.

Obviously, if a sender wants the equivalent of bounces/DSNs sent to an
email address, it could provide such an address under the new
protocol. It'd be up to the recipient to decide whether to pay any
attention to that, and the recipient could certainly choose to send
*all* tracking responses to that address, including one that means
"could you verify whether you sent msg #dsahj43fds89?". Instead of an
email address, a URL for some other notification mechanism could be
provided by the sender.

The key thing is to *allow* a sender to identify herself via an
envelope sender address without necessarily requesting bounces/DSNs
(or any other change to *exchange* protocols for the message), so a
recipient MA can *choose* to validate that address on behalf of its
user. (As you noted, this address might have no relationship to
whatever is represented, in the content of the message itself, as the
"sender" or "source" of the message. The MA and user have to be smart
enough to deal with that.)

(Conversely, a sender should be able to request the equivalent of
bounces/DSNs via some channel *other* than email. That's not
particularly pertinent to this discussion, except it's consistent with
my view that any *new* email system must be *extremely* flexible and
efficient from the get-go, or it won't be adopted to any useful
degree. Most people contemplating designing such things seem doomed
to adopt a "straitjacket" mentality, as in "recipients must always
..." and "senders cannot ...". I don't think that works or is
necessary anymore, even assuming it did/was during the days when RFCs
821 and 822 were written. My proposal is mainly about specifying a
*language* for communicating about exchanging emails, so any
straitjacketing and other restrictions are typically *communicated*
about *within* that language and acted upon by the actors who are
doing the communicating.)

Post by Brian Candler
So more important to me is that any "information" which a receiving site
might use to validate or reject a message, is correct.

That problem cannot be solved in general, as you've pointed out,
without a human being who receives the message being involved (and
smart enough to know to, and how to, validate and authenticate a
message).

Not all messages require such authentication; therefore, the email
exchange system shouldn't require it. Any email that *does* require
such authentication on all incoming messages is, IMO, doomed from the
start as a replacement for SMTP; it'll either be horribly inefficient
soon after it sufficiently encompasses the Internet to actually work,
or its anti-forgery mechanisms will quickly be defeated by spammers,
leaving us with a less-efficient email exchange mechanism than SMTP
and, yet, receiving tons of spam anyway.

Post by Brian Candler
Now, even if your proposal removes the idea of a sender address and bounces
altogether, what you have instead is that the whole message headers and body
may be received before you analyse whether to accept or reject the message.

Right. Or, at least, *some* of the message. A recipient's server
could simply decide to close the incoming connection after seeing the
first 500 bytes of a message, for example.

Such a decision can be practically made within the scope of IM2000,
and made even more practical by allowing a recipient to request
message contents starting at a given byte (one beyond the byte
previously read and presumably saved away somewhere), to save
bandwidth.

SMTP servers can do the same thing, but that prevents them from
providing any definitive response to the upstream client, which will
normally retry the delivery later on. If the server is going to
ecrulist or greylist the message anyway, then that's okay, but it'll
still have to allow for the entire message contents to be redelivered
if that's what it wants to allow later on.

(If it doesn't, it probably has to wait to see enough of the message
to be sure it's the same one before it disconnects again...and go
through this again and again...until the sender gives up or it decides
to wait for the entire contents and return a 5xy code.)

But the RFCs discourage the idea that SMTP servers have complete
freedom with regard to how to treat the incoming connection, so
there'd be all sorts of heated arguments back and forth about whether
servers that "prematurely" closed an incoming connection after
receiving enough of a message to make a decision to ecrulist/greylist
it (such as the "Message-ID:" header or "Received:" headers) were
"conformant".

Under my design, recipients would be explicitly allowed to close an
incoming connection at any time, without having to provide any
explanation or response. It allows tracking requests to be sent and
replied to *after*, and out of band with respect to, the transmission
of message contents. (IM2000 is similar helpful in giving recipients
more flexibility. A request for the contents of a message is distinct
from, and thus out of band with respect to, anything else a recipient
agent wants to do with or request concerning the message. So it can
simply close a TCP connection it opens to read a message once it sees
enough of a message to make that decision.)

Post by Brian Candler
That means, for example, that people will do the same sort of stupid
filtering on the From: header that they used to do on the MAIL FROM
envelope. And of course, forged From: headers have another particular
problem with "phishing"-type attacks.

Under my *new* proposal, message content is completely unspecified by
the transport protocol. That makes it clear to those authoring
recipient-side software that they can't assume "From:" in the content
means *anything*, if it's even present.

Of course, content can be inferred anyway, based on
upstream/downstream knowledge, and inspected by the recipient's MA
during the transmission to make transmission-time decisions, if
desired.

Post by Brian Candler
So I think it's essential that any proposal include ways to inherently
validate this information.

I think exactly the *opposite*: that any proposal for exchanging
*email* must *not* assume that validating or authenticating sender
identity is inherent for all email exchange.

However, we might be in agreement, if by "include ways to inherently"
you mean "offer standardized methods for recipients to optionally
attempt to".

Post by Brian Candler
IM2000 does because you *have* to use the sender
identity to collect the mail; if it's forged, mail collection doesn't work.

We've covered that; that's some combination of false and impractical.
(If the implementation doesn't inherently prevent delivery of forged
email, then it imposes a potentially-unacceptable degree of collateral
damage on third, or joe-jobbed, parties, without senders of UBE
themselves being necessarily prevented from getting their email read
during such an attack.)

Post by Brian Candler
Something like DomainKeys on top of SMTP tries to do the same, although it's
heavyweight and take-up of anything relying on public-key cryptography has
historically been poor.

So, same problem. New protocols should at least avoid making
something like DK any *harder* than it is on top of SMTP, however.

Post by James Craig Burley
As the expense of ordinary exchange of email drops across the board,

would it drop very much, compared to SMTP? The bandwidth and disk space for
receiving mail are the same, so you just lose the cost of being required to
send bounces after a message has been accepted.

The main thing is that email becomes more feasible on a point-to-point
basis, so use of third-party relays (including smarthosts,
i.e. senders' ISP's relays) is less necessary.

Further, such relays needn't actually *store* email they relay, if
they are reasonably certain, for a given message, that either the
recipient, having seen it once, will reliably not lose it, or the
sender, not being a mobile laptop, will be online and thus able to
resend it if it *is* actually lost.

A "smarthost" in such a system could in fact be, for most messages,
quite "dumb", simply forwarding all incoming transmissions on the fly,
and doing so for all tracking requests, when recipients are able to
directly respond to the upstream senders.

Most small businesses, even those with mobile users, would probably
use this sort of setup, where the main job of a "smarthost", or
Internet-facing server (the MX), would do little more than know where
a given user is currently at, IP-wise, and forward incoming
transmissions pertaining to that user directly to that user's MUA
(which is listening on a particular port). (It would presumably need
a local DB to associate IDs with envelope recipients, unless the
protocol allowed it to tell senders to always "remind" it who the
envelope recipient for each message was -- which it probably should.)

Providing NAT-style opacity is a step up from this simplicity; here,
recipients respond not directly to the sender, but to the smarthost,
which forwards such responses back to the sender.

But I believe that, in general, moving *away* from requiring
smarthosts for most message exchange is best. It leaves whitelisting,
blacklisting, content filtering, and most related decisions directly
in the "hands" of the recipient's MUA (and the recipient herself),
plus, to a greater degree than at present, leaves some control in the
hands of the sender's MUA (and the sender himself).

Still, "large" ISPs, like AOL and Comcast, will undoubtedly stick to
the "huge smarthost" model for quite some time, and my proposal is
designed to work well in that sort of environment as well.

Post by Brian Candler
Of course, most *recipient* E-mail systems avoid this already, by validating
the recipient at RCPT TO time and sending a 5xx response if the mailbox does
not exist or is otherwise unavailable. So the main cost here is at the
sending mail relays, who may be forced to try to return bounces to
(possibly) non-existent senders.

Yes, but the overall cost to the entire infrastructure of supporting
bounces is still way too high, because plenty of sites are *not* able
to avoid sending joe-job bounces all over the place.

Post by Brian Candler
Eliminating that may reduce the cost of building outbound mail relays, a
bit.

Some sites have been *drowned* in bounce handling, especially in ways
that a system like mine would not have burdened nearly so much.

I do agree that, on the whole, there appears to be an across-the-board
effort to reduce generation of bounces.

But that effort will run up against some pretty hard boundaries down
the road, because of the inflexibility of SMTP.

(Think of my proposal as SMTP with more flexibility and efficiency.
That's basically what it is: pretty much anything SMTP can do, my
system can do at least as reliably and efficiently, if not moreso.
That's a design goal. And while recipients generally control the
degree to which SMTP-like behavior is implemented by them, senders
have some say in the desireability of that, e.g. whether to provide
the equivalent of an envelope sender address and/or to request
bounces/DSNs.)

Ah, but if *I* save some money at my ISP, that doesn't free up money for
*other people's* ISPs to invest in controlling their users, unfortunately.

No, but it still frees up money to more precisely determine *which*
users at those ISPs you might decide to trust and thus accept email
from more quickly.

Whether you think that's significant, I don't know, but, previously,
you were saying you had concerns about the costs to recipient ISPs
regarding various approaches to handling email.

With IM2000, I have *major* concerns about the costs to *sender* ISPs
(well, mailstores generally), which apparently have to hold onto
outgoing messages for, potentially, weeks, months, or even *years*,
even if recipients have already read and acted on them.

(They might not tell their MUA to "unpin" them. Probably 50% or more
of the IM2000 mail-reading audience would simply never click "unpin".
It'd either be a default -- meaning UBE would be quickly unpinned as
well -- or mailstores would have to have huge storage capacities.)

And with IM2000, it isn't just the cost of storage, it's the cost of
24x7 availability of and near-zero latency to access, that storage
that worries me.

I've tried to find a better way, and I believe leaving *original*
sender MUAs -- not third-party mailstores -- responsible for holding
onto message contents until recipients explicitly accept them is best,
since *most* senders probably want to archive their outgoing email
anyway.

And by assuming that contents are usually sent along with
notifications, so *only* responsibility is typically not immediately
transferred, sender MUAs aren't expected to be available 24x7 nor to
be near-zero latency to access.

But sender MUAs are more directly burdened by this model than under
SMTP (though senders, via mailstores, don't have to have DNS pointing
back at them to send email). Recipient MAs are *less* directly
burdened, in terms of storage and response-time requirements.

I think that strikes a reasonable and simple middle ground between
SMTP and IM2000.

To convince me that IM2000 is still the best bet, you won't get very
far by complaining about how much *more* expensive my system will be
for senders *or* recipients to support, since IM2000 is even *more*
expensive.

However, you might be right that, when taking into account the
deployment costs, the economics for sending legit email, and the
economics for sending UBE, my proposal isn't any better than SMTP, and
yet IM2000 somehow is (because IM2000 will inherently resist UBE much
better than my proposal).

I'm less and less convinced that's the case, the more I think about
it, discuss it, and see others (even IM2000 proponents) discuss it.

But I'm always on the lookout for that one "gotcha!" that might
afflict my proposal, because I sure don't want to waste any more time
designing or implementing it, or even thinking about it, if there's a
"gotcha!" down that path somewhere!

Joe-jobs are a small subset of the problem, IMO. If that's your main
concern, then you can fix that using SES/SRS/BATV today.

Of those, only BATV promises an improvement without adding a whole new
external infrastructure (in DNS, mainly), and it only helps partially.
All it helps with is avoiding the extra energy to *store* an incoming
bounce; it doesn't help avoid it being *sent* in the first place, at
least as a message notification.

The others require a whole new infrastructure, and they don't solve
the problem of errant bounces being received from sites that don't
implement them.

And *none* of the proposals (except possibly IM2000) solve the problem
of *genuine* downstream failures resulting in:

- expensive (but inconsistently implemented) bounces being delivered

- bounces being lost because of problems sending them back

- senders not being sure messages were received until they can
reasonably conclude they'll no longer receive a bounce (maybe as
much as two or more *weeks*)

So, not only are joe-jobs not my main concern, SES/SRS/BATV don't
really promise to "fix" them until there's an across-the-board
implementation (well, of SES/SRS anyway, at which point BATV is
probably not strictly needed), and the general problem of bounces
being not what people want these days *anyway* remains.

I do think that if we could be sure we would someday get back to the
situation where over 99% of all *sent* email was actually *desired* by
each recipient, bounces would return to their status of being a
reasonable substitute for tracking outgoing email.

I'm not sure there's any way to get from here to there, other than to
assume email will cease to be widely used, and return to being used by
a smallish group of cooperative, like-minded users, as it was in the
1970s and 1980s. (Hey, maybe that'll describe our planet's entire
population someday...? ;-)

Post by Brian Candler
It would help if
there were a common agreed standardised format though, so that things like
mailing list software could recognise an encoded sender.

Well, there's an inherent set of improvements we would all
theoretically realize by designing a new exchange protocol from
scratch, one that avoids only the agreed-upon bodges in SMTP (RFCs
2821, 2822, etc.), but otherwise introduces, or at leasts requires, no
"new" twists, a la IM2000 or my proposal.

But that costs so much $$ to do, the question is, would the
improvements be worthwhile.

So, my (or any) proposal must be that much *better* than a
theoretically "clean" replacement for SMTP, in order to make it worth
rolling out. That theoretical replacement is something I keep in mind
as my "baseline" for analyzing IM2000 and for architecting and
designing my own protocol(s).

If that baseline can't be substantially improved upon, we might as
well just re-do what we already understand, but do it *right* this
time -- that is, implement the baseline! (Okay, technically, today's
SMTP is the baseline, but since deployment costs are now so huge
thanks to the popularity of SMTP impementations worldwide, it is
easier for me to think in terms of a *fresh* implementation that
replaces SMTP as a baseline for any other protocol that also would
require a worldwide deployment.)

I *think* my proposal is sufficiently close to the baseline to be
reasonably sure it won't be a failure, and, to the extent it differs
from SMTP in conception, it's a big-enough improvement to *possibly*
justify rollout.

Even if not -- and it probably *is* not enough of an improvement -- it
might be worth implementing as a sort of under-the-radar, low-overhead
way for sysadmins and other "gurus" to converse.

In that case, I might as well just design and implement the full-blown
network OS, or GUIXP as I called it earlier, since mere email is not
*that* useful for such communication, as it's too limited, and SMTP
works pretty well for sysadmin<->sysadmin conversation,
notwithstanding the nonparticipation of people like Knuth and JdBP
(who, IMO, *could* just run SMTP servers on unique port numbers and
publish those suitably, so the rest of us can email them with their
being flooded with spam, but hey ;-).

Problem (2) isn't, ultimately, a mail *transport* problem. It's a
problem for the recipient to filter and prioritize her incoming email,
which requires having all sorts of (pertinent) criteria upon which to
do so.

Realize that I'm not advocating tracking as, first and foremost, a
*prioritization* criterion. Its main purpose is to provide what's
functionally *missing* in SMTP -- tracking -- and also *flawed* about
SMTP (bounces, and inflexibilities and inefficiencies in the protocol
itself).

Generally, tracking of some sort is a reasonable way to realize the
end-to-end principle in email exchange, and once one starts designing
a protocol assuming that principle "rules", all sorts of wonderful
things (mainly, simplifications) happen, as I'm discovering as I
design this new protocol.

So, once you *assume* tracking as a basic property of a mail-exchange
protocol, then it so happens you have yet another criterion for
recipients to use to filter and prioritize their incoming email, since
senders of UBE will *tend* to not want to track their bazillions of
pieces of outgoing email the same way senders of legit email will.

Of course, with tracking instead of bounces, the costs of people
"accepting and silently discarding mail", as you acknowledge they do
now (and I assume they'll *always* do), are much lower: no *actual*
false positives ever occur, so there's no need to to generate bounces
to avoid them. This is one of those "wonderful things" that results
from obeying the end-to-end principle in this protocol.

(By "actual false positives", I mean cases where an intermediate
agent, such as an MTA or MUA, has decided to silently drop an email
that it, or some other MTA/MUA speaking on its behalf, promised the
sender it would deliver. SMTP puts a lot of pressure on an SMTP
server to make such a promise -- the costs of *not* doing so are high
for many reasons -- so they often make that promise. Then, if a
downstream filter decides to not deliver the message, or just dumps it
into a rarely-if-ever-read "spam folder", the sender has no clue the
message has been effectively but falsely discarded, unless a bounce is
sent. At *that* point, anti-forgery mechanisms could be used to
determine whether to actually send such a bounce, if they consistently
work that much later than the original delivery to the first SMTP
server -- but many proposed mechanisms don't. Besides, failure to
authenticate a source is *not* a reliable indicator of a forged email
-- it might be due to DNS problems, for example -- which means a false
positive designating *content* as Unsolicited could be followed by a
false positive designating *sender identity* as potentially forged,
leading to a actual false positive overall: legit email dropped with
no bounce saying so. A "nonactual false positive" is where a legit
email is dropped, but without the sender ever being promised it
wouldn't be dropped.)

You see, in a world where over 99% of outgoing emails are reasonably
expected to be desired and read by recipients, yet where
interconnection is neither 100% available nor nearly zero-latency, a
system like SMTP makes a lot of sense, because messages might need a
series of "hops" to get from point A to point B, but once it reaches
point B and gets into a mailbox, the end user will likely want to read
it.

So, there's no real need to track outgoing messages, since the vast
majority of them will be desired and will reach their destination,
thanks to the best efforts of cooperating SMTP relays. All the sender
needs to know is that the message was successfully injected into the
first component (SMTP relay, or local queue that reliably talks to
downstream relays).

What a sender wants, in that world, is a notification when there's an
actual *failure*, ideally including the entire original message, in
case the sender didn't have enough storage to keep it around. (And
why should the sender keep it around, since the recipient will likely
want to read it, so each relay will do its best to assure it is
successfully transported even in the presence of a system crash?)
Hence the bounce/DSN concept and the assumption that responsibility
for message delivery immediately precedes and usually follows
transmission of message content.

That was email circa the 1970s and 1980s.

We now live in another kind of world, where a very small % of outgoing
emails are reasonably expected to be desired and read by recipients,
thanks to the huge amount of UBE being sent.

IM2000 recognizes that reality and tilts the economics against the
senders (of lots of *distinct* messages, anyway), but it assumes
interconnection is nearly 100% available and zero-latency in order to
make recipients happy with the responsiveness of their MUAs when
reading email.

I believe IM2000's assumptions about interconnection are false, and
will be for decades, so I suggest we stick with SMTP's push-oriented,
relaying concept.

But, in light of the fact that most messages aren't *wanted*, we shift
the burden of "discovering" whether a message is wanted -- from the
*recipient* (who presently bears that burden in the form of sending
bounces or, if quickly able to do so, sending 5xy responses during
SMTP conversations) to the *sender* (who would, under my proposal,
discover whether messages are really being accepted downstream by
occasionally sending tracking requests).

To the extent recipient systems are burdened with sending bounces
*today*, they *should* be less burdened receiving tracking requests
under my proposal. Add to that the improvements of quality of service
-- no actual false positives and no misdirected bounces causing
collateral damage -- and it seems like a potential winner.

(You might be right to say that bounces are less of a problem now.
Well, how is that possible? Either SMTP servers are more quickly
rejecting unwanted email at the door or they are accepting
responsibility for messages that are subsequently dropped. Under my
proposal, a recipient MA can also reject unwanted email at the door,
either during the initial TCP session or in response to the first
tracking request that arrives, usually within a minute or so, and with
less worry about duplicate deliveries a la SMTP's DATA phase
response-time issue. And if it does accept a message *without*
accepting responsibility, something SMTP doesn't really do right now,
a recipient MA can, under my proposal, subsequently drop that message
without resulting in an actual false positive, since the sender won't
have been promised that the message was, or would be, delivered.)

Some of what I'm advocating could be viewed as little more than
allowing an SMTP server more time to respond (to, especially, the DATA
phase) with a 5xy code, plus the ability for the server to respond
with a whole new code class, a 6xy code, meaning "connect to me again
later, remind me what message you're talking about, and maybe I'll
have an answer then". The "remind me" function would take a message
or transmission ID, not the entire message contents.

Since a lot of the difficulty with making SMTP resistant to UBE,
including doing content analysis, anti-forgery checking, and so on,
has to do with having to come up with a definite decision within a
short window of opportunity -- a window made *artificially* short by
the SMTP protocol, which places unnecessary burdens on the recipient,
insofar as they could be placed on the sender -- my proposal is really
just the outgrowth of my considering all the ways SMTP could, or
should, be improved or redesigned to give the *recipient* lots more
flexibility in deciding whether, how, and when to respond to an
incoming message.

The rest of my proposal assumes it'd be nice to give recipient *users*
more direct control over the kinds of things that can normally be done
only by upstream SMTP *servers* (such as reject with 5xy, which
recipients cannot, at present, do), and that it'd be nice to give
sending *users* more visibility regarding what's going on and more
control in determing just when and how message status is obtained.

In particular, what the legitimate message-sending community would, I
think, passionately *love* about my system is the fact that, much of
the time, they'd have a fairly quick positive response that any given
message they cared about had indeed been received and read, without
their having to wait for the recipient to finish composing a reply.

(I certainly love that about TXT messaging via my cell. That "message
delivered" notification is wonderful, because, when I don't get it, I
can decide for myself whether to try a direct phone call to the same
cell #, to another number where I know the person might be, etc. But
I don't know whether that's implemented by an outgoing poll, a la
tracking, or an incoming notification, a la DSN, though it "feels"
more like the latter. The latter is much more reasonable in a
monolothic communications system like The Telephone Company than in a
heterogenous environment like the Internet, though, under a totally
new protocol, it can be made much more reliable and inexpensive than
SMTP's DSNs.)

Is it worth trying such a system out? Well, the "easy" way to do so
and gain some real-world experience with it is to try ecrulisting,
which implies some modest (at least) changes to an SMTP server and
some more substantial changes to downstream components (including
MUAs) that want to directly take advantage of ecrulisting.

Is IM2000 worth trying out? Well, the "easy" way to do *that* is to
implement something like it via SMTP, by having all your (site's?)
outgoing email be munged such that the contents are put up on a web
site you control (complete with an "unpin" button on each page) and
the actual outgoing email has little more than a URL to the page (plus
some kind of "X-IM2000:" header to make it easier for experimental
MUAs to handle things automatically), so recipients gain experience
with whatever delays and outages would be inherent in IM2000, along
with not having to find their mailboxes stuffed with tons of UBE
*content* (just, theoretically, with notifications pointing to UBE
*websites*, assuming spammers decide to jump on that experimental
bandwagon).

Both approaches can be tried, and can be restricted so they "engage"
only when talking to certain sites on the other end, though
ecrulisting is less likely to be disruptive (it doesn't require any
real changes to *external* end-user behavior, except to accommodate
broken, but legitimate, SMTP clients).

--
James Craig Burley
Software Craftsperson
<http://www.jcb-sc.com>

Brian Candler

2005-05-09 13:25:29 UTC

And if, in practice, that is not good enough to ensure successful delivery
in the vast majority of cases, then senders will choose more aggressive
parameters.

Post by James Craig Burley
Spammers will naturally keep trying to gin up new identities, but the
new system is more easily able to "punish" that, without necessarily
triggering a lot of false positives, by expecting any *legitimate*
messages that come from unknown sources to be reasonably and
frequently tracked by their sender.

And assuming that they do, since they desire to get their spam through, what
have we gained? You assert that this tracking is likely to be too expensive
for them. I assert that it is trivial, and demonstrated how to write a
program to do it for many millions of recipients.

Post by James Craig Burley
I get the impression you don't yet have a *coherent* view of what an
email *system* should do.

My view has always been:
- spam is anything which originates from spammers
- if we can quickly and unambiguously recognise new spam sources, before much
of it has been delivered, then we can block it before it causes much of a
nuisance, and also make it so ineffective as to be not worth sending in
the first place
- if we can identify the sender accurately and quickly, this would help
enforce anti-spamming laws

What should the E-mail system do? It should have an *unforgeable* identity
in each incoming message; and we should be able to tell where that identity
originated from, so that a place where large numbers of identities are
created for spammers can be treated as a single identity.

I'm afraid I don't see many other approaches. Trying to identify patterns of
behaviour which characterise spammers from non-spammers is doomed, because
spammers can *easily* mimic the behaviour of non-spammers. This is what they
did when people started filtering on non-existent MAIL FROM:<...> domains,
for example.

The one thing which spammers can't mimic is the fact that they send spam -
vast quantities of untargetted mail. If you can detect this reliably,
without harming those people who have legitimate reasons for sending large
quantities of mail, then you have a good solution against spam. Sooner or
later a human element and a trusted third party need to be involved, simply
to share the work of classifying new sources. (You see such TTPs as
potential points of attack - as indeed they are - but I believe such systems
are capable of being built in such a way as to defend themselves).

Current solutions such as RBLs and DCC attempt to implement exactly this,
with some degree of success, but with an unfortunate amount of collateral
damage simply because SMTP doesn't give them a sufficiently fine-grained
view of the sender identity, particularly when mail goes via shared relays.
And because they are not sufficiently good at their job, people implement
other types of filtering too, which often block legitimate mail.

Now, that's assuming you want to be able to receive mails from people you
don't know. The only other solution I can see is to take the view that
everyone you don't know is suspect. This also requires an unforgeable sender
identity - so you can create whitelists of your friends. Anyone who is not
known, needs to be given a sufficiently difficult thing to do to get through
to you - whether that be pay a 5c e-stamp, or spend 5 x 1GHz CPU-seconds
calculating a cryptographic problem. This could be anything which a single
person sending you mail could do at low expenses, but for which sending a
million mails would become prohibitively expensive. (It may not be
prohibitively expensive if spammers *targetted* their lists, but then it
wouldn't be spam any more)

But the extra work involved by the spammer has to be quantifiable, and
controllable (so that over time, the "cost" can be increased proportionately
as required by improvements in CPU speed, devaluation of the cent, or
whatever). An arbitary constant constraint like "must poll for message a few
times in a certain period" doesn't really cut it.

Regards,

Brian.

James Craig Burley

2005-05-09 18:12:57 UTC

And if, in practice, that is not good enough to ensure successful delivery
in the vast majority of cases, then senders will choose more aggressive
parameters.

Yes, assuming their messages are sufficiently important for them. The
more outgoing messages they have, the more expensive those aggressive
parameters will be for them.

Why write a program, when simply using whatever vanilla software
implements the new protocol will work just fine?

Again, if you think that sending an average four tracking requests per
outgoing message for, say, over one million messages sent by a single
0wned box in a given day, and doing so over a period of as much as one
week, is something spammers can afford to do, great, that's your
opinion.

But that's *more* than they have to do *now* to send SMTP email (all
else being equal) and probably more than than they'd have to do under
IM2000 (as generally described), since the vast majority of their
message notifications would be ignored by recipients anyway.

Under my proposal, there's no question the "average" user would have
little trouble sending hundreds, even thousands, of pieces of outgoing
UBE without running into resource limits. But spammers won't find
email to be such a worthwhile protocol to advertise their goods at
those rates of usage, especially given an increasingly sophisticated
end-user audience (and more-intelligent MUAs).

But as the number goes up, they run into all sorts of trouble with
their outgoing bandwidth, potential upstream ISP limits, downstream
(recipient) limits, and so on.

Since any of those limits can result in dropped email, the best system
is one that *gracefully* allows email to be dropped. My proposal does
that, as does IM2000, without reducing the reliability of ordinary
exchange of email, which is not true for IM2000.

Post by James Craig Burley
I get the impression you don't yet have a *coherent* view of what an
email *system* should do.

- spam is anything which originates from spammers

And who are spammers? People who send spam? That ends up being a
recursive, therefore useless, definition. It leads to problems
defining spam -- who is responsible for doing that, who has the right
to claim another person is a spammer, etc. (These are *real* problems
we have right *now*.)

Post by Brian Candler
- if we can quickly and unambiguously recognise new spam sources, before much
of it has been delivered, then we can block it before it causes much of a
nuisance, and also make it so ineffective as to be not worth sending in
the first place

If true, and if sufficient, that puts SMTP, IM2000, and my protocol on
an equal playing field, except that SMTP puts too much pressure on
relays and recipients to take responsibility for incoming email
*before* the source is recognized as sending too much spam -- though
ecrulisting addresses that (probably somewhat clumsily).

Post by Brian Candler
- if we can identify the sender accurately and quickly, this would help
enforce anti-spamming laws

We can do that *now*: it's the "owner" of the immediately upstream IP
address or, if that's an innocent relay that is reasonably trusted to
insert proper "Received:" header, recurse to the next upstream IP
address.

If that final owner says "well, I can't be sure which of my users or
customers actually sent that email", that's *her* problem, for which
*she* should be held responsible, since *she* is responsible for
filtering, authenticating, and/or reliably logging all outgoing
communications through her IP address for just this case.

Post by Brian Candler
What should the E-mail system do? It should have an *unforgeable* identity
in each incoming message; and we should be able to tell where that identity
originated from, so that a place where large numbers of identities are
created for spammers can be treated as a single identity.

Again, identity is orthagonal to email exchange. How do I know this?
Because sending anonymous emails makes perfect sense in all sorts of
situations, and because identity is important to *other* forms of
exchange.

If you don't want to accept anonymous email, your recipient MUA can
simply reject such email. Ditto for email with an identity it is
unable to confirm.

The email *system* is *not* the right (and obviously not the only)
place to "enforce identity". We've gone over this ground many ways
already.

Post by Brian Candler
I'm afraid I don't see many other approaches. Trying to identify patterns of
behaviour which characterise spammers from non-spammers is doomed, because
spammers can *easily* mimic the behaviour of non-spammers. This is what they
did when people started filtering on non-existent MAIL FROM:<...> domains,
for example.

Trying to identify spammers by "unforgeable identities" is doomed as
well, becaues there will always be new/changed/reformed people coming
online with no absolute guarantee that they'll be linkable to any
former online identities.

Post by Brian Candler
The one thing which spammers can't mimic is the fact that they send spam -
vast quantities of untargetted mail. If you can detect this reliably,
without harming those people who have legitimate reasons for sending large
quantities of mail, then you have a good solution against spam. Sooner or
later a human element and a trusted third party need to be involved, simply
to share the work of classifying new sources. (You see such TTPs as
potential points of attack - as indeed they are - but I believe such systems
are capable of being built in such a way as to defend themselves).

They *are* built that way *now* -- they're the "smarthosts" that more
and more sites require incoming email (from some portions of the
Internet) to arrive through.

If that's enough, then we have no need for any *new* email technology,
since we already have SMTP AUTH and other means for those TTPs to
ensure that they remain Trusted even as they cater to more and more
users.

Post by Brian Candler
Current solutions such as RBLs and DCC attempt to implement exactly this,
with some degree of success, but with an unfortunate amount of collateral
damage simply because SMTP doesn't give them a sufficiently fine-grained
view of the sender identity, particularly when mail goes via shared relays.
And because they are not sufficiently good at their job, people implement
other types of filtering too, which often block legitimate mail.

Yup.

Post by Brian Candler
Now, that's assuming you want to be able to receive mails from people you
don't know. The only other solution I can see is to take the view that
everyone you don't know is suspect.

Yes -- that's a solution that pretty much anyone can implement now,
via SMTP.

Post by Brian Candler
Anyone who is not
known, needs to be given a sufficiently difficult thing to do to get through
to you - whether that be pay a 5c e-stamp, or spend 5 x 1GHz CPU-seconds
calculating a cryptographic problem. This could be anything which a single
person sending you mail could do at low expenses, but for which sending a
million mails would become prohibitively expensive. (It may not be
prohibitively expensive if spammers *targetted* their lists, but then it
wouldn't be spam any more)

Right. If I designed a new protocol along the lines I'm thinking, I'd
try to accommodate this sort of response by a recipient. Given the
protocol I have in mind, it doesn't *seem* hard, but I don't know
enough about how hashcash or epostage actually would work to be sure.

I assume IM2000 would be extended, in the wild if not in its deployed
form, to support such options.

Post by Brian Candler
But the extra work involved by the spammer has to be quantifiable, and
controllable (so that over time, the "cost" can be increased proportionately
as required by improvements in CPU speed, devaluation of the cent, or
whatever). An arbitary constant constraint like "must poll for message a few
times in a certain period" doesn't really cut it.

Yup. The system -- whatever it is -- *has* to be flexible, adaptable,
and yet at least as reliable as we have today.

The problem is, the reliability of SMTP email appears to be rapidly
dropping, and I believe I've addressed several key reasons why
already.

--
James Craig Burley
Software Craftsperson
<http://www.jcb-sc.com>

Sean Conner

2005-05-09 18:34:07 UTC

Post by James Craig Burley
Why write a program, when simply using whatever vanilla software
implements the new protocol will work just fine?
Again, if you think that sending an average four tracking requests per
outgoing message for, say, over one million messages sent by a single
0wned box in a given day, and doing so over a period of as much as one
week, is something spammers can afford to do, great, that's your
opinion.
But that's *more* than they have to do *now* to send SMTP email (all
else being equal) and probably more than than they'd have to do under
IM2000 (as generally described), since the vast majority of their
message notifications would be ignored by recipients anyway.
Under my proposal, there's no question the "average" user would have
little trouble sending hundreds, even thousands, of pieces of outgoing
UBE without running into resource limits. But spammers won't find
email to be such a worthwhile protocol to advertise their goods at
those rates of usage, especially given an increasingly sophisticated
end-user audience (and more-intelligent MUAs).

What about mailing lists? Say, a real popular one, like the Linux Kernel
Mailing list?

-spc (And what's with the tracking anyway?)

James Craig Burley

2005-05-09 21:28:51 UTC

Post by James Craig Burley
Under my proposal, there's no question the "average" user would have
little trouble sending hundreds, even thousands, of pieces of outgoing
UBE without running into resource limits. But spammers won't find
email to be such a worthwhile protocol to advertise their goods at
those rates of usage, especially given an increasingly sophisticated
end-user audience (and more-intelligent MUAs).

What about mailing lists? Say, a real popular one, like the Linux Kernel
Mailing list?

It's hard to predict whether the server that runs LKML would see more
or less pain under my proposal.

Let's assume a straightforward/naive implementation on both ends.
That is, LKML server accepts incoming email to list (however it does
that today -- same filtering, etc.) and sends it right out to all the
members on the list. Recipients accept incoming email from LKML
server just as they do today -- with whatever filtering they might
have in place, though, presumably, many would whitelist that server
(as they probably do today) -- except they have the additional ability
to respond indefinitely (or not at all) to the incoming email.

So far, that's within a small scalar factor of my proposal in terms of
efficiency. (That is, it might be more efficient than today's system,
but probably not enough to justify deployment costs alone. A system
that is inherently more resistant to UBE is the Holy Grail, especially
on this list, since it makes *everything* better.)

In cases where recipients whitelist the server, they respond to LKML's
transmission with "I've accepted responsibility", so there's no need
for LKML to track or worry further. That's only a little better than
SMTP can do today -- again, because of simple protocol speedups.

LKML then sends a tracking request out to each of the remaining
recipients about a minute later. Assuming they work pretty much as
they do today, they'll have committed the messages to disk, done any
preliminary O(N)-type scanning of content, and accepted the messages,
so they'll respond "I've accepted responsibility".

Only two more tracking request per remaining recipient would be sent,
assuming the recipients didn't respond. The LKML server would,
presumably, treat inadequate response by any recipient after the final
tracking request as it would if it received a bounce in today's
system.

How does this compare, in terms of bandwidth and efficiency, to
today's sytem? Besides recipients having an easier time of it, as
they obviously would?

Well, the LKML server must deal with incoming legitimate bounces,
which are a grab-bag of formats and sizes. For a sufficiently large
mailing list, it *will* get such bounces. It must distinguish between
"I'm on vacation this week" messages (which are not really bounces)
from the legitimate "mailbox disappeared" ones.

And bounces, besides being inherently oversized, often come with
payloads -- the original message -- which, in this case, LKML hardly
needs to be reminded of, so that's a waste of bandwidth on both ends
right there.

Plus, LKML must deal with joe-job bounces all the time -- undoubtedly,
spammers would forge email to come from it in order to get past naive
filters.

Since handling an incoming bounce is inherently much more expensive
than sending a tracking request and receiving a response (or waiting
for one that doesn't arrive), the question is how does the ratio of
all those tracking requests and responses compare to the (presumably
smaller) number of actual bounces?

The third-party costs are lowered as well, because sending bounces or
DSNs requires DNS lookups in the *reverse* direction. That's
conceptually a waste of resources (DNS caches), since bounces/DSNs
aren't really *original* communications, they are *responses* to
communications.

Tracking requests and responses would not be DNS-addressed -- they'd
be more ephemeral than bounces. (After all, sending a bounce requires
far more handshaking between point A and point B than point B sending
a single UDP packet containing a tracking request and point A
responding by sending a single UDP packet back.)

Still, if tracking ends up being much more expensive overall than
today's bounce handling in cases such as LKML, *relying* on it (by
providing no alternative in the protocol) would be a mistake.

But another possibility presents itself, as I've alluded to before. A
new protocol could essentially eliminate *inefficient* bounces
(mainly, payload-carrying bounces), replacing them with
highly-efficient versions of SMTP's DSNs. They'd still require DNS
lookups in the reverse direction, and perhaps even a TCP connection
(not just flinging a single UDP packet back at the original sender) to
ensure the sender received the DSN.

List-managing software could dynamically decide that, based on its
profile of recipient availability and other factors, it is best to
switch from tracking to requesting such bounces -- and back again, as
circumstances warrant.

Some recipients might choose to never return such bounces under the
new system (and in fact they might choose, or be unable, to do so
under today's SMTP, despite language in the RFCs that appears to
disallow that).

However, if the *protocol* allows it, and if LKML sends its messages
to the list such that they request these low-overhead DSNs *only* in
cases of failure, it can wait, say, two weeks for such a bounce before
assuming each recipient got the message, just as it does *today*,
using a new, and modestly more efficient, protocol.

If recipients are insufficiently willing to send such bounces and the
LKML server still wants to attempt to verify receipt, it can send a
single tracking message per unconfirmed recipient after that two-week
period has expired.

That makes the incremental cost of running my new system, over today's
SMTP, pretty much unmeasurable, even for the LKML. (This ignores the
one-time adoption cost.)

Further, a list manager like LKML might even choose to request DSNs as
of a particular level of success *or* failure, so it would tend to be
notified earlier, by recipients whose systems cooperate, that a given
message has in fact been received, in case it wants to be more
aggressive about trying alternate routes, deliveries, etc. (For some
list members, e.g. Linus Torvalds, that might actually make a lot of
sense.)

So, comparing apples to apples and oranges to oranges, I don't believe
my new protocol would be any *less* efficient than today's SMTP,
though it's "foundation", or "core competency", would certainly
involve tradeoffs.

Hence it seems best to extend that foundation by allowing something
akin to today's bounces/DSNs. It doesn't seem worthwhile deploying
such a system without giving senders and recipients the option to
agree to not bother with tracking and, instead, use a DSN mechanism
similar (but superior overall) to today's bounces and DSNs.

Post by Sean Conner
-spc (And what's with the tracking anyway?)

Not sure what you mean by that. Don't you track important packages
you send via FedEx &c. using their online web sites?

It does seem as though IM2000 wouldn't provide any tracking at all,
beyond an indication to the sender of whether the message was
unpinned, and possibly of who (as in IP address) has retrieved the
message.

Such information seems to require some kind of handshaking between the
sender and the message store to take place, however -- it seems too
"remote" to me to be sufficiently reliable.

That's why I keep coming back to wanting direct MUA<->MUA
communications, which takes me away from *presuming* there'll be large
monolithic third-party systems, stores, data bases, etc. that enable
email exchange. (Although, in practice, even under my proposal, such
entities would likely exist, though more to add value in various ways
than to merely *enable* email exchange, as IM2000 and, increasingly,
SMTP seem to do.)

--
James Craig Burley
Software Craftsperson
<http://www.jcb-sc.com>

Sean Conner

2005-05-09 22:14:50 UTC

Post by James Craig Burley
LKML then sends a tracking request out to each of the remaining
recipients about a minute later. Assuming they work pretty much as
they do today, they'll have committed the messages to disk, done any
preliminary O(N)-type scanning of content, and accepted the messages,
so they'll respond "I've accepted responsibility".

See, this is where I don't understand the tracking mechanism. Today, you
(more or less) have

MUAs -> MTAs -> MTAr -> MUAr

I don't see any real difference with SMTP now and what you are proposing,
except for the overhead of polling ("Did Bob get this yet? Did Bob get
this now? What about now? Now? Did he just read his email?") and a
possible annoyance factor [1]. I would also think this tracking mechanism
would be quite expensive for a large mailing list. Now, for each outgoing
message you have *two* connections, one for the sending of the email, and
then another one (or two, or three) constantly nagging the other end "Did
Bob get this?" (okay, maybe not that often, but still ...).

Post by James Craig Burley
Only two more tracking request per remaining recipient would be sent,
assuming the recipients didn't respond. The LKML server would,
presumably, treat inadequate response by any recipient after the final
tracking request as it would if it received a bounce in today's
system.

I understood the IM2k method as working like this: LKML would send out a
notification of email to each recipient:

MTAs -notice-> MTAr1
MTAr2
MTAr3
MTAr4
...

As each person then logs in to check their email, they receive
notification that there's an email waiting for them from LKML:

MTAr1 -notice-> MUAr1

User selects "Yes, I want to read that":

MTAr1 <-getbody- MUAr1
MTAs <-getbody- MTAr1
MTAs -body-> MTAr1
MTAr1 -body-> MUAr1

LKML now knows that Bob just read the email. LKML can keep track of the
number of recipients that have actually *read* (read: requested) the body,
and delete the message from the store once everybody has read it, or just
expire it after a period of time (much like NNTP). Or keep it---hey, it's a
mailing list.

Going back to individuals sending email, it works simularly. Alice sends
Bob an email [2]:

Alice -send-> Iago
Iago -notice-> Ibby
Alice <-stored- Iago

Ibby -notice-> Bob
Ibby <-getbody- Bob
Iago <-getbody- Ibby
Iago -body-> Ibby
Ibby -body-> Bob
Alice <-recv- Iago

(that last step assumes Alice is online at the time---otherwise, the
notification can happen the next time she checks her email).

Post by James Craig Burley
Since handling an incoming bounce is inherently much more expensive
than sending a tracking request and receiving a response (or waiting
for one that doesn't arrive), the question is how does the ratio of
all those tracking requests and responses compare to the (presumably
smaller) number of actual bounces?
The third-party costs are lowered as well, because sending bounces or
DSNs requires DNS lookups in the *reverse* direction. That's
conceptually a waste of resources (DNS caches), since bounces/DSNs
aren't really *original* communications, they are *responses* to
communications.

Sending the intial email requires DNS lookups to begin with---actually, a
minimum of two (one for the MX record, then for the A record of the MX
host) and possibly more. I don't see this as being a major issue.

Post by James Craig Burley
Tracking requests and responses would not be DNS-addressed -- they'd
be more ephemeral than bounces.

Um ... how else do you know who to send the tracking requests to?
Granted, IP addresses may not change that often, but they do change
(renumbering, moving to a new provider, etc).

Post by Sean Conner
-spc (And what's with the tracking anyway?)

Not sure what you mean by that. Don't you track important packages
you send via FedEx &c. using their online web sites?

Um ... nope. Nor do I track regular snail mail (is that even possible?)

Post by James Craig Burley
It does seem as though IM2000 wouldn't provide any tracking at all,
beyond an indication to the sender of whether the message was
unpinned, and possibly of who (as in IP address) has retrieved the
message.
Such information seems to require some kind of handshaking between the
sender and the message store to take place, however -- it seems too
"remote" to me to be sufficiently reliable.
That's why I keep coming back to wanting direct MUA<->MUA
communications, which takes me away from *presuming* there'll be large
monolithic third-party systems, stores, data bases, etc. that enable
email exchange. (Although, in practice, even under my proposal, such
entities would likely exist, though more to add value in various ways
than to merely *enable* email exchange, as IM2000 and, increasingly,
SMTP seem to do.)

Well, until IPv6 become common, don't expect full peer-to-peer
communications on the Internet [3].

-spc (Who sadly remembers when he had 32 static IP addresses routed
to his home ... )

[1] Years ago I worked at IBM and used its internal email system (not
SMTP based---some proprietary mainframe format that required users
to log *into* the mainframe before checking) and it had tracking
capabilities much like you mention and believe me, I *hated it*.
Managers (and team leaders) would routinely mark their email with
return receipts indicating not only when I received it, but when I
*read* it.

A friend of mine found a way around the tracking mechanism though (I
think by reading the spool file directly but it's been way too many
years). Just because *you* want it doesn't mean *everybody* wants
it.

[2] Cast of characters:

Alice
Bob
Carol
Dave - regular users of email.

Matilda - sends out mass numbers of *wanted* email (a
mailing list for example)
Sam - Spammer
Trent - trusted authentication source

Iago - runs Alice's ISP
Ibby - runs Bob's ISP
Ichabod - runs Carol's ISP
Idelle - runs Dave's ISP
Immanuel - runs Matilda's ISP
Isabel - runs Sam's ISP (not knowing Sam is a spammer)
Isam - aka Sam, running his own ISP
Itani - runs Trent's ISP

Names can reference the people themselves, or the machines they run.

[3] Yes, the Internet was designed to be peer-to-peer, but since 1994
it's become less and less peer-to-peer, what with firewalls, NAT and
the assumed scarcity of IP addresses.

James Craig Burley

2005-05-10 05:14:20 UTC

It's beneath the radar for the recipient, and the *sender* isn't
likely to be annoyed by her *own* use of tracking! (We probably have
two different ideas regarding what "tracking" means -- see below.)

But if you find today's bounces and joe jobs less annoying to deal
with than lightweight sender-initiated tracking requests and
responses, great. I'm pretty sure I'd prefer a tracking-based system,
as either an end user *or* as a sysadmin for an MTA that sees a lot of
incoming UBE and joe-job bounces.

Just keep this in mind: a typical tracking request/response pair costs
little or no more than a typical DNS lookup, *assuming* the entry is
already cached in the upstream DNS cache.

In SMTP-speak, think of a tracking request this way. An SMTP client,
MTAs, initiates a TCP connection to an SMTP server, MTAr, to use your
terminology. It goes something like this:

<-220 MTAr.example.net, pleased to meet you ESMTP
->EHLO MTAs.example.com
<-250-MTAr.example.net, pleased to meet you ESMTP
<-250 TRACKING
->MAIL FROM:<***@example.com>
<-250 ok
->RCPT TO:<***@example.net>
<-250 TRKID=0123456789 ok
->DATA
<-354 go ahead
->[...message contents...]
->.
[MTAr disconnects]

MTAs can't be sure whether the message was accepted or not, since it
never saw a response to the DATA phase. It should therefore treat
that as a temporary failure (as if a 4xy response code was sent).

So, sometime later -- maybe a minute or two -- it throws a UDP packet
at MTAr (probably preferring the same IP address, if the latest MX
lookup returns that IP in its result set) saying little more than
this:

Status of 0123456789?

MTAr might choose to response with the equivalent of little more than
this:

250 TRKID=0123456789 ok

At that point, MTAs has achieved the equivalent of the SMTP
conversation ending with the following, prior to disconnection:

<-250 ok
->QUIT
<-221 quitting

Now, how many more packets would that final exchange, during the SMTP
session, have cost anyway? Well, given the nature of SMTP, at least
as many as a single tracking request/response pair, is my
understanding.

Of course, MTAr might not respond to the tracking request at all,
which would cause MTAs to probably try again in another few minutes.

Or, MTAr could respond with a 4xy or 5xy code, or a new code, 6xy,
meaning, literally, "ask me again later" (though perhaps the "xy" in
"6xy", or the extended response code, would give MTAs enough of a
sense of resolution to stop tracking). (Of course, 6xy codes wouldn't
be returned except to clients that somehow advertised they understood
them -- probably via new SMTP verbs or something similar.)

At some point, if MTAs isn't satisfied the message was received, it
could simply attempt to transfer the message again. That conversation
might look like this:

<-220 MTAr.example.net, pleased to meet you ESMTP
->EHLO MTAs.example.com
<-250-MTAr.example.net, pleased to meet you ESMTP
<-250 TRACKING
->MAIL FROM:<***@example.com>
<-250 ok
->RCPT TO:<***@example.net> TRKID=0123456789

At that point, if MTAr sees it already has the message contents
available locally, it can tell MTAs to not bother resending that. Or
it can explicitly reject the message without accepting the contents.

This is my rough outline of an SMTP++ that would implement my proposal
in the context of existing SMTP implementations. It wouldn't be
*particularly* clean, but it could be quite effective anyway.

Under my proposal for a *replacement* for SMTP, the conversations
above would be much simpler, with hardly any handshaking at all.

And I think it'd be easy to make the new protocol naturally support
both push-style *and* pull-style delivery, so IMAP/POP3/IM2000-type
access could simply "fall out" of the protocol design. (But that's
not terribly interesting if the basic premise of my system --
submission following by tracking -- is *itself* uninteresting, since
that would imply that SMTP with bounces will always be adequate, or
else IM2000's pull-style delivery will be necessary after all.)

Post by Sean Conner
[1] Years ago I worked at IBM and used its internal email system (not
SMTP based---some proprietary mainframe format that required users
to log *into* the mainframe before checking) and it had tracking
capabilities much like you mention and believe me, I *hated it*.
Managers (and team leaders) would routinely mark their email with
return receipts indicating not only when I received it, but when I
*read* it.
A friend of mine found a way around the tracking mechanism though (I
think by reading the spool file directly but it's been way too many
years). Just because *you* want it doesn't mean *everybody* wants
it.

In other words, you dealt with a stereotypical big-iron imposition on
how you read email. That's not the sort of tracking system I'm
talking about, though it can be *cooperatively* used in that fashion.

Just because that system used the word "tracking" and I'm using the
word "tracking" does *not* mean I'm proposing the same thing.

I do worry about the possible downsides of allowing a recipient to
specify that a message has actually been *read*, not just accepted
into a mailbox a la IMAP/POP3 (with responsibility being taken over
from the sender), because there will always be suits who will require
their employees to enable and dutifully obey that "option".

That's not really the fault of the protocol, though, and I think
enough people would *prefer* to have the choice of whether to request
and/or provide that level of detail regarding disposition of emails to
justify providing it (and providing it would be trivial in the context
of the protocol I'm designing).

By the way, one thing I did not like about IBM's old mainframe designs
was the fact they were built around polling, not interruption in a
pro-active sense, when it came to the CPU<->peripheral interface, and
that mentality seemed to creep into other areas that, from my Digital
and Pr1me background, simply bothered me.

Similarly, I've been very hesitant to propose this tracking concept
for ordinary email, because I just don't like polling, generally
speaking.

I've come to believe it's best, however, in an email environment.
Ultimately, it really *is* the sender that cares most about the status
of an outgoing message. That's different from a CPU having to poll a
3270 terminal to find out whether its user has hit "Enter".

Still, I'd like to provide the option for highly efficient
bounce/DSN-style notifications in this protocol, maybe without even
requiring backwards DNS lookups most of the time. (A sender could
provide an IP:port combo to the recipient for such notifications, and
even give a DNS-style expiration for that combo in case the sender
switches to another IP address.)

Post by Sean Conner
I would also think this tracking mechanism
would be quite expensive for a large mailing list.

"Profile, don't speculate." As I said, it's on the scale of the
"expense" of DNS lookups that are performed when distributing an email
to each member of the mailing list.

And that sure beats the list manager having to overcome anti-UBE
mechanisms like greylisting, which require subsequent TCP connections
and re-transmission of entire message contents, or Challenge/Response,
which requires *multiple* TCP connections, as just two examples.

Post by Sean Conner
Now, for each outgoing
message you have *two* connections, one for the sending of the email, and
then another one (or two, or three) constantly nagging the other end "Did
Bob get this?" (okay, maybe not that often, but still ...).

What second *connection*?? It's one outgoing UDP packet, and
potentially one incoming UDP packet in response. Like a DNS lookup in
terms of bandwidth and overhead, except there's rarely a need for the
sender to recurse or redirect to another entity, unless the recipient
responds in a way that suggests the sender do exactly that.

Compare that to the deluge of joe-job bounces, "weird" (or useless)
bounces, and the *lack* of bounces in cases where messages were
dropped, all of which involve TCP connections and multiple handshakes
*within* those connections (never mind the handshakes needed to set up
and tear down TCP connections *themselves*), thanks to SMTP's design,
not to mention the resources needed to simply *compose* a bounce in
the first place, for non-English-speaking people to *decode* its
meaning, etc. (Some of this doesn't really argue against the
*concept* of a recipient-side notification -- just SMTP's
*implementation* of that concept.)

I understood the IM2k method as working like this: LKML would send out a

That's a different issue -- it's not a *traditional* mailing list, in
terms of operating a la SMTP. Since my proposal is much more
SMTP-like, I responded to the previous question about list management
in terms of an SMTP-ish mailing list engine. (I haven't given list
management a lot of thought, frankly; partly because it seems to me
that it'd be a lot like it is under SMTP, since the basic push model
is preserved.)

But, if you study your own example, you'll see that each "notice" is
just like my tracking request in terms of resource utilization, except
it's sent *first*. Then, your example shows an *incoming* TCP
connection being made to MTAs in order to fetch the body -- which,
aside from the well-known problem IM2000 will have with higher-latency
networks, means DNS must be working in the backwards[A] direction, and
MTAs *must* be available when the *reader* (via MUAr1) wants to read
the message.

Since my proposal involves MTAs submitting the entire message
contents, along with the notification, directly to MTAr1 (which in
turn can cheaply forward it -- that is, with no fsync()'ing to disk --
to MUAr1, if it happens to be online), and only *afterward* following
up that transmission with a tracking request (if necessary), the
*essential* bandwidth requirements are roughly similar, except your
IM2000-based example requires MTAr1 to do a backwards DNS lookup to
find MTAs, which is not necessary (under my proposal) under normal
circumstances.

[A] I realize I've used "reverse" in connection with DNS lookups, but
"backwards" is a better term in the context of this discussion. A
"forwards DNS lookup" is what happens when you type http://google.com
into your browser -- it has to look up google.com to find to which
host to connect. A "backwards DNS lookup" is what happens after
you've submitted a URL to, say, your own web site, to a mailing list
or blog or similar -- anyone clicking on that link must look up that
URL via their own DNS mechanism. Most of the time, that's fine,
except it is more prone to abuse (by you and anyone else who is freely
able to submit URLs through such channels). So, *generally*, it's
best to make sure there's a *need* for such a backwards lookup, rather
than simply expecting the "submitter" to directly inject the desired
content, instead of a backwards pointer to it, into the forum in
question. The "slashdot effect" is, conceptually, an example of the
problem of backward (not just DNS) lookups being substituted for
direct injection of content -- it's pretty much a combination of
convenience and copyright law that prevents the direct use of content
in the first place.

Post by Sean Conner
Going back to individuals sending email, it works simularly. Alice sends
Alice -send-> Iago
Iago -notice-> Ibby
Alice <-stored- Iago
Ibby -notice-> Bob
Ibby <-getbody- Bob
Iago <-getbody- Ibby
Iago -body-> Ibby
Ibby -body-> Bob
Alice <-recv- Iago
(that last step assumes Alice is online at the time---otherwise, the
notification can happen the next time she checks her email).

That seems a tad complicated and wasteful. Compare it to this:

Alice -send-> Iago
Iago -send-> Ibby
Ibby -send-> Bob
Ibby <-thanks Bob
Iago <-thanks Ibby
Alice <-thanks Iago

Now, each send/thanks pair *can* happen within a *single*
rightward-moving TCP connection, in which case neither Iago nor Ibby
needs to commit to storing the message contents on disk at all.

If the TCP connections don't stay up long enough to accommodate that,
or if Ibby simply doesn't want to confirm receipt to the degree Alice
wants to see, then it might look like this:

Alice -send-> Iago
Iago -send-> Ibby
Ibby -send-> Bob
Ibby <-thanks Bob
[...later that day...]
Alice -track-> Iago
Iago -track-> Ibby
Iago <-thanks Ibby
Alice <-thanks Iago

("Thanks" could mean just "responsibility for message accepted" a la
an SMTP response code of 250, but Alice might want to be sure it goes
further -- until it means "message stored in Bob's mailbox", "message
given priority higher than spam by Bob's MUA", or "message actually
read by Bob", depending on how important the message is to Alice. If
it doesn't reach the desired state quickly enough for her, she can
pick up the phone and call Bob. Alice would therefore love tracking,
since it'd potentially keep her from having to pick up the phone and
pester Bob regarding each "important" email she sent. Coincidentally,
I used to work for someone named Alice!)

Now, in this sample use of tracking, though Alice has to look up Bob's
MX (yielding Ibby's IP address) again via DNS, it's quite possible the
answer will be in a nearby DNS cache, since Alice *already* looked up
Bob's MX to send the message in the first place. But neither Ibby nor
Bob have to look up Iago nor Alice, since they're not trying to send a
bounce or DSN, so *their* DNS cache isn't ever polluted as a result of
*receiving* incoming email or tracking requests. In both cases, the
cost to outside third parties is lower -- about as low as possible, in
fact, when it comes to exchanging an email message.

Further, those track/thanks pairs are about as expensive, in terms of
bandwidth, as simple DNS lookups.

That misses the point entirely. It's the lookup in the *backwards*
(left-moving) direction that is, strictly speaking, unnecessary. With
vanilla SMTP, it occurs only when a bounce/DSN must be sent. With
IM2000, it occurs every time an email is exchanged. (Besides, A
records might accompany MX records in the "additional" section of a
response to a DNS lookup, though that's not particularly pertinent.)

As I've said before, when you design a protocol (such as SMTP or
IM2000) to rely on a backwards-initiated communication that requires a
DNS lookup in order to fully exchange an email, you give *anyone* the
ability to pollute a recipient's DNS cache with arbitrary (and
generally useless) information.

Seems like everyone who thinks they can design, improve on, or
criticize an email-exchange protocol trivializes or ignores the
burdens their ideas place on third parties, including distributed data
bases like DNS.

But those are *not* infinite resources; if they were, we could burden
them even *more* by simply sending *all* of our email through them,
and whitelisting them for incoming email -- problem solved.

Since they're not infinite resources, and since they are not able to
*directly* stem the flow of invalid requests for their services, they
should not be designated as critical paths for something as crucial as
email exchange.

For an information-exchange system in a hostile environment, then, the
key is to keep the exposure of a recipient's mechanisms to outside,
untrusted entities as limited as possible in the general case. To put
it bluntly, that means a protocol should lean towards require the
*sender* to "expose" itself moreso than the recipient.

Post by James Craig Burley
Tracking requests and responses would not be DNS-addressed -- they'd
be more ephemeral than bounces.

Um ... how else do you know who to send the tracking requests to?
Granted, IP addresses may not change that often, but they do change
(renumbering, moving to a new provider, etc).

The same way TCP on the receiving end of an incoming connection knows
to whom to send responses to that connection, data from the receiver
to the sender, etc. (Tracking requests are ephemeral, like DNS
lookups or TCP connections; they are not queued.)

But if you mean how does a *sender* know to whom to send the tracking
request -- well, to whatever MX is currently advertised for the
destination domain name, assuming the recipient didn't already provide
a list of IP addresses to use in response to the original message
submission or a previous tracking request.

(Yes, that MX lookup is another DNS lookup. But it's not a
*backwards* lookup, and it's quite possible the information will be in
a local DNS cache, since the same lookup was recently done to submit
or track the message.)

Post by Sean Conner
-spc (And what's with the tracking anyway?)

Not sure what you mean by that. Don't you track important packages
you send via FedEx &c. using their online web sites?

Um ... nope. Nor do I track regular snail mail (is that even possible?)

No. So you don't know snail mail got lost until, sometimes, long
after you sent it, if ever, because you're relying on a *return*
mechanism that is basically no more reliable than the *sending*
mechanism. Hence the need for certified mail and the like, since
reliability of regular snail mail is widely known to be inadequate for
many situations (and, yet, adequate for a huge % of outgoing mail).

At least with the US Post Office, you have a monolithic organization
that is "committed" to making the system work...at a substantial loss,
last I heard.