Feedback on Hypertext Mail Protocol (a.k.a. Stub Email)

Discussion:

Nathan Cheng

2006-02-22 00:11:44 UTC

Hi, I just joined this list after being directed to it by readers of
my recent article on CircleID:

http://www.circleid.com/posts/hypertext_mail_protocol_aka_stub_emaill/

Before publishing this article, I was unaware not only of this list,
but of IM2000 in general, having been out of the academic loop for
about 8 years now, and professionally busying myself with writing
better web applications rather than thinking about bettering the
Internet in general.

There are over 1600 messages on this list over a period of about 5
years, so I was wondering if someone could write about 3 sentences to
fill me in on the current state of IM2000 affairs, as well as what
primary objections to my idea--or ideas such as mine--may have already
been aired on this list.

Thanks,

Nathan Cheng

Brian Candler

2006-02-22 09:10:54 UTC

Permalink

Post by Nathan Cheng
There are over 1600 messages on this list over a period of about 5
years, so I was wondering if someone could write about 3 sentences to
fill me in on the current state of IM2000 affairs, as well as what
primary objections to my idea--or ideas such as mine--may have already
been aired on this list.

The current state is that it's dead :-) Yours is one of a handful of
messages here over the last several months.

The name of your project is quite interesting, because arguing the need for
a new E-mail system tends to end up as "squaring the circle". That is,
people seem to think that having a verified electronic identity for the
sender of a mail will result in a reduction in spam; but since new
electronic identities can be created at whim [1], this can never solve the
spam problem.

Spammers will just use whatever new protocol or mechanism is put in place.
At best you can solve the 'joe job' problem of receiving spurious bounces to
messages you never sent, but there are far simpler mechanisms for doing this
which work with SMTP mail today [2]

That is, unless you want every E-mail sender identity to be certified by an
agency like Verisign or cacert.org or your national government; or that you
will always have a prior arrangement with every person with whom you want to
exchange E-mail, learning their electronic identity out-of-band [3]. When
pushed down this route, it turns out that people generally don't want to
lose the ability to receive E-mails from people they've not been formally
"introduced" to by some other mechanism, nor to have to jump through
identity verification hoops just to get an E-mail address.

I wrote an assessment of IM2000 a while back:
http://pobox.com/~b.candler/doc/misc/im2000.html

The essence is that while I think "pull" E-mail systems do have benefits
(which I expect are shared by your own proposal), by themselves they are
unlikely to reduce spam in the way that has been claimed for IM2000.

Regards,

Brian.

[1] Taking electronic identity as E-mail address (***@domain), then you can
easily create new ones by:

- registering a new domain (cheap, easy, and with thousands of registrars
to choose from there will always be ones with lax security checks)
- using a dynamic DNS provider
- using a freemail service like hotmail.com, netscape.com
- using a free/pay-as-you-go ISP dialup service, which gives you one or
more E-mail accounts
- taking over someone else's account, e.g. by hacking into the machine where
their credentials are stored. By some estimates, around 50% of all
Windows machines currently connected to the Internet are infested with
viruses or malware.

Unless you can plug *all* those holes, spammers will have an infinite supply
of valid E-mail addresses to draw upon.

If the electronic identity is a public/private key pair, then new key pairs
can be generated on demand within seconds.

[2] Examples include BATV and SES. These can be deployed incrementally and
give immediate benefit to the person performing the deployment; there is no
need to wait for the rest of the Internet to upgrade themselves as well.

[3] Such as printing your E-mail address or RSA key fingerprint on a
business card, and handing it out to everyone you need to E-mail. The other
party will need to import this into their E-mail client before they can
receive mail from you. This is essentially "whitelisting" in today's world;
"pull" E-mail systems can give a stronger check on the claimed identity of a
received mail, but it's already quite hard for a spammer to guess which
people are in your whitelist when sending you spam.

Nathan Cheng

2006-02-22 16:14:47 UTC

Permalink

So then perhaps the only benefit of IM2000 schemes over today's
situation is that you get to see the Subject/Sender/To of the email
before retrieving the entire email? If adoption is painless and user
experience remains pretty much the same, then that's still a
worthwhile benefit, isn't it? I know that's how I delete from my inbox
spam that the filter missed--I just look at the To/Subect, check the
boxes and hit "Delete". With IM2000 at least I won't have retrieved
the entire contents of those emails.

Also, my proposal for "HTMP with obfuscation" (outlined in comments
below my article) would dramatically increase bandwith costs for the
sender, which would really only effect spammers. Is that a worthwhile
benefit?

Aren't these two benefits alone enough to justify the effort?

Nathan

Post by Brian Candler

The current state is that it's dead :-) Yours is one of a handful of
messages here over the last several months.
The name of your project is quite interesting, because arguing the need for
a new E-mail system tends to end up as "squaring the circle". That is,
people seem to think that having a verified electronic identity for the
sender of a mail will result in a reduction in spam; but since new
electronic identities can be created at whim [1], this can never solve the
spam problem.
Spammers will just use whatever new protocol or mechanism is put in place.
At best you can solve the 'joe job' problem of receiving spurious bounces to
messages you never sent, but there are far simpler mechanisms for doing this
which work with SMTP mail today [2]
That is, unless you want every E-mail sender identity to be certified by an
agency like Verisign or cacert.org or your national government; or that you
will always have a prior arrangement with every person with whom you want to
exchange E-mail, learning their electronic identity out-of-band [3]. When
pushed down this route, it turns out that people generally don't want to
lose the ability to receive E-mails from people they've not been formally
"introduced" to by some other mechanism, nor to have to jump through
identity verification hoops just to get an E-mail address.
http://pobox.com/~b.candler/doc/misc/im2000.html
The essence is that while I think "pull" E-mail systems do have benefits
(which I expect are shared by your own proposal), by themselves they are
unlikely to reduce spam in the way that has been claimed for IM2000.
Regards,
Brian.
- registering a new domain (cheap, easy, and with thousands of registrars
to choose from there will always be ones with lax security checks)
- using a dynamic DNS provider
- using a freemail service like hotmail.com, netscape.com
- using a free/pay-as-you-go ISP dialup service, which gives you one or
more E-mail accounts
- taking over someone else's account, e.g. by hacking into the machine where
their credentials are stored. By some estimates, around 50% of all
Windows machines currently connected to the Internet are infested with
viruses or malware.
Unless you can plug *all* those holes, spammers will have an infinite supply
of valid E-mail addresses to draw upon.
If the electronic identity is a public/private key pair, then new key pairs
can be generated on demand within seconds.
[2] Examples include BATV and SES. These can be deployed incrementally and
give immediate benefit to the person performing the deployment; there is no
need to wait for the rest of the Internet to upgrade themselves as well.
[3] Such as printing your E-mail address or RSA key fingerprint on a
business card, and handing it out to everyone you need to E-mail. The other
party will need to import this into their E-mail client before they can
receive mail from you. This is essentially "whitelisting" in today's world;
"pull" E-mail systems can give a stronger check on the claimed identity of a
received mail, but it's already quite hard for a spammer to guess which
people are in your whitelist when sending you spam.

James Craig Burley

2006-02-22 17:04:37 UTC

Permalink

Post by Nathan Cheng
So then perhaps the only benefit of IM2000 schemes over today's
situation is that you get to see the Subject/Sender/To of the email
before retrieving the entire email?

Not necessarily. Some variants/proposals under the name "IM2000"
disallow arbitrary sender-provided information, such as "Subject",
since that *itself* can be spam/advocacy/junk. Limits on sizes of
such things would be arbitrary (more likely to penalize legit users of
the fields, such as "Subject", than spammers); yet, having no such
limits would defeat the point of IM2000 in the first place (spammers
would just put the entire message in a "Subject" header, so it's part
of the notification -- I've already seen some SMTP spam that does
stuff like that!).

Post by Nathan Cheng
If adoption is painless

It wouldn't be, so never mind that.

Post by Nathan Cheng
and user
experience remains pretty much the same

It wouldn't be, unless and until we have a nearly 100% uptime,
low-latency, fat-pipe, universal Internet. Even so, all users would
almost certainly experience at least slight increases in delays
between when they select a message for display and when it actually
gets displayed.

(That's even without imagining all sorts of decodings and validations,
never mind SpamAssassin-type analysis, that would presumably be
desirable to apply to the content prior to displaying it to the user.)

Prefetching/caching can mitigate those problems, but they'd also
eliminate the bandwidth-savings and diskspace-savings advantages of
IM2000; we might as well make more sophisticated use of SMTP, in order
to avoid generating bounces (which is, in fact, the approach I'm
working on -- a "Push"-style, bounceless approach, which SMTP can
accommodate somewhat more readily than "Pull"-style delivery).

Post by Nathan Cheng
then that's still a
worthwhile benefit, isn't it?

A marginal one. You can try it out by experimenting with an MUA
designed to show you just the Subject/Sender/To information of an
incoming email, for which the entire contents are already placed on
some other server, that your MUA would then fetch when you actually
went to read the email.

(That is, the bandwidth savings issue for the upstream SMTP servers is
orthagonal to how you, or a typical IM2000 user, would handle your
inbox.)

That might be what you're already doing now, except the message
contents aren't already on your local disk -- they are, at *best*, on,
say, an HTTP server elsewhere on your LAN. (With IM2000, they're not
only "an HTTP server away", they're probably at least one DNS lookup
away, and DNS lookups can have unavoidable delays.)

Post by Nathan Cheng
I know that's how I delete from my inbox
spam that the filter missed--I just look at the To/Subect, check the
boxes and hit "Delete". With IM2000 at least I won't have retrieved
the entire contents of those emails.

Again, if you can do that now, what would IM2000 actually save *you*?

Post by Nathan Cheng
Also, my proposal for "HTMP with obfuscation" (outlined in comments
below my article) would dramatically increase bandwith costs for the
sender, which would really only effect spammers. Is that a worthwhile
benefit?
Aren't these two benefits alone enough to justify the effort?

Probably not.

Reviewing materials available via a Google search for "IM2000
problems", it seems the biggest problems go largely unaddressed or at
least unspecified: that of being unable to read an important email in
your in-box because some *other* server, or the network connection to
it, are down; and, along similar lines, that of experiencing
noticeable delays in reading such emails, because of high latencies
between the MUA and the message server and/or a slow or broken DNS
that is called upon to find that server in the first place.

(Correspondingly, as an IM2000 message *sender*, you might "send" 40
messages while "online", then shut down your laptop, or go offline.
If you haven't shoveled the corresponding message *contents* to some
intermediate/upstream message store, then recipients of those message
notifications won't be able to read the contents until you come back
online -- a non-starter. If you *have* so shoveled your content
upstream, you *and your message recipients* are using IM2000 pretty
much exactly as you're using SMTP *today*, in terms of mitigating
spam, using trusted intermediaries, etc., except the message
recipients would have the problems specified above vis-a-vis the
upstream message store that you're using.)

Those problems do not, generally speaking, happen with SMTP or other
"Push" systems, in which messages generally accompany notifications.
So as long as the connection is up at notification time, the message
is available locally when the *user* is actually ready to read it --
even if, at *that* point in time, the network connection is down
(offline reading, such as via a wireless device on an aircraft, in a
tunnel, etc.).

IMO, IM2000's "Pull"-style, bounceless delivery *must* be a choice
provided by any new, viable email system (whether it's a Not-So-Simple
Mail Transfer Protocol, aka NSSMTP, or something else ;-). It has too
many advantages *in specific situations* to not provide it as an
option for exchanging messages. (That is, if the sender of a message
is willing to "host" it somewhere, and a recipient is willing to
retrieve it later from the specified message store, a "new" email
system should allow them to cleanly negotiate that sort of
transaction.)

But, standing on its own, it isn't worth deploying as a *replacement*
for SMTP. (If it was, we'd have done it by now. It isn't
particularly hard to implement, and it's 2006, not 2000, after all.)

--
James Craig Burley
Software Craftsperson
<http://www.jcb-sc.com>

Charles Cazabon

2006-02-22 17:30:31 UTC

Permalink

Post by Nathan Cheng
So then perhaps the only benefit of IM2000 schemes over today's
situation is that you get to see the Subject/Sender/To of the email
before retrieving the entire email?

No, the prime benefit is that mail is stored by the sender.

That means the spammer bears the cost of hosting the mail, so spam becomes
less economically attractive.

It also means that if a spammer sets up an IM2000 server and sends out fifty
million copies of his latest penis enhancement spam, the first person to
complain to the ISP and get their connectivity yanked prevents the remainder
of those fifty million recipients from ever seeing that spam.

Charles

--
-----------------------------------------------------------------------
Charles Cazabon <***@discworld.dyndns.org>
GPL'ed software available at: http://pyropus.ca/software/
-----------------------------------------------------------------------

Greg Hewgill

2006-02-22 17:49:55 UTC

Permalink

Post by Charles Cazabon
No, the prime benefit is that mail is stored by the sender.
That means the spammer bears the cost of hosting the mail, so spam becomes
less economically attractive.

Spammers will find a way around this. For example, they could install
an arbitrary number of IM2000 message stores on zombie Windows
machines, and point junk domain names at them. Or, they could set up an
IM2000 server that stores a single copy of their message for all 50
million recipients. IM2000 shifts the burden of message storage to the
sender, but the sender can quite easily delegate this further to other
unsuspecting victims.

Greg Hewgill
http://hewgill.com

Shae Matijs Erisson

2006-02-22 18:07:17 UTC

Permalink

Post by Greg Hewgill
Spammers will find a way around this. For example, they could install
an arbitrary number of IM2000 message stores on zombie Windows
machines, and point junk domain names at them. Or, they could set up an
IM2000 server that stores a single copy of their message for all 50
million recipients. IM2000 shifts the burden of message storage to the
sender, but the sender can quite easily delegate this further to other
unsuspecting victims.

But it means that I as the 'internet geek' can kill off a thousand zombie
servers before 'gullible grandma spam-target' can retrieve the spam messages
for which she's been sent a zillion notifications.
(Plus many ISPs block incoming and outgoing port 25 for anti-zombie purposes.)

Right now, the system itself bears the cost of sending.
Any unsecured inlet means an ISP stores spam on their server, even when the
zombies are shutdown.
If the zombies had to be online to send spam, then forcing them to change
address or whatever would have the same effect in the end.
Gullible Grandma wouldn't receive the spam message, and would not send money to
the spammer.

Q. So what if the spam is in the subject field only?
A. Clients should not display unretrievable messages unless asked.
It'll probably be a spam anyway.

The point is that IM2000 and other pull systems provide accountability.

--
I've tried to teach people autodidactism, | ScannedInAvian.com
but it seems they always have to learn it for themselves.| Shae Matijs Erisson

James Craig Burley

2006-02-22 18:49:21 UTC

Permalink

Post by Shae Matijs Erisson

So what happens when she clicks on a blocked message? How long before
she realizes the message sender has been designated a spammer? *How*
is she told this?

Post by Shae Matijs Erisson
Right now, the system itself bears the cost of sending.

You mean the *recipient* system, right? (That is, some agent, like an
SMTP server running on the recipient's ISP's system.)

Post by Shae Matijs Erisson
Any unsecured inlet means an ISP stores spam on their server, even when the
zombies are shutdown.

Under traditional SMTP, yes.

Post by Shae Matijs Erisson
If the zombies had to be online to send spam, then forcing them to change
address or whatever would have the same effect in the end.
Gullible Grandma wouldn't receive the spam message, and would not send money to
the spammer.

Yes. The best defense against that is for the recipient to never see
the spammer's message at all.

But it's not clear that can be assured by the recipient's agent *not*
looking at message content, relying *solely* on whether some entity or
entities have designated a particular message store as a source of
spam.

Post by Shae Matijs Erisson
Q. So what if the spam is in the subject field only?
A. Clients should not display unretrievable messages unless asked.
It'll probably be a spam anyway.

Ummm...what? How could a client display an unretreivable message
anyway?

Or, do you mean that a client shouldn't display a *notification*
without making sure the message *content* was retrievable?

Interesting idea...but wouldn't that make merely opening up your inbox
excruciatingly slow, as your client first tried contacting the message
store for each and every pending notification before showing you your
in-box full of notifications?

Post by Shae Matijs Erisson
The point is that IM2000 and other pull systems provide accountability.

What do you mean by "accountability"? Certainly they give *senders*
more *control* over outgoing message content (they can change the
content in between content fetches, for example), and they give
senders more long-term responsibility for delivery. But
accountability?

--
James Craig Burley
Software Craftsperson
<http://www.jcb-sc.com>

James Craig Burley

2006-02-22 18:42:08 UTC

Permalink

Post by Greg Hewgill
IM2000 shifts the burden of message storage to the
sender, but the sender can quite easily delegate this further to other
unsuspecting victims.

Indeed.

Note that IM2000 (potentially) also shifts the *responsibility* of
message *delivery* to the sender. I believe that's even more
important than the message-content issue, because a recipient can't
really decide whether to accept responsibility for a message without
having access to the message *content* anyway.

(I say "(potentially)" because I'm not sure whether IM2000 actually
allows a recipient agent to acknowledge receipt of a message
notification without also accepting responsibility for the message
itself -- that is, without "guaranteeing", a la SMTP, that it will
pass that notification along to the recipient.)

Assuming responsibility for message delivery remains with an IM2000
sender until the recipient accepts delivery ("unpinning" the message,
I believe is the term coined for this), regardless of whether the
recipient has retrieved the message contents, IM2000 gives ISPs (not
just end-user recipients) more flexibility with regard to how to
handle message notifications from "suspect" (non-whitelisted,
non-approved) sources, compared to traditional SMTP.

Spammers really don't care about shifting responsibility for message
delivery to anybody else, because they don't have a strong desire (nor
the resources) to assure that each recipient of their message actually
sees it, or to use other means of communication if email fails to
confirm delivery. That distinguishes them from most senders of
legitimate messages.

(But spammers *do* care about transmitting message *content* to as
many recipients as possible. Hence, you're right that, in IM2000 as
in SMTP, they'll exploit zombies, if they have to, to serve as
short-term message stores.)

Therefore, any approach that shifts the burden of *responsibility*
from recipients to senders, making it "cheaper" (resource-wise) to be
a typical recipient vs. a typical sender when compared to the present
SMTP-based system, will:

- Be more likely to be acceptable to legitimate senders, since they
also tend to be legitimate recipients, so the shift in burden
won't be a problem for them *overall*

- Be less likely to be bothered with by spammers, since they don't
really have much interest in message-delivery responsibility in
the first place, don't really care to receive legitimate email,
and depend on sending bazillions of emails frequently

It seems likely, therefore, that shifting the *responsibility* of
message delivery from recipient to sender will tend to expose many
spammers simply because they *won't* demonstrate, to recipients, that
they "care" about messages they send:

- In SMTP, they won't respond to temporary delivery failures by
retrying delivery later as maybe > 99% of legitimate email senders
do. (Greylisting, which crudely "plays" with responsibility,
depends on this fact, by temporarily rejecting delivery at least
once.)

- In IM2000, they won't repeatedly send message notifications, or
provide message contents (especially repeatedly for a given
recipient), over more than a fairly short period of time. (E.g.
if they're exploiting zombies as message stores, when shut down, a
zombie won't be sending any more notifications or serving any more
content.)

In both cases, spammers simply have too many potential recipients, and
too little interest in delivering their payloads to *all* of them, to
care enough to repeatedly notify and deliver content. They favor
hit-and-run tactics, which traditional SMTP was *designed* to favor
from the outset (because of the nature of the Internet back in those
days).

And, in both cases, recipient agents can, when their (human)
recipients aren't constantly online and accepting messages immediately
upon receipt of notifications, detect these responsibility-avoiding
behaviors and prioritize/ignore messages accordingly.

Further, with IM2000 and other systems that shift responsibility to
the sender, spamtraps are particularly effective, because:

- They need never accept responsibility for any message (they are
therefore just like legitimate users who never log in to read mail
anymore, and there are probably hundreds of thousands, if not
millions, of those).

- They need never reject any message (so they do not advertise their
existence or nonexistence as spamtraps or real users).

- Their "recipient agents" can detect responsibility-avoidance
behavior and "blacklist" corresponding senders in ways that can be
referenced by the agents for other "real" recipients. (So they
can "feed into" blacklists.)

In particular, by using spamtraps, the issue becomes not so much of
whether a *particular* message's sender doesn't seem to demonstrate
much interest in it, but whether that sender (e.g. the IP address)
doesn't demonstrate much interest in any of the *hundreds* or
*thousands* of messages it sends (via IM2000 notifications or SMTP
deliveries) to a variety of addresses, including spamtraps, at a given
domain, over a relatively short time.

Note that none of the above depends on *any* central system, such as
IP blacklists, RBLs, SPF, beyond traditional, simple DNS (for message
*delivery* lookup, and for message retrieval if IM2000 is used).

Instead, local-LAN-based versions of such systems, which recipient
agents on the same LAN can use and/or notify regarding spammers, would
likely be employed, thus distributing the burden of identifying
spammers more evenly throughout the Internet (giving spammers fewer
"fat targets" to dDOS), which has positive implications for
reliability (fewer points of failure), security, etc.

--
James Craig Burley
Software Craftsperson
<http://www.jcb-sc.com>

Brian Candler

2006-02-22 20:43:25 UTC

Permalink

Post by Charles Cazabon

Post by Nathan Cheng
So then perhaps the only benefit of IM2000 schemes over today's
situation is that you get to see the Subject/Sender/To of the email
before retrieving the entire email?

No, the prime benefit is that mail is stored by the sender.
That means the spammer bears the cost of hosting the mail, so spam becomes
less economically attractive.

I argued on my critique page that IM2000 would make spamming far cheaper for
the spammer. They only need to store *one* copy of the spam body on their
disk, and then send out as many notifications as they like. Furthermore,
inactive recipients will not retrieve the mail, so their bandwidth costs
will be minimised too.

Given that they will write their own IM2000 message store optimised for
spamming, they can choose either to send out the same opaque key to all
recipients, or to send out different ones. The latter case lets them track
who has read their spam and when, for a cost of a few bytes of storage per
recipient. A cheap 250GB drive holds a *lot* of recipient information.

Regards,

Brian.

Nathan Cheng

2006-02-22 23:23:47 UTC

Permalink

Post by Brian Candler
I argued on my critique page that IM2000 would make spamming far cheaper for
the spammer. They only need to store *one* copy of the spam body on their
disk, and then send out as many notifications as they like. Furthermore,
inactive recipients will not retrieve the mail, so their bandwidth costs
will be minimised too.
Given that they will write their own IM2000 message store optimised for
spamming, they can choose either to send out the same opaque key to all
recipients, or to send out different ones. The latter case lets them track
who has read their spam and when, for a cost of a few bytes of storage per
recipient. A cheap 250GB drive holds a *lot* of recipient information.

My obfuscation technique would eliminate both of those benefits to
spammers (by multiplying bandwith usage over current levels & filling
log files with useless information), but would require modifying MTAs
to retrieve every email they handle while masquerading as an MUA. So
definitely not an ideal solution, but a solution nonetheless. If you
can poke holes in it, please do.

Nathan

James Craig Burley

2006-02-23 16:54:55 UTC

Permalink

Post by Nathan Cheng

It does seem to offer the advantages you cite. By automatically
retrieving ("pulling") the stored message regardless of "need":

- Each intermediate MTA helps confuse a reader of the message-store
log as to whether a given retrieval represents an automated or a
human retrieval.

- The message-store system bears a greater cost for sending a
message that goes through one or more intermediate MTAs when
compared to the current system, because it has to transmit the
entire message contents to each (automatically-retrieving)
intermediate MTA.

It also offers disadvantages:

- Participating intermediate MTAs don't save bandwidth over the
current system, when it comes to such a message. In fact, their
per-message burden increases, because they not only have to deal
with the full message contents as with vanilla SMTP, they have to
deal with a stub version of a message *and* at least one DNS
lookup to obtain the full contents -- a lookup based on arbitrary,
attacker-provided data (which has problems I've posted about
before).

- Participating intermediate MTAs sharing a given provider (upstream
bandwdith) would, in fact, cause that provider to also bear a
greater cost for *receiving* a message, since that set of MTAs
would retrieve a message's contents multiple times via the same
pipe.

- Readers of message-store logs would have more visibility into how
MTAs in internal LANs operated, in return for losing clarity with
regard to exactly which retrieval is for human viewing. So it
would potentially be easier to see that recipient "bobsmith" is in
a different department (different second or third intermediate
MTA) than recipient "janejones", unless NAT or other (further)
obfuscation techniques were used (meaning a shared provider bears
a greater burden for multiple retrievals).

- Either network connectivity and message-store availability would
have to be higher, to accommodate a *series* of (arbitrary)
requests from various MTAs, or intermediate MTAs would have to
consider such retrievals to be secondary to their mission of
forwarding the corresponding stub emails in order to survive
failure to retrieve:

o If forwarding stubs depends on obtaining message contents (for
obfuscation purposes), forwarding is delayed or halted at each
hop, depending on how long it takes to obtain the contents, if
they are even available to that intermediate MTA at that time.
An unavailable (even legitimate) outgoing message store thus
causes mail queues to "back up" for *intermediate* MTAs,
rather than at an end-user's Mail Agent (MTA or MUA), delaying
delivery of legitimate mail further, potentially causing
dropped or bounced mail if queues get too large.

o Otherwise -- if forwarding doesn't depend on successful
retrieval of contents -- stub forwarding (to the end user)
might *complete* before even the first participating MTA
finishes obtaining the message contents, which (on the plus
side) "confuses" anyone reading the log file further, but (on
the minus side) means the end-user trying to read a message is
potentially competing with a number of intermediate MTAs for
both upstream (if shared) and downstream (from the
message-content provider) bandwidth to get its own individual
copy of the message contents.

Those last points really get to the crux of the issue: What Problem
Are You Trying To Solve, that you're modifying a Store-And-Forward
protocol (SMTP), which was employed specifically to allow a message to
propagate from A to B to C ... to Z, without there ever having to be
simultaneous connectivity between any *three* points or any two
*non-neighboring* points, such that those very desirable properties of
storing and forwarding are rendered nearly useless (or worse)?

Vanilla IM2000 (and its less-efficient variants, like your *vanilla*
HTMP proposal), on the other hand, offers the rather blatant advantage
that only the *notification* is stored/forwarded, until it reaches the
endpoint. Only when the end user retrieves a message need there be
any connectivity between that user's system (including intermediate
MTAs) and the originating message store. (That still is a heavier
burden on the network than imposed by vanilla SMTP, but less than
imposed by HTMP with the obfuscation method you propose. That is,
given the A->Z forwarding illustrated in the paragraph above, IM2000
adds the requirements that Z be able to contact A when the end user
wants to read a message. Your proposal further adds the requirements
that B, C, ..., and Y all be able to contact A as they propagate a
stub email, aka notification in IM2000-land.)

When it comes to "privacy", I cannot see how to reconcile the idea
that a recipient must be able to accept delivery of a thing (a
message) without the sender being informed of that fact with the
concepts of reliability and ductility, which require that the sender
assume that the recipient has not taken delivery until a sufficiently
trusted confirmation of such receipt has been received by the sender.

Either the recipient somehow arranges to not make any untrusted
intermediate transfer agent aware of his taking delivery, or the
recipient must accept that the sender might indeed learn not just
*that* delivery was taken, but, to some approximations, *when* and
*where* it was taken.

In the former situation, the recipient must accept that a sufficiently
motivated sender will repeatedly attempt delivery until either receipt
is confirmed (via ordinary confirmation, or some other event,
observable by the sender, occurs that confirms receipt, such as
receiving a response to an email or seeing a web site get updated) or
will try to use some other means to transmit the message.

Since spammers don't generally care to use other means or go to any
trouble to gather data on receipt of individual messages, probably the
best approach is for recipients to be given the option of whether to
confirm receipt on a per-message bases, letting their MUA prioritize
or categorize messages to make such confirmation automatic or manual,
and perhaps letting the MUA use simpler obfuscation techniques (like
time-shifting confirmation of receipts).

--
James Craig Burley
Software Craftsperson
<http://www.jcb-sc.com>

Bryan Campbell

2006-02-23 17:57:02 UTC

Permalink

This is all very insightful and entertaining, but, could we just stop.

It is obvious that there are a great many ways to exploit e-mail
systems. The current systems are abused daily. And, we can all think
of ways in which IM2000-ish stuff could be abused. So, why don't we
think outside the box a bit.

For example.

Can we augment, or work around, the current e-mail systems MTUs and MUAs
to give end-users anti-spam features without having to re-write the
whole guts of the internet.

For example . . . use a formatted message notification with checksums,
signatures and other information. Make a Thunderbird/OE/O/whatever MUA
plugin which will recognize a properly formatted notice and mark it as a
properly formatted notice. The notice must have truly identifying
information in it which cannot be forged. Within the notice could be an
embedded link to allow for the person who receives the message to
acknowledge that they really do want the message. Once the link is
selected the sending MTA could release the message from queue to be sent
with credentials which match the notification. The message should
arrive in moments and be found authentic by the plugin.

I am sure that everyone can find ten-dozen things wrong with this idea.
But, it is better than trying to figure out how all these MTAs in the
world will figure out what to do with e-mail without user intervention.
I for one would be happy to see a system which required user
intervention. Send me a notice . . . I won't care. If everyone I know
used notices to send e-mail, I would never accept open e-mail, without
notices, again.

Here is an idea, make software capable of running as a lightweight MTA
(with smarthost) on the client machine that the plugin manages. Then
the MTAs on the internet don't have to change at all.

Then when we are all done, we don't have to force our ideas upon the
entire internet mail system. We just work through it. People will have
less spam. Spammers can't send you anything you don't want to see. You
could even put other limitations on the e-mail. For example, if you
controlled the sending MTA/MUA the notice could provide for the
acceptance of attachments.

Just don't make this so complicated that grandma can't use it. In fact,
if you could make it grandma proof, the entire world might just adopt it.

And, someday, I can just stop accepting mail which doesn't have a
notice. Wouldn't that be nice.

Bryan -

***@misn.com

Nathan Cheng

2006-02-23 18:41:17 UTC

Permalink

It seems to me you're still thinking inside the IM2000 box, which is
definitely forgivable, since we're having this discussion on the
IM2000 mailing list.

I think your idea is similar to DMTP. I don't know if that has been
discussed previously on this list; here's the link:

http://www.ietf.org/internet-drafts/draft-duan-smtp-receiver-driven-02.txt

Your MTA-as-client-plugin idea is interesting. I'll think about that
some more...

Nathan

Brian Candler

2006-02-23 20:16:32 UTC

Permalink

Post by Bryan Campbell
This is all very insightful and entertaining, but, could we just stop.
It is obvious that there are a great many ways to exploit e-mail
systems. The current systems are abused daily. And, we can all think
of ways in which IM2000-ish stuff could be abused. So, why don't we
think outside the box a bit.
For example.
Can we augment, or work around, the current e-mail systems MTUs and MUAs
to give end-users anti-spam features without having to re-write the
whole guts of the internet.
For example . . .

. . . see http://www.rhyolite.com/anti-spam/you-might-be.html :-)

Seriously: consider the question, "what do people want from E-mail?" At the
basic level for this discussion, I'd say the key ones are:

1. People want to receive E-mail from their real-life buddies and
colleagues, plus people they've only meet via the Internet (e.g. on mailing
lists, E-bay, dating sites, chat rooms etc), without any of it being lost,
and preferably with rapid delivery.

2. People don't want to receive E-mail from scammers, fraudsters, phishers,
worms, viruses etc

3. People may or may not want to receive advertising or special offers from
legitimate businesses [this is a personal choice]

So setting aside point (3) for the moment, they want an E-mail system which
can tell the difference between mail from someone who is friendly or somehow
beneficial to hear from, and someone who is malicious or irrelevant. I
assert that there is not a technical solution to this problem.

Post by Bryan Campbell
use a formatted message notification with checksums,
signatures and other information. Make a Thunderbird/OE/O/whatever MUA
plugin which will recognize a properly formatted notice and mark it as a
properly formatted notice. The notice must have truly identifying
information in it which cannot be forged.

What sort of identity? A government-certified "this person's birth
certificate contains the name Thelonius Q. Wildebeest?" Or an ISP-certified
"this person has E-mail address ***@hotmail.com"?

In the first case, it means that once a spammer has been identified, you can
permanently block all mail from that person ever again. However, I believe
that people won't use E-mail if it involves visiting their local government
agency in person and producing their identity credentials just to get a
certified E-mail address.

In the second case, such E-mail addresses are trivially obtainable in bulk
by spammers. Therefore, the test "does this message come from a valid E-mail
account?" in no way corresponds to "does this message come from a friend or
a foe?"

Consider the situation say 10 years ago. Spammers used to send spam with
harmless but invalid return addresses, such as

MAIL FROM:<***@hijklmnop>

The logic then went:

1. Almost all of my spam has an invalid domain in the return address.
2. All my non-spam has a valid domain in the return address. [*]
3. Therefore, if I block all mails with invalid domains in the return
address, then I will block almost all of my spam.

Lovely, so that's what everyone did. Of course, the spammers instantly
retalliated by changing their return addresses to use valid domains, or even
valid E-mail addresses (after all, big lists of valid E-mail addresses are
core business for a spammer). And so the "Joe Job" was born.

This is why sender validation techniques are bound to fail as spam control
measures. As long as spammers can create accounts at will, and as long as
there are thousands of ISPs in the world, most of whom don't bother to
process their abuse mail or to limit account creation, new E-mail addresses
will appear all the time. When you receive a mail from a new identity, how
can you know whether it's a friend or foe?

But it seems people are determined to implement this sort of thing (witness
DomainKeys and the like). So what happens when that's implemented? Well, the
next domino topples. Take Challenge-Response systems, for example. The logic
goes:

1. Almost all of my spam has a forged return address.
2. All my non-spam has a valid return address. [*]
3. Therefore, if I send a challenge to the return address, and the person
replies to me again, I can prove they sent the mail in the first place and
are not a spammer.

But by now, we've forced spammers to send from their own real accounts
(obtained by various means listed in a previous mail), which they can
generate on demand. So now of course they can collect and process the
challenges, and set up automated systems to send the responses, and so
spammers become indistinguishable from non-spammers to these systems.

[*] Actually, in my experience this is not true. I used to work for a UK ISP
which had a lot of old customers under a particular domain, I'll call it
example.net. Looking at mail relay logs, I could see mails from

***@subdomain.example.net
***@examplee.net

which were basically misconfiguration of the client's MUA. Mails could go
out normally, but they would never see bounces. In the current world, now
many E-mail systems will reject their mail - but because the return address
is bad, the sender will never know. The message is simply blackholed.

The point here is: the test "is this sending machine properly configured?"
is no good as a test for "is this a friend or foe?" either. The same applies
to trying to cross match domains in HELO/EHLO to IP addresses, for example
(which incidentally is *not* required by RFC to happen in any case)

Post by Bryan Campbell
Within the notice could be an
embedded link to allow for the person who receives the message to
acknowledge that they really do want the message.

And how do you know whether you want each message or not? What information
is presented to the user to enable them to make that choice?

The information I currently get in my MUA maildrop listing comprises
(claimed) sender name, and subject header. I certainly *can* make a decision
based on that as to whether I want the message or not, but it's a PITA to
have to keep doing that, and occasionally I do skip over mails which I
really wanted to see.

Regards,

Brian.

Nathan Cheng

2006-02-23 18:34:31 UTC

Permalink

James,

Post by James Craig Burley
- Participating intermediate MTAs don't save bandwidth over the
current system, when it comes to such a message. In fact, their
per-message burden increases, because they not only have to deal
with the full message contents as with vanilla SMTP, they have to
deal with a stub version of a message *and* at least one DNS
lookup to obtain the full contents -- a lookup based on arbitrary,
attacker-provided data (which has problems I've posted about
before).

It seems to me that HTMP even *with* obfuscation will still, on
average, decrease the per-message bandwith burden for intermediate
MTAs, when taking both upstream and downstream bandwith into account.
Or is there some reason of which I am unaware that it is proper to
exclude upstream bandwith from your calculation?

Thanks,

Nathan

James Craig Burley

2006-02-23 20:12:49 UTC

Permalink

Post by Nathan Cheng

I think you're right. Let me walk it through to see where I went
wrong.

In SMTP, the entire payload -- message notification, message contents,
and responsibility for delivery -- are (typically) transmitted from
hop to hop:

A -> B -> C -> ... -> Z

So, for a 10MB message, with negligible additional notification goo
(envelope sender and recipient, SMTP handshaking), an intermediate MTA
such as C must:

- Accept 10MB worth of data coming down the pipe from upstream (B)

- Save 10MB worth of data on a reliable persistent storage device
(usually a hard drive)

- Transmit 10MB worth of data further downstream (to D)

With vanilla IM2000 or HTMP, replace 10MB above with, say, 1KB or so
-- nice savings. Also, perhaps delete the "Save..." item, depending
on responsibility issues. Nice win there as well, for each
intermediate MTA.

But add obfuscation to HTMP (or IM2000 -- basically it'd be the same
deal to achieve the same stated goal), and the amount in the
"Accept..." item above goes back to 10MB, plus change, plus at least
one DNS lookup.

What does *not* happen is that the amount in the "Transmit..." item
changes back to 10MB. D doesn't need to get the 10MB from C; rather,
it gets it directly from A. All C transmits downstream to D is the
message notification (and maybe delivery responsibility). Only if D
is downstream of the same Internet-connectivity pipe as C does the
bandwidth savings disappear -- but I had already covered that
possibility in my previous post.

So, indeed, you're right, and I'm wrong -- a typical intermediate MTA,
employing retrieval solely (or largely) for obfuscation purposes, cuts
its *combined* upstream and downstream bandwidth utilization roughly
in half, when compared to SMTP. However, it still increases it
substantially (in direct proportion to message-content size) over
vanilla IM2000/HTMP.

Having thought about the underlying (privacy) issues more since my
previous post, it seems the more pertinent incompatibility to which I
alluded near the bottom of the post is between privacy and *retrieval*
(in pull-based systems like IM2000, HTMP, and HTTP).

In push-based systems (like SMTP), the message contents having been
delivered to an intermediary (such as a mailbox) that the *recipient*
has already trusted to preserve privacy and the like, the message may
be retrieved at any time, by that recipient, without the sender
necessarily knowing it has happened. That's because the sender has
little or no control over the message store.

But, as we can see in the above walk-through, push-based systems
require the entire message contents be "pushed" from hop to hop,
though that naturally helps obscure details about the recipient's
activities (and about intermediate MTAs as well).

--
James Craig Burley
Software Craftsperson
<http://www.jcb-sc.com>

Nathan Cheng

2006-02-23 21:18:16 UTC

Permalink

...a typical intermediate MTA,
employing retrieval solely (or largely) for obfuscation purposes...

Note that even if the benefits of obfuscation prove to be little or
none, the obfuscation technique would still be useful simply as a way
to spite the sender, which 99% of the time is a spammer.

For most legitimate email senders, even if the amount of bandwith they
use for sending emails increases tenfold, it would still be much much
less than the amount of bandwith they use for other purposes, e.g. web
browsing, audio, video, games, graphics, etc. Perhaps this is not as
much the case in 3rd world countries yet, but in 2nd world countries I
think it is already the case (not to mention 1st world), and I don't
think the 3rd world is too far behind.

Nathan

Nathan Cheng

2006-02-23 21:38:25 UTC

Permalink

...more thoughts on "obfuscation as spite":

MTA-level whitelists would then be known as "non-spitelists"--messages
stored on whitelisted domains would not be "verified" thereby
affording them all the bandwith benefits IM2000 has to offer. Of
course, the blacklist equivalent is the dreaded "spitelist", where the
message is retrieved X >> 1 times and then trashed.

So maybe eventually all spammers will find ways to be on whitelists.

Then final destination MTAs like Yahoo could offer a service to paying
email customers to request every email message 10000 times before
inboxing it (and if the sender thinks he's smart and returns 1 byte
the first 999 times and the entire message the 1000th time, guess
what? his message isn't going into the Inbox). Yahoo would have a
massive array of computers basically doing a limited form of DoS on
senders.

Nathan

Brian Candler

2006-02-23 22:14:20 UTC

Permalink

Post by Nathan Cheng
Then final destination MTAs like Yahoo could offer a service to paying
email customers to request every email message 10000 times before
inboxing it

... and find themselves blackholed by the rest of the Internet for wasting
stupid amounts of legitimate senders' bandwidth?

You could already do the same with URLs embedded in spams - i.e. fetch the
page contents 10,000 times before displaying them to the user.

If the spammer's webpage is hosted on some 3rd party server (typical 'free
homepages' service) then the losers are:

(a) the free homepages service. "Serves them right for hosting a spammer's
page" you say. I'm not sure that's true, but if it is, there are less
wasteful ways of blackholing the site.

(b) the machine(s) fetching the page 10,000 times, who between them will
have to bear the bandwidth cost too.

In fact, the only person who doesn't lose out is the spammer.

Regards,

Brian.

Nathan Cheng

2006-02-23 22:22:48 UTC

Permalink

You're right, the 1000 request thing isn't such a good idea, or else
it would probably already be popular with forms of spam that are
currently susceptible to that. In any case, I think spammers would at
least lose out indirectly because people would be more incentivized to
fix compromised servers and not provide free service to spammers.

James Craig Burley

2006-02-24 01:24:40 UTC

Permalink

Post by Nathan Cheng

...a typical intermediate MTA,
employing retrieval solely (or largely) for obfuscation purposes...

Note that even if the benefits of obfuscation prove to be little or
none, the obfuscation technique would still be useful simply as a way
to spite the sender, which 99% of the time is a spammer.

Yes. There are lots of techniques like that, which seem like they
would "fight" spam or "punish" spammers, by making email exchange
artificially more expensive.

I've come to believe making email exchange artificially more expensive
or less reliable -- especially structurally -- is not the right
direction for the industry, or, at least, not a direction in which I
personally have much interest.

(Note that you can "spite" a sender in SMTP by temporarily rejecting
each and every delivery after the DATA phase, and by also having lots
of spamtraps.)

Post by Nathan Cheng
For most legitimate email senders, even if the amount of bandwith they
use for sending emails increases tenfold, it would still be much much
less than the amount of bandwith they use for other purposes, e.g. web
browsing, audio, video, games, graphics, etc. Perhaps this is not as
much the case in 3rd world countries yet, but in 2nd world countries I
think it is already the case (not to mention 1st world), and I don't
think the 3rd world is too far behind.

There's an insight there that I think is very important, and I believe
I've already touched on it before: of those sending emails, *only*
spammers fit a particular profile, that of trying to send a very large
number of (fundamentally) identical emails to as large a potential
audience as inexpensively as possible, with little or no concern for
whether any given recipient actually sees that email.

(Group deliveries of legitimate notifications such as "you requested
monitoring of www.example.com; it was just updated" come close to that
kind of behavior, but not all *that* close. For one thing, recipients
have presumably already signed up to receive such emails, so spamtraps
are unlikely to be among the list of recipients.)

Artificially increasing the cost of exchanging email via techniques
that are designed with that insight in mind is one way to "solve" the
spam problem. Especially if it takes advantage, and hence does not
require substantial replacement, of SMTP, that approach has
attractions.

Another approach is to fundamentally change how email is (typically)
exchanged, such that it better favors legitimate exchanges of email
over those who fit the spammer profile shown above than today's
typical SMTP exchanges.

I believe both SMTP and IM2000 can be "adapted" in that fashion,
though neither in as an ideal a fashion as an entirely new protocol,
and not by adding all sorts of artificial barriers to exchanging email
but, rather, by *lowering* some existing ones.

Post by Nathan Cheng
So maybe eventually all spammers will find ways to be on whitelists.

Which brings us to the zombie problem, which I believe can be "solved"
(without simply defining it away) only via an appropriately adapted
existing protocol or a properly-designed new protocol, neither of
which is particularly on-topic here. (The idea I have is to increase
requirements for zombie, or untrusted-bulk-sender, disk space
substantially, in conjunction with the increases in bandwidth
utilization we've been discussing.)

--
James Craig Burley
Software Craftsperson
<http://www.jcb-sc.com>