Jamming the spammers

15 Jul 2006

Ingenious spam techniques require anti-spam firms to stay one step ahead, says Ambarish Deshpande, regional director India & SAARC, IronPort Systems.

The volume of spam has been steadily increasing every year since 2002. In addition to sheer volume, the sophistication of spammer tactics has also grown. This flood of illegitimate email is propelled by a powerful motive – profit. Spammers make money from selling a wide array of marginal products – ranging from herbal supplements, low interest mortgages, and ergonomic computer products, to criminal activities such as credit card fraud, pornography and illegal pharmaceutical sales. The profits behind these endeavours are being ploughed back into new technology and infrastructure for delivering spam.

When spam initially became a pandemic, corporations and networks began to deploy first-generation spam filters. These filters primarily relied upon heuristic analysis – looking at the words in a message and using a weighting system to create a probability that the message was spam.

As these anti-spam solutions became more widespread, spammers began to develop new, more sophisticated tactics to circumvent the filters. This spawned a cat and mouse game – in which spammers would develop a new tactic to get past filters, then anti-spam vendors would add a new technique to their "cocktail" to stop the spammers'', then spammers would come out with a new tactic to get past even these filters, etc.

Recently, spam has been using increasingly sophisticated obfuscation techniques and mutating faster than ever. Most spam now includes blocks of text that contain words known to score as "not spam" – which are often technical terms or a passage from a text book. Other tricks involve using words with white on white text or replacing letters with numbers. Spammers have keep becoming increasingly smarter in using URLs. Some spam contains minimal content but includes a URL with a call to action, while other spam attacks host their spam URLs on the same servers used by legitimate websites – using free web hosting services, like Geocities.

These obfuscation techniques have effectively defeated most content-based filters. While most vendors still claim to have spam capture rates in the high 90''s, in reality, their capture rate may be in the 80''s (or worse). At the same time, content-based filters have the challenge of occasionally deleting legitimate mail that happens to contain words associated with spam creating a "false positive". The table highlights the evolution of spam filtering, along with the limitations of each of the approaches.

Generation

Limitations

Example

1. Heuristics

Spoofable spammers change words so filters dont recognize spam but humans do. False positives legitimate email often contains "spammy" words.

"C H EAP V.i.a.g.r.a"

2. Signatures

Spoofable Hashbusters fool bulk detection systems by making spam look dissimilar. Reactive writing signatures first requires collecting spam samples.

"Cheap Viagra dgjk#"

3. Adaptive

Spoofable Defeated by inserting good words that only machines see. High Overhead learn ing systems, like bayesian, are hard to train/maintain.

"Cheap Viagra here:http://abc.comCancer, office, Shakespeare."

4. Context Adaptive

Emerging Requires extensive vendor investment in tracking email and Web reputation.

IronPort Systems'' latest industry research shows an increased prevalence of "image-based spam" — an advanced technique that spammers have adopted to evade detection. Image-based spam bypasses both traditional content and signature scanning and contains little or no text to analyse, instead including a .gif or .jpeg file with an image.

The image contains the spam message in the form of text and graphics, similar to an HTML email, making it difficult for a machine to easily recognize the text. Image-based spam has exploded-growing from less than 1 per cent of all spam in June of 2005 to more than 12 per cent of all spam in June 2006.

This represents more than five billion image-based spam messages sent per day — 78 per cent of which pass right through first- and second-generation spam filters. The study was conducted using SenderBase data, which represents 25 pe rcent of the worlds email traffic and data from more than 100,000 ISPs, universities, and corporations around the world.

In late 2005, spam volumes were still increasing, but the growth rate began to decline from the 100 percent+ that spam volumes had sustained for the two previous years. But this respite was brief. Over the last six months, spam volumes have resumed their hyper growth rates.

From just two months between April 2006 to June 2006, spam volumes have surged 40 per cent worldwide. At the same time, spammers are focusing the intensity of their attacks. When the sophisticated spammers launch a new wave of randomised image spam, they will typically target a specific geographical area, an ISP or even an enterprise.

When this happens, as much as 50 percent of the incoming spam at a corporation is image-based. If the filter protecting that corporation is not equipped to detect and block these highly sophisticated attacks, end-users are deluged with spam for the duration of the attack, causing sever communication disruptions and major productivity losses.

Today''s spam attacks have become too sophisticated for earlier-generation spam systems. These systems share a common weakness – relying heav-ily on analysing content that can easily be manipulated by a spammer. State of the art anti-spam systems must go beyond content analysis and analyse messages in the full context in which they are sent.

Maintaining leading efficacy also requires publishing high-quality rules in near real time. Rule quality is driven by the size, breadth, and quality of the data that feeds the rule generation system. Finally, the most effective rule development systems have humans in the loop – analyzing and responding to the last few percent of spam messages that escaped automated defenses.