Hunting the Red Fox Online: Understanding and Detection of Mass Redirect-Script Injections

The “Red Fox”

For years, the Internet community has been haunted by increasingly sophisticated and organized cybercrimes, ranging from exploits on vulnerable systems (e.g., drive-by downloads) to all kinds of frauds and social engineering. Such criminal activities have developed into mass underground businesses, costing the world hundreds of billions of dollars every year and victimizing hundreds of millions of Internet users. Crucial to their operations is the existence of a large number of vulnerable websites, which can be easily compromised on a large scale and converted into web redirectors. These redirectors serve as doorways for a complicated web infrastructure that delivers malicious payloads to victims, playing a critical role in hiding more expensive criminal assets (e.g., exploit servers) in the shadow.

Timely detection and recovery of those compromised websites deprives cybercriminals of their major resources for luring visitors, and can potentially disrupt this portion of the underground business. Development of effective techniques for this purpose, however, is a daunting tasks in fighting cybercrimes. Different from the web servers hosting malicious content such as exploit kits, those redirectors are just ordinary websites with a few injected redirect scripts, which can also appear on legitimate sites. Further complicating this effort is the trend that the criminals increasingly place their redirect scripts within JavaScript (JS) files on a compromised site, which are different from other web documents like HTML, are not indexed by Google and other search engines, and thus the infections on them are even more difficult to find. Most importantly, those redirectors are easy to collect and often expendable to the attackers, rendering any heavy-weight detection technique hard to catch up with the pace that new sites are recruited. As a result, even though progress continues to be made in analyzing and detecting malicious content hosts, compromised web redirectors remain to be an elusive “red fox” difficult to hunt down.

Catching the red fox – JsRED

We introduced a new perspective to look at this problem – the strategies those criminals adopt to inject redirect scripts, which underline the constraints they face. Through inspecting 436,869 infected files collected recently, we found that the “red fox” indeed has several unique, previously unknown features. In particular, to deploy his redirect scripts quickly, the attacker tends to inject them blindly into various files (JS files as well as HTMLs) on a compromised site. This needs to be done carefully, avoiding any interference with the website’s normal operations to hide the presence of the malicious code. Also, a significant portion of infected JS files are public JS libraries (JS-libs), due to their dominant presence on legitimate websites (at least 60%), which web developers either do not change at all or modify in a way completely different from what the attacker does.

Leveraging those unique features, we developed a new, lightweight technique for catching this red fox. Our solution, called JsRED, is designed to automatically detect unknown redirect scripts on a large scale. Our idea is based upon the observation that similar scripts are blindly inserted into JS-libs, other JS files, HTMLs, etc. on compromised web servers. Among them, the clean versions of the JS-libs are publicly available, often unchanged by the website developer or customized by adjusting just a few parameters. Therefore, we can compare a JS-lib file crawled from a website with its clean references to extract the difference, and further analyze it statically and dynamically to determine whether the difference is actually a redirect script. Given the fact that it is extremely rare in a legitimate customization of a JS-lib to add just redirect code, a script identified this way is almost certain to be malicious. With the blind injection strategy the attacker takes to make his campaign scalable, the detected code can then be generalized into a template for scanning non-lib JS files, HTMLs and other content to catch their infections.

hunt_the_red_fox

The design of JsRED is illustrated in the above figure. JsRED has a mechanism that gathers a set of clean JS-libs as references. Each of such references is then compared with every JS file crawled from the web, using a scalable Bloom-filter inclusion analysis that measures the proportion of the reference present in the JS file (Section III-B). When most part of the reference is found, we further run google-diff-match-patch, a code-diff tool, to extract the difference part of the code (diff for short) from the JS file. The diff obtained this way is sent to a verifier module that analyzes the code both statically and dynamically: once it is found to be a redirector, we believe that a malicious script is detected. All such scripts are then grouped using the Hierarchical Clustering algorithm and a signature is generated for each such cluster. Those signatures are used to scan other crawled web content, HTML as well as JS, to identify other infected files. In this way, we can identify infected websites on a large scale, even when their infections have never been seen before.

Detection and findings

We implemented JsRED and evaluated it over 1,129,988 JS and HTML files we collected. The new approach was found to be highly effective: it outperformed Microsoft Security Essentials, a commercial malware detection service, by nearly 100% in terms of detected JS-file infections, and did not result in any false positives on data collected recently over three months. The approach has also achieved high performance and is capable of analyzing 255,082 JS files to generate signatures within one day and scanning all 1,129,988 files in only two hours using the signatures, on a single desktop machine. We further conducted a measurement study on the infected JS and HTML files JsRED detected, which reveals the attacker’s strategies such as the effort they made to conceal their redirect scripts. Of particular interest is the discovery of a structured peer-to-peer (P2P) redirection network built entirely on compromised sites: a visitor to any of such compromised doorways will be redirected to other sites, which are also compromised legitimate websites, before being forwarded to attack hosts. This network provides further cover for the web redirectors and we studied this network and report its unique features like dynamic selection of redirect targets, cloaking strategies and long life time (over 285 days).

For more detail, please refer to our paper.

Published by 袁 侃

If you don't have ability, you wind up coding