UW CSE and ICSI Web Integrity Checker

Charles Reis, Steven D. Gribble, Tadayoshi Kohno, University of Washington (UW).
Nicholas C. Weaver, International Computer Science Institute (ICSI).
July 24, 2007.
[ Overview | Results from your browser | How it works | Our findings | Download toolkit | Analysis ]

Overview:

Last month we learned on Slashdot that:

"Some ISPs are resorting to a new tactic to increase revenue: inserting advertisements into web pages requested by their end users. They use a transparent web proxy (such as this one) to insert javascript and/or HTML with the ads into pages returned to users."
Have you wondered how often this is happening? And whether it's happened to you?

The University of Washington security and privacy research group and ICSI have created a measurement infrastructure to help answer these questions. By visiting our web page, you are helping out with our experiment. (Thank you!) In the process, we'll help you figure out if some "party in the middle" (like your ISP) might be modifying your web content in flight. We also plan to share our overall results with the public.


del.icio.us del.icio.us
Note that our experiment relies on having many people behind different ISPs visit this page, so we encourage you to use the "digg" link to the left. We would especially like to reach people in as many geographic locations as possible. If you share our interest in the experiment, please encourage others to visit this page or post the URL to appropriate blogs or web sites.

Experimental Results for Your Browser:

Just by visiting this page, your web browser is participating in our experiment. We are detecting whether some "party in the middle" is modifying a set of test web pages, and the results of the tests are shown below. If you do not see a "change found" message below, then we did not detect any modifications to the test pages. For more information on how the tests work, see below.

How it Works (Summary):

Our experiment assumes that you are using a modern web browser with JavaScript enabled. We have tested our experiment with Firefox 2, Internet Explorer 6 and 7, Safari 2, Opera 9, and Konqueror 3.5.

Our experiment first loads custom "integrity checking" JavaScript programs into your web browser. We collectively refer to these "integrity checking" scripts as the Experiment Harness. The Experiment Harness requests pages from the following domains, plus an IP address:

  1. washington.edu,
  2. uwsecurity.com,
  3. uwprivacy.org,
  4. uwsystems.net,
  5. happyblimp.com,
  6. 128.208.6.75.
The Experiment Harness requests pages from these different locations because ISPs may treat different types of websites differently, and we'd like to understand these differences. For example, some ISPs might only inject ads into .com sites, as observed by some users [BenAnderson].

The Experiment Harness then determines the integrity of each web page -- i.e., the Experiment Harness determines if your ISP (or some other party in the middle) modified the web page between when our server sends it and when it arrives at your web browser. Our Experiment Harness is not affected by changes caused by browser plugins or extensions. If a page is modified in-flight, the Experiment Harness will show you exactly what changed.

How it Works (Details):

The Experiment Harness loads one page (launcher.fcgi) from each of the domains listed above into separate frames on this page. In each frame, launcher.fcgi runs an independent test to determine the integrity of a second page from the same domain (testpage.html).

Each distinct fetch of testpage.html uses a different top level domain (.edu, .com, .net, etc.), although each of our servers hosts identical content. Since the HTTP request includes the server name itself, any "party in the middle" which is only targeting a particular top level domain or group of domains will still interpret the request as something worth modifying.

To perform the test, the "integrity checking" code in launcher.fcgi requests a copy of testpage.html from the server, using a JavaScript XmlHttpRequest call. To your ISP, this request looks no different than if you had visited testpage.html in your web browser. To your browser, this request looks like pure data, so it will not be altered by page-modifying browser extensions. As a result, even if you have an ad-blocker browser extension that hides ads, our integrity checking code will still detect any ads inserted by your ISP.

The integrity checking code then compares the actual contents of the test page to a string containing the expected contents of the test page. If it detects a difference, it reports the modified web page to our server. It also displays a message to you saying that the web page has been changed, as shown in the screenshot below:
Modified Page Screenshot

Whether the page was modified or not, the integrity checking code draws the test page in your browser, using document.write calls in JavaScript. The code sends a message back to our server, letting us know that the experiment has completed for you and whether modifications were detected. By combining these results with your IP address, we can determine which ISPs are actively modifying web pages.

If the test page was modified, this last message also includes a text "snapshot" of the layout of the page. That is, we send a string representation of the Document Object Model (DOM) tree (using document.documentElement.innerHTML) to our server. This snapshot is useful in the event that a "party in the middle" injects a script into the test page. We may not be able to see the source code of the script itself, but we will be able to see the impact it has on the layout of the test page.

Finally, if the test page was modified, we provide a link for you to view the exact changes to the page. This link also provides a feedback form, if you would like to send us any information about your Internet connection that might be useful for our experiment.

The effectiveness of our experiment relies on having many different people with many different ISPs visit this page. If you share our interest in whether ISPs are modifying web content, please encourage others to visit this page!

Caveat 1: Our Experiment Harness may have false negatives; indeed, some ISPs may only inject ads under circumstances that we do not explicitly test. If we don't detect any changes to the test pages, that does not prove that your ISP is not modifying other pages that you visit. However, if we do detect changes, then someone is really modifying the page.

Caveat 2: Our integrity checking mechanism is not cryptographically secure. If a "party in the middle" were modifying web pages that you visit, it could modify our scripts as well. Instead, our mechanism acts as a "tripwire" that is likely to catch any party that is currently unaware of our experiment. In the future, we could create a huge number of variants on the JavaScript tripwire. This would make it more difficult for a "party in the middle" to reliably determine that a JavaScript tripwire is running.

Our Findings So Far

You can find a detailed analysis of our results, along with our NSDI 2008 paper, on our main research page: Detecting In-Flight Changes with Web Tripwires. This page also hosts a toolkit that publishers can use to deploy a similar integrity check on their own web pages.

Update (10-31-2007, 19:35 PDT): We have analyzed all of the test results from July 24 to August 12, 2007. We found a diverse set of in-flight modifications being made to our page, many of which were not in the interest of the user or the publisher. During this time period, we found that:

Update (10-30-2007, 9:10 PDT): We have posted a brief summary of our analysis results here. This page also includes a downloadable "web tripwire" toolkit that allows publishers to detect changes to their own web pages. More information will be online soon.

Update (7-25-2007, 10:56 PDT): A few aggregate statistics so far:

Update (7-25-2007, 10:04 PDT): We are continuing to see some ISP injected advertisements, from parties such as NebuAd and FairEagle. Note that you may see modifications detected if you are using certain firewalls (e.g., ZoneAlarm) or ad-blocking proxies (e.g., Privoxy). We have also added another domain name to the test.

Update (7-25-2007, 8:00 PDT): We have added another domain name to the test, just in case ISPs have chosen to ignore our current domain names.

Update (7-24-2007, 13:42 PDT): Stay tuned... We will post our findings as our experiment runs. We have already had thousands of visitors and we have detected some modifications that appear to be caused by ISPs.

Download Toolkit

We have developed a toolkit to help publishers detect in-flight changes to their own web pages. By deploying "web tripwires" similar to those on this page, you can easily determine if your web page is being changed. More information is available here.

Privacy Policy

For the purposes of this study, we record certain information when you visit this page. Specifically, if we detect that the HTML source code of any of the frames on this page is modified before reaching your browser, we will record the difference. We also record information that is typically stored by most web servers, including your IP address, browser type, date and time of your request, and a cookie that helps to distinguish visits from different users.

In our public reports and communications with other researchers, we may reveal information about which ISPs are used by our visitors, but we will not share the IP addresses of any visitors to our page.

UW and ICSI: