Internals of noXSS
This page describes how noXSS actually detects XSS attacks, what components noXSS consists of and how they work together.
Basically noXSS checks all executed scripts against relevant request data. If a certain amount of request data is found within a script noXSS assumes that a XSS attempt has occurred and prevents the execution of the whole script.
There is one other case noXSS has to handle. The injection of a entire script tag with a source to load (e.g. <script src="http://example.com/evil.js"/>) instead of inline code. Ideally this would be detected in the same way as we detect code injection but done on markup instead of JavaScript code. In it currents stage noXSS limits itself on checks of the host part of the URL provided in the src attribute. Integrity checks of the whole markup will follow in a later release (#33).
Components
noXSS mainly consists of XPCOM components. XPCOM is similar to Microsoft's COM and is heavily used within Mozilla's code base.
The strict separation of duties gives us the possibility to easily replace a component when necessary. One example is the currently used string matcher. It does a sub-string match and is going to be replaced by a sub-sequence matcher in the next major release.
Interceptor
The interceptor is responsible for intercepting any JavaScript code right before execution. This lazy evaluation approach has the advantage that we can limit our checks to a minimum (i.e. the code actually executed vs. the whole code of the current page).
Another advantage is that we are able to intercept code generated during runtime (e.g. through eval() or window.setTimeout()). This gives us the possibility to detect some DOM based XSS (Type 0) attacks where the payload is contained within the request data we are checking.
Our initial approach replaced all instances implementing nsIScriptContext with a proxy object but since Firefox 3.0 this is no longer possible for an extension due to the unavailability of internal linkage.
We are now using the debugging API of SpiderMonkey which isn't a perfect solution but works for the moment.
String Matcher
The string matcher does the actual matching. It gets request data and code and finds any matches longer than a requested length (15 characters at the moment). We are using sub-string matching in 0.1 which is fast but can by tricked by filter evasion techniques (especially if a dysfunctional removal filter is in place).
Consider the following vulnerable PHP snippet:
<?php ... // sombody told me about this 'XSS' and // and that this will protect us against it $data = preg_replace("/<\/?script[^>]*>/is", "", $_REQUEST["data"]); // we need our data as variable within our script in order to access it echo "<script>var data='". $data. "';</script>"; ... ?>
An attacker can exploit this vulnerability with the following attack vector:
';aler<script>t(<script>"XS<script>S"<script>);<script>//
This would result in ';alert("XSS");// due to the removal of the script tags. A sub-sequence matcher can detect this while a sub-string matcher will fail.
Script Tokenizer
Matching request data against script code will yield an astonishing number of matches, even with a minimum match length required. But most of the matches will be within quoted strings and are totally harmless. The script tokenizer will scan the whole script from the beginning up until the offset of the match and counts the number of tokens the match consists of. A match which consists of three tokens and above is considered harmful. An assignment in JavaScript consists of exactly three tokens and an attacker could use it to taint otherwise harmless variables. This might be used to alter control flow or to pass arbitrary strings to an already existing eval().
We are using SpiderMonkey's tokenizer for this as Mozilla's Firefox uses SpiderMonkey too. Unfortunately we cannot use the engine already embedded into Firefox. The corresponding API is considered internal and the required symbols are not exported. We solved the issue by statically linking our own copy of SpiderMonkey into our library.
Defender
The defender does the actual work by processing events from the interceptor and triggering the scanning of JavaScript code found within a page. The defender will also use the tokenizer when necessary to get the number of tokens a match consists of.
Tracker
The tracker keeps track of page transitions and will taint request data as required. The defender will check all data marked as tainted for the current domain against the code. As the tracker is not finished yet the defender currently checks the request data on every request. We will add more here as soon as the tracker is done.