Update on PAPAS and HTTP Parameter Pollution [Part 1]

My first post on HTTP Parameter Pollution has been read by more than 1,500 people, and several other security portals have blogged about it (e.g., Security-Shell, PenTestIT, Dark Reading, ToolsWatch, Packet Storm and Security Focus). So far, PAPAS, our online HPP scanning service, has received 78 submissions and 1388 unique visits.

I am happy to see that the security community seems to have some interest in our work. I thought it would be a good idea to write more on how our detection tool works. The paper is actually quite complete and comprehensive, but people are generally busy (or lazy ;)) and do not have too much time to go through all the details of a scientific paper. In the next couple of blog posts, I will explain the architecture and the algorithm we designed to detect HPP flaws in web applications.

PAPAS consists of four main components: a browser, a crawler, and two scanners.

The first component is an instrumented browser that is responsible for fetching the webpages, rendering the content, and extracting all the links and form URLs contained in the page. It is implemented as a browser extension using the standard technology offered by the Mozilla development environment: a mix of Javascript and XML User Interface Language (XUL). We use XPConnect to access Firefox’s XPCOM components for GETing and POSTing data.

Similar to other scanners, it would have been possible to directly retrieve web pages without rendering them in a real browser. However, such techniques have the drawback that they cannot efficiently deal with dynamic content that is often found on Web pages (e.g., Javascript). By using a real browser to render the pages we visit, we are able to analyze the page as it is supposed to appear to the user after the dynamic content has been generated. Also, note that unlike detecting cross site scripting or SQL injections, the ability to deal with dynamic content is a necessary prerequisite to be able to test for HPP vulnerabilities using a black-box approach.

The second component is a crawler that communicates with the browser through a bidirectional channel. This channel is used by the crawler to inform the browser on the URLs that need to be visited, and on the forms that need to be submitted. Furthermore, the channel is also used to retrieve the collected information from the browser.

In order to increase the depth that a website can be scanned with, the instrumented browser in PAPAS uses a number of simple heuristics to automatically fill forms. For example, random alphanumeric values of 8 characters are inserted into password fields and a default e-mail address is inserted into fields with the name email, e-mail, or mail.

For sites where is the authenticated section to be scanned, the crawler can be assisted by specifying a regular expression to be used to prevent the crawler from visiting the log-out page (e.g., by excluding links that include the cmd=logout parameter). You find this feature in the online service under the name “exclude regexp”.

Every time the crawler visits a page, it passes the extracted information to the two scanners so that it can be analyzed. The parameter Precedence Scanner (P-Scan) is responsible for determining how the page behaves when it receives two parameters with the same name. The Vulnerability Scanner (V-Scan), in contrast, is responsible for testing the page to determine if it is vulnerable to HPP attacks. V-Scan does this by attempting to inject a new parameter inside one of the existing ones and analyzing the output. The two scanners are written in Python, and communicate with the instrumented browser over TCP/IP sockets.

In the next post, I will go into the details of how these two scanners work. Thanks for your interest and see you next week!

embyte

About these ads

About embyte

http://www.iseclab.org/people/embyte/
This entry was posted in Web Security and tagged , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s