Why is it so difficult to select a Web Application scanner?
By Eric P. Maurice on Oct 29, 2010
Hello, this is Denis Pilipchuk again.
In a previous blog entry, we discussed how blackbox fuzz testing of Web Applications is a popular method for product security testing. Though this kind of tool is commonly used, the process for selecting one is not as straightforward as one would imagine.
Arguably, the greatest challenge with selecting the proper tool results from the continual race between blackbox fuzzing tools and Web technologies. For example, the transition of web applications from Web 1.0 to Web 2.0 has caused great difficulties to all of the blackbox fuzzing tools vendors because of the rich user interactive content and dynamically generated links typically introduced by Web 2.0 applications. This race, along with the facts that the standards in this area are very loose, and that each implementation of web 2.0 standard is slightly (or not so slightly) different, prevent even the most advanced tools from being able to successfully navigate and test websites based on newer technology.
Some of the tools currently on the market do not even currently have the capability to deal with Web 2.0 applications. Others, which claim that they are "Web 2.0 compliant", turn out to be compatible only with certain implementations. In pretty much all instances I have seen while dealing with blackbox Web application scanners, even automated navigation of such applications (discovery) presents huge problems and requires manual intervention from the tester.
One of the logical impacts of this permanent technology race is that it often turns yesterday's leaders into tomorrow's outsiders. More importantly, this constant state of catch up between testing tools and Web technology greatly challenges potential customers when conducting tools evaluations in the absence of any standards or even mutually agreed practices in the industry. If one wants to compare several blackbox fuzzing tools, where does one start?
The National Institute of Standards and Technology (NIST) is trying to answer this question for static analysis tools with the SATE project, which aims at evaluating the effectiveness of these tools. But to date, nothing comparable has been done in the area of dynamic analysis tools. The closest I can think of in terms of comparison was Larry Suto's analysis of several blackbox fuzzing tools and services, which was published in February 2010. This analysis was definitely an admirable effort and an eye-opener for many people, but to be truly useful, it would need to become a more formal exercise, published on a regular schedule.
At the beginning of the evaluation of a fuzzing tool, testers must ask themselves: "what is the desired configuration - Point-and-Shoot (PaS) or trained?" Point-and-Shoot (PaS) mode implies that the tool is just aimed at the root URL of a Web application and is capable of discovering and testing the links itself (with the exception of login configuration), whereas "Trained" mode, requires varying degree of involvement on the part of the tester, ranging from creating specialized login macros to manually following links to be tested and specifying the fields and parameters for fuzzing.
Many vendors claim that the PaS mode is not fair and should not be used in practice, but this is exactly how many IT security people deploy these tools because they simply do not have time to train themselves to deal with each of the specific applications they need to test. IT security folks will generally rely on the default tool configuration instead.
So, what makes comparing these tools so difficult? There are no standard benchmarks to evaluate against, no commonly used metrics to measure up to, and no compatible reporting. Many tool vendors maintain test sites to showcase their respective technologies (see for example http://securitythoughts.wordpress.com/2010/03/22/vulnerable-web-applications-for-learning/), but those sites are naturally biased toward an individual vendor's own product and can be updated at any given time, which makes them unsuitable as a reliable benchmark.
All tools report their findings in different formats, often categorizing same issues differently and assigning them different priorities. It takes a great deal of effort to go through assessment reports from two different vendors and compare their findings. On top of that, tools often detect the same vulnerability through multiple attack vectors and report it multiple times in different sections of the report, further complicating the comparison. Additional effort is also required to identify false positives and false negatives in these reports, as careful analysis of all reports against the site in question is necessary.
The lack of metrics presents another problem for these comparisons. The only significant effort in this area, WASSEC (Web Application Security Scanner Evaluation Criteria) concentrates on defining tools comparison metrics and provides a pretty comprehensive and useful analysis framework, but it has not been yet widely adopted in the industry.
Further refinement of the WASSEC metrics, and the incorporation of this methodology into a regularly published report along the lines of Larry Suto's report, would be a huge step toward achieving consistency of analysis across blackbox fuzzing tools. Conceivably, this could be an ongoing project, supported by the most prominent industry players and maintained by an independent organization like NIST or an academic institution.