Thursday Feb 12, 2009

The Insecure, Great, British Pædosieve: The use of web-cache software

In my previous post on the Great-British Pædowall (or sieve more accurately, given its usefulness) I mentioned potential insecurities generally. Some web-URL-blocking systems make use of the Squid Cache to implement the URL-matching-and-blocking functionality. This, I would argue, is a qualitative security risk, in at least two dimensions.

It is, I think, widely accepted that software detects are unavoidable and proportional to the complexity of a body of software. The rate of defects in software can be minimised, e.g. through certain, more costly, development practices and by restricting the complexity of the software (e.g. the amount of code, features, etc).

Squid is a long-standing software project to develop web-caching (not blocking) software. As it was designed to cache web content, it has to have a fairly sophisticated understanding of HTTP - not entirely trivial and certainly significantly more complex than applying string filters. Further, Squid can act in a distributed manner, co-operating with other caches by exchanging information with them - to this end it supports various inter-cache/proxy protocols (ICP, WCCP, etc.). Each of which, obviously, requires its own body of network exposed code. Additionally, it contains support for management protocols, both in-band (HTTP cache_object requests) and out-of-band (e.g. SNMP) which can be used to retrieve information - again requiring specific, network-exposed lumps of code.

Squid was developed via normal, open source development practices. The project originates from the 1990s, an era when the Internet was still a more trusting, less hostile place. Its coding practices reflect this to some extent: raw handling of network-supplied input abounds - this is (sadly) still very common in C/C++ network software generally (Squid is C++ now, but much of the code is imported C from the original Squid; there is little, if any, use of input-sanitising abstractions). Further, deployment in "friendly" environments is reasonably typical for Squid (e.g. corporate, where nearly all users are reasonably responsible adults who are contractually bound to be "friendly", on pain of substantial punishment) .

So a) Squid is a reasonably featureful and complex piece of software, with several relatively independent, parallel bodies of code that can be invoked by a remote, untrusted user; b) There is no reason to think any special defect-minimisation processes were used to develop Squid. In combination, this means we should expect Squid to have a fairly large number of defects that can be triggered by an external attacker - some proportion of which will have disruptive consequences, possibly even leading to the execution of network-supplied code. A glance at the various vulnerability databases on the internet would seem to confirm this suspicion: Squid is subject to fairly regular reports, against pretty much all the protocol bits mentioned above. It is reasonable to think that many defects remain and that some would be easily found with fuzz-testing.

Simply put this means a Squid cache will expose information and/or behave in ways which its operator did not anticipate. These ways may be by design - through features the operator was not fully aware of - or by defect. Behaviours that are caused solely by having put this large piece of software in the path, and could be fixed simply by removing it, without any loss of functionality (according to my argument in the earlier post).

The risks here are in two dimensions, a) Risks to the operator; b) Risks to the users. This may seem obvious, but it's an important distinction because the operator is likely to care more about the former risks than the latter. I.e. an ISP might take pre-emptive action to mitigate a risk to itself, but is less likely to expend similar effort on mitigating b. This is somewhat speculative, but reasonably safe in the face of various lost-customer-data incidents seen in other industries.

Risks to the operator:

  • Here's a concrete one, Squid supports HTTP CONNECT. So potentially allowing customers access to systems intended to be private to the ISP, if they happen to be accessible to the proxy. This is the case for the ISP I'm with, I received an email from their own proxy system, which was sent by CONNECT proxying to the SMTP port on the proxy itself. No defect involved, though it's possible this can blocked with appropriate configuration of Squid (Squid is promiscuous in its operation by default).
  • By default, Squid makes information publically available via an in-band management protocol (GET cache_object://menu HTTP/...). This includes information such as DNS lookup information that may reveal otherwise private, internal systems to all the proxy users. My ISP had their Squid proxy doing this this for a while.
  • More generally, as (I believe) Squid retains root privileges, the risks to the operator from defects are bounded only by the risks of such on the hosts concerned.

Risks to the user:

  • Leak of browsing history: Again, the in-band management (cache_object://) by default makes information publically available on requests made. Even if Squid is configured not to cache, substantial amount of information is available on ongoing requests and also on which IPs have recently made any requests. So any user can get a good view of what other users are doing, for traffic to hosts that host URLs that are on the IWF block-list (which tends to includes any public, image-sharing site. and were on the IWF list too recently). This was the case at the ISP I use, till recently. This information could be used in various nefarious ways (blackmail, etc).
  • Loss of service (defect): As above, Squid regularly has (at least) Denial-of-Service class vulnerabilities reported against it. The most recent one was reported in February, 2009 (though, only affects versions of Squid with assert() enabled - which is not the case for several vendors). My ISP were, at the start of this year, running Squid 2.6-STABLE15 and hence vulnerable to at least one other DoS.
  • Loss of service (distributed): A small number of users could trivially DoS the Squid service, with a minimum of bandwidth usage (compared to bandwidth based DoS) - done properly it could be hard to distinguish normal users from the miscreants.
  • The uber-hack: A cracker subverts Squid and gains control of the proxy system. Using this they observe administrative passwords (sniffing telnet access to routers, traceing system login daemons). With those passwords they gain access to the routers redirecting traffic to the proxy - they now have arbitrary power to subvert traffic. Alternatively, if the routing fabric allows (e.g. IGP without HMACs) they can inject routes directly.

Finally, a typical Squid transparent proxy has no way to ensure that the original IP destination address matches the requested host. Squid will only see the HTTP-layer request, and can only route that. Therefore such an ISP is, obviously, operating an open-proxy as far as its customers are concerned. Whether there are any qualitative security problems with this, I don't know, but you can gain some amount of plausible deniability by setting a known filtered host as your HTTP proxy (e.g. for browsing, or for BitTorrent) and sending your own X-Forwarded-For header with various other customer IPs. Squid will add your IP to the end of the X-Forwarded-For, but the recipient may not be able to tell which was the real one.


  • General-purpose, complex caching software, such as Squid, ought not be used for blocking access to URLs. The UK ISP industry should develop, from scratch and using practices appropriate for security sensitive software, a specific, limited system to meet the needs of the GB-pædosieve.
  • If such software must be used, then the ISP should treat the proxy system as hostile to its own network and use appropriate security precautions. E.g. it should be firewalled off; shouldn't have access to links running any unprotected access or routing protocols like telnet, STP or HMAC-less IGPs; administrative passwords must not be shared between the proxy system and the rest of the ISP's systems.
  • Such systems should probably be configured to clear any existing X-Forwarded-For headers, and create a clean header. (This may require Squid to be modified).
  • The system should not act as a proxy for hosts that are not on the filter-list (this does not eliminate the open-proxy problem entirely, but would substantially mitigate it).

Much of the investigation into this was done by an unnamed collaberator. Also, Richard Clayton's paper on BT Cleanfeed contains very useful information (e.g. techniques to find filtered hosts to test with). tcptraceroute and socat are very useful tools.

Updates: murb informs me that there are transparent-proxy patches for Squid and Linux. It's not clear to me whether these currently would allow Squid to match on the original destination IP though.

It appears the X-Forwarded-For should consistently point to the source behind the ISP pædo-sieve. That said, this header might not be logged by default by many systems; it's also conceivable that the code responsible for updating this header may contain exploitable defects.


Stuff about Quagga, networking and motorbikes. With the occasional rant thrown in for good measure. I am currently on indefinite leave of absence from Sun.


« July 2016