HTTP Proxy caching
My next project involves deliverying files via HTTP, and as part of the optimization we are going to implement a proxy cache. The aim of this is to reduce computation by resources.
HTTP response caching
The HTTP protocol provides a number of cache control mechanisms.
[RFC2616] The basic cache mechanisms in HTTP/1.1 (server-specified expiration times and validators) are implicit directives to caches. In some cases, a server or client might need to provide explicit directives to the HTTP caches. We use the Cache-Control header for this purpose.
The Cache-Control header allows a client or server to transmit a variety of directives in either requests or responses. These directives typically override the default caching algorithms. As a general rule, if there is any apparent conflict between header values, the most restrictive interpretation is applied (that is, the one that is most likely to preserve semantic transparency). However,
in some cases, cache-control directives are explicitly specified as weakening the approximation of semantic transparency (for example, “max-stale” or “public”).
Cache-control headers are a mechanism for supplying hints to servers and end-user applications concerning how resources should be validated, revalidated and cached.
Here’s the response without a proxy:
HTTP/1.0 200 OK Date: Sat, 11 Jul 2009 12:05:07 GMT Server: Apache/2.0.52 (Red Hat) Cache-Control: max-age=315360000 Expires: Mon, 28 Jul 2014 23:30:00 GMT Last-Modified: Sun, 07 Jun 2009 19:53:21 GMT Accept-Ranges: bytes Content-Length: 134318 Content-Type: image/jpeg Age: 7 X-Cache: HIT from photocache413.flickr.ac4.yahoo.com X-Cache-Lookup: HIT from photocache413.flickr.ac4.yahoo.com:81 X-Cache: MISS from photocache427.flickr.ac4.yahoo.com X-Cache-Lookup: MISS from photocache427.flickr.ac4.yahoo.com:80 Via: 1.1 photocache413.flickr.ac4.yahoo.com:81 (squid/2.7.STABLE6), 1.0 photocache427.flickr.ac4.yahoo.com:80 (squid/2.7.STABLE6) Connection: close
And here it is a local proxy:
HTTP/1.0 200 OK Date: Sat, 11 Jul 2009 16:14:33 GMT Server: Apache/2.0.52 (Red Hat) Cache-Control: max-age=315360000 Expires: Mon, 28 Jul 2014 23:30:00 GMT Last-Modified: Sun, 07 Jun 2009 19:53:21 GMT Accept-Ranges: bytes Content-Length: 134318 Content-Type: image/jpeg X-Cache: HIT from photocache413.flickr.ac4.yahoo.com X-Cache-Lookup: HIT from photocache413.flickr.ac4.yahoo.com:81 X-Cache: MISS from photocache427.flickr.ac4.yahoo.com X-Cache-Lookup: MISS from photocache427.flickr.ac4.yahoo.com:80 X-Cache: MISS from 68e99101007e4d9 X-Cache-Lookup: HIT from 68e99101007e4d9:3128 Via: 1.1 photocache413.flickr.ac4.yahoo.com:81 (squid/2.7.STABLE6), 1.0 photocache427.flickr.ac4.yahoo.com:80 (squid/2.7.STABLE6), 1.0 68e99101007e4d9:3128 (squid/2.7.STABLE6) Connection: keep-alive Proxy-Connection: keep-alive
- The first one has an
[Age header]: This represents a cache’s estimate of the time in seconds since the response was generated by the origin server — it means how long it’s been cached for. [TBD: Why is this missing when I add my local cache? Should it be forwarded, or not?]
- The second one has an additional cache lookup representing the local proxy we’ve added.
Cache control headers
The Cache-Control general-header field is used to specify directives that MUST be obeyed by all caching mechanisms along the request/response chain. The directives specify behavior intended to prevent caches from adversely interfering with the request or response.
There are both request and reply directives.
To strip out all the comments:
grep -P '^(\w)|^(#[\s]+TAG:)' squid.conf.default > squid.conf
This makes squid.conf easier to workwith. By default there is already a backup of this file called squid.conf.default.
To allow local machine to access local proxy, add the following:
acl localnet src 127.0.0.1/32
If this one is not added, then all requests from localhost will be denied.
There are other recommendations also in the QUICKSTART file.
Debugging access control lists
You can add a debug setting to conf file:
This switches on debug level 9 (most verbose) for section 28, Access Control. For the full set of available sections, see /docs/debug-sections.txt.
You can see what’s going on using the logs generated. For example, to look at cache hits, switch debugging on for section 12.
The access log records all cache activity.
TIP: Ensure your client is not sending Pragma:no-cache (curl does this by default), otherwise you’ll see lots of TCP_CLIENT_REFRESH_MISS in your access log.
TCP_MEM_HIT: A valid copy of the requested object was in the cache and it was in memory, thus avoiding disk accesses. This is like TCP_HIT, but the object was found in memory — TCP_HIT means disk access was required.
The store log is also interesting, it gives the status of stored objects. It looks like entries only recorded here when items are added or removed, i.e., cache hits will not show up.
Enries are tagged with one of :
- SWAPIN (swapped into memory from disk).
- SWAPOUT (saved to disk).
- RELEASE (removed from cache).
Check your store log after making a fresh range get — you may see:
RELEASE -1 FFFFFFFF
This means the object was not cachable. [TBD: Why is this the case here?]
Problems testing range requests using curl
I was finding that my range header was being passed on to origin server even though I had set
Which forces squid to request the entire file and do the range itself. By turning on debugging for section 64, I could see it being forwarded.
If you get messages about excess data in your cache log:
Excess data from "GET http://www.example-domain/resource.html"
it’s likely that your origin server is sending more bytes than specified by the content length header. If you have control over the origin server, then ensure the write loop is not writing empty buffers — an easy mistake to make.
Squid v2.7 for Windows and range requests
Can’t seem to get cold range requests to cache. Is this a bug with the windows version? Here is an excellent article describing a possible workaround.
Debugging range requests
Output info to cache log using filter debug sections:
- 11 — Hypertext Transfer Protocol (HTTP)
- 12 — Internet Cache Protocol
- 17 — Request Forwarding
- 64 — HTTP Range Header
- 66 — HTTP Header Tools
- 74 — HTTP Message
debug_options 11,9 12,9 17,9 64,9 66,9 74,9
This will show you if range headers are being forwarded or not. For example, this line shows that we_do_ranges is being set to false:
httpBuildRequestHeader: range specs: 01597430, cachable: 1; we_do_ranges: 0
Even though this is from a range request, the range header is still being forwarded. Had we_do_ranges evaluated to 1, the range header would not have been forwarded. Squid is not supposed to forward the range header if range_offset_limit is set to -1.
Squid will not start — abnormal program termination
If you encounter this error on start, it may be because you have used a port that is in use. Run netstat to check.
For example, if squid is configured for port 3128:
$ netstat | grep 
Change port using the http_port config setting.
- HTTP caching specification
- Cache control directives
- Known HTTP proxying issues
- Squid — access control lists