Ben Biddington

Whatever it is, it's not about "coding"

Posts Tagged ‘squid

Network diagnostics

leave a comment »

If you have a problem like:

  • One of a set of load balanced servers is behaving unexpectedly and
  • Targetting that machine directly by IP address does not fail the same way

Then you may have a missing host header.

Your host name may be being routed to a different application.

This kind of thing has prompted us to think about writing smoke tests for this type of multi-machine configuration.

Advertisements

Written by benbiddington

2 November, 2009 at 19:04

HTTP Proxy caching

leave a comment »

My next project involves deliverying files via HTTP, and as part of the optimization we are going to implement a proxy cache. The aim of this is to reduce computation by resources.

HTTP response caching

The HTTP protocol provides a number of cache control mechanisms.

[RFC2616] The basic cache mechanisms in HTTP/1.1 (server-specified expiration times and validators) are implicit directives to caches. In some cases, a server or client might need to provide explicit directives to the HTTP caches. We use the Cache-Control header for this purpose.

The Cache-Control header allows a client or server to transmit a variety of directives in either requests or responses. These directives typically override the default caching algorithms. As a general rule, if there is any apparent conflict between header values, the most restrictive interpretation is applied (that is, the one that is most likely to preserve semantic transparency). However,
in some cases, cache-control directives are explicitly specified as weakening the approximation of semantic transparency (for example, “max-stale” or “public”).

Cache-control headers are a mechanism for supplying hints to servers and end-user applications concerning how resources should be validated, revalidated and cached.

Example

Here’s the response without a proxy:

HTTP/1.0 200 OK
Date: Sat, 11 Jul 2009 12:05:07 GMT
Server: Apache/2.0.52 (Red Hat)
Cache-Control: max-age=315360000
Expires: Mon, 28 Jul 2014 23:30:00 GMT
Last-Modified: Sun, 07 Jun 2009 19:53:21 GMT
Accept-Ranges: bytes
Content-Length: 134318
Content-Type: image/jpeg
Age: 7
X-Cache: HIT from photocache413.flickr.ac4.yahoo.com
X-Cache-Lookup: HIT from photocache413.flickr.ac4.yahoo.com:81
X-Cache: MISS from photocache427.flickr.ac4.yahoo.com
X-Cache-Lookup: MISS from photocache427.flickr.ac4.yahoo.com:80
Via: 1.1 photocache413.flickr.ac4.yahoo.com:81 (squid/2.7.STABLE6),
     1.0 photocache427.flickr.ac4.yahoo.com:80 (squid/2.7.STABLE6)
Connection: close

And here it is a local proxy:

HTTP/1.0 200 OK
Date: Sat, 11 Jul 2009 16:14:33 GMT
Server: Apache/2.0.52 (Red Hat)
Cache-Control: max-age=315360000
Expires: Mon, 28 Jul 2014 23:30:00 GMT
Last-Modified: Sun, 07 Jun 2009 19:53:21 GMT
Accept-Ranges: bytes
Content-Length: 134318
Content-Type: image/jpeg
X-Cache: HIT from photocache413.flickr.ac4.yahoo.com
X-Cache-Lookup: HIT from photocache413.flickr.ac4.yahoo.com:81
X-Cache: MISS from photocache427.flickr.ac4.yahoo.com
X-Cache-Lookup: MISS from photocache427.flickr.ac4.yahoo.com:80
X-Cache: MISS from 68e99101007e4d9
X-Cache-Lookup: HIT from 68e99101007e4d9:3128
Via: 1.1 photocache413.flickr.ac4.yahoo.com:81 (squid/2.7.STABLE6),
     1.0 photocache427.flickr.ac4.yahoo.com:80 (squid/2.7.STABLE6),
     1.0 68e99101007e4d9:3128 (squid/2.7.STABLE6)
Connection: keep-alive
Proxy-Connection: keep-alive

Differences:

  • The first one has an
    [Age header]: This represents a cache’s estimate of the time in seconds since the response was generated by the origin server — it means how long it’s been cached for. [TBD: Why is this missing when I add my local cache? Should it be forwarded, or not?]
  • The second one has an additional cache lookup representing the local proxy we’ve added.

Cache control headers

Cache-control

The Cache-Control general-header field is used to specify directives that MUST be obeyed by all caching mechanisms along the request/response chain. The directives specify behavior intended to prevent caches from adversely interfering with the request or response.

There are both request and reply directives.

Squid config

To strip out all the comments:

grep -P '^(\w)|^(#[\s]+TAG:)' squid.conf.default > squid.conf

This makes squid.conf easier to workwith. By default there is already a backup of this file called squid.conf.default.

Minimum config

To allow local machine to access local proxy, add the following:

acl localnet src 127.0.0.1/32

If this one is not added, then all requests from localhost will be denied.

There are other recommendations also in the QUICKSTART file.

Debugging access control lists

You can add a debug setting to conf file:

debug_options 28,9

This switches on debug level 9 (most verbose) for section 28, Access Control. For the full set of available sections, see /docs/debug-sections.txt.

Cache inspection

You can see what’s going on using the logs generated. For example, to look at cache hits, switch debugging on for section 12.

The access log records all cache activity.

TIP: Ensure your client is not sending Pragma:no-cache (curl does this by default), otherwise you’ll see lots of TCP_CLIENT_REFRESH_MISS in your access log.

TCP_MEM_HIT: A valid copy of the requested object was in the cache and it was in memory, thus avoiding disk accesses. This is like TCP_HIT, but the object was found in memory — TCP_HIT means disk access was required.

The store log is also interesting, it gives the status of stored objects. It looks like entries only recorded here when items are added or removed, i.e., cache hits will not show up.

Enries are tagged with one of :

  • SWAPIN (swapped into memory from disk).
  • SWAPOUT (saved to disk).
  • RELEASE (removed from cache).

Caching ranges

Check your store log after making a fresh range get — you may see:

RELEASE -1 FFFFFFFF

This means the object was not cachable. [TBD: Why is this the case here?]

Problems testing range requests using curl

I was finding that my range header was being passed on to origin server even though I had set

range_offset_limit -1

Which forces squid to request the entire file and do the range itself. By turning on debugging for section 64, I could see it being forwarded.

Troubleshooting Squid

Excess data

If you get messages about excess data in your cache log:

Excess data from "GET http://www.example-domain/resource.html"

it’s likely that your origin server is sending more bytes than specified by the content length header. If you have control over the origin server, then ensure the write loop is not writing empty buffers — an easy mistake to make.

Squid v2.7 for Windows and range requests

Can’t seem to get cold range requests to cache. Is this a bug with the windows version? Here is an excellent article describing a possible workaround.

Debugging range requests

Output info to cache log using filter debug sections:

  • 11 — Hypertext Transfer Protocol (HTTP)
  • 12 — Internet Cache Protocol
  • 17 — Request Forwarding
  • 64 — HTTP Range Header
  • 66 — HTTP Header Tools
  • 74 — HTTP Message
debug_options 11,9 12,9 17,9 64,9 66,9 74,9

This will show you if range headers are being forwarded or not. For example, this line shows that we_do_ranges is being set to false:

httpBuildRequestHeader: range specs: 01597430, cachable: 1; we_do_ranges: 0

Even though this is from a range request, the range header is still being forwarded. Had we_do_ranges evaluated to 1, the range header would not have been forwarded. Squid is not supposed to forward the range header if range_offset_limit is set to -1.

Squid will not start — abnormal program termination

If you encounter this error on start, it may be because you have used a port that is in use. Run netstat to check.

For example, if squid is configured for port 3128:

$ netstat | grep [3128]

Change port using the http_port config setting.

References

Written by benbiddington

11 September, 2009 at 08:00

Posted in development

Tagged with , , , , ,