Ben Biddington

Whatever it is, it's not about "coding"

Archive for September 2009

Scala introduction — writing an OAuth library

leave a comment »

I started out intending to write some scala examples against the twitter API, however I soon discovered I needed OAuth first. Given that I use OAuth all the time at work I figured I could probably do with learning about it first-hand, while learning scala.

org.junit.rules._

I chose to test drive it with JUnit 4.7 and NetBeans.

NetBeans works almost immediately with scala, and has support for project templates etc — even scala JUnit fixtures.

UPDATE (2010-04-27) I have since discovered IntelliJ to be much better, and there is now a free community edition. IntelliJ supports scala without any fiddling around.

JUnit mostly works, though rules don’t and neither do some matchers. Even though rules don’t work, I have included it anyway because I have the t-shirt.

You can find the project on github.

Important abstractions

  1. SignatureBaseString.
    1. Characterized by three ampersand-separated segments: verb, uri, parameters.
    2. URL Encoding must conform to RFC 3986, and the following characters should are consider unreserved so should not be encoded:
      ALPHA, DIGIT, ‘-‘, ‘.’, ‘_’, ‘~’
  2. Signature.
    1. Signature is a keyed-Hash Message Authentication Code (HMAC).
    2. Consumer secret required part of HMAC secret key.
    3. Token secret is optionally included in HMAC secret key:
      (consumer_secret, token_secret) => uri_encoded_consumer_secret&[uri_encoded_token_secret]
  3. OAuthCredential. Represents the secret key(s) used to create the HMAC signature. OAuth requires a consumer credential, and optionally a token credential, representing the end user.

Now that these core concepts are complete, I am working on high-level policy, like classes for generating signed URLs and authorization headers.

Notes

JUnit — expecting exceptions in scala

Assuming JUnit 4.x, a test can expect an exception using the test annotation:

Java:

@Test(expected=IllegalArgumentException.class)
    public void ExampleThrowsException(){
        throw new IllegalArgumentException();
    }

This needs to be modified for scala:

Scala:

@Test { val expected=classOf[IllegalArgumentException] }
    def ExampleThrowsException {
        throw new IllegalArgumentException
    }

The reason for it is outlined here in the Java annotations section on named parameters.

Here is the documentation for scala annotations. Seealso: the documentation for scala 2.7.3 (includes dbc).

Closures and return

The return statement immediately returns from the current method, even if you’re within a closure. Omit return in this case — return is optional anyway.

When to use semicolon line terminator

Never — apart from:

  • When a method returns Unit (equivalent to void) and you aren’t using return keyword. [TBD: Add example].

How to use blocks

var count = 1
times(2) { println("Printed " + count + " times")}
protected def times(count : Int)(block : => Unit) = {
    1.to(count).foreach((_) => block)
}

Seealso: some executable examples on github

References

ALPHA, DIGIT, '-', '.', '_', '~'

Written by benbiddington

18 September, 2009 at 13:37

Posted in development

Tagged with , , , , , , , ,

Particle physics, mocks and stubs

leave a comment »

Steve Freeman had interesting analogy in TDD 10 years later (17m30s, slide 26: The origins of mock objects). He describes mocked unit test being “rather like particle physics”.

You fire something at a particle, things splinter off and you can detect what happens…

mocks-and-stubs

A mock is used to both detect the emissions from the system under test (SUT), and verify expectations. Additionally, a mock object may perform stub duties. This doesn’t quite fit, since fission is one-way.

Testing “by detection” like this is considered behaviour verification: verifying collaborations between the SUT and other objects.

To be testable in such a manner:

  • Requires the ability to isolate the SUT sufficiently, i.e., detach it completely from its context and collaborators. A test fixture should be able to create the SUT easily by itself.
  • Then the SUT should minimize concrete dependencies.
  • Collaborators must be designed in such a way to allow a mock to be generated that can intercept interactions. This means identifying the abstraction(s) for collaborators.
  • Mock is a stub in the sense that it needs to stand in for a real (if inert) object. But a mock is also a “detector” and is used as the means of assertion.
  • Stub queries and mock actions. “we mock when the service changes the external world; we stub when it doesn’t change the external world – stub queries and mock actions”

References

Written by benbiddington

12 September, 2009 at 15:15

Posted in development

Tagged with , , , ,

HTTP Proxy caching

leave a comment »

My next project involves deliverying files via HTTP, and as part of the optimization we are going to implement a proxy cache. The aim of this is to reduce computation by resources.

HTTP response caching

The HTTP protocol provides a number of cache control mechanisms.

[RFC2616] The basic cache mechanisms in HTTP/1.1 (server-specified expiration times and validators) are implicit directives to caches. In some cases, a server or client might need to provide explicit directives to the HTTP caches. We use the Cache-Control header for this purpose.

The Cache-Control header allows a client or server to transmit a variety of directives in either requests or responses. These directives typically override the default caching algorithms. As a general rule, if there is any apparent conflict between header values, the most restrictive interpretation is applied (that is, the one that is most likely to preserve semantic transparency). However,
in some cases, cache-control directives are explicitly specified as weakening the approximation of semantic transparency (for example, “max-stale” or “public”).

Cache-control headers are a mechanism for supplying hints to servers and end-user applications concerning how resources should be validated, revalidated and cached.

Example

Here’s the response without a proxy:

HTTP/1.0 200 OK
Date: Sat, 11 Jul 2009 12:05:07 GMT
Server: Apache/2.0.52 (Red Hat)
Cache-Control: max-age=315360000
Expires: Mon, 28 Jul 2014 23:30:00 GMT
Last-Modified: Sun, 07 Jun 2009 19:53:21 GMT
Accept-Ranges: bytes
Content-Length: 134318
Content-Type: image/jpeg
Age: 7
X-Cache: HIT from photocache413.flickr.ac4.yahoo.com
X-Cache-Lookup: HIT from photocache413.flickr.ac4.yahoo.com:81
X-Cache: MISS from photocache427.flickr.ac4.yahoo.com
X-Cache-Lookup: MISS from photocache427.flickr.ac4.yahoo.com:80
Via: 1.1 photocache413.flickr.ac4.yahoo.com:81 (squid/2.7.STABLE6),
     1.0 photocache427.flickr.ac4.yahoo.com:80 (squid/2.7.STABLE6)
Connection: close

And here it is a local proxy:

HTTP/1.0 200 OK
Date: Sat, 11 Jul 2009 16:14:33 GMT
Server: Apache/2.0.52 (Red Hat)
Cache-Control: max-age=315360000
Expires: Mon, 28 Jul 2014 23:30:00 GMT
Last-Modified: Sun, 07 Jun 2009 19:53:21 GMT
Accept-Ranges: bytes
Content-Length: 134318
Content-Type: image/jpeg
X-Cache: HIT from photocache413.flickr.ac4.yahoo.com
X-Cache-Lookup: HIT from photocache413.flickr.ac4.yahoo.com:81
X-Cache: MISS from photocache427.flickr.ac4.yahoo.com
X-Cache-Lookup: MISS from photocache427.flickr.ac4.yahoo.com:80
X-Cache: MISS from 68e99101007e4d9
X-Cache-Lookup: HIT from 68e99101007e4d9:3128
Via: 1.1 photocache413.flickr.ac4.yahoo.com:81 (squid/2.7.STABLE6),
     1.0 photocache427.flickr.ac4.yahoo.com:80 (squid/2.7.STABLE6),
     1.0 68e99101007e4d9:3128 (squid/2.7.STABLE6)
Connection: keep-alive
Proxy-Connection: keep-alive

Differences:

  • The first one has an
    [Age header]: This represents a cache’s estimate of the time in seconds since the response was generated by the origin server — it means how long it’s been cached for. [TBD: Why is this missing when I add my local cache? Should it be forwarded, or not?]
  • The second one has an additional cache lookup representing the local proxy we’ve added.

Cache control headers

Cache-control

The Cache-Control general-header field is used to specify directives that MUST be obeyed by all caching mechanisms along the request/response chain. The directives specify behavior intended to prevent caches from adversely interfering with the request or response.

There are both request and reply directives.

Squid config

To strip out all the comments:

grep -P '^(\w)|^(#[\s]+TAG:)' squid.conf.default > squid.conf

This makes squid.conf easier to workwith. By default there is already a backup of this file called squid.conf.default.

Minimum config

To allow local machine to access local proxy, add the following:

acl localnet src 127.0.0.1/32

If this one is not added, then all requests from localhost will be denied.

There are other recommendations also in the QUICKSTART file.

Debugging access control lists

You can add a debug setting to conf file:

debug_options 28,9

This switches on debug level 9 (most verbose) for section 28, Access Control. For the full set of available sections, see /docs/debug-sections.txt.

Cache inspection

You can see what’s going on using the logs generated. For example, to look at cache hits, switch debugging on for section 12.

The access log records all cache activity.

TIP: Ensure your client is not sending Pragma:no-cache (curl does this by default), otherwise you’ll see lots of TCP_CLIENT_REFRESH_MISS in your access log.

TCP_MEM_HIT: A valid copy of the requested object was in the cache and it was in memory, thus avoiding disk accesses. This is like TCP_HIT, but the object was found in memory — TCP_HIT means disk access was required.

The store log is also interesting, it gives the status of stored objects. It looks like entries only recorded here when items are added or removed, i.e., cache hits will not show up.

Enries are tagged with one of :

  • SWAPIN (swapped into memory from disk).
  • SWAPOUT (saved to disk).
  • RELEASE (removed from cache).

Caching ranges

Check your store log after making a fresh range get — you may see:

RELEASE -1 FFFFFFFF

This means the object was not cachable. [TBD: Why is this the case here?]

Problems testing range requests using curl

I was finding that my range header was being passed on to origin server even though I had set

range_offset_limit -1

Which forces squid to request the entire file and do the range itself. By turning on debugging for section 64, I could see it being forwarded.

Troubleshooting Squid

Excess data

If you get messages about excess data in your cache log:

Excess data from "GET http://www.example-domain/resource.html"

it’s likely that your origin server is sending more bytes than specified by the content length header. If you have control over the origin server, then ensure the write loop is not writing empty buffers — an easy mistake to make.

Squid v2.7 for Windows and range requests

Can’t seem to get cold range requests to cache. Is this a bug with the windows version? Here is an excellent article describing a possible workaround.

Debugging range requests

Output info to cache log using filter debug sections:

  • 11 — Hypertext Transfer Protocol (HTTP)
  • 12 — Internet Cache Protocol
  • 17 — Request Forwarding
  • 64 — HTTP Range Header
  • 66 — HTTP Header Tools
  • 74 — HTTP Message
debug_options 11,9 12,9 17,9 64,9 66,9 74,9

This will show you if range headers are being forwarded or not. For example, this line shows that we_do_ranges is being set to false:

httpBuildRequestHeader: range specs: 01597430, cachable: 1; we_do_ranges: 0

Even though this is from a range request, the range header is still being forwarded. Had we_do_ranges evaluated to 1, the range header would not have been forwarded. Squid is not supposed to forward the range header if range_offset_limit is set to -1.

Squid will not start — abnormal program termination

If you encounter this error on start, it may be because you have used a port that is in use. Run netstat to check.

For example, if squid is configured for port 3128:

$ netstat | grep [3128]

Change port using the http_port config setting.

References

Written by benbiddington

11 September, 2009 at 08:00

Posted in development

Tagged with , , , , ,

.NET Process — avoid deadlock with async reads

leave a comment »

If you are working with a child process that writes large amounts of data to its redirected stdout (or stderr), it is advisable to read from it asynchronously.

Why read stdout asynchronously?

A pipe is a connection between two processes in which one process writes data to the pipe and the other reads from the pipe. System.Diagnostics.Process.StandardOutput is an example of a pipe.

A child process may block while it waits for the client end to read from its stdout (or stderr).

When redirected, a process’s stdout may reach its limit, it will then wait for its parent to read some data before it will continue. If the parent process is waiting for all the bytes to be written before it reads anything (synchronous read), then it will wait indefinitely.

The point is: redirected streams have a limited buffer, keep them clear to allow process to complete.

So you may encounter deadlock:

[Deadlock] Pipes have a fixed size (often 4096 bytes) and if a process tries to write to a pipe which is full, the write will block until a process reads some data from the pipe.

If your child process is going to write more data than its buffer can contain, you’ll need to read it asynchronously. This stops a process blocking by ensuring there is space to emit data.

Tips

Example: piping a file to lame stdin (Windows)

Use the type command:

$ type file.mp3 | lame --mp3output 64 - "path/to/output.mp3"

Type reads the source file an emits it to its stdout, we’re then piping that directly to lame. In the preceeding example, lame has been instructed to read from stdin and write directly to a file.

To pipe stdout to another process, use something like:

$ type file.mp3 | lame --mp3output 64 - - | another_process

Or redirect to a file:

$ type file.mp3 | lame --mp3output 64 - - > "path/to/output.mp3"

Get a list of running processes (Windows)

Use the query process command.

References

Written by benbiddington

8 September, 2009 at 09:56

.NET Process — working with binary output

with one comment

Lately we discovered an issue while encoding Mp3 files with LameOur client reported encoded files we garbled; playable but watery — and full of pops and clicks.

We found this was due to interpreting the binary output from Lame as text — we had mistakenly employed Process.BeginOutputReadLine and its companion event OutputDataReceived.

Process.OutputDataReceived

By observing a Process using its OutputDataReceived event, clients can make asynchronous reads on a process’s StandardOutput.

Process.StandardOutput is a TextReader: it represents a reader that can read a sequential series of characters, i.e., it interprets its underlying stream as text.

When StandardOutput is being read asynchronously, the Process class monitors it, collecting characters into a string. Once it encounters a line ending, it notifies observers (handlers of its OutputDataReceived event), with the line of text it’s been collecting.

In short, the Process‘s underlying byte stream is converted to lines of text, and clients are notified one line at a time.

In doing so, some bytes are discarded: any bytes that (in the current encoding) represent line endings.

As a result of these missing bytes, our output Mp3s were playable, but sounded terrible.

Solution

Bypass StandardOutput. Use its underlying Stream instead.

Written by benbiddington

7 September, 2009 at 08:00