Ben Biddington

Whatever it is, it's not about "coding"

Archive for June 2009

RhinoMocks — repeat times

with one comment

My pair and I discovered some unexpected behaviour with repeat times yesterday. It appears repeat times behaves differently when using the Expect than it does when using AAA.

Our expectation fails, i.e., the following test passes, when the actual number of invocations exceeds expected:

[Test]
public void expect_repeat_n_times_does_not_work_when_actual_greater_than_expected() {
    const Int32 ActualTimesToCall = 6;
    const Int32 ExpectedTimesToCall = 4;

    var mock = MockRepository.GenerateMock<IExample>();
    mock.Expect(example => example.ExampleMethod()).Repeat.Times(ExpectedTimesToCall);

    for (var i = 0; i < ActualTimesToCall; i++) {
        mock.ExampleMethod();
    }

    // [?] This one passes
    mock.VerifyAllExpectations();
}

And yet when using AAA, we get the desired outcome (the following test fails):

[Test]
public void aaa_repeat_n_times_does_work_when_actual_greater_than_expected() {
    const Int32 ActualTimesToCall = 6;
    const Int32 ExpectedTimesToCall = 4;

    var mock = MockRepository.GenerateMock<IExample>();

    for (var i = 0; i < ActualTimesToCall; i++) {
        mock.ExampleMethod();
    }

    // This one fails (as expected)
    mock.AssertWasCalled(
        example => example.ExampleMethod(),
        options => options.Repeat.Times(ExpectedTimesToCall)
    );
}

Produces the message:

Rhino.Mocks.Exceptions.ExpectationViolationException: IExample.ExampleMethod();
Expected #4, Actual #6.

As expected.

The happens because of AbstractExpectation.CanAcceptCalls stops recording after range max has been reached, and so the actual number of calls recorded is never greater than the expected number:

// AbstractExpectation
public bool CanAcceptCalls
{
    get
    {
        //I don't bother to check for RepeatableOption.Never because
        //this is handled the method recorder
        if (repeatableOption == RepeatableOption.Any)
            return true;

        return expected.Max == null || actualCallsCount  > actualCallsCount )
            return false;

        if (Expected.Max==null)
            return true;

        // BJB: The following is always true
        return actualCallsCount <= expected.Max.Value;
    }
}

Solution?

I have been advised by someone on the Google group to use AAA syntax instead. And it is nicer to be sure — if a little less discoverable.

AAA and methods that return values

Expect is overloaded to handle return values — it supports Action<T> and Func<T, TReturnValue>. AssertWasCalled on the other hand does not support Func<T, TReturnValue>, so we have to suppress the return value manually in the function:

Stream mockStream = MockRepository.GenerateMock<Stream>();
mockStream.Expect(stream => stream.CanRead).
    Return(true).
    Repeat.Once();
...
mockStream.VerifyAllExpectations();

Becomes something like:

Stream mockStream = MockRepository.GenerateMock<Stream>();
...
mockStream.AssertWasCalled(
    stream => { var temp = stream.CanRead; },
    options => options.Return(true).Repeat.Once()
);
Advertisements

Written by benbiddington

23 June, 2009 at 08:50

Closure

leave a comment »

Closures have been floating around lately, cropping up in Ruby as blocks and procs, as well as pure functional languages.

[A closure] is a first-class function with free variables. Such a function is said to be “closed over” its free variables. A closure is defined within the scope of its free variables, and the extent of those variables is at least as long as the lifetime of the closure itself. The explicit use of closures is associated with functional programming and with languages such as MLLisp and Perl. Closures are used to implement continuation passing style, and in this manner, hide state. Constructs such as objects and monads can thus be implemented with closures.

Free variables

A free variable specifies a place holder in an expression. Whether it is bound or free depends on where it is declared with respect to the expression.

It is approximately correct to say:

  • A variable is free if you can substitute a value for it and the resulting expression is meaningful.
  • A variable is bound if the expression is a statement about all the possible values of the variable all at once.  A bound variable is bound by an operator such as the integral sign, a quantifier, or a summation sign.

Or:

  • [A free variable is] An occurrence of a variable in a logic formula which is not inside the scope of a quantifier.
  • [A bound variable] In logic, [is] a variable that occurs within the scope of a quantifier, and cannot be replaced by a constant.

Example:

This expression is considered bound in x, and free in y. This expression holds for all values of x between the limits, but y can take only one value. The variable y stands for a fixed value, not specified inside the expression, while x is bound by the expression definition.

In mathematical examples, it is often mentioned that bound variables cannot be replaced with a constant, otherwise a meaningless expression would result — try replacing x with 1 in the integral above. Clearly d1 doesn’t make any sense.

Interestingly, as shown in the example, this can be applied to language:

Satomi found her book.

In this expression, the pronoun her is ambiguous. It may refer to Satomi, or any female declared outside the current context, or scope. The variable her is free.

A variable in an expression is either free or bound.  It is approximately correct to say:
¨  A variable is free if you can substitute a value for it and the resulting expressions is meaningful.
¨  A variable is bound if the expression is a statement about all the possible values of the variable all at once.  A bound variable is bound by an operator such as the integral sign, a quantifier, or a summation sign.

Free variables in computer programming

In computer programming, a free variable is a variable referred to in a function that is not local (not declared within the scope of the function), or an argument of that function. An upvalue is a free variable that has been bound (closed over) with a closure.

So, in terms of a closure, a free variable is any variable in scope that is declared outside the closure itself, and is not supplied as an argument. By contrast, arguments and locals are always bound.

Closed over free variables

A closure is said to be closed over its free variables, what does that mean? This means completed by. A closure expression is completed by specifying values for its free variables.

[TBD: hoisting — what a complier emits for free variables.

Closures in ruby

Though blocks are like closures in that they’re closed over their free variables, they’re not closures because they’re not really first class functions — a block cannot be passed around like an object.

A block can be converted to a proc, though. Capture a block as a proc using ampersand:

class Simple
    attr_reader :saved_block

    def initialize()
        yield self if block_given?
    end

    def save_block_for_later(&proc)
        @saved_block = proc
    end
end

And the proc can be assigned like:

var_x= 'x'
simple = Simple.new
simple.save_block_for_later { puts "The current value for var_x = '#{var_x}'."}

This closure can then be invoked at a later time — still bound to its free variables — using call:

simple.saved_block.call

Which prints the text:

"This one has var_x defined as 'x'."

And the same as usual to supply arguments:

simple.save_block_for_later do |an_argument|
    puts "The current value for var_x = '#{var_x}', " +
        "and an_argument has been supplied as '#{an_argument}'."
end

simple.saved_block.call 'xxx'

Funargs

The funarg problem — how to manage variable scoping when dealing with first-class functions.

Stack frames and locals

Traditionally, local variable scope is managed using stack frames.

The idea behind a stack frame is that each subroutine can act independently of its location on the stack, and each subroutine can act as if it is the top of the stack.

When a function is called, a new stack frame is created at the current esp location. A stack frame acts like a partition on the stack. All items from previous functions are higher up on the stack, and should not be modified. Each current function has access to the remainder of the stack, from the stack frame until the end of the stack page. The current function always has access to the “top” of the stack, and so functions do not need to take account of the memory usage of other functions or programs.

In short, functions are allocated temporary storage in a stack frame. This frame stores arguments and local variables. The frame is allocated before the function call, and cleaned up at function exit. The problem arises when a function returns another function.

Normally all local variables are removed with the stack frame, however if a function is returned that references locals, i.e.,  a closure, then these variables have to be kept alive.

CPU registers

[In computer architecture], a processor register is a small amount of storage available on the CPU whose contents can be accessed more quickly than storage available elsewhere.

  • ESP: stack pointer for top address of the stack
  • EBP: stack base pointer for holding the address of the current stack frame

In terms of functions, this article describes the roles of the EBP and ESP registers. The ESP register marks the top of the stack

References

Written by benbiddington

22 June, 2009 at 21:24

Posted in development

Tagged with ,

Book review — Clean Code

with one comment

An excellent book by Bob Martin, with tips on often overlooked fundamentals.

3 — Functions

Functions should:

  • Be small.
  • Do one thing, with no side effects.
  • Do something or answer something, not both (command query separation). A function should either change the state of an object (but not its arguments), or return information about an object. Doing both is confusing.
  • Operate at one level of abstraction.
  • Have as few arguments as possible

Arguments

Arguments are required for a function to do its job. Arguments are parameters describing how a function should operate. Zero argument functions (niladic) are ideal, from both understandability and testability perspectives.

Arguments should:

  • Be at the same level of abstraction as the function
  • Describe input, not output. We expect information to go in to a function through its arguments not out (consider mathematical functions — they have no concept of output arguments). Functions should not, therefore, modify their arguments. Passing a list to a function expecting it to be filled when the function returns is incorrect usage. Plus it violates the “do something or answer something”. [TBD: What about functions that accept Streams and write to them? Is this considered modifying an argument?]
  • Not contain flag arguments. Flag arguments imply the method does more than one thing, anyway. Consider splitting the method in two in this case.

Monadic functions

Two common reasons for passing single argument:

  1. To ask a question about it (e.g., File.Exists(“path”)).
  2. To operate on the argument, transform it and return it (e.g., Stream inStream = File.Open(“path”)).

[TBD: TW anthology describes trying to limit classes to two instance fields, is this similar?]

Argument objects

If a function expects more than two or three arguments, it’s likely that at least some of those should be wrapped in their own class. For example:

Circle createCircle(Int32 x, Int32 y, Int32 radius);

Could be refactored to:

Circle createCircle(Point point, Int32 radius);

This is not cheating, provided the resultant object actually makes sense. In the first version, x and y are ordered components of a single value (or concept). You wouldn’t do the same thing with:

void WriteField(Stream outStream, String name);

Here, Stream and String are not components of the same concept.

Error handling is “one thing”

Consider extracting error handling to its own function — so the one thing it does is handle errors. A function written in this style will start with try and do nothing after its catch/finally. [TBD: Give this one a try]

Arguments or instance variables?

[TBD: How doI tell whether to pass a variable as an argument or add it as an instance member of the object?]

Currying is a way to simplify a function signature, but where should the line be drawn?

Perhaps its worth focusing on the arguments that clients would like to be able to supply.

Should instance members only be used for real object state? If an object uses a variable to perform its functions, surely that qualifies as eligible for instance membership?

6 — Objects and data structures

This was perhaps my favourite section (even though it has that cretinous modern Star Trek character on its title page).

Hiding implementation is about more than defining getters and setters on instance fields — it’s about abstractions.

Consider these interfaces:

// 1
public interface Vehicle {
    double getFuelTankCapacity();
    double getGallonsInTank();
}
// 2
public interface Vehicle {
    double getPercentFuelRemaining();
}

(2) is considered preferable, because it is defining an abstraction, rather than exposing data. [TBD: I am not sure about this, though. Shouldn’t I be able to query for internal state? Shouldn’t I be able to see how much gas my vehicle has?].

The reason (2) is preferred is outlined in the next section, data/object anti-symmetry.

Data/Object anti-symmetry

Objects and data structures and virtual opposites, as described by these anti-symmetry rules:

  • Objects hide their data behind abstractions and expose functions that operate on those abstractions.
  • Data structures expose their data and have no meaningful functions

This section goes on to describe the differences between OO and procedural code, using calculating the area of geometric shapes as an example.

The difference in the two alternatives amounts to where you put your behaviour (functions).

If we followed the antisymmetry rules, we’d add a Geometry class that defined an area function. We would have successfully kept our data structures pure, but we’d have to modify the area function whenever we add a new data structure (which violates the open-closed principle).

Procedural code makes it hard to add data structures

The OO approach forces our shapes to implement a polymorphic area function. This is the way I am most used to, however it has a down side: if we want to add new functions, we have to change all of our data structures.

OO code makes it hard to add functions

Also, we have polluted our data structure with functions — our shapes no longer satisfy the anti-symmetry rules. Our shapes are now hybrids.

This, too, shows that objects and data structures are opposites.

Interesting. The final point in the section is that the idea that everything is an object is a myth — sometimes the procedural approach is applicable.

Bob Martin has written more about this in his post about ActiveRecord. Here he makes the case that an object designed as an active record contains both data and behaviour. By definition, a class like this exposes both its innards, and a persistence abstraction.

The Law of Demeter

So, if objects hide data and expose operations, then an object must not expose its internal structure through accessors [TBD: ?].

A module should not know about the innards of the objects it manipulates.

Note: The term object is important, because the law does not apply to data structures. Data structures are supposed to expose their innards — so we’re free to dig as deep into them as we like.

The Law of Demeter:

A function f of class C should only call the methods of:

  • C
  • An object created by f
  • An object supplied as an argument to f
  • An object held as an instance variable of C

Note: f should not invoke methods on the objects returned from these allowed functions either.

Talk to friends not strangers.

11 — Systems

[TBD: Returned the book already]

Written by benbiddington

22 June, 2009 at 17:49

Posted in development, oop

Tagged with , , ,

The Response.OutputStream floater

leave a comment »

Today we had to recall a release due to a message like:

Not enough memory for operation.

The error occured while trying to emit a large (~800MB) zip file via ASP.NET. Even though I knew buffering was on, I figured that because we were writing directly to HttpResponse.OutputStream that we would somehow bypass the buffering enforced by HttpResponse.

Because we were flushing HttpResponse.OutputStream regularly during processing I thought we were okay. I didn’t realise that HttpResponse.OutputStream was bound by the same buffering policy.

My theory is as follows.

Response.OutputStream.Flush does nothing

Here’s what happens when we call HttpResponse.OutputStream.Flush:

[Pseudo Stacktrace]
HttpWriter.Flush
HttpResponseStream._writer.Flush
HttpResponseStream.Flush
HttpWriter._stream.Flush
HttpWriter.OutputStream.Flush
HttpResponse._httpWriter.OutputStream.Flush
HttpResponse.OutputStream.Flush

And HttpWriter.Flush looks like this:

// HttpWriter
public override void Flush() { }

That’s right: it’s empty. Flushing is completely ignored. I guess this makes sense, HttpResponse maintains full control of its underlying stream.

Flushing depends entirely on HttpResponse.BufferOutput

Here’s what happens when we call HttpResponse.OutputStream.Write:

[Pseudo Stacktrace]
HttpWriter.WriteFromStream
HttpResponseStream._writer.WriteFromStream
HttpResponseStream.Write
HttpWriter._stream.Write
HttpWriter.OutputStream.Write
HttpResponse._httpWriter.OutputStream.Write
HttpResponse.OutputStream.Write

And HttpWriter.WriteFromStream looks like:

// HttpWriter
internal void WriteFromStream(byte[] data, int offset, int size)
{
    if (this._charBufferLength != this._charBufferFree)
    {
        this.FlushCharBuffer(true);
    }

    this.BufferData(data, offset, size, true);

    if (!this._responseBufferingOn)
    {
        this._response.Flush();
    }
}

Therefore, flushing depends on the HttpWriter._responseBufferingOn:

[Pseudo stacktrace]
HttpResponse.BufferOutput
HttpWriter._response.BufferOutput
HttpWriter._responseBufferingOn

Provided we have HttpResponse.Buffer set to false, we should get flushing every time we write to Response.OutputStream.

That was the theory anyway. In practice we experienced high memory consumption still. This seems very weird to me.  By turning buffering off — we should be flushing the response at every write as shown by the last line of HttpWriter.WriteFromStream above. [TBD: Investigate this properly].

Solution

In the end, though, we decided to implement this by explictly calling HttpResponse.Flush whenever Response.OutputStream.Flush is invoked. But how? Passing an HttpResponse into our writer API is just pollution.

Our zip files API are already doing everything correctly — it just happens that one Stream implementation is not working as expected.

We needed to modify what happens when call Stream.Flush. To do this without breaking our abstraction, we created a Stream decorator type.

This type takes a Stream, and an HttpResponse to decorate Stream operations. Mostly this decorator delegates all of its Stream operations, but some are decorated — like Flush. It is here we are invoking HttpResponse.Flush.

The nice thing about this design is that no changes at all were required within the zip API. It is written in terms of an abstraction — Stream. Our decorator is completely transparent, it is just another Stream.

Another option may be to implement an ObservableStream using decoration. This allows a client (our view) to observe a Stream object and respond to changes in state. In this case, we can raise a Flush event, which our view can respond to by flushing the HttpResponse for the request. This was feels cleaner somehow — though we sacrifice some testability.

Related patterns

Are we decorating, or proxying? Well, according to the GoF, we are decorating because we’re adding responsibilities rather than controlling access to an object.

We are decorating a Stream by adding the responsibility of flushing an HTTP response. Or, in the second case, adding the responsibility of notifying observers of state changes on a Stream.

Incidentally

Download dialog is misleading

You may think your download doesn’t start until you confirm — actually it’s already started by this point. You can see this for yourself if you inspect your HTTP connections. Interestingly, Chrome doesn’t seem to offer a dialog at all.

This is because this is just an HTTP response like any other. There is no special callback in place. Once response has been started, your browser is reading it. Cancelling will disconnect though, hence it’s important to check the client is still connected while emitting large files.

[TBD: With that in mind then, what happens to the data if we continue writing but the client has disconnected? Do the bytes just fall out the end? Or doesan error result?]

Response.Buffer == Response.BufferOutput

You can optionally use HttpResponse.Buffer, or HttpResponse.BufferOutput: they do the same thing:

// HttpResponse
public bool Buffer
{
    get
    {
        return this.BufferOutput;
    }
    set
    {
        this.BufferOutput = value;
    }
}

[UPDATE, 2009-10-30] We have encountered an issue which has required us to set buffering on. It seems that we get poorer throughput with buffering off for some reason. There has got to be some reason for this — something in HttpResponse.Flush(Boolean finalFlush) is optimizing.

References

  • Examine .NET assemblies with Reflector
  • KB 812406 — Response.WriteFile cannot download a large file
  • KB 823409 — The hotfix for KB 812406

Written by benbiddington

19 June, 2009 at 08:51

IDisposable and unmanaged memory

leave a comment »

My pair and I had to implement IDisposable the other day, and I had almost forgotten how and why it is done the way it is, so I thought I’d make some notes. An exceptionally clear summary can be found in section 9.3 of Framework Design Guidelines: Conventions, Idioms, and Patterns for Reusable .NET Libraries, which I have used as the basis.

Objects that:

  1. Contain references to unmanaged resources, i.e., objects that don’t have finalizers. These types of objects should also define a finalizer.
  2. — or– contain references to disposable objects.

should always implement IDisposable. Disposable objects offer clients a way to free resources deterministically, rather than whenever the CLR deems it necessary.

Here is a class that contains a simple implementation. It includes a finalizer because it contains a reference to an unmanaged object that doesn’t have its own.

public class UnmanagedResourceHolder : IDisposable {
    IntPtr buffer; // An unmanaged resource
    SafeHandle managedResource;

    public UnmanagedResourceHolder () {
        this.buffer = ... // init buffer
        this.managedResource = ...
    }

    public void Dispose() {
        Dispose(true);

        // Only suppress if Dispose(true) has completed successfully
        // to ensure finalizer gets a chance
        GC.SuppressFinalize(this);
    }

    public ~UnmanagedResourceHolder() {
        Dispose(false);
    }

    protected virtual void Dispose(Boolean disposing) {
        // Can't find reference for the following, assume it's self-explanatory...
        ReleaseBuffer(buffer);

        if (disposing) {
            // Run deterministic cleanup
            if (managedResource != null) {
                managedResource.Dispose();
            }
        }
    }
}

Points to note:

  • Unmanaged resources released on both paths. This ensures deterministic cleanup is available as well as finalizer cleanup.
  • Managed resources are not released during finalizer. This is because managedResource is managed — it will handle its own finalization, plus the next reason.
  • During finalization, (normally valid) assumptions about the internal state of an object are no longer reliable. Finalization occurs in an unpredictable order — for example, the managedResource field may have already been finalized.
  • Provided Dispose() is called, finalization is skipped (though there is still overhead, see below).
  • It is a good idea to provide a protected virtual Dispose to allow derived types to perform their own cleanup.
  • Always invoke super type’s Dispose (if there is one) — for obvious reasons — when overriding in derived type.

A connection pool example

Why is it important to close database connections? Here’s what happens when connection is not explicitly closed:

[Trace]
Audit Login		-- network protocol: TCP/IP
SQL:BatchStarting	SELECT count(1) from User
SQL:BatchCompleted	SELECT count(1) from User
Audit Logout

Here’s what happens when a connection is closed (or finalized):

[Trace]
Audit Login		-- network protocol: TCP/IP...
SQL:BatchStarting	SELECT count(1) from User
SQL:BatchCompleted	SELECT count(1) from User
Audit Logout
RPC:Completed		exec sp_reset_connection

Identical, except that sp_reset_connection is invoked at the end.

In both cases, the connection remains sleeping (process is waiting for a lock or user input):

login_time last_batch hostname cmd status
2009-06-15 09:17:29.590 BENB AWAITING COMMAND sleeping

This behaviour is part of ADO.NET connection pooling. Connections remain ready like this until they are considered surplus (and removed from the pool), or the application exits. You can prove this easily enough yourself, quit your test fixture and then requery your connection state.

It is, therefore, important to close connections from an ADO.NET pooling standpoint. In order to make the in-memory connection available again.

If Open is invoked on a database connection, and there are no free connections available, an InvalidOperationException results with an error message like:

Timeout expired.  The timeout period elapsed prior to obtaining
a connection from the pool. This may have occurred because all pooled
connections were in use and max pool size was reached.

Querying connection states

Examine connections in SqlServer using master.db.sysprocesses:

select login_time, last_batch, hostname, cmd, status
from master.dbo.sysprocesses with(nolock)
where dbid = DB_ID('PersonalWind')

Finalizers

Finalizers are only for unmanaged resources. A finalizer provides a mechanism for releasing unmanaged resources when clients omit explicit disposal. Finalization occurs before the garbage collector reclaims managed memory, and is the last chance for objects to release unmanaged resources.

[MSDN, Object Lifetime: How Objects Are Created and Destroyed] The garbage collector in the CLR does not (and cannot) dispose of unmanaged objects, objects that the operating system executes directly, outside the CLR environment. This is because different unmanaged objects must be disposed of in different ways. That information is not directly associated with the unmanaged object; it must be found in the documentation for the object. A class that uses unmanaged objects must dispose of them in its Finalize method.

Though useful in certain circumstances, finalizers are notoriously difficult to implement, and incur real overhead:

  • [MSDN] When allocated, finalizable objects are added to a finalization list. When these instances are no longer reachable and the GC runs, they’re moved to the “FReachable” queue, which is processed by the finalizer thread. Suppressing finalization with GC.SuppressFinalize sets a “do not run my finalizer” flag in the object’s header, such that the object will not get moved to the FReachable queue by the GC. As a result, while minimal, there is still overhead to giving an object a finalizer even if the finalizer does nothing or is suppressed.
  • When the CLR needs to call a finalizer, it postpones reclamation of managed memory until the next round. This means finalizable objects are longer-lived — they use memory for longer.

Non-determinism

There is no way to predict when a finalizer will be called, because CLR decides when to reclaim memory based dynamically at runtime. Garbage collection is an expensive exercise, and is minimized by design, so memory can persist long after the variables that reference it have dropped out of scope. This may be unacceptable for some systems. Database connection pooling is a prime example of this. Failure to release connections by closing them when they’re no longer required quickly cripples a system.

References

Written by benbiddington

15 June, 2009 at 21:01