Ben Biddington

Whatever it is, it's not about "coding"

Posts Tagged ‘java

Adobe Content Server — packaging large files is painful

with 5 comments

Ordinarily, working with Adobe Content Server (ACS) is more or less tolerable, but recently we have encountered what may be another indication of the quality of this product.

Packaging

Packaging an ebook amounts to posting a signed request to the server, describing the file you’d like to ingest.

You can use UploadTestJar (which may have its own issues), or (if you’re lucky) you can rewrite some sample codes in your chosen language and use that.

You’d think then, that once you’ve got it working you can summarily be on your way, forget about ACS and finish your application.

And you can.

Until the OutOfMemoryExceptions start

While test-driving our application, we naturally wanted to describe what happens with different sizes of files, so we tried some large ones. These would fail with errors about being out of heap space, errors like:

21-Apr-2010 13:37:26 org.apache.catalina.core.StandardWrapperValve invoke
at com.adobe.adept.servlet.AdeptServlet.doPost(AdeptServlet.java:180)
SEVERE: Servlet.service() for servlet Package threw exception
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Unknown Source)
at java.lang.AbstractStringBuilder.expandCapacity(Unknown Source)
at java.lang.AbstractStringBuilder.append(Unknown Source)
at java.lang.StringBuffer.append(Unknown Source)
at com.adobe.adept.xml.XMLAbstractDigestSink.characters(XMLAbstractDigestSink.java:133)
at com.adobe.adept.xml.XMLSink.characters(XMLSink.java:261)
at com.adobe.adept.xml.XMLFieldReader.characters(XMLFieldReader.java:447)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.characters(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
at com.adobe.adept.servlet.AdeptServlet.doPost(AdeptServlet.java:180)

We tried adjusting the available memory for “sags” (as Dan Rough would put it), and this did work to a certain degree, but is not a satisfactory solution.

To me it looks a bit like an attempt to load entire file into memory at once. Surely this can’t be right, can it?

Examining AdeptServlet

In an effort to understand the nature of the problem before solving it, we decided to have a look at that servlet. We decompiled it and set about finding that class, and it is:

com.adobe.adept.packaging.servlet.Package

In the doPost method, there are these lines:

if (paramParsedRequest.data != null) {
    localObject1 = new PDFPackager(paramParsedRequest.data);
} else {
    localObject1 = new PDFPackager(new File(paramParsedRequest.dataPath));
}

This shows the two methods of loading a PDFPackager.

Examining PDfPackager, we can see the ctor has four overloads including these two:

public PDFPackager(byte[] paramArrayOfByte) throws Exception {
    this(new ByteBufferByteReader(paramArrayOfByte));
}

public PDFPackager(File paramFile) throws Exception {
    this(new FileInputStream(paramFile));
}

So, it appears the problem may result from usage of the first version.

That  paramParsedRequest argument to doPost is of type ParsedRequest, and its data property is a Byte array.

This could be a problem: when submitting a package request with a data node instead of a dataPath node, we’re using the byte array overload.

Where is the error actually coming from?

From the stacktrace it looks as though it is coming from whatever is creating the arguments to supply to Package.doPost.

This is the responsibility of Package‘s supertype: AdeptServlet<RequestParser>. It is this class that is responsible for parsing the http request into one of those ParsedRequest objects, and then supplying that to Package.doPost.

The problems starts here at the top level request handling:

// AdeptServlet<RequestParser>
doPost(HttpServletRequest paramHttpServletRequest, HttpServletResponse paramHttpServletResponse)

This is where the request parsing happens, and then — as the stack trace shows — an error ends up resulting from XMLAbstractDigestSink.characters.

XMLAbstractDigestSink.characters attempts to append data to an internal StringBuffer.

Summary

This mechanism has not been designed in any kind of scalable manner — buffering files in memory is utterly nuts.

Why not just write the posted data to a temp file and use the other PDFPackager ctor?

The solution

Well, one suggestion is to not post the files at all, but make a slightly different packaging request that supplies a path to a file on disk rather than the file itself.

To do so requires — as described in ContentServer_Technical_Reference.pdf — supplying a dataPath node in your request instead of a data node.

The downside for us is that now we need to manage this shared file location — a non-trivial task when working with Windows services.

Another (unlikely) solution

Modify the application, i.e., AdeptServlet<RequestParser> so it first copies the posted file to disk, and then proceeds as though it received a dataPath request.

Pretty hard without the source — it’s probably actually against the law, is it?

References

Written by benbiddington

27 April, 2010 at 14:00

Scala — Futures

leave a comment »

A future is a placeholder for the return value of an asynchronous operation, it’s left to clients to decide when to block and wait for reply value.

It is an alternative to blocking on receive.

For example, the double-bang on Actor causes operation to return a future:

   /**
   * Sends msg to this actor and immediately
   * returns a future representing the reply value.
   */
  def !!(msg: Any): Future[Any] = {
    val ftch = new Channel[Any](Actor.self)
    send(msg, ftch)
    new Future[Any](ftch) {
      def apply() =
        if (isSet) value.get
        else ch.receive {
          case any => value = Some(any); any
        }
      def isSet = value match {
        case None => ch.receiveWithin(0) {
          case TIMEOUT => false
          case any => value = Some(any); true
        }
        case Some(_) => true
      }
    }
  }

Which can then be used to obtain the reply.

The Future class takes an InputChannel as its ctor argument. This channel is monitored to determine the future’s completion status.

In this instance, the future by !! is configured with the reply channel as supplied to send. In short, the actor has sent itself a message and specified that the future’s channel should be notified when complete. The future then just monitors that channel for the reply.

Note: Actor.send invokes the act method using a Reaction, which spawns threads and runs actors.

Only the actor creating an instance of a Channel may receive from it. This means that the future here must be running on the same thread as the actor that created it.

The send call is instructing the reply to be returned to the channel being monitored by the future.

Why does future block until actor returns value?

This is because it blocks on the channel, waiting for reply:

...
def apply() =
    if (isSet) value.get
    else ch.receive {
        case any => value = Some(any); any
    }
...

and Channel.receive is a ultimately a blocking operation, since it invokes receive on the actor it belongs to:

...
def receive[R](f: PartialFunction[Msg, R]): R = {
    val C = this.asInstanceOf[Channel[Any]]
    recv.receive {
        case C ! msg if (f.isDefinedAt(msg.asInstanceOf[Msg])) => f(msg.asInstanceOf[Msg])
    }
}
...

Note that recv here is the Actor supplied in Channel ctor.

Consider this example:

val aFuture = future[String] {
    currentThreadId
};

Internally, a new actor is created and has its double bang invoked:

def future[T](body: => T): Future[T] = {
    case object Eval
    val a = Actor.actor {
        Actor.react {
            case Eval => Actor.reply(body)
        }
    }
    a !! (Eval, { case any => any.asInstanceOf[T] })
}

And by examining the apply method above, we know this blocks until a message is received from channel.

Written by benbiddington

24 April, 2010 at 13:37

Posted in development

Tagged with , , ,

Serialization rules for Adobe Content Server

with 31 comments

Working with Adobe Content Server can be a truly depressing experience. The recommendation is to use a jar file — UploadTestJar — written by Adobe to perform HTTP RPC operations against the Content Server.

Problem is that UploadTestJar only does uploads, but we need full control, like deletes for example. Porting the java is possible, but it’s some of the most poorly written crap I have ever seen, and finding a specification is resisting web search.

Finally we managed to get a description from the support staff which’ll be helpful if you’re intending to port that awful UploadTestJar mess.

  1. All adjacent text nodes are collapsed and their leading and trailing whitespace is removed.
  2. Zero-length text nodes are removed.
  3. Signature elements in Adept namespace are removed.
  4. Attributes are sorted first by their namespaces and then by their names; sorting is done byte wise on UTF-8 representations.
    1. If attributes have no namespace insert a 0 length string (i.e. 2 bytes of 0) for the namespace
  5. Strings are serialized by writing two-byte length (in big endian order) of the UTF-8 representation and then UTF-8 representation itself
  6. Long strings (longer than 0x7FFF) are broken into chunks: first as many strings of the maximum length 0x7FFF as needed, then the remaining string. This is done on the byte level, irrespective of the UTF-8 boundary.
  7. Text nodes (text and CDATA) are serialized by writing TEXT_NODE byte and then text node value.
  8. Attributes are serialized by writing ATTRIBUTE byte, then attribute namespace (empty string if no namespace), attribute name, and attribute value.
  9. Elements are serialized by writing BEGIN_ELEMENT byte, then element namespace, element name, all attributes END_ATTRIBUTES byte, all children, END_ELEMENT byte.

This list is in actually the javadocs for the XmlUtil class. Why it’s all lumped in there is anybody’s guess. The serialization as described above is mostly implemented by one very long method in (1000+ line) XmlUtil.java: Eater.eatNode.

Note: The values of the constants BEGIN_ELEMENT etc are listed in the XMLUtil class.

Why I consider UploadTestJar poorly written

Here are some things I’ve noticed:

  • Nothing reads like a narrative, i.e. , methods call other methods that occur before it in the file — makes files very hard to follow.
  • Too many comments. I know this is a java idiom, but it make reading the stuff that matter more difficult
  • Idiotic comments: inline comments that state the obvious and are just noise. e.g.:// retrieve HMAC key and run a raw SHA1 HASH on it.
    byte[] hmacKeyBytesSHA1 = XMLUtil.SHA1(getHmacKey());
  • XMLUtil.java contains several classes
  • XMLUtil class does more than one thing:
    • Parses XML
    • Normalizes XML
    • Creates XML documents
    • Serializes XML, dates, bytes and strings
    • Checks signatures
    • Signs XML documents
    • Hashes things
  • Class UploadTest does everything in ctor: reads a file from disk, validates it, makes some xml, signs it and then posts it to the server.
  • UploadTest the main entry point for executable, and it contains all the behaviour — it’s 1600 lines long
  • Cannot use UploadTest without a real epub file
  • UploadTest does too many things:
    • Ctor does too many things
      • Handles command line input
      • Displays help/usage
      • Asserts a file on disk has been supplied
      • “Makes” content
        • makeContent requires a file an epub on disk
        • makeContent loads xml
        • makeContent assembles xml files
        • makeContent hashes things
        • makeContent swallows errors and writes to stdout
    • “Sends” content via HTTP
    • Methods that do too many things, e.g., if/else branches based on the verboseDisplay flag

Written by benbiddington

16 February, 2010 at 10:39

Creating a wave robot

leave a comment »

Download Eclipse Java EE IDE, and follow the instructions for getting all of the AppEngine plugins. It’s important to get these because it simplifies the deployment process to single click.

  1. Create a new web application project.
  2. Download all of the Wave Robot Java Client Library jars from here.
    1. Copy them to your /war/WEB-INF/lib directory.
    2. Reference them (Alt + Enter > Java Build Path > Libraries > Add JARS…).
      If successful, they’ll appear in a Referenced Libraries node in Package Explorer.

Edit web.xml

Add an endpoint for Wave to post to:

<servlet>
    <servlet-name>SearchServlet</servlet-name>
    <servlet-class>org.coriander.wave.sevendigital.servlets.search.SearchServlet</servlet-class>
</servlet>
<servlet-mapping>
    <servlet-name>SearchServlet</servlet-name>
    <url-pattern>/_wave/robot/jsonrpc</url-pattern>
</servlet-mapping>

We now have a handler for incoming wave requests. All wave requests will be routed to SearchServlet.

Add capabilities.xml

Registers servlets for wave events, add one to your /war/_wave directory:

<?xml version="1.0" encoding="utf-8"?>
<w:robot xmlns:w="http://wave.google.com/extensions/robots/1.0">
  <w:capabilities>
    <w:capability name="WAVELET_PARTICIPANTS_CHANGED" content="true" />
    <w:capability name="BLIP_SUBMITTED" content="true" />
  </w:capabilities>
  <w:version>1</w:version>
</w:robot>

Implementing a robot servlet

Very easy, derive from com.google.wave.api.AbstractRobotServlet, which has a single method:

import com.google.wave.api.AbstractRobotServlet;
import com.google.wave.api.RobotMessageBundle;

public class SearchServlet extends AbstractRobotServlet {
    @Override
    public void processEvents(RobotMessageBundle bundle) { }
}

There are numerous tutorials around describing what to do next.

Setting robot profile image

This amounts to adding the root /_wave/robot/profile endpoint. There easiest way to do this is to derive a servlet from com.google.wave.api.ProfileServlet, and override any of the methods you choose.

To set the avatar image, override getRobotAvatarUrl:

import com.google.wave.api.ProfileServlet;
...
public class Profile extends ProfileServlet {
    @Override
    public String getRobotAvatarUrl() {
        return "http://coriander-7digital.appspot.com/_wave/coriander-7digital.png;
    }
}

And then place your image your /war/_wave directory. There are other useful settings like display name which can also be provided by override. The default content type of ProfileServlet-derive types is application/json.

Don’t forget to register the new servlet by updating web.xml.

Deployment

Very simple, it’s all done by the IDE.

References

Written by benbiddington

18 October, 2009 at 10:53

Scala introduction — writing an OAuth library

leave a comment »

I started out intending to write some scala examples against the twitter API, however I soon discovered I needed OAuth first. Given that I use OAuth all the time at work I figured I could probably do with learning about it first-hand, while learning scala.

org.junit.rules._

I chose to test drive it with JUnit 4.7 and NetBeans.

NetBeans works almost immediately with scala, and has support for project templates etc — even scala JUnit fixtures.

UPDATE (2010-04-27) I have since discovered IntelliJ to be much better, and there is now a free community edition. IntelliJ supports scala without any fiddling around.

JUnit mostly works, though rules don’t and neither do some matchers. Even though rules don’t work, I have included it anyway because I have the t-shirt.

You can find the project on github.

Important abstractions

  1. SignatureBaseString.
    1. Characterized by three ampersand-separated segments: verb, uri, parameters.
    2. URL Encoding must conform to RFC 3986, and the following characters should are consider unreserved so should not be encoded:
      ALPHA, DIGIT, ‘-‘, ‘.’, ‘_’, ‘~’
  2. Signature.
    1. Signature is a keyed-Hash Message Authentication Code (HMAC).
    2. Consumer secret required part of HMAC secret key.
    3. Token secret is optionally included in HMAC secret key:
      (consumer_secret, token_secret) => uri_encoded_consumer_secret&[uri_encoded_token_secret]
  3. OAuthCredential. Represents the secret key(s) used to create the HMAC signature. OAuth requires a consumer credential, and optionally a token credential, representing the end user.

Now that these core concepts are complete, I am working on high-level policy, like classes for generating signed URLs and authorization headers.

Notes

JUnit — expecting exceptions in scala

Assuming JUnit 4.x, a test can expect an exception using the test annotation:

Java:

@Test(expected=IllegalArgumentException.class)
    public void ExampleThrowsException(){
        throw new IllegalArgumentException();
    }

This needs to be modified for scala:

Scala:

@Test { val expected=classOf[IllegalArgumentException] }
    def ExampleThrowsException {
        throw new IllegalArgumentException
    }

The reason for it is outlined here in the Java annotations section on named parameters.

Here is the documentation for scala annotations. Seealso: the documentation for scala 2.7.3 (includes dbc).

Closures and return

The return statement immediately returns from the current method, even if you’re within a closure. Omit return in this case — return is optional anyway.

When to use semicolon line terminator

Never — apart from:

  • When a method returns Unit (equivalent to void) and you aren’t using return keyword. [TBD: Add example].

How to use blocks

var count = 1
times(2) { println("Printed " + count + " times")}
protected def times(count : Int)(block : => Unit) = {
    1.to(count).foreach((_) => block)
}

Seealso: some executable examples on github

References

ALPHA, DIGIT, '-', '.', '_', '~'

Written by benbiddington

18 September, 2009 at 13:37

Posted in development

Tagged with , , , , , , , ,