java | Ben Biddington

Posts Tagged ‘java’

Adobe Content Server — packaging large files is painful

Ordinarily, working with Adobe Content Server (ACS) is more or less tolerable, but recently we have encountered what may be another indication of the quality of this product.

Packaging

Packaging an ebook amounts to posting a signed request to the server, describing the file you’d like to ingest.

You can use UploadTestJar (which may have its own issues), or (if you’re lucky) you can rewrite some sample codes in your chosen language and use that.

You’d think then, that once you’ve got it working you can summarily be on your way, forget about ACS and finish your application.

And you can.

Until the OutOfMemoryExceptions start

While test-driving our application, we naturally wanted to describe what happens with different sizes of files, so we tried some large ones. These would fail with errors about being out of heap space, errors like:

21-Apr-2010 13:37:26 org.apache.catalina.core.StandardWrapperValve invoke
at com.adobe.adept.servlet.AdeptServlet.doPost(AdeptServlet.java:180)
SEVERE: Servlet.service() for servlet Package threw exception
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Unknown Source)
at java.lang.AbstractStringBuilder.expandCapacity(Unknown Source)
at java.lang.AbstractStringBuilder.append(Unknown Source)
at java.lang.StringBuffer.append(Unknown Source)
at com.adobe.adept.xml.XMLAbstractDigestSink.characters(XMLAbstractDigestSink.java:133)
at com.adobe.adept.xml.XMLSink.characters(XMLSink.java:261)
at com.adobe.adept.xml.XMLFieldReader.characters(XMLFieldReader.java:447)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.characters(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
at com.adobe.adept.servlet.AdeptServlet.doPost(AdeptServlet.java:180)

We tried adjusting the available memory for “sags” (as Dan Rough would put it), and this did work to a certain degree, but is not a satisfactory solution.

To me it looks a bit like an attempt to load entire file into memory at once. Surely this can’t be right, can it?

Examining AdeptServlet

In an effort to understand the nature of the problem before solving it, we decided to have a look at that servlet. We decompiled it and set about finding that class, and it is:

com.adobe.adept.packaging.servlet.Package

In the doPost method, there are these lines:

if (paramParsedRequest.data != null) {
    localObject1 = new PDFPackager(paramParsedRequest.data);
} else {
    localObject1 = new PDFPackager(new File(paramParsedRequest.dataPath));
}

This shows the two methods of loading a PDFPackager.

Examining PDfPackager, we can see the ctor has four overloads including these two:

public PDFPackager(byte[] paramArrayOfByte) throws Exception {
    this(new ByteBufferByteReader(paramArrayOfByte));
}

public PDFPackager(File paramFile) throws Exception {
    this(new FileInputStream(paramFile));
}

So, it appears the problem may result from usage of the first version.

That paramParsedRequest argument to doPost is of type ParsedRequest, and its data property is a Byte array.

This could be a problem: when submitting a package request with a data node instead of a dataPath node, we’re using the byte array overload.

Where is the error actually coming from?

From the stacktrace it looks as though it is coming from whatever is creating the arguments to supply to Package.doPost.

This is the responsibility of Package‘s supertype: AdeptServlet<RequestParser>. It is this class that is responsible for parsing the http request into one of those ParsedRequest objects, and then supplying that to Package.doPost.

The problems starts here at the top level request handling:

// AdeptServlet<RequestParser>
doPost(HttpServletRequest paramHttpServletRequest, HttpServletResponse paramHttpServletResponse)

This is where the request parsing happens, and then — as the stack trace shows — an error ends up resulting from XMLAbstractDigestSink.characters.

XMLAbstractDigestSink.characters attempts to append data to an internal StringBuffer.

Summary

This mechanism has not been designed in any kind of scalable manner — buffering files in memory is utterly nuts.

Why not just write the posted data to a temp file and use the other PDFPackager ctor?

The solution

Well, one suggestion is to not post the files at all, but make a slightly different packaging request that supplies a path to a file on disk rather than the file itself.

To do so requires — as described in ContentServer_Technical_Reference.pdf — supplying a dataPath node in your request instead of a data node.

The downside for us is that now we need to manage this shared file location — a non-trivial task when working with Windows services.

Another (unlikely) solution

Modify the application, i.e., AdeptServlet<RequestParser> so it first copies the posted file to disk, and then proceeds as though it received a dataPath request.

Pretty hard without the source — it’s probably actually against the law, is it?

References

Adobe forum discussion — OutOfMemoryError while Packaging big e-books
ContentServer_Technical_Reference.pdf (this comes as part of the installation, can’t find it on the internet)

Written by benbiddington

27 April, 2010 at 14:00

Posted in development

Tagged with adobe, content_server, java, outofmemoryexception

Scala — Futures

Why does future block until actor returns value?

This is because it blocks on the channel, waiting for reply:

...
def apply() =
    if (isSet) value.get
    else ch.receive {
        case any => value = Some(any); any
    }
...

and Channel.receive is a ultimately a blocking operation, since it invokes receive on the actor it belongs to:

...
def receive[R](f: PartialFunction[Msg, R]): R = {
    val C = this.asInstanceOf[Channel[Any]]
    recv.receive {
        case C ! msg if (f.isDefinedAt(msg.asInstanceOf[Msg])) => f(msg.asInstanceOf[Msg])
    }
}
...

Note that recv here is the Actor supplied in Channel ctor.

Consider this example:

val aFuture = future[String] {
    currentThreadId
};

Internally, a new actor is created and has its double bang invoked:

def future[T](body: => T): Future[T] = {
    case object Eval
    val a = Actor.actor {
        Actor.react {
            case Eval => Actor.reply(body)
        }
    }
    a !! (Eval, { case any => any.asInstanceOf[T] })
}

And by examining the apply method above, we know this blocks until a message is received from channel.

Written by benbiddington

24 April, 2010 at 13:37

Posted in development

Tagged with actor, async, java, scala

Serialization rules for Adobe Content Server

with 31 comments

Working with Adobe Content Server can be a truly depressing experience. The recommendation is to use a jar file — UploadTestJar — written by Adobe to perform HTTP RPC operations against the Content Server.

Problem is that UploadTestJar only does uploads, but we need full control, like deletes for example. Porting the java is possible, but it’s some of the most poorly written crap I have ever seen, and finding a specification is resisting web search.

Finally we managed to get a description from the support staff which’ll be helpful if you’re intending to port that awful UploadTestJar mess.

All adjacent text nodes are collapsed and their leading and trailing whitespace is removed.
Zero-length text nodes are removed.
Signature elements in Adept namespace are removed.
Attributes are sorted first by their namespaces and then by their names; sorting is done byte wise on UTF-8 representations.
1. If attributes have no namespace insert a 0 length string (i.e. 2 bytes of 0) for the namespace
Strings are serialized by writing two-byte length (in big endian order) of the UTF-8 representation and then UTF-8 representation itself
Long strings (longer than 0x7FFF) are broken into chunks: first as many strings of the maximum length 0x7FFF as needed, then the remaining string. This is done on the byte level, irrespective of the UTF-8 boundary.
Text nodes (text and CDATA) are serialized by writing TEXT_NODE byte and then text node value.
Attributes are serialized by writing ATTRIBUTE byte, then attribute namespace (empty string if no namespace), attribute name, and attribute value.
Elements are serialized by writing BEGIN_ELEMENT byte, then element namespace, element name, all attributes END_ATTRIBUTES byte, all children, END_ELEMENT byte.

This list is in actually the javadocs for the XmlUtil class. Why it’s all lumped in there is anybody’s guess. The serialization as described above is mostly implemented by one very long method in (1000+ line) XmlUtil.java: Eater.eatNode.

Note: The values of the constants BEGIN_ELEMENT etc are listed in the XMLUtil class.

Why I consider UploadTestJar poorly written

Here are some things I’ve noticed:

Nothing reads like a narrative, i.e. , methods call other methods that occur before it in the file — makes files very hard to follow.
Too many comments. I know this is a java idiom, but it make reading the stuff that matter more difficult
Idiotic comments: inline comments that state the obvious and are just noise. e.g.:// retrieve HMAC key and run a raw SHA1 HASH on it.
byte[] hmacKeyBytesSHA1 = XMLUtil.SHA1(getHmacKey());
XMLUtil.java contains several classes
XMLUtil class does more than one thing:
- Parses XML
- Normalizes XML
- Creates XML documents
- Serializes XML, dates, bytes and strings
- Checks signatures
- Signs XML documents
- Hashes things
Class UploadTest does everything in ctor: reads a file from disk, validates it, makes some xml, signs it and then posts it to the server.
UploadTest the main entry point for executable, and it contains all the behaviour — it’s 1600 lines long
Cannot use UploadTest without a real epub file
UploadTest does too many things:
- Ctor does too many things
  - Handles command line input
  - Displays help/usage
  - Asserts a file on disk has been supplied
  - “Makes” content
  - - makeContent requires a file an epub on disk
    - makeContent loads xml
    - makeContent assembles xml files
    - makeContent hashes things
    - makeContent swallows errors and writes to stdout
- “Sends” content via HTTP
- Methods that do too many things, e.g., if/else branches based on the verboseDisplay flag

Written by benbiddington

16 February, 2010 at 10:39

Posted in development

Tagged with adobe, content_server, howto, java, rant, shame, very_very_poor

Ben Biddington

Posts Tagged ‘java’

Adobe Content Server — packaging large files is painful

Packaging

Until the OutOfMemoryExceptions start

Examining AdeptServlet

Where is the error actually coming from?

Summary

The solution

Another (unlikely) solution

References

Scala — Futures

Why does future block until actor returns value?

Serialization rules for Adobe Content Server

Why I consider UploadTestJar poorly written

Recent Posts

Archive

Photos

Delicious

Meta

Top Posts

Twitter