Ben Biddington

Whatever it is, it's not about "coding"

Posts Tagged ‘outofmemoryexception

Adobe Content Server — packaging large files is painful

with 5 comments

Ordinarily, working with Adobe Content Server (ACS) is more or less tolerable, but recently we have encountered what may be another indication of the quality of this product.

Packaging

Packaging an ebook amounts to posting a signed request to the server, describing the file you’d like to ingest.

You can use UploadTestJar (which may have its own issues), or (if you’re lucky) you can rewrite some sample codes in your chosen language and use that.

You’d think then, that once you’ve got it working you can summarily be on your way, forget about ACS and finish your application.

And you can.

Until the OutOfMemoryExceptions start

While test-driving our application, we naturally wanted to describe what happens with different sizes of files, so we tried some large ones. These would fail with errors about being out of heap space, errors like:

21-Apr-2010 13:37:26 org.apache.catalina.core.StandardWrapperValve invoke
at com.adobe.adept.servlet.AdeptServlet.doPost(AdeptServlet.java:180)
SEVERE: Servlet.service() for servlet Package threw exception
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Unknown Source)
at java.lang.AbstractStringBuilder.expandCapacity(Unknown Source)
at java.lang.AbstractStringBuilder.append(Unknown Source)
at java.lang.StringBuffer.append(Unknown Source)
at com.adobe.adept.xml.XMLAbstractDigestSink.characters(XMLAbstractDigestSink.java:133)
at com.adobe.adept.xml.XMLSink.characters(XMLSink.java:261)
at com.adobe.adept.xml.XMLFieldReader.characters(XMLFieldReader.java:447)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.characters(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
at com.adobe.adept.servlet.AdeptServlet.doPost(AdeptServlet.java:180)

We tried adjusting the available memory for “sags” (as Dan Rough would put it), and this did work to a certain degree, but is not a satisfactory solution.

To me it looks a bit like an attempt to load entire file into memory at once. Surely this can’t be right, can it?

Examining AdeptServlet

In an effort to understand the nature of the problem before solving it, we decided to have a look at that servlet. We decompiled it and set about finding that class, and it is:

com.adobe.adept.packaging.servlet.Package

In the doPost method, there are these lines:

if (paramParsedRequest.data != null) {
    localObject1 = new PDFPackager(paramParsedRequest.data);
} else {
    localObject1 = new PDFPackager(new File(paramParsedRequest.dataPath));
}

This shows the two methods of loading a PDFPackager.

Examining PDfPackager, we can see the ctor has four overloads including these two:

public PDFPackager(byte[] paramArrayOfByte) throws Exception {
    this(new ByteBufferByteReader(paramArrayOfByte));
}

public PDFPackager(File paramFile) throws Exception {
    this(new FileInputStream(paramFile));
}

So, it appears the problem may result from usage of the first version.

That  paramParsedRequest argument to doPost is of type ParsedRequest, and its data property is a Byte array.

This could be a problem: when submitting a package request with a data node instead of a dataPath node, we’re using the byte array overload.

Where is the error actually coming from?

From the stacktrace it looks as though it is coming from whatever is creating the arguments to supply to Package.doPost.

This is the responsibility of Package‘s supertype: AdeptServlet<RequestParser>. It is this class that is responsible for parsing the http request into one of those ParsedRequest objects, and then supplying that to Package.doPost.

The problems starts here at the top level request handling:

// AdeptServlet<RequestParser>
doPost(HttpServletRequest paramHttpServletRequest, HttpServletResponse paramHttpServletResponse)

This is where the request parsing happens, and then — as the stack trace shows — an error ends up resulting from XMLAbstractDigestSink.characters.

XMLAbstractDigestSink.characters attempts to append data to an internal StringBuffer.

Summary

This mechanism has not been designed in any kind of scalable manner — buffering files in memory is utterly nuts.

Why not just write the posted data to a temp file and use the other PDFPackager ctor?

The solution

Well, one suggestion is to not post the files at all, but make a slightly different packaging request that supplies a path to a file on disk rather than the file itself.

To do so requires — as described in ContentServer_Technical_Reference.pdf — supplying a dataPath node in your request instead of a data node.

The downside for us is that now we need to manage this shared file location — a non-trivial task when working with Windows services.

Another (unlikely) solution

Modify the application, i.e., AdeptServlet<RequestParser> so it first copies the posted file to disk, and then proceeds as though it received a dataPath request.

Pretty hard without the source — it’s probably actually against the law, is it?

References

Written by benbiddington

27 April, 2010 at 14:00