XML Stream Performance
Here are the results obtained using a test program
included in the XML Stream (XMLS) download.
The test technique is to first build a dom4j or JDOM
representation of the XML document, then output that
representation repeatedly using either Java
serialization, text, or XMLS, and finally use a copy of
the output to repeatedly reconstruct the representation.
The test times are for the last 10 of 11 total passes
for each operation.
The documents tested are intended to be
representative of a wide range of applications:
- much_ado.xml, the Shakespeare play marked
up as XML. No attributes and a fairly flat structure,
heavy text content (202K bytes).
- periodic.xml, periodic table of the
elements in XML. Some attributes, also fairly flat,
relatively low text (117K bytes).
- soap2.xml, generated list of values in SOAP
document form. Heavy on namespaces and attributes
(134K bytes).
- xml.xml, the XML specification, with the
DTD reference removed and all entities defined inline.
Text-style markup with heavy mixed content, some
attributes (160K bytes).
- build.xml, the Ant build file for this
project. Lots of attributes, low text (5K
bytes).
Several of these documents contained non-significant
whitespace. With the default options XMLS is likely to
handle this type of whitespace with handles, so it may
have received more of an advantage than would otherwise
be the case. The next set of tests will include
documents with non-significant whitespace removed to see
how much this effects the results.
In the XMLS test runs the same adapter instances were
used for each test pass but were reset between passes,
so that each pass started from scratch without any
retained information. Changing the code to create a new
instance of the adapters for each test pass did not
significantly change the results.
The timings shown are from tests using Sun
Microsystems Java version 1.3.1, Java HotSpot(TM) Client
VM 1.3.1-b24, on an Athlon 1GHz system with 256MB of
RAM, running Redhat Linux 7.1, using the default memory
settings.
|
Figure 1. Output
time | |
|
Figure 2. Input
time | |
|
Figure 3. Roundtrip
time | |
|
Figure 4. Output
size | |
As you can see from these results, XMLS gives
dramatic performance improvements over the standard text
XML document format, which itself is much faster than
Java serialization of the document representations.
The size reduction for XMLS is not as great overall
as the time reduction, but still very good considering
that the emphasis is on speed. The different document
types make more of a difference in this area. Heavy text
documents (much_ado.xml and xml.xml in this test) get
much less benefit from the XMLS encoding than more
structure-intensive documents such as soap2.xml and
periodic.xml.
The build.xml file results are difficult to see on
the scale of these graphs, but worth mentioning. This
small file was included in the tests out of concern that
the handle approach used by XMLS might not perform as
well for small files as for larger ones. In terms of
output size, the text output was about 50 percent larger
than the XMLS output and the Java serialized output was
more than 3 times the size, which is not that much
different from the other test results.
The time differences were much more pronounced,
though. The roundtrip time for text using dom4j was more
than 5 times that of XMLS, and for Java serialization
more than 12 times XMLS. JDOM's time for XMLS was about
50 percent longer than dom4j, as with the other
documents, but the text time was about 17 times that of
XMLS and the Java serialization time about 8 times
XMLS.
It appears from these results that text and Java
serialization may have high startup overhead for
relatively small documents. Contrary to initial
expectations, it looks like XMLS may be even better as
an alternative to these formats for small documents than
for larger ones. Beyond this, using XMLS for streams of
documents of the same type is likely to provide even
greater performance
improvements.