Welcome!

Related Topics: Industrial IoT, Microservices Expo

Industrial IoT: Article

Index XML Documents with VTD-XML

How to turn the indexing capability on in your application

Applications Scenarios
There are at least two different views to make sense of VTD+XML as a practical solution to real problems. The first is a traditional view of native XML indexing. Alternatively, you can think of VTD+XML as a binary data format backwards-compatible with XML.

Native XML Indexing
In this view, you simply use VTD+XML as the basis for native XML data stores that serve the backend data needs of XML/SOA applications. By saving it as a BLOB (Binary Large OBject) in a more traditional database table, you obtain the additional capabilities such as concurrency and data integrity and replication. Being vastly superior to the awkward shredding-based XML to relational data mapping, VTD+XML fits exceptionally well in a pure XML/SOA environment. Have a lot of XBRL (Extensible Business Reporting Language) documents, or those big GML (Geography Markup Language) files? VTD+XML should equip you with horsepower never before available.

Binary Enhanced XML
VTD+XML also naturally extends the core capabilities of XML by boosting its processing efficiency to a whole new level. In other words, as a wire format, XML now has it all: not only is it easy to learn, human-readable, interoperable, and loosely encoded by design, performance-wise it also leads CORBA, DCOM, and RMI by a mile. When applied to XML pipelining, VTD+XML can potentially eliminate the repetitive parsing at each stage of the pipeline - an issue none of the existing XML pipeline specs (e.g., XProc and the XML pipeline definition language) address.

If it takes too long for you to push large documents over your DOM-based ESB (Enterprise Services Bus), how does 100MB around a single second sound?

Benchmark
This section shows you quantitatively the performance gain achievable using VTD+XML. The benchmark code measures the combined latency of VTD+XML index-loading (as in VTD-XML 2.0) and XPath evaluation of a specified number of nodes (the first five nodes in the set) in the result nodeset. The same code is also rewritten using the Xerces DOM parser and Xalan or Jaxen, both of which are popular XPath engines. The benchmark code used for the test can be downloaded here.

Setup
The environment for the benchmark has the following setup:

  • Hardware: A Sony VAIO notebook featuring a 1.7GHz Pentium M processor with 2MB of integrated cache memory, 512MB of DDR2 RAM, and a 400MHz front-side bus.
  • OS/JVM setting: The notebook runs Windows XP, and the test applications are obtained from version 1.6.0.6-b105 of JDK/JVM.
  • XML parsers and XPath engines: The DOM code uses both Xalan (bundled in the JDK) and Jaxen over Xerces DOM (full node expansion). VTD-XML, on the other hand, uses the built-in XPath engine.
To reduce timing variations due to I/O, the benchmark programs first read XML files into the memory buffer prior to the test runs and output XML files into an in-memory byte array output stream. The server JVM is used to get peak performance. All input/output streams are reused whenever possible.

Three XML files of similar structure, but different sizes, are used for the test.

<?xml version="1.0"?>
<purchaseOrder orderDate="1999-10-20">
   <shipTo country="US">
     <name>Alice Smith</name>
     <street>123 Maple Street</street>
     <city>Mill Valley</city>
     <state>CA</state>
     <zip>90952</zip>
   </shipTo>
   <billTo country="US">
     <name> Robert Smith </name>
     <street>8 Oak Avenue</street>
     <city>Old Town</city>
     <state>PA</state>
     <zip>95819</zip>
   </billTo>
   <comment>Hurry, my lawn is going wild!</comment>
   <items>
     <item partNum="872-AA">
       <productName>Lawnmower</productName>
       <quantity></quantity>
       <USPrice>148.95</USPrice>
       <comment>Confirm this is electric</comment>
     </item>
     <item partNum="926-AA">
       <productName>Baby Monitor</productName>
       <quantity>1</quantity>
       <USPrice>39.98</USPrice>
       <shipDate>1999-05-21</shipDate>
     </item>
     ...
   </items>
</purchaseOrder>

The respective file sizes are:

  • "po_small.xml" ---- 6780 bytes
  • "po_medium.xml" ---- 112,238 bytes
  • "po_big.xml" ----- 1,219,388 bytes

    The following XPath expressions are used for the test

  • /*/*/*[position() mod 2 = 0]
  • /purchaseOrder/items/item[USPrice<100]
  • /*/*/*/quantity/text()
  • //item/comment
  • //item/comment/../quantity

More Stories By Jimmy Zhang

Jimmy Zhang is a cofounder of XimpleWare, a provider of high performance XML processing solutions. He has working experience in the fields of electronic design automation and Voice over IP for a number of Silicon Valley high-tech companies. He holds both a BS and MS from the department of EECS from U.C. Berkeley.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


Latest Stories
SYS-CON Events announced today that Super Micro Computer, Inc., a global leader in compute, storage and networking technologies, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Supermicro (NASDAQ: SMCI), the leading innovator in high-performance, high-efficiency server technology, is a premier provider of advanced server Building Block Solutions® for Data Center, Cloud Computing, Enterprise IT, Hadoop/...
With major technology companies and startups seriously embracing IoT strategies, now is the perfect time to attend @ThingsExpo 2016 in New York. Learn what is going on, contribute to the discussions, and ensure that your enterprise is as "IoT-Ready" as it can be! Internet of @ThingsExpo, taking place June 6-8, 2017, at the Javits Center in New York City, New York, is co-located with 20th Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry p...
All organizations that did not originate this moment have a pre-existing culture as well as legacy technology and processes that can be more or less amenable to DevOps implementation. That organizational culture is influenced by the personalities and management styles of Executive Management, the wider culture in which the organization is situated, and the personalities of key team members at all levels of the organization. This culture and entrenched interests usually throw a wrench in the work...
Judith Hurwitz is president and CEO of Hurwitz & Associates, a Needham, Mass., research and consulting firm focused on emerging technology, including big data, cognitive computing and governance. She is co-author of the book Cognitive Computing and Big Data Analytics, published in 2015. Her Cloud Expo session, "What Is the Business Imperative for Cognitive Computing?" is scheduled for Wednesday, June 8, at 8:40 a.m. In it, she puts cognitive computing into perspective with its value to the busin...
NHK, Japan Broadcasting, will feature the upcoming @ThingsExpo Silicon Valley in a special 'Internet of Things' and smart technology documentary that will be filmed on the expo floor between November 3 to 5, 2015, in Santa Clara. NHK is the sole public TV network in Japan equivalent to the BBC in the UK and the largest in Asia with many award-winning science and technology programs. Japanese TV is producing a documentary about IoT and Smart technology and will be covering @ThingsExpo Silicon Val...
SYS-CON Events announced today that CollabNet, a global leader in enterprise software development, release automation and DevOps solutions, will be a Bronze Sponsor of SYS-CON's 20th International Cloud Expo®, taking place from June 6-8, 2017, at the Javits Center in New York City, NY. CollabNet offers a broad range of solutions with the mission of helping modern organizations deliver quality software at speed. The company’s latest innovation, the DevOps Lifecycle Manager (DLM), supports Value S...
The 20th International Cloud Expo has announced that its Call for Papers is open. Cloud Expo, to be held June 6-8, 2017, at the Javits Center in New York City, brings together Cloud Computing, Big Data, Internet of Things, DevOps, Containers, Microservices and WebRTC to one location. With cloud computing driving a higher percentage of enterprise IT budgets every year, it becomes increasingly important to plant your flag in this fast-expanding business opportunity. Submit your speaking proposal ...
Building a cross-cloud operational model can be a daunting task. Per-cloud silos are not the answer, but neither is a fully generic abstraction plane that strips out capabilities unique to a particular provider. In his session at 20th Cloud Expo, Chris Wolf, VP & Chief Technology Officer, Global Field & Industry at VMware, will discuss how successful organizations approach cloud operations and management, with insights into where operations should be centralized and when it’s best to decentraliz...
The age of Digital Disruption is evolving into the next era – Digital Cohesion, an age in which applications securely self-assemble and deliver predictive services that continuously adapt to user behavior. Information from devices, sensors and applications around us will drive services seamlessly across mobile and fixed devices/infrastructure. This evolution is happening now in software defined services and secure networking. Four key drivers – Performance, Economics, Interoperability and Trust ...
DevOps is often described as a combination of technology and culture. Without both, DevOps isn't complete. However, applying the culture to outdated technology is a recipe for disaster; as response times grow and connections between teams are delayed by technology, the culture will die. A Nutanix Enterprise Cloud has many benefits that provide the needed base for a true DevOps paradigm. In his Day 3 Keynote at 20th Cloud Expo, Chris Brown, a Solutions Marketing Manager at Nutanix, will explore t...
With billions of sensors deployed worldwide, the amount of machine-generated data will soon exceed what our networks can handle. But consumers and businesses will expect seamless experiences and real-time responsiveness. What does this mean for IoT devices and the infrastructure that supports them? More of the data will need to be handled at - or closer to - the devices themselves.
@DevOpsSummit at Cloud taking place June 6-8, 2017, at Javits Center, New York City, is co-located with the 20th International Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time to wait for long developm...
Cloud Expo, Inc. has announced today that Aruna Ravichandran, vice president of DevOps Product and Solutions Marketing at CA Technologies, has been named co-conference chair of DevOps at Cloud Expo 2017. The @DevOpsSummit at Cloud Expo New York will take place on June 6-8, 2017, at the Javits Center in New York City, New York, and @DevOpsSummit at Cloud Expo Silicon Valley will take place Oct. 31-Nov. 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
Translating agile methodology into real-world best practices within the modern software factory has driven widespread DevOps adoption, yet much work remains to expand workflows and tooling across the enterprise. As models evolve from pockets of experimentation into wholescale organizational reinvention, practitioners find themselves challenged to incorporate the culture and architecture necessary to support DevOps at scale. In his session at @DevOpsSummit at 20th Cloud Expo, Anand Akela, Senior...
In his keynote at @ThingsExpo, Chris Matthieu, Director of IoT Engineering at Citrix and co-founder and CTO of Octoblu, focused on building an IoT platform and company. He provided a behind-the-scenes look at Octoblu’s platform, business, and pivots along the way (including the Citrix acquisition of Octoblu).