Welcome!

Related Topics: Industrial IoT, Microservices Expo

Industrial IoT: Article

Index XML Documents with VTD-XML

How to turn the indexing capability on in your application

Traditionally DOM or SAX-based enterprise applications have to repeat CPU-intensive XML parsing when accessing the same documents multiple times. VTD-XML 2.0 introduces a simple general-purpose XML index called VTD+XML (http://vtd-xml.sourceforge.net/persistence.html) that eliminates the need for repetitive parsing of those applications.

This article combines various examples and the latest benchmark reports to show you how to get started with this indexing. This article also discusses various scenarios and use cases where you may find VTD+XML useful.

Avoid Repetitive XML Parsing with VTD-XML
As discussed in "Simplify XML processing with VTD-XML," to date one of underlying assumptions in XML application development is that an XML document must be parsed before anything else can be done with it. In other words, the processing logic of XML applications can't start without parsing. Frequently considered a threat to database performance, XML parsing is usually many times slower than other XML operations such as XPath evaluation. When those applications perform multiple read-only access to XML data that don't change very often, wouldn't it be nice to able to eliminate the overhead of associated repetitive parsing?

With the native XML indexing feature introduced in version 2.0 of VTD-XML, you can do precisely that. VTDGen, the class encapsulating various parsing routines, now adds "readIndex(...)" and "writeIndex(...)." VTD-XML 2.0 also introduces two new exceptions: indexWriteException and indexReadException.

Let me put those new methods into action and show you how to turn on the indexing capability in your application. Consider the following XML document:

   <purchaseOrder orderDate="1999-10-21">
     <item partNum="872-AA">
       <productName>Lawnmower</productName>
       <quantity>1</quantity>
       <USPrice>148.95</USPrice>
     </item>
   </purchaseOrder>

Below is a simple pre-2.0 VTD-XML code named "printPrice.java" that prints out the content of the element "USPrice." Notice that it parses the XML file and then uses XPath to filter out the target nodes.

import com.ximpleware.*;
import com.ximpleware.xpath.*;
public class printPrice{
   public static void main(String args[]){
     VTDGen vg = new VTDGen();
     try{
       if (vg.parseFile("po.xml",true)){
         VTDNav vn = vg.getNav();
         AutoPilot ap = new AutoPilot(vn);
         ap.selectXPath("/purchaseOrder/item/USPrice/text()");
         int i=-1;
         while((i=ap.evalXPath())!=-1){
           System.out.println(" USPrice ==> "+vn.toString(i));
         }
       }
     }catch(Exception e){

     }
   }
}

A few changes are needed to add VTD-XML's new indexing capability to the Java code above. First, you need to read in the XML document, parse it, and then write out the indexed version of the same XML document. From that point onward, your application can run XPath query or processing logic directly on top of the index, saving the CPU cycles of parsing the XML document again. The following code snippets (named "genIndex.java" and "accessIndex.java" respectively) show you how to generate and access the index. Notice that, when executed sequentially, both applications produce the identical output as "printPrice.java."
The first application (genIndex.java) reads in "po.xml" and produces "po.vxl."

import com.ximpleware.*;
import com.ximpleware.xpath.*;
public class genIndex{
   public static void main(String args[]){
     VTDGen vg = new VTDGen();
     try{
       if (vg.parseFile("po.xml",true)){
         vg.writeIndex("po.vxl");
       }
     }catch(Exception e){
     }
   }
}

The second application (accessIndex.java) loads "po.vxl" and filters the document using XPath expression "/purchaseOrder/item/USPrice/text()."

import com.ximpleware.*;
import com.ximpleware.xpath.*;
public class accessIndex{
   public static void main(String args[]){
     VTDGen vg = new VTDGen();
     try{
       VTDNav vn = vg.loadIndex("po.vxl");
       AutoPilot ap = new AutoPilot(vn);
       ap.selectXPath("/purchaseOrder/item/USPrice/text()");
       int i=-1;
       while((i=ap.evalXPath())!=-1){
         System.out.println(" USPrice ==> "+vn.toString(i));
       }
     }catch(Exception e){
     }

   }
}


More Stories By Jimmy Zhang

Jimmy Zhang is a cofounder of XimpleWare, a provider of high performance XML processing solutions. He has working experience in the fields of electronic design automation and Voice over IP for a number of Silicon Valley high-tech companies. He holds both a BS and MS from the department of EECS from U.C. Berkeley.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


Latest Stories
DevOps at Cloud Expo, taking place October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 21st Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time to w...
Hardware virtualization and cloud computing allowed us to increase resource utilization and increase our flexibility to respond to business demand. Docker Containers are the next quantum leap - Are they?! Databases always represented an additional set of challenges unique to running workloads requiring a maximum of I/O, network, CPU resources combined with data locality.
SYS-CON Events announced today that EnterpriseTech has been named “Media Sponsor” of SYS-CON's 21st International Cloud Expo, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. EnterpriseTech is a professional resource for news and intelligence covering the migration of high-end technologies into the enterprise and business-IT industry, with a special focus on high-tech solutions in new product development, workload management, increased effi...
Internet of @ThingsExpo, taking place October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 21st Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The Internet of Things (IoT) is the most profound change in personal and enterprise IT since the creation of the Worldwide Web more than 20 years ago. All major researchers estimate there will be tens of billions devic...
Cloud Expo, Inc. has announced today that Andi Mann and Aruna Ravichandran have been named Co-Chairs of @DevOpsSummit at Cloud Expo Silicon Valley which will take place Oct. 31-Nov. 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. "DevOps is at the intersection of technology and business-optimizing tools, organizations and processes to bring measurable improvements in productivity and profitability," said Aruna Ravichandran, vice president, DevOps product and solutions marketing...
SYS-CON Events announced today that SourceForge has been named “Media Sponsor” of SYS-CON's 21st International Cloud Expo, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. SourceForge is the largest, most trusted destination for Open Source Software development, collaboration, discovery and download on the web serving over 32 million viewers, 150 million downloads and over 460,000 active development projects each and every month.
Multiple data types are pouring into IoT deployments. Data is coming in small packages as well as enormous files and data streams of many sizes. Widespread use of mobile devices adds to the total. In this power panel at @ThingsExpo, moderated by Conference Chair Roger Strukhoff, panelists looked at the tools and environments that are being put to use in IoT deployments, as well as the team skills a modern enterprise IT shop needs to keep things running, get a handle on all this data, and deliver...
SYS-CON Events announced today that IBM has been named “Diamond Sponsor” of SYS-CON's 21st Cloud Expo, which will take place on October 31 through November 2nd 2017 at the Santa Clara Convention Center in Santa Clara, California.
We build IoT infrastructure products - when you have to integrate different devices, different systems and cloud you have to build an application to do that but we eliminate the need to build an application. Our products can integrate any device, any system, any cloud regardless of protocol," explained Peter Jung, Chief Product Officer at Pulzze Systems, in this SYS-CON.tv interview at @ThingsExpo, held November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA
Cloud applications are seeing a deluge of requests to support the exploding advanced analytics market. “Open analytics” is the emerging strategy to deliver that data through an open data access layer, in the cloud, to be directly consumed by external analytics tools and popular programming languages. An increasing number of data engineers and data scientists use a variety of platforms and advanced analytics languages such as SAS, R, Python and Java, as well as frameworks such as Hadoop and Spark...
SYS-CON Events announced today that CHEETAH Training & Innovation will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct. 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. CHEETAH Training & Innovation is a cloud consulting and IT training firm specializing in improving clients cloud strategies and infrastructures for medium to large companies.
"Tintri focuses on the Ops side of the DevOps, which basically is pushing more and more of the accessibility of the infrastructure to the developers and trying to get behind the scenes," explained Dhiraj Sehgal of Tintri in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
You know you need the cloud, but you’re hesitant to simply dump everything at Amazon since you know that not all workloads are suitable for cloud. You know that you want the kind of ease of use and scalability that you get with public cloud, but your applications are architected in a way that makes the public cloud a non-starter. You’re looking at private cloud solutions based on hyperconverged infrastructure, but you’re concerned with the limits inherent in those technologies.
SYS-CON Events announced today that TMC has been named “Media Sponsor” of SYS-CON's 21st International Cloud Expo and Big Data at Cloud Expo, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Global buyers rely on TMC’s content-driven marketplaces to make purchase decisions and navigate markets. Learn how we can help you reach your marketing goals.
Both SaaS vendors and SaaS buyers are going “all-in” to hyperscale IaaS platforms such as AWS, which is disrupting the SaaS value proposition. Why should the enterprise SaaS consumer pay for the SaaS service if their data is resident in adjacent AWS S3 buckets? If both SaaS sellers and buyers are using the same cloud tools, automation and pay-per-transaction model offered by IaaS platforms, then why not host the “shrink-wrapped” software in the customers’ cloud? Further, serverless computing, cl...