Welcome!

Related Topics: Industrial IoT, Microservices Expo

Industrial IoT: Article

Index XML Documents with VTD-XML

How to turn the indexing capability on in your application

Results
Absolute Latency
/*/*/*[position() mod 2 = 0]
file name Jaxen (ms) Xalan (ms) VTD-XML (ms)
po_small.xml 0.401 1.521 0.028
po_medium.xml 16.255 25.131 0.449
po_big.xml 159.329 270.188 4.44

/purchaseOrder/items/item[USPrice<100]

file name Jaxen (ms) Xalan (ms) VTD-XML (ms)
po_small.xml 0.441 1.612 0.0338
po_medium.xml 16.954 28.21 0.431
po_big.xml 174.201 288.18 4.499

/*/*/*/quantity/text()

file name Jaxen (ms) Xalan (ms) VTD-XML (ms)
po_small.xml 0.47 1.534 0.0315
po_medium.xml 17.57 25.278 0.431
po_big.xml 190 272.958 4.412

//item/comment

file name Jaxen (ms) Xalan (ms) VTD-XML (ms)
po_small.xml 0.805 1.689 0.0364
po_medium.xml 27.27 27.687 0.434
po_big.xml 398.57 304.103 4.43

//item/comment/../quantity

file name Jaxen (ms) Xalan (ms) VTD-XML (ms)
po_small.xml 0.816 1.706 0.0372
po_medium.xml 28.367 28.338 0.435
po_big.xml 384.05 306.056 4.431

Observation
The benchmark results show that, after removing the parsing cost (by resorting to the index), VTD-XML now consistently outperforms DOM by two orders of magnitude, regardless of the message sizes. Interpreting the above results as the upper limit of how fast an XML content switch makes routing decisions based on the XPath output, VTD-XML's processing throughput, calculated by dividing the XML message size (not including VTD) by the latency, is around 250 MB/sec, roughly doubling the maximum throughput of a gigabit Ethernet connection. This means that switching/routing VTD+XML payloads based on simple XPath expressions is I/O-bound.

Conclusion
This article has introduced the latest indexing feature of VTD-XML along with the latest benchmark numbers showcasing the efficiency level it achieves. Prior to VTD-XML, an XML/SOA application written in DOM or SAX incurs the overhead of XML parsing, XPath evaluation and, optionally, content update. It's not uncommon that those overheads account for 80%-90% or more of the total CPU cycles of running the application. VTD-XML obliterates those overheads since there's not much overhead left to optimize. Using VTD-XML as a parser reduces XML parsing overhead by 5x-10x. Next VTD-XML's incremental update uniquely eliminates the roundtrip overhead of updating XML. Moreover, this article shows VTD-XML's innovative non-blocking, stateless XPath engine significantly outperforming Jaxen and Xalan. With the addition of the indexing capability, XML parsing has now become "optional."

In other words, obstacles standing on the path to successful SOA have quietly disappeared. But this is just another starting point. It probably won't be difficult to see that none of its benefits would exist if VTD-XML stuck with excessive object allocation like DOM. In the context of XML processing, pure OO modeling of an XML infoset (e.g., string and node objects) just doesn't appear the right thing to do in the first place. Like anything else, OO has its weaknesses. The problems (e.g., DOM and SAX's problems) arise when one chooses OO for the sake of choosing it, and stops questioning its sensibility. To me, knowing when not to use objects is equally, if not more, important. Derived from the weaknesses, constraints, and limitations, VTD-XML strives to be the simple, sensible answer to the problems.

And, in the context of SOA, there are more questions on OO programming worth reflecting on. Among them, is OOP's API-based public contract suitable for building loosely coupled, document-centric Web Services applications? The answers, again, are likely to be surprisingly simple.

More Stories By Jimmy Zhang

Jimmy Zhang is a cofounder of XimpleWare, a provider of high performance XML processing solutions. He has working experience in the fields of electronic design automation and Voice over IP for a number of Silicon Valley high-tech companies. He holds both a BS and MS from the department of EECS from U.C. Berkeley.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


Latest Stories
SYS-CON Events announced today that Hitachi, the leading provider the Internet of Things and Digital Transformation, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Hitachi Data Systems, a wholly owned subsidiary of Hitachi, Ltd., offers an integrated portfolio of services and solutions that enable digital transformation through enhanced data management, governance, mobility and analytics. We help globa...
Did you know that you can develop for mainframes in Java? Or that the testing and deployment can be automated across mobile to mainframe? In his session at @DevOpsSummit at 20th Cloud Expo, Vaughn Marshall, Sr. Principal Product Owner at CA Technologies, will discuss and demo how increasingly teams are developing with agile methodologies using modern development environments and automating testing and deployments, mobile to mainframe.
@DevOpsSummit has been named the ‘Top DevOps Influencer' by iTrend. iTred processes millions of conversations, tweets, interactions, news articles, press releases, blog posts - and extract meaning form them and analyzes mobile and desktop software platforms used to communicate, various metadata (such as geo location), and automation tools. In overall placement, @DevOpsSummit ranked as the number one ‘DevOps Influencer' followed by @CloudExpo at third, and @MicroservicesE at 24th.
@GonzalezCarmen has been ranked the Number One Influencer and @ThingsExpo has been named the Number One Brand in the “M2M 2016: Top 100 Influencers and Brands” by Analytic. Onalytica analyzed tweets over the last 6 months mentioning the keywords M2M OR “Machine to Machine.” They then identified the top 100 most influential brands and individuals leading the discussion on Twitter.
SYS-CON Events announced today that Hitachi, the leading provider the Internet of Things and Digital Transformation, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Hitachi Data Systems, a wholly owned subsidiary of Hitachi, Ltd., offers an integrated portfolio of services and solutions that enable digital transformation through enhanced data management, governance, mobility and analytics. We help globa...
Blockchain is a shared, secure record of exchange that establishes trust, accountability and transparency across supply chain networks. Supported by the Linux Foundation's open source, open-standards based Hyperledger Project, Blockchain has the potential to improve regulatory compliance, reduce cost and time for product recall as well as advance trade. Are you curious about Blockchain and how it can provide you with new opportunities for innovation and growth? In her session at 20th Cloud Exp...
In his keynote at 19th Cloud Expo, Sheng Liang, co-founder and CEO of Rancher Labs, discussed the technological advances and new business opportunities created by the rapid adoption of containers. With the success of Amazon Web Services (AWS) and various open source technologies used to build private clouds, cloud computing has become an essential component of IT strategy. However, users continue to face challenges in implementing clouds, as older technologies evolve and newer ones like Docker c...
The goal of Continuous Testing is to shift testing left to find defects earlier and release software faster. This can be achieved by integrating a set of open source functional and performance testing tools in the early stages of your software delivery lifecycle. There is one process that binds all application delivery stages together into one well-orchestrated machine: Continuous Testing. Continuous Testing is the conveyor belt between the Software Factory and production stages. Artifacts are ...
Most technology leaders, contemporary and from the hardware era, are reshaping their businesses to do software in the hope of capturing value in IoT. Although IoT is relatively new in the market, it has already gone through many promotional terms such as IoE, IoX, SDX, Edge/Fog, Mist Compute, etc. Ultimately, irrespective of the name, it is about deriving value from independent software assets participating in an ecosystem as one comprehensive solution.
DevOps is often described as a combination of technology and culture. Without both, DevOps isn't complete. However, applying the culture to outdated technology is a recipe for disaster; as response times grow and connections between teams are delayed by technology, the culture will die. A Nutanix Enterprise Cloud has many benefits that provide the needed base for a true DevOps paradigm. In his Day 3 Keynote at 20th Cloud Expo, Chris Brown, a Solutions Marketing Manager at Nutanix, will explore t...
The explosion of new web/cloud/IoT-based applications and the data they generate are transforming our world right before our eyes. In this rush to adopt these new technologies, organizations are often ignoring fundamental questions concerning who owns the data and failing to ask for permission to conduct invasive surveillance of their customers. Organizations that are not transparent about how their systems gather data telemetry without offering shared data ownership risk product rejection, regu...
NHK, Japan Broadcasting, will feature the upcoming @ThingsExpo Silicon Valley in a special 'Internet of Things' and smart technology documentary that will be filmed on the expo floor between November 3 to 5, 2015, in Santa Clara. NHK is the sole public TV network in Japan equivalent to the BBC in the UK and the largest in Asia with many award-winning science and technology programs. Japanese TV is producing a documentary about IoT and Smart technology and will be covering @ThingsExpo Silicon Val...
SYS-CON Events announced today that SoftLayer, an IBM Company, has been named “Gold Sponsor” of SYS-CON's 18th Cloud Expo, which will take place on June 7-9, 2016, at the Javits Center in New York, New York. SoftLayer, an IBM Company, provides cloud infrastructure as a service from a growing number of data centers and network points of presence around the world. SoftLayer’s customers range from Web startups to global enterprises.
Cloud Expo, Inc. has announced today that Aruna Ravichandran, vice president of DevOps Product and Solutions Marketing at CA Technologies, has been named co-conference chair of DevOps at Cloud Expo 2017. The @DevOpsSummit at Cloud Expo New York will take place on June 6-8, 2017, at the Javits Center in New York City, New York, and @DevOpsSummit at Cloud Expo Silicon Valley will take place Oct. 31-Nov. 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
As DevOps methodologies expand their reach across the enterprise, organizations face the daunting challenge of adapting related cloud strategies to ensure optimal alignment, from managing complexity to ensuring proper governance. How can culture, automation, legacy apps and even budget be reexamined to enable this ongoing shift within the modern software factory?