Welcome!

Related Topics: Industrial IoT, Microservices Expo

Industrial IoT: Article

Index XML Documents with VTD-XML

How to turn the indexing capability on in your application

Results
Absolute Latency
/*/*/*[position() mod 2 = 0]
file name Jaxen (ms) Xalan (ms) VTD-XML (ms)
po_small.xml 0.401 1.521 0.028
po_medium.xml 16.255 25.131 0.449
po_big.xml 159.329 270.188 4.44

/purchaseOrder/items/item[USPrice<100]

file name Jaxen (ms) Xalan (ms) VTD-XML (ms)
po_small.xml 0.441 1.612 0.0338
po_medium.xml 16.954 28.21 0.431
po_big.xml 174.201 288.18 4.499

/*/*/*/quantity/text()

file name Jaxen (ms) Xalan (ms) VTD-XML (ms)
po_small.xml 0.47 1.534 0.0315
po_medium.xml 17.57 25.278 0.431
po_big.xml 190 272.958 4.412

//item/comment

file name Jaxen (ms) Xalan (ms) VTD-XML (ms)
po_small.xml 0.805 1.689 0.0364
po_medium.xml 27.27 27.687 0.434
po_big.xml 398.57 304.103 4.43

//item/comment/../quantity

file name Jaxen (ms) Xalan (ms) VTD-XML (ms)
po_small.xml 0.816 1.706 0.0372
po_medium.xml 28.367 28.338 0.435
po_big.xml 384.05 306.056 4.431

Observation
The benchmark results show that, after removing the parsing cost (by resorting to the index), VTD-XML now consistently outperforms DOM by two orders of magnitude, regardless of the message sizes. Interpreting the above results as the upper limit of how fast an XML content switch makes routing decisions based on the XPath output, VTD-XML's processing throughput, calculated by dividing the XML message size (not including VTD) by the latency, is around 250 MB/sec, roughly doubling the maximum throughput of a gigabit Ethernet connection. This means that switching/routing VTD+XML payloads based on simple XPath expressions is I/O-bound.

Conclusion
This article has introduced the latest indexing feature of VTD-XML along with the latest benchmark numbers showcasing the efficiency level it achieves. Prior to VTD-XML, an XML/SOA application written in DOM or SAX incurs the overhead of XML parsing, XPath evaluation and, optionally, content update. It's not uncommon that those overheads account for 80%-90% or more of the total CPU cycles of running the application. VTD-XML obliterates those overheads since there's not much overhead left to optimize. Using VTD-XML as a parser reduces XML parsing overhead by 5x-10x. Next VTD-XML's incremental update uniquely eliminates the roundtrip overhead of updating XML. Moreover, this article shows VTD-XML's innovative non-blocking, stateless XPath engine significantly outperforming Jaxen and Xalan. With the addition of the indexing capability, XML parsing has now become "optional."

In other words, obstacles standing on the path to successful SOA have quietly disappeared. But this is just another starting point. It probably won't be difficult to see that none of its benefits would exist if VTD-XML stuck with excessive object allocation like DOM. In the context of XML processing, pure OO modeling of an XML infoset (e.g., string and node objects) just doesn't appear the right thing to do in the first place. Like anything else, OO has its weaknesses. The problems (e.g., DOM and SAX's problems) arise when one chooses OO for the sake of choosing it, and stops questioning its sensibility. To me, knowing when not to use objects is equally, if not more, important. Derived from the weaknesses, constraints, and limitations, VTD-XML strives to be the simple, sensible answer to the problems.

And, in the context of SOA, there are more questions on OO programming worth reflecting on. Among them, is OOP's API-based public contract suitable for building loosely coupled, document-centric Web Services applications? The answers, again, are likely to be surprisingly simple.

More Stories By Jimmy Zhang

Jimmy Zhang is a cofounder of XimpleWare, a provider of high performance XML processing solutions. He has working experience in the fields of electronic design automation and Voice over IP for a number of Silicon Valley high-tech companies. He holds both a BS and MS from the department of EECS from U.C. Berkeley.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


Latest Stories
This session will provide an introduction to Cloud driven quality and transformation and highlight the key features that comprise it. A perspective on the cloud transformation lifecycle, transformation levers, and transformation framework will be shared. At Cognizant, we have developed a transformation strategy to enable the migration of business critical workloads to cloud environments. The strategy encompasses a set of transformation levers across the cloud transformation lifecycle to enhance ...
Your job is mostly boring. Many of the IT operations tasks you perform on a day-to-day basis are repetitive and dull. Utilizing automation can improve your work life, automating away the drudgery and embracing the passion for technology that got you started in the first place. In this presentation, I'll talk about what automation is, and how to approach implementing it in the context of IT Operations. Ned will discuss keys to success in the long term and include practical real-world examples. Ge...
The challenges of aggregating data from consumer-oriented devices, such as wearable technologies and smart thermostats, are fairly well-understood. However, there are a new set of challenges for IoT devices that generate megabytes or gigabytes of data per second. Certainly, the infrastructure will have to change, as those volumes of data will likely overwhelm the available bandwidth for aggregating the data into a central repository. Ochandarena discusses a whole new way to think about your next...
So the dumpster is on fire. Again. The site's down. Your boss's face is an ever-deepening purple. And you begin debating whether you should join the #incident channel or call an ambulance to deal with his impending stroke. Yes, we know this is a developer's fault. There's plenty of time for blame later. Postmortems have a macabre name because they were once intended to be Viking-like funerals for someone's job. But we're civilized now. Sort of. So we call them post-incident reviews. Fires are ne...
Whenever a new technology hits the high points of hype, everyone starts talking about it like it will solve all their business problems. Blockchain is one of those technologies. According to Gartner's latest report on the hype cycle of emerging technologies, blockchain has just passed the peak of their hype cycle curve. If you read the news articles about it, one would think it has taken over the technology world. No disruptive technology is without its challenges and potential impediments t...
Hackers took three days to identify and exploit a known vulnerability in Equifax’s web applications. I will share new data that reveals why three days (at most) is the new normal for DevSecOps teams to move new business /security requirements from design into production. This session aims to enlighten DevOps teams, security and development professionals by sharing results from the 4th annual State of the Software Supply Chain Report -- a blend of public and proprietary data with expert researc...
CloudEXPO New York 2018, colocated with DevOpsSUMMIT and DXWorldEXPO New York 2018 will be held November 12-13, 2018, in New York City and will bring together Cloud Computing, FinTech and Blockchain, Digital Transformation, Big Data, Internet of Things, DevOps, AI and Machine Learning to one location.
DXWorldEXPO LLC announced today that Nutanix has been named "Platinum Sponsor" of CloudEXPO | DevOpsSUMMIT | DXWorldEXPO New York, which will take place November 12-13, 2018 in New York City. Nutanix makes infrastructure invisible, elevating IT to focus on the applications and services that power their business. The Nutanix Enterprise Cloud Platform blends web-scale engineering and consumer-grade design to natively converge server, storage, virtualization and networking into a resilient, softwar...
CloudEXPO | DevOpsSUMMIT | DXWorldEXPO are the world's most influential, independent events where Cloud Computing was coined and where technology buyers and vendors meet to experience and discuss the big picture of Digital Transformation and all of the strategies, tactics, and tools they need to realize their goals. Sponsors of DXWorldEXPO | CloudEXPO benefit from unmatched branding, profile building and lead generation opportunities.
The digital transformation is real! To adapt, IT professionals need to transform their own skillset to become more multi-dimensional by gaining both depth and breadth of a wide variety of knowledge and competencies. Historically, while IT has been built on a foundation of specialty (or "I" shaped) silos, the DevOps principle of "shifting left" is opening up opportunities for developers, operational staff, security and others to grow their skills portfolio, advance their careers and become "T"-sh...
Lori MacVittie is a subject matter expert on emerging technology responsible for outbound evangelism across F5's entire product suite. MacVittie has extensive development and technical architecture experience in both high-tech and enterprise organizations, in addition to network and systems administration expertise. Prior to joining F5, MacVittie was an award-winning technology editor at Network Computing Magazine where she evaluated and tested application-focused technologies including app secu...
DXWorldEXPO LLC announced today that Big Data Federation to Exhibit at the 22nd International CloudEXPO, colocated with DevOpsSUMMIT and DXWorldEXPO, November 12-13, 2018 in New York City. Big Data Federation, Inc. develops and applies artificial intelligence to predict financial and economic events that matter. The company uncovers patterns and precise drivers of performance and outcomes with the aid of machine-learning algorithms, big data, and fundamental analysis. Their products are deployed...
ICC is a computer systems integrator and server manufacturing company focused on developing products and product appliances to meet a wide range of computational needs for many industries. Their solutions provide benefits across many environments, such as datacenter deployment, HPC, workstations, storage networks and standalone server installations. ICC has been in business for over 23 years and their phenomenal range of clients include multinational corporations, universities, and small busines...
This sixteen (16) hour course provides an introduction to DevOps, the cultural and professional movement that stresses communication, collaboration, integration and automation in order to improve the flow of work between software developers and IT operations professionals. Improved workflows will result in an improved ability to design, develop, deploy and operate software and services faster.
Headquartered in Plainsboro, NJ, Synametrics Technologies has provided IT professionals and computer systems developers since 1997. Based on the success of their initial product offerings (WinSQL and DeltaCopy), the company continues to create and hone innovative products that help its customers get more from their computer applications, databases and infrastructure. To date, over one million users around the world have chosen Synametrics solutions to help power their accelerated business or per...