Hadoop + esProc Help You Replace IOE


What is IOE? I=IBM, O=Oracle, and E=EMC. They represent the typical high-end database and data warehouse architecture. The high-end servers include HP, IBM, and Fujitsu, the high-end database software includes Teradata, Oracle, Greenplum; the high-end storages include EMC, Violin, and Fusion-io.

In the past, such typical high performance database architecture is the preference of large and middle sized organizations. They can run stably with superior performance, and became popular when the informatization degree was not so high and the enterprise application was simple. With the explosive data growth and the nowadays diversified and complex enterprise applications, most enterprises have gradually realized that they should replacing IOE, and quite a few of them have successfully implemented their road map to cancel the high-end database totally, including Intel, Alibaba, Amazon, eBay, Yahoo, and Facebook.

The data explosion has brought about sharp increase in the storage capacity demand, and the diversified and complex applications pose the challenge to meet the fast-growing computation pressure and parallel access requests. The only solution is to upgrade ever more frequently. More and more enterprise managements get to feel the pressure of the great cost to upgrade IOE. More often than not, enterprises still suffer from the slow response and high workloads even if they've invested heavily. That is why these enterprises are determined to replace IOE.

Hadoop is one of the IOE solutions on which the enterprise management have pinned great hope.

It supports the cheap desktop hard disk as a replacement to high-end storage media of IOE.

Its HDFS file system can replace the disk cabinet of IOE, ensuring the secure data redundancy.

It supports the cheap PC to replace the high-end database server.

It is the open source software, not incurring any cost on additional CPUs, storage capacities, and user licenses.

With the support for parallel computing, the inexpensive scale-out can be implemented, and the storage pressure can be averted to multiple inexpensive PCs at less acquisition and management cost, so as to have greater storage capacity, higher computing performance, and a number of paralleling processes far more than that of IOE. That's why Hadoop is highly anticipated.

However, IOE still has an advantage over Hadoop for its great data computing capability. The data computing is the most important software function for the modern enterprise data center. Nowadays, it is normal to find some data computing involving the complex business logics, in particular the applications of enterprise decision-making, procedure optimizing, performance benchmarking, time control, and cost management. However, Hadoop alone cannot replace IOE. As a matter of facts, those enterprises of high-profile champions for replacing IOE have to partly keep the IOE. With the drawback of insufficient computing capability, Hadoop can only be used to compute the simple ETL, data storage and locating, and is awkward to handle the truly massive business data computation.

To replace IOE, we need to have the computational capability no weaker than the enterprise-level database and seamlessly incorporating this capability to Hadoop to give full play to the advantageous middleware of Hadoop. esProc is just the choice to meet this demand.

esProc is a parallel computing framework software which is built with pure Java and focused on powering Hadoop. It can access Hive via JDBC or directly read and write to HDFS. With the complete data computing system, you can find an alternative to IOE to perform a range of data computing of whatsoever complexity. It is especially good at the computation requiring complex business logics and stored procedures.

esProc supports the professional data scripting languages, offering the true set data type, easy for algorithm design from business client's perspective, and effortless to implement the complex business logics of clients. In addition, esProc supports the ordered set for arbitrary access to the member of set and perform the serial-number-related computation. The set of set can be used to represent the complex grouping style easily, for example, the equal grouping, align grouping, and enum grouping. Users can operate on the single record in as same way of operating on an object. esProc scripts is written and presented in a grid. By this way, the intermediate result can be referenced without definition. To add the convenience, the complete code editing and debugging functions are provided. esProc can be regarded as a dynamic set-lized language which has something in common with R language, and offers native support for distributed parallel computation from the core. Programmers can surely be benefited from the efficient parallel computation of esProc while still having the simple syntax of R. It is built for the data computing, and optimized for data processing. For the complex analysis business, both its development efficiency and computing performance are beyond the existing solution of Hadoop.

The combined use of Hadoop + esProc can fully remedy the drawback to Hadoop, empowering Hadoop to replace the very most of IOE features and improving its computing capability dramatically.

More Stories By Jessica Qiu

Jessica Qiu is the editor of Raqsoft. She provides press releases for data computation and data analytics.

Latest Stories
Internet-of-Things discussions can end up either going down the consumer gadget rabbit hole or focused on the sort of data logging that industrial manufacturers have been doing forever. However, in fact, companies today are already using IoT data both to optimize their operational technology and to improve the experience of customer interactions in novel ways. In his session at @ThingsExpo, Gordon Haff, Red Hat Technology Evangelist, shared examples from a wide range of industries – including en...
In IT, we sometimes coin terms for things before we know exactly what they are and how they’ll be used. The resulting terms may capture a common set of aspirations and goals – as “cloud” did broadly for on-demand, self-service, and flexible computing. But such a term can also lump together diverse and even competing practices, technologies, and priorities to the point where important distinctions are glossed over and lost.
In his session at @DevOpsSummit at 20th Cloud Expo, Kelly Looney, director of DevOps consulting for Skytap, showed how an incremental approach to introducing containers into complex, distributed applications results in modernization with less risk and more reward. He also shared the story of how Skytap used Docker to get out of the business of managing infrastructure, and into the business of delivering innovation and business value. Attendees learned how up-front planning allows for a clean sep...
Detecting internal user threats in the Big Data eco-system is challenging and cumbersome. Many organizations monitor internal usage of the Big Data eco-system using a set of alerts. This is not a scalable process given the increase in the number of alerts with the accelerating growth in data volume and user base. Organizations are increasingly leveraging machine learning to monitor only those data elements that are sensitive and critical, autonomously establish monitoring policies, and to detect...
Most companies are adopting or evaluating container technology - Docker in particular - to speed up application deployment, drive down cost, ease management and make application delivery more flexible overall. As with most new architectures, this dream takes a lot of work to become a reality. Even when you do get your application componentized enough and packaged properly, there are still challenges for DevOps teams to making the shift to continuous delivery and achieving that reduction in cost ...
Enterprise architects are increasingly adopting multi-cloud strategies as they seek to utilize existing data center assets, leverage the advantages of cloud computing and avoid cloud vendor lock-in. This requires a globally aware traffic management strategy that can monitor infrastructure health across data centers and end-user experience globally, while responding to control changes and system specification at the speed of today’s DevOps teams. In his session at 20th Cloud Expo, Josh Gray, Chie...
To get the most out of their data, successful companies are not focusing on queries and data lakes, they are actively integrating analytics into their operations with a data-first application development approach. Real-time adjustments to improve revenues, reduce costs, or mitigate risk rely on applications that minimize latency on a variety of data sources. Jack Norris reviews best practices to show how companies develop, deploy, and dynamically update these applications and how this data-first...
Intelligent Automation is now one of the key business imperatives for CIOs and CISOs impacting all areas of business today. In his session at 21st Cloud Expo, Brian Boeggeman, VP Alliances & Partnerships at Ayehu, will talk about how business value is created and delivered through intelligent automation to today’s enterprises. The open ecosystem platform approach toward Intelligent Automation that Ayehu delivers to the market is core to enabling the creation of the self-driving enterprise.
"At the keynote this morning we spoke about the value proposition of Nutanix, of having a DevOps culture and a mindset, and the business outcomes of achieving agility and scale, which everybody here is trying to accomplish," noted Mark Lavi, DevOps Solution Architect at Nutanix, in this SYS-CON.tv interview at @DevOpsSummit at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"We're here to tell the world about our cloud-scale infrastructure that we have at Juniper combined with the world-class security that we put into the cloud," explained Lisa Guess, VP of Systems Engineering at Juniper Networks, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
Historically, some banking activities such as trading have been relying heavily on analytics and cutting edge algorithmic tools. The coming of age of powerful data analytics solutions combined with the development of intelligent algorithms have created new opportunities for financial institutions. In his session at 20th Cloud Expo, Sebastien Meunier, Head of Digital for North America at Chappuis Halder & Co., discussed how these tools can be leveraged to develop a lasting competitive advantage ...
WebRTC is the future of browser-to-browser communications, and continues to make inroads into the traditional, difficult, plug-in web communications world. The 6th WebRTC Summit continues our tradition of delivering the latest and greatest presentations within the world of WebRTC. Topics include voice calling, video chat, P2P file sharing, and use cases that have already leveraged the power and convenience of WebRTC.
"We're a cybersecurity firm that specializes in engineering security solutions both at the software and hardware level. Security cannot be an after-the-fact afterthought, which is what it's become," stated Richard Blech, Chief Executive Officer at Secure Channels, in this SYS-CON.tv interview at @ThingsExpo, held November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA.
In his session at 20th Cloud Expo, Mike Johnston, an infrastructure engineer at Supergiant.io, discussed how to use Kubernetes to set up a SaaS infrastructure for your business. Mike Johnston is an infrastructure engineer at Supergiant.io with over 12 years of experience designing, deploying, and maintaining server and workstation infrastructure at all scales. He has experience with brick and mortar data centers as well as cloud providers like Digital Ocean, Amazon Web Services, and Rackspace. H...
You know you need the cloud, but you’re hesitant to simply dump everything at Amazon since you know that not all workloads are suitable for cloud. You know that you want the kind of ease of use and scalability that you get with public cloud, but your applications are architected in a way that makes the public cloud a non-starter. You’re looking at private cloud solutions based on hyperconverged infrastructure, but you’re concerned with the limits inherent in those technologies.