Welcome!

Related Topics: @CloudExpo, Open Source Cloud

@CloudExpo: Blog Feed Post

Top Five Hosted Hadoop-Based Applications Reviewed

Amazon Elastic MapReduce; Cloud Era CDH; InfoSphere BigInsights; MapR M3 and M5 and Hortonworks Data Platform

It is our goal at Monitis to make the lives of web developers and system administrators easy. We have reviewed the 5 leading hosted hadoop-based applications and given a short analysis of them in this post to help guide you in finding a solution that best suits your needs.

The article covers: Amazon Elastic MapReduce; Cloud Era CDH; InfoSphere BigInsights; MapR M3 and M5 and Hortonworks Data Platform.

amazon_web_services_logo_aws

Amazon Elastic MapReduce (http://aws.amazon.com/elasticmapreduce/)

Introduced by Amazon in 2009, Elastic MapReduce automates the process of various Hadoop cluster processes and transfers between Amazon’s EC2 and S3 products. For a minimal fee, Amazon will provide its clients with the ability to launch a preconfigured Hadoop cluster to run a client’s MapReduce Program.

AWS Screenshot

Pros

  • Very easy to setup a job flow
  • There’s an enormous amount of documentation available to help new users
  • Example applications are provided, giving an option to test drive the application before putting it to use.
  • Entire application system can be powered by a command line interface, compared to a web-based management console.
  • Ability to conduct several jobs simultaneously and parallel.
  • No hardware is needed and costs can be very limited, which is great for small businesses seeking to be more cost efficient.

Cons

  • Need an account with Amazon Web Services (AWS)
  • Service is only available in the United States
  • Requires the use of Amazon’s S3 service, which adds extra costs to an overall project (data transfer, security etc.)

 

Cloudera Logo

Cloudera CDH (www.cloudera.com)

Founded in March 2009, Cloudera was previously considered to be the Red Hat of the Hadoop World. With a large customer base of over 400 (including paid and free downloads), the company’s offerings include the  Cloudera Enterprise products and Training & Support Services. Formed by a number of key executives from various technology giants (Oracle, Yahoo, Google and Facebook), Cloudera is considered the pioneer in the Hadoop community, having a head start in the industry compared to its competitors.

Cloudera screen.jpg

Pros

  • Free application that can be easily downloaded
  • Installed internally within an organization which allows the company to have full control of all processes, jobs etc.
  • Technical support is superior and the knowledgebase is an essential resource to anyone starting out with Hadoop
  • Used by a large number of companies worldwide, and has been proven as a leading choice in Hadoop applications.
  • Application includes additional resources and components (e.g. Pig, Hive, Flume, HBase, Zookeeper, Mahout, Whirr, Hue, Sqoop and Oozie)
  • Cloudera conducts quarterly updates: eliminating the need to conduct a big scale annual upgrade.

Cons

  • Requires companies to obtain the necessary hardware in order to install the application, adding additional costs.
  • Additional costs are added to support and maintain the application, increasing the company’s operating costs.

 

logo-ibm-infosphere

IBM InfoSphere BigInsights (www.ibm.com/software/data/infosphere/biginsights)

A new product introduced in May 2011, the product is geared towards handling extremely large volumes of streaming data using a Hadoop-based analytics framework. IBM states that the IBM InfoSphere Biginsights will be able to handle “tens-of-petabytes” of data, and will retain a sub-millisecond response time. The company also plans to launch 20 new service offerings, including numerous analytical tools for business and IT.

ibm

Pros

  • Superior product support and long standing company reputation established from many years of servicing the IT community.
  • Comes standard with a number of essential components including; PIG programming, IBM DB2 and IBM BigSheets.
  • Offers two replication models that provide log-based replication working independently (queue-based and SQL-based).
  • Lots of documentation and step-by-step training is available from the IBM website.
  • Superior product for analysing big data in motion that needs to be continuously analyzed in real time.

Cons

  • New to the marketplace and has not been around long enough to ensure a solid reputation.
  • An expensive solution for small/medium size organizations seeking to utilize a more cost effective application.

 

30621_MapR_Logo_Tag-Copy

MapR M3 and M5 (www.mapr.com)

With headquarters in San Jose, CA, MapR markets its proprietary applications with a focus on providing a number of key features and capabilities for the use with MapReduce and Hadoop.

mappr

Pros

  • Offers superior monitoring that can provide a better understanding of data distribution and processing – essential for achieving increased performance.
  • A free version is offered, which includes everything except management tools which are only offered in its M5 series products.
  • Excellent technical support and vast quantities of documentation available

Cons

  • New to the marketplace so has a limited reputation
  • An expensive solution for small/medium size organizations
  • 24×7 support is only available on the paid version of the application
  • Requires an enormous amount of disk space to install (25GB), compared to similar products.

 

hortonworks

Hortonworks Data Platform (http://hortonworks.com/)

Hortonworks was formed in June 2011 by a number of key architects and Hadoop committers formerly employed within the Yahoo Hadoop Software department. The company’s offerings include; HDP (Hadoop Data Platform) and Training Support Services. The company currently serves 2 customers – Yahoo and Microsoft.

Pros

  • A spin-off Yahoo product, so it’s been tested in the marketplace.
  • Lots of documentation and support available from the knowledgebase community.
  • The company is continuously working with Yahoo to develop its future products
  • Scalable to meet the demands of specific projects.
  • Offers variations and expanded product offerings from partnerships with a number of specialized companies.

Cons

  • Product is similar in nature to Cloudera, and provides similar features.

 

1 YEAR WEBSITE TRAFFIC COMPARISON (from Compete.com)

Hadoop Based Application Website Performance

Hadoop Based Application Website Performance Stats

Hopefully our post has been of interest to web developers and system administrators.

More information on Monitis can be found on our website: www.monitis.com

More Stories By Hovhannes Avoyan

Hovhannes Avoyan is the CEO of PicsArt, Inc.,

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


Latest Stories
"With Digital Experience Monitoring what used to be a simple visit to a web page has exploded into app on phones, data from social media feeds, competitive benchmarking - these are all components that are only available because of some type of digital asset," explained Leo Vasiliou, Director of Web Performance Engineering at Catchpoint Systems, in this SYS-CON.tv interview at DevOps Summit at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
SYS-CON Events announced today that Secure Channels, a cybersecurity firm, will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Secure Channels, Inc. offers several products and solutions to its many clients, helping them protect critical data from being compromised and access to computer networks from the unauthorized. The company develops comprehensive data encryption security strategie...
DevOps at Cloud Expo, taking place October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 21st Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time to w...
Vulnerability management is vital for large companies that need to secure containers across thousands of hosts, but many struggle to understand how exposed they are when they discover a new high security vulnerability. In his session at 21st Cloud Expo, John Morello, CTO of Twistlock, will address this pressing concern by introducing the concept of the “Vulnerability Risk Tree API,” which brings all the data together in a simple REST endpoint, allowing companies to easily grasp the severity of t...
The question before companies today is not whether to become intelligent, it’s a question of how and how fast. The key is to adopt and deploy an intelligent application strategy while simultaneously preparing to scale that intelligence. In her session at 21st Cloud Expo, Sangeeta Chakraborty, Chief Customer Officer at Ayasdi, will provide a tactical framework to become a truly intelligent enterprise, including how to identify the right applications for AI, how to build a Center of Excellence to...
Recently, WebRTC has a lot of eyes from market. The use cases of WebRTC are expanding - video chat, online education, online health care etc. Not only for human-to-human communication, but also IoT use cases such as machine to human use cases can be seen recently. One of the typical use-case is remote camera monitoring. With WebRTC, people can have interoperability and flexibility for deploying monitoring service. However, the benefit of WebRTC for IoT is not only its convenience and interopera...
Deep learning has been very successful in social sciences and specially areas where there is a lot of data. Trading is another field that can be viewed as social science with a lot of data. With the advent of Deep Learning and Big Data technologies for efficient computation, we are finally able to use the same methods in investment management as we would in face recognition or in making chat-bots. In his session at 20th Cloud Expo, Gaurav Chakravorty, co-founder and Head of Strategy Development ...
Cloud adoption is often driven by a desire to increase efficiency, boost agility and save money. All too often, however, the reality involves unpredictable cost spikes and lack of oversight due to resource limitations. In his session at 20th Cloud Expo, Joe Kinsella, CTO and Founder of CloudHealth Technologies, tackled the question: “How do you build a fully optimized cloud?” He will examine: Why TCO is critical to achieving cloud success – and why attendees should be thinking holistically ab...
The goal of Continuous Testing is to shift testing left to find defects earlier and release software faster. This can be achieved by integrating a set of open source functional and performance testing tools in the early stages of your software delivery lifecycle. There is one process that binds all application delivery stages together into one well-orchestrated machine: Continuous Testing. Continuous Testing is the conveyer belt between the Software Factory and production stages. Artifacts are m...
Connecting to major cloud service providers is becoming central to doing business. But your cloud provider’s performance is only as good as your connectivity solution. Massive Networks will place you in the driver's seat by exposing how you can extend your LAN from any location to include any cloud platform through an advanced high-performance connection that is secure and dedicated to your business-critical data. In his session at 21st Cloud Expo, Paul Mako, CEO & CIO of Massive Networks, wil...
When shopping for a new data processing platform for IoT solutions, many development teams want to be able to test-drive options before making a choice. Yet when evaluating an IoT solution, it’s simply not feasible to do so at scale with physical devices. Building a sensor simulator is the next best choice; however, generating a realistic simulation at very high TPS with ease of configurability is a formidable challenge. When dealing with multiple application or transport protocols, you would be...
As businesses adopt functionalities in cloud computing, it’s imperative that IT operations consistently ensure cloud systems work correctly – all of the time, and to their best capabilities. In his session at @BigDataExpo, Bernd Harzog, CEO and founder of OpsDataStore, presented an industry answer to the common question, “Are you running IT operations as efficiently and as cost effectively as you need to?” He then expounded on the industry issues he frequently came up against as an analyst, and ...
Cloud resources, although available in abundance, are inherently volatile. For transactional computing, like ERP and most enterprise software, this is a challenge as transactional integrity and data fidelity is paramount – making it a challenge to create cloud native applications while relying on RDBMS. In his session at 21st Cloud Expo, Claus Jepsen, Chief Architect and Head of Innovation Labs at Unit4, will explore that in order to create distributed and scalable solutions ensuring high availa...
SYS-CON Events announced today that App2Cloud will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct. 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. App2Cloud is an online Platform, specializing in migrating legacy applications to any Cloud Providers (AWS, Azure, Google Cloud).
In his session at @DevOpsSummit at 20th Cloud Expo, Kelly Looney, director of DevOps consulting for Skytap, showed how an incremental approach to introducing containers into complex, distributed applications results in modernization with less risk and more reward. He also shared the story of how Skytap used Docker to get out of the business of managing infrastructure, and into the business of delivering innovation and business value. Attendees learned how up-front planning allows for a clean sep...