Welcome!

Related Topics: @CloudExpo, Open Source Cloud

@CloudExpo: Blog Feed Post

Top Five Hosted Hadoop-Based Applications Reviewed

Amazon Elastic MapReduce; Cloud Era CDH; InfoSphere BigInsights; MapR M3 and M5 and Hortonworks Data Platform

It is our goal at Monitis to make the lives of web developers and system administrators easy. We have reviewed the 5 leading hosted hadoop-based applications and given a short analysis of them in this post to help guide you in finding a solution that best suits your needs.

The article covers: Amazon Elastic MapReduce; Cloud Era CDH; InfoSphere BigInsights; MapR M3 and M5 and Hortonworks Data Platform.

amazon_web_services_logo_aws

Amazon Elastic MapReduce (http://aws.amazon.com/elasticmapreduce/)

Introduced by Amazon in 2009, Elastic MapReduce automates the process of various Hadoop cluster processes and transfers between Amazon’s EC2 and S3 products. For a minimal fee, Amazon will provide its clients with the ability to launch a preconfigured Hadoop cluster to run a client’s MapReduce Program.

AWS Screenshot

Pros

  • Very easy to setup a job flow
  • There’s an enormous amount of documentation available to help new users
  • Example applications are provided, giving an option to test drive the application before putting it to use.
  • Entire application system can be powered by a command line interface, compared to a web-based management console.
  • Ability to conduct several jobs simultaneously and parallel.
  • No hardware is needed and costs can be very limited, which is great for small businesses seeking to be more cost efficient.

Cons

  • Need an account with Amazon Web Services (AWS)
  • Service is only available in the United States
  • Requires the use of Amazon’s S3 service, which adds extra costs to an overall project (data transfer, security etc.)

 

Cloudera Logo

Cloudera CDH (www.cloudera.com)

Founded in March 2009, Cloudera was previously considered to be the Red Hat of the Hadoop World. With a large customer base of over 400 (including paid and free downloads), the company’s offerings include the  Cloudera Enterprise products and Training & Support Services. Formed by a number of key executives from various technology giants (Oracle, Yahoo, Google and Facebook), Cloudera is considered the pioneer in the Hadoop community, having a head start in the industry compared to its competitors.

Cloudera screen.jpg

Pros

  • Free application that can be easily downloaded
  • Installed internally within an organization which allows the company to have full control of all processes, jobs etc.
  • Technical support is superior and the knowledgebase is an essential resource to anyone starting out with Hadoop
  • Used by a large number of companies worldwide, and has been proven as a leading choice in Hadoop applications.
  • Application includes additional resources and components (e.g. Pig, Hive, Flume, HBase, Zookeeper, Mahout, Whirr, Hue, Sqoop and Oozie)
  • Cloudera conducts quarterly updates: eliminating the need to conduct a big scale annual upgrade.

Cons

  • Requires companies to obtain the necessary hardware in order to install the application, adding additional costs.
  • Additional costs are added to support and maintain the application, increasing the company’s operating costs.

 

logo-ibm-infosphere

IBM InfoSphere BigInsights (www.ibm.com/software/data/infosphere/biginsights)

A new product introduced in May 2011, the product is geared towards handling extremely large volumes of streaming data using a Hadoop-based analytics framework. IBM states that the IBM InfoSphere Biginsights will be able to handle “tens-of-petabytes” of data, and will retain a sub-millisecond response time. The company also plans to launch 20 new service offerings, including numerous analytical tools for business and IT.

ibm

Pros

  • Superior product support and long standing company reputation established from many years of servicing the IT community.
  • Comes standard with a number of essential components including; PIG programming, IBM DB2 and IBM BigSheets.
  • Offers two replication models that provide log-based replication working independently (queue-based and SQL-based).
  • Lots of documentation and step-by-step training is available from the IBM website.
  • Superior product for analysing big data in motion that needs to be continuously analyzed in real time.

Cons

  • New to the marketplace and has not been around long enough to ensure a solid reputation.
  • An expensive solution for small/medium size organizations seeking to utilize a more cost effective application.

 

30621_MapR_Logo_Tag-Copy

MapR M3 and M5 (www.mapr.com)

With headquarters in San Jose, CA, MapR markets its proprietary applications with a focus on providing a number of key features and capabilities for the use with MapReduce and Hadoop.

mappr

Pros

  • Offers superior monitoring that can provide a better understanding of data distribution and processing – essential for achieving increased performance.
  • A free version is offered, which includes everything except management tools which are only offered in its M5 series products.
  • Excellent technical support and vast quantities of documentation available

Cons

  • New to the marketplace so has a limited reputation
  • An expensive solution for small/medium size organizations
  • 24×7 support is only available on the paid version of the application
  • Requires an enormous amount of disk space to install (25GB), compared to similar products.

 

hortonworks

Hortonworks Data Platform (http://hortonworks.com/)

Hortonworks was formed in June 2011 by a number of key architects and Hadoop committers formerly employed within the Yahoo Hadoop Software department. The company’s offerings include; HDP (Hadoop Data Platform) and Training Support Services. The company currently serves 2 customers – Yahoo and Microsoft.

Pros

  • A spin-off Yahoo product, so it’s been tested in the marketplace.
  • Lots of documentation and support available from the knowledgebase community.
  • The company is continuously working with Yahoo to develop its future products
  • Scalable to meet the demands of specific projects.
  • Offers variations and expanded product offerings from partnerships with a number of specialized companies.

Cons

  • Product is similar in nature to Cloudera, and provides similar features.

 

1 YEAR WEBSITE TRAFFIC COMPARISON (from Compete.com)

Hadoop Based Application Website Performance

Hadoop Based Application Website Performance Stats

Hopefully our post has been of interest to web developers and system administrators.

More information on Monitis can be found on our website: www.monitis.com

More Stories By Hovhannes Avoyan

Hovhannes Avoyan is the CEO of PicsArt, Inc.,

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


Latest Stories
WebRTC has had a real tough three or four years, and so have those working with it. Only a few short years ago, the development world were excited about WebRTC and proclaiming how awesome it was. You might have played with the technology a couple of years ago, only to find the extra infrastructure requirements were painful to implement and poorly documented. This probably left a bitter taste in your mouth, especially when things went wrong.
You think you know what’s in your data. But do you? Most organizations are now aware of the business intelligence represented by their data. Data science stands to take this to a level you never thought of – literally. The techniques of data science, when used with the capabilities of Big Data technologies, can make connections you had not yet imagined, helping you discover new insights and ask new questions of your data. In his session at @ThingsExpo, Sarbjit Sarkaria, data science team lead ...
"We provide DevOps solutions. We also partner with some key players in the DevOps space and we use the technology that we partner with to engineer custom solutions for different organizations," stated Himanshu Chhetri, CTO of Addteq, in this SYS-CON.tv interview at DevOps at 18th Cloud Expo, held June 7-9, 2016, at the Javits Center in New York City, NY.
DevOps tends to focus on the relationship between Dev and Ops, putting an emphasis on the ops and application infrastructure. But that’s changing with microservices architectures. In her session at DevOps Summit, Lori MacVittie, Evangelist for F5 Networks, will focus on how microservices are changing the underlying architectures needed to scale, secure and deliver applications based on highly distributed (micro) services and why that means an expansion into “the network” for DevOps.
WebRTC is bringing significant change to the communications landscape that will bridge the worlds of web and telephony, making the Internet the new standard for communications. Cloud9 took the road less traveled and used WebRTC to create a downloadable enterprise-grade communications platform that is changing the communication dynamic in the financial sector. In his session at @ThingsExpo, Leo Papadopoulos, CTO of Cloud9, discussed the importance of WebRTC and how it enables companies to focus o...
A critical component of any IoT project is what to do with all the data being generated. This data needs to be captured, processed, structured, and stored in a way to facilitate different kinds of queries. Traditional data warehouse and analytical systems are mature technologies that can be used to handle certain kinds of queries, but they are not always well suited to many problems, particularly when there is a need for real-time insights.
Providing secure, mobile access to sensitive data sets is a critical element in realizing the full potential of cloud computing. However, large data caches remain inaccessible to edge devices for reasons of security, size, format or limited viewing capabilities. Medical imaging, computer aided design and seismic interpretation are just a few examples of industries facing this challenge. Rather than fighting for incremental gains by pulling these datasets to edge devices, we need to embrace the i...
Web Real-Time Communication APIs have quickly revolutionized what browsers are capable of. In addition to video and audio streams, we can now bi-directionally send arbitrary data over WebRTC's PeerConnection Data Channels. With the advent of Progressive Web Apps and new hardware APIs such as WebBluetooh and WebUSB, we can finally enable users to stitch together the Internet of Things directly from their browsers while communicating privately and securely in a decentralized way.
All organizations that did not originate this moment have a pre-existing culture as well as legacy technology and processes that can be more or less amenable to DevOps implementation. That organizational culture is influenced by the personalities and management styles of Executive Management, the wider culture in which the organization is situated, and the personalities of key team members at all levels of the organization. This culture and entrenched interests usually throw a wrench in the work...
@DevOpsSummit at Cloud taking place June 6-8, 2017, at Javits Center, New York City, is co-located with the 20th International Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time to wait for long developm...
With major technology companies and startups seriously embracing IoT strategies, now is the perfect time to attend @ThingsExpo 2016 in New York. Learn what is going on, contribute to the discussions, and ensure that your enterprise is as "IoT-Ready" as it can be! Internet of @ThingsExpo, taking place June 6-8, 2017, at the Javits Center in New York City, New York, is co-located with 20th Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry p...
"We're bringing out a new application monitoring system to the DevOps space. It manages large enterprise applications that are distributed throughout a node in many enterprises and we manage them as one collective," explained Kevin Barnes, President of eCube Systems, in this SYS-CON.tv interview at DevOps at 18th Cloud Expo, held June 7-9, 2016, at the Javits Center in New York City, NY.
In his General Session at 17th Cloud Expo, Bruce Swann, Senior Product Marketing Manager for Adobe Campaign, explored the key ingredients of cross-channel marketing in a digital world. Learn how the Adobe Marketing Cloud can help marketers embrace opportunities for personalized, relevant and real-time customer engagement across offline (direct mail, point of sale, call center) and digital (email, website, SMS, mobile apps, social networks, connected objects).
SYS-CON Events announced today that Catchpoint, a leading digital experience intelligence company, has been named “Silver Sponsor” of SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Catchpoint Systems is a leading Digital Performance Analytics company that provides unparalleled insight into your customer-critical services to help you consistently deliver an amazing customer experience. Designed for digital business, C...
A look across the tech landscape at the disruptive technologies that are increasing in prominence and speculate as to which will be most impactful for communications – namely, AI and Cloud Computing. In his session at 20th Cloud Expo, Curtis Peterson, VP of Operations at RingCentral, will highlight the current challenges of these transformative technologies and share strategies for preparing your organization for these changes. This “view from the top” will outline the latest trends and developm...