Welcome!

Blog Feed Post

It’s Time to Kill the Elephant

Google started using MapReduce about 10 years ago.  Somewhere between there and now, Doug Cutting decided that he could copy it while at Yahoo and Hadoop was born.  Doug now works at a company named Cloudera who bills themselves as providing the “only solution that manages Apache Hadoop across the enterprise.”  Hadoop has been around for so long that even leading analyst firms are covering it, claiming that if your organization is an early adopter, you need to be looking at Hadoop.  Hear that Luddites?  Time to get moving.

Hadoop Is Picking Up Speed

MAYBE THERE’S A REASON FOR THAT

Recently, Google announced their move away from batch based MapReduce to something a little more real time.  Seams like it was taking days to update search results with something that you might be interested in.  Google never open sourced their implementation of MapReduce, which is said to be at least one or two orders of magnitude faster than Hadoop.  But still not fast enough.

EVEN YAHOO IS GETTING INTO THE ACT

Yahoo used to have a substantial relationship with Cloudera, at least according to Cloudera.  But now even Yahoo have started a company to distribute and support Hadoop.  Yahoo calls their company hortonworks.

WHAT THIS MEANS TO YOU

Without getting into things like how much data and corresponding analysis you need to do before Hadoop makes any sense to use at all (most companies are not going to see any benefit at all), let’s recognize something.  All of these recent shifts from companies like Google, Yahoo, and others no longer see a competitive advantage in batch based MapReduce.  The future has arrived, let’s look at some evidence.

REAL TIME HADOOP

MapReduce

There have been more than a handful of releases in this space – like S4 from Yahoo, HStreaming, Storm, and several NoSQL databases now supporting this, it means that for competitive advantage, you’d best be getting some real-time.  And getting it soon.

WHAT IS REAL-TIME?

Database vendors like DataStax, who support Cassandra, claim to be real-time.  They’re not.  They say that they’re real time because as soon as you commit data to the database, it’s available for query.  That’s supported by just about every database and hardly a new and exciting feature of NoSQL.  Even one of their big shots left to start a real time company named Platfora.

CONTINUOUS QUERY OR EVENT-DRIVEN

Rather than thinking about what real-time is or is not, let’s worry about event-driven.  Let’s use an example:

I’m a manager, and I want to know when the average time on my website dips below 2 minutes.  Using the ‘my database is real time because the data I send to it can be queried after I write it’ means that I would have to run this query repeatedly at regular intervals to catch this mounting exodus from my web properties.

THERE’S GOT TO BE A BETTER WAY

And there is, it’s called continuous query.  I ask the same question as above, and there’s some process somewhere that’s sessionizing data from my web logs and injecting that into that server – the same server that I sent the query above to.  And when that process finds a web session that lasted less than 2 minutes, it sends another ‘row’ to the program that submitted that query.

ABRACADABRA

Waiting for Hadoop Query

And then I’ve got it on my dashboard, and can switch out the really badly designed page the marketing department A/B’d this morning.  That’s continuous query, or event-driven.  The term real-time didn’t even need to be mentioned.  If I was running batch based Hadoop, that notification could have taken hours, or days.  How much money would your company lose if that happened to you?

BACK TO MAP/REDUCE

I am Node of Cluster...

So if I can do the above, why do I need MapReduce?  MapReduce is an algorithm for splitting work up, distributing the work out to nodes where the data lives that needs to be analyzed, and then gathering the results.  If you’re problem is big enough, MapReduce might help you get it done faster than using just one machine.

BUT EITHER WAY

If you’re running batch processes, like some well known web properties are and think that Hadoop holds an answer to your ever dwindling ad revenue, you’re mistaken.  And if you’re that CIO, the other thing you need to be working on is most likely your resume.

GET YOURSELF SOME CONTINUOUS QUERY, AND GET COMPETITIVE!

and thanks for reading!

Read the original blog entry...

More Stories By Colin Clark

Colin Clark is the CTO for Cloud Event Processing, Inc. and is widely regarded as a thought leader and pioneer in both Complex Event Processing and its application within Capital Markets.

Follow Colin on Twitter at http:\\twitter.com\EventCloudPro to learn more about cloud based event processing using map/reduce, complex event processing, and event driven pattern matching agents. You can also send topic suggestions or questions to [email protected]

Latest Stories
"Digital transformation - what we knew about it in the past has been redefined. Automation is going to play such a huge role in that because the culture, the technology, and the business operations are being shifted now," stated Brian Boeggeman, VP of Alliances & Partnerships at Ayehu, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
The past few years have brought a sea change in the way applications are architected, developed, and consumed—increasing both the complexity of testing and the business impact of software failures. How can software testing professionals keep pace with modern application delivery, given the trends that impact both architectures (cloud, microservices, and APIs) and processes (DevOps, agile, and continuous delivery)? This is where continuous testing comes in. D
"Evatronix provides design services to companies that need to integrate the IoT technology in their products but they don't necessarily have the expertise, knowledge and design team to do so," explained Adam Morawiec, VP of Business Development at Evatronix, in this SYS-CON.tv interview at @ThingsExpo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
"WineSOFT is a software company making proxy server software, which is widely used in the telecommunication industry or the content delivery networks or e-commerce," explained Jonathan Ahn, COO of WineSOFT, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
Smart cities have the potential to change our lives at so many levels for citizens: less pollution, reduced parking obstacles, better health, education and more energy savings. Real-time data streaming and the Internet of Things (IoT) possess the power to turn this vision into a reality. However, most organizations today are building their data infrastructure to focus solely on addressing immediate business needs vs. a platform capable of quickly adapting emerging technologies to address future ...
Mobile device usage has increased exponentially during the past several years, as consumers rely on handhelds for everything from news and weather to banking and purchases. What can we expect in the next few years? The way in which we interact with our devices will fundamentally change, as businesses leverage Artificial Intelligence. We already see this taking shape as businesses leverage AI for cost savings and customer responsiveness. This trend will continue, as AI is used for more sophistica...
There is a huge demand for responsive, real-time mobile and web experiences, but current architectural patterns do not easily accommodate applications that respond to events in real time. Common solutions using message queues or HTTP long-polling quickly lead to resiliency, scalability and development velocity challenges. In his session at 21st Cloud Expo, Ryland Degnan, a Senior Software Engineer on the Netflix Edge Platform team, will discuss how by leveraging a reactive stream-based protocol,...
In his Opening Keynote at 21st Cloud Expo, John Considine, General Manager of IBM Cloud Infrastructure, led attendees through the exciting evolution of the cloud. He looked at this major disruption from the perspective of technology, business models, and what this means for enterprises of all sizes. John Considine is General Manager of Cloud Infrastructure Services at IBM. In that role he is responsible for leading IBM’s public cloud infrastructure including strategy, development, and offering m...
Sanjeev Sharma Joins June 5-7, 2018 @DevOpsSummit at @Cloud Expo New York Faculty. Sanjeev Sharma is an internationally known DevOps and Cloud Transformation thought leader, technology executive, and author. Sanjeev's industry experience includes tenures as CTO, Technical Sales leader, and Cloud Architect leader. As an IBM Distinguished Engineer, Sanjeev is recognized at the highest levels of IBM's core of technical leaders.
Product connectivity goes hand and hand these days with increased use of personal data. New IoT devices are becoming more personalized than ever before. In his session at 22nd Cloud Expo | DXWorld Expo, Nicolas Fierro, CEO of MIMIR Blockchain Solutions, will discuss how in order to protect your data and privacy, IoT applications need to embrace Blockchain technology for a new level of product security never before seen - or needed.
The 22nd International Cloud Expo | 1st DXWorld Expo has announced that its Call for Papers is open. Cloud Expo | DXWorld Expo, to be held June 5-7, 2018, at the Javits Center in New York, NY, brings together Cloud Computing, Digital Transformation, Big Data, Internet of Things, DevOps, Machine Learning and WebRTC to one location. With cloud computing driving a higher percentage of enterprise IT budgets every year, it becomes increasingly important to plant your flag in this fast-expanding busin...
Digital transformation is about embracing digital technologies into a company's culture to better connect with its customers, automate processes, create better tools, enter new markets, etc. Such a transformation requires continuous orchestration across teams and an environment based on open collaboration and daily experiments. In his session at 21st Cloud Expo, Alex Casalboni, Technical (Cloud) Evangelist at Cloud Academy, explored and discussed the most urgent unsolved challenges to achieve f...
SYS-CON Events announced today that Synametrics Technologies will exhibit at SYS-CON's 22nd International Cloud Expo®, which will take place on June 5-7, 2018, at the Javits Center in New York, NY. Synametrics Technologies is a privately held company based in Plainsboro, New Jersey that has been providing solutions for the developer community since 1997. Based on the success of its initial product offerings such as WinSQL, Xeams, SynaMan and Syncrify, Synametrics continues to create and hone inn...
Digital Transformation (DX) is not a "one-size-fits all" strategy. Each organization needs to develop its own unique, long-term DX plan. It must do so by realizing that we now live in a data-driven age, and that technologies such as Cloud Computing, Big Data, the IoT, Cognitive Computing, and Blockchain are only tools. In her general session at 21st Cloud Expo, Rebecca Wanta explained how the strategy must focus on DX and include a commitment from top management to create great IT jobs, monitor ...
In a recent survey, Sumo Logic surveyed 1,500 customers who employ cloud services such as Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP). According to the survey, a quarter of the respondents have already deployed Docker containers and nearly as many (23 percent) are employing the AWS Lambda serverless computing framework. It’s clear: serverless is here to stay. The adoption does come with some needed changes, within both application development and operations. Tha...