Welcome!

Blog Feed Post

It’s Time to Kill the Elephant

Google started using MapReduce about 10 years ago.  Somewhere between there and now, Doug Cutting decided that he could copy it while at Yahoo and Hadoop was born.  Doug now works at a company named Cloudera who bills themselves as providing the “only solution that manages Apache Hadoop across the enterprise.”  Hadoop has been around for so long that even leading analyst firms are covering it, claiming that if your organization is an early adopter, you need to be looking at Hadoop.  Hear that Luddites?  Time to get moving.

Hadoop Is Picking Up Speed

MAYBE THERE’S A REASON FOR THAT

Recently, Google announced their move away from batch based MapReduce to something a little more real time.  Seams like it was taking days to update search results with something that you might be interested in.  Google never open sourced their implementation of MapReduce, which is said to be at least one or two orders of magnitude faster than Hadoop.  But still not fast enough.

EVEN YAHOO IS GETTING INTO THE ACT

Yahoo used to have a substantial relationship with Cloudera, at least according to Cloudera.  But now even Yahoo have started a company to distribute and support Hadoop.  Yahoo calls their company hortonworks.

WHAT THIS MEANS TO YOU

Without getting into things like how much data and corresponding analysis you need to do before Hadoop makes any sense to use at all (most companies are not going to see any benefit at all), let’s recognize something.  All of these recent shifts from companies like Google, Yahoo, and others no longer see a competitive advantage in batch based MapReduce.  The future has arrived, let’s look at some evidence.

REAL TIME HADOOP

MapReduce

There have been more than a handful of releases in this space – like S4 from Yahoo, HStreaming, Storm, and several NoSQL databases now supporting this, it means that for competitive advantage, you’d best be getting some real-time.  And getting it soon.

WHAT IS REAL-TIME?

Database vendors like DataStax, who support Cassandra, claim to be real-time.  They’re not.  They say that they’re real time because as soon as you commit data to the database, it’s available for query.  That’s supported by just about every database and hardly a new and exciting feature of NoSQL.  Even one of their big shots left to start a real time company named Platfora.

CONTINUOUS QUERY OR EVENT-DRIVEN

Rather than thinking about what real-time is or is not, let’s worry about event-driven.  Let’s use an example:

I’m a manager, and I want to know when the average time on my website dips below 2 minutes.  Using the ‘my database is real time because the data I send to it can be queried after I write it’ means that I would have to run this query repeatedly at regular intervals to catch this mounting exodus from my web properties.

THERE’S GOT TO BE A BETTER WAY

And there is, it’s called continuous query.  I ask the same question as above, and there’s some process somewhere that’s sessionizing data from my web logs and injecting that into that server – the same server that I sent the query above to.  And when that process finds a web session that lasted less than 2 minutes, it sends another ‘row’ to the program that submitted that query.

ABRACADABRA

Waiting for Hadoop Query

And then I’ve got it on my dashboard, and can switch out the really badly designed page the marketing department A/B’d this morning.  That’s continuous query, or event-driven.  The term real-time didn’t even need to be mentioned.  If I was running batch based Hadoop, that notification could have taken hours, or days.  How much money would your company lose if that happened to you?

BACK TO MAP/REDUCE

I am Node of Cluster...

So if I can do the above, why do I need MapReduce?  MapReduce is an algorithm for splitting work up, distributing the work out to nodes where the data lives that needs to be analyzed, and then gathering the results.  If you’re problem is big enough, MapReduce might help you get it done faster than using just one machine.

BUT EITHER WAY

If you’re running batch processes, like some well known web properties are and think that Hadoop holds an answer to your ever dwindling ad revenue, you’re mistaken.  And if you’re that CIO, the other thing you need to be working on is most likely your resume.

GET YOURSELF SOME CONTINUOUS QUERY, AND GET COMPETITIVE!

and thanks for reading!

Read the original blog entry...

More Stories By Colin Clark

Colin Clark is the CTO for Cloud Event Processing, Inc. and is widely regarded as a thought leader and pioneer in both Complex Event Processing and its application within Capital Markets.

Follow Colin on Twitter at http:\\twitter.com\EventCloudPro to learn more about cloud based event processing using map/reduce, complex event processing, and event driven pattern matching agents. You can also send topic suggestions or questions to [email protected]

Latest Stories
We all know that end users experience the Internet primarily with mobile devices. From an app development perspective, we know that successfully responding to the needs of mobile customers depends on rapid DevOps – failing fast, in short, until the right solution evolves in your customers' relationship to your business. Whether you’re decomposing an SOA monolith, or developing a new application cloud natively, it’s not a question of using microservices – not doing so will be a path to eventual b...
One of the biggest challenges with adopting a DevOps mentality is: new applications are easily adapted to cloud-native, microservice-based, or containerized architectures - they can be built for them - but old applications need complex refactoring. On the other hand, these new technologies can require relearning or adapting new, oftentimes more complex, methodologies and tools to be ready for production. In his general session at @DevOpsSummit at 20th Cloud Expo, Chris Brown, Solutions Marketi...
At the keynote this morning we spoke about the value proposition of Nutanix, of having a DevOps culture and a mindset, and the business outcomes of achieving agility and scale, which everybody here is trying to accomplish," noted Mark Lavi, DevOps Solution Architect at Nutanix, in this SYS-CON.tv interview at @DevOpsSummit at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
I think DevOps is now a rambunctious teenager - it's starting to get a mind of its own, wanting to get its own things but it still needs some adult supervision," explained Thomas Hooker, VP of marketing at CollabNet, in this SYS-CON.tv interview at DevOps Summit at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
Enterprises are moving to the cloud faster than most of us in security expected. CIOs are going from 0 to 100 in cloud adoption and leaving security teams in the dust. Once cloud is part of an enterprise stack, it’s unclear who has responsibility for the protection of applications, services, and data. When cloud breaches occur, whether active compromise or a publicly accessible database, the blame must fall on both service providers and users. In his session at 21st Cloud Expo, Ben Johnson, C...
In this presentation, you will learn first hand what works and what doesn't while architecting and deploying OpenStack. Some of the topics will include:- best practices for creating repeatable deployments of OpenStack- multi-site considerations- how to customize OpenStack to integrate with your existing systems and security best practices.
"MobiDev is a software development company and we do complex, custom software development for everybody from entrepreneurs to large enterprises," explained Alan Winters, U.S. Head of Business Development at MobiDev, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
Agile has finally jumped the technology shark, expanding outside the software world. Enterprises are now increasingly adopting Agile practices across their organizations in order to successfully navigate the disruptive waters that threaten to drown them. In our quest for establishing change as a core competency in our organizations, this business-centric notion of Agile is an essential component of Agile Digital Transformation. In the years since the publication of the Agile Manifesto, the conn...
"We work around really protecting the confidentiality of information, and by doing so we've developed implementations of encryption through a patented process that is known as superencipherment," explained Richard Blech, CEO of Secure Channels Inc., in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
Major trends and emerging technologies – from virtual reality and IoT, to Big Data and algorithms – are helping organizations innovate in the digital era. However, to create real business value, IT must think beyond the ‘what’ of digital transformation to the ‘how’ to harness emerging trends, innovation and disruption. Architecture is the key that underpins and ties all these efforts together. In the digital age, it’s important to invest in architecture, extend the enterprise footprint to the cl...
Leading companies, from the Global Fortune 500 to the smallest companies, are adopting hybrid cloud as the path to business advantage. Hybrid cloud depends on cloud services and on-premises infrastructure working in unison. Successful implementations require new levels of data mobility, enabled by an automated and seamless flow across on-premises and cloud resources. In his general session at 21st Cloud Expo, Greg Tevis, an IBM Storage Software Technical Strategist and Customer Solution Architec...
Data is the fuel that drives the machine learning algorithmic engines and ultimately provides the business value. In his session at Cloud Expo, Ed Featherston, a director and senior enterprise architect at Collaborative Consulting, discussed the key considerations around quality, volume, timeliness, and pedigree that must be dealt with in order to properly fuel that engine.
"We were founded in 2003 and the way we were founded was about good backup and good disaster recovery for our clients, and for the last 20 years we've been pretty consistent with that," noted Marc Malafronte, Territory Manager at StorageCraft, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
Effectively SMBs and government programs must address compounded regulatory compliance requirements. The most recent are Controlled Unclassified Information and the EU's GDPR have Board Level implications. Managing sensitive data protection will likely result in acquisition criteria, demonstration requests and new requirements. Developers, as part of the pre-planning process and the associated supply chain, could benefit from updating their code libraries and design by incorporating changes. In...
Two weeks ago (November 3-5), I attended the Cloud Expo Silicon Valley as a speaker, where I presented on the security and privacy due diligence requirements for cloud solutions. Cloud security is a topical issue for every CIO, CISO, and technology buyer. Decision-makers are always looking for insights on how to mitigate the security risks of implementing and using cloud solutions. Based on the presentation topics covered at the conference, as well as the general discussions heard between sessio...