Welcome!

Blog Feed Post

Hadoop Will Not Mow Your Lawn


"The best minds of my generation are thinking about how to make people click ads." Jeff Hammerbacher ex- Facebook Architect

It turns out that when you have a lot of "best minds" working on the same problem, you come up with some pretty interesting technology - no matter how inane that problem may be.

The technology that those "best minds" at Yahoo came up with to target ads to users is called Hadoop. 

Hadoop is a powerful technology and like most new IT solutions is being touted at being able to solve a vast number of technical ills. When companies discover that Hadoop will not in fact cure male pattern balding, they will fall into the inevitable trough of disillusionment

Here are some thoughts about what Hadoop can and cannot do:

1. RDBS are for business data, Hadoop is for web data

Almost all traditional business data fits well into the relational model, including data about customers (CRM), products (ERP) and employees (HR). This data should continue to live in relational databases, where it is much easier to manage and access than in Hadoop.

Almost all web data fits well into the Hadoop model, including log files, email and social media. This data would be almost impossible to store in a relational database, not just because of the volume, but because of the inherently nested quality of the data (threaded email conversations, web site directory structures, social media graphs).

2. Hadoop is really good at analyzing web data

Hadoop is incredibly good at looking at huge amounts of web data and figuring out why people clicked on the blue button instead of the red one. This can be generated to a few other computer log formats, but the list is relatively small, including:
How many other data types look like click streams? Not very many. How many other real world problems lend themselves to analysis using web data analytic techniques? Also not as many as you might think.

This is not to take anything from the Hadoop market opportunity - as more of the world interacts with each other via web applications and devices, more of the world's data will be reducible to click-stream-like formats. 

The big data craze has taken over the tech media world much like the cloud craze. Most people know it is important but they don't know why. Many vendors get caught up in the hype cycle and start to believe that their technology has some sort of manifest destiny that will allow it to do much more than it can reasonably be expected to do.

3. Hadoop is a Pay Me Later Technology

Traditional data warehouses work on a "pay me now" basis. To get data into the data warehouse - even data that may not end up being useful in any way - you have to massage the data into a formal relational model. This is expensive and the data normalization process itself may make it impossible to get at the data in exactly the way you want to.

In contrast, Hadoop works on a "pay me later" basis. Data can be shoved into the Hadoop file system any old way. It is not until someone wants to analyze the data that you have to worry about how to connect all the pieces. The gotcha is that the price you pay in this "pay me later" model is much higher, requiring extensive programming in order to ask each question. 

In addition, because the normalization process wasn't done up front, it won't be until later that you may discover that you were missing crucial pieces of information all along. Thus it does bear some thinking up front on what sort of data to store in your Hadoop database and what kinds of questions you might want to be able to answer about that data in the future.  

Realistically, it will take most businesses who implement several years to figure out whether all the data they are dumping into Hadoop produces real value out the back end, just as it was several years before companies started to get a payout from their investments in relational data warehouses.

4. Use the right tool for the right job

Back in my - very brief - high school shop days, we learned that the trick to making a really nice looking ash tray is picking the right tool for the right job.
  • Hadoop is web data query engine that requires a high level of effort for each new query. 
  • Relational is a business data query engine that requires a high level of effort to format and load data into the datastore.
The fastest way for companies to get into trouble with Hadoop is to try to use it as a one-size-fits-all data warehouse. Much of the news in the Hadoop world today has to do with SQL parsers that run on top of Hadoop data. This is a powerful and valuable technology, but does not mean that you can throw out your data warehouse and replace it with Hadoop just yet.



Read the original blog entry...

More Stories By Christopher Keene

Christopher Keene is Chairman and CEO of WaveMaker (formerly ActiveGrid). He was the founder, in 1991, of Persistence Software, a San Mateo, CA-based company that created a new approach for managing data in high-transaction banking and communications systems. Persistence Software investors included Cisco, Intel, Reuters and Sun Microsystems. The company went public in 1999 on the NASDAQ exchange and was sold in 2004 to Progress software.

After leaving Persistence Software in 2005, Chris spent a year in France as chairman of Reportive Software, a Paris-based maker of business-intelligence tools, and as an adjunct professor and entrepreneur-in-residence at INSEAD, a leading graduate business school.

Latest Stories
In his session at @ThingsExpo, Eric Lachapelle, CEO of the Professional Evaluation and Certification Board (PECB), provided an overview of various initiatives to certify the security of connected devices and future trends in ensuring public trust of IoT. Eric Lachapelle is the Chief Executive Officer of the Professional Evaluation and Certification Board (PECB), an international certification body. His role is to help companies and individuals to achieve professional, accredited and worldwide re...
The Internet giants are fully embracing AI. All the services they offer to their customers are aimed at drawing a map of the world with the data they get. The AIs from these companies are used to build disruptive approaches that cannot be used by established enterprises, which are threatened by these disruptions. However, most leaders underestimate the effect this will have on their businesses. In his session at 21st Cloud Expo, Rene Buest, Director Market Research & Technology Evangelism at Ara...
It is ironic, but perhaps not unexpected, that many organizations who want the benefits of using an Agile approach to deliver software use a waterfall approach to adopting Agile practices: they form plans, they set milestones, and they measure progress by how many teams they have engaged. Old habits die hard, but like most waterfall software projects, most waterfall-style Agile adoption efforts fail to produce the results desired. The problem is that to get the results they want, they have to ch...
Cloud Expo, Inc. has announced today that Andi Mann and Aruna Ravichandran have been named Co-Chairs of @DevOpsSummit at Cloud Expo Silicon Valley which will take place Oct. 31-Nov. 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. "DevOps is at the intersection of technology and business-optimizing tools, organizations and processes to bring measurable improvements in productivity and profitability," said Aruna Ravichandran, vice president, DevOps product and solutions marketing...
"Loom is applying artificial intelligence and machine learning into the entire log analysis process, from start to finish and at the end you will get a human touch,” explained Sabo Taylor Diab, Vice President, Marketing at Loom Systems, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
Wooed by the promise of faster innovation, lower TCO, and greater agility, businesses of every shape and size have embraced the cloud at every layer of the IT stack – from apps to file sharing to infrastructure. The typical organization currently uses more than a dozen sanctioned cloud apps and will shift more than half of all workloads to the cloud by 2018. Such cloud investments have delivered measurable benefits. But they’ve also resulted in some unintended side-effects: complexity and risk. ...
"We are a monitoring company. We work with Salesforce, BBC, and quite a few other big logos. We basically provide monitoring for them, structure for their cloud services and we fit into the DevOps world" explained David Gildeh, Co-founder and CEO of Outlyer, in this SYS-CON.tv interview at DevOps Summit at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
21st International Cloud Expo, taking place October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud strategy. Me...
With major technology companies and startups seriously embracing Cloud strategies, now is the perfect time to attend 21st Cloud Expo October 31 - November 2, 2017, at the Santa Clara Convention Center, CA, and June 12-14, 2018, at the Javits Center in New York City, NY, and learn what is going on, contribute to the discussions, and ensure that your enterprise is on the right path to Digital Transformation.
SYS-CON Events announced today that Enzu will exhibit at SYS-CON's 21st Int\ernational Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Enzu’s mission is to be the leading provider of enterprise cloud solutions worldwide. Enzu enables online businesses to use its IT infrastructure to their competitive advantage. By offering a suite of proven hosting and management services, Enzu wants companies to focus on the core of their ...
DevOps at Cloud Expo, taking place October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 21st Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time to w...
In 2014, Amazon announced a new form of compute called Lambda. We didn't know it at the time, but this represented a fundamental shift in what we expect from cloud computing. Now, all of the major cloud computing vendors want to take part in this disruptive technology. In his session at 20th Cloud Expo, Doug Vanderweide, an instructor at Linux Academy, discussed why major players like AWS, Microsoft Azure, IBM Bluemix, and Google Cloud Platform are all trying to sidestep VMs and containers wit...
With major technology companies and startups seriously embracing Cloud strategies, now is the perfect time to attend 21st Cloud Expo October 31 - November 2, 2017, at the Santa Clara Convention Center, CA, and June 12-14, 2018, at the Javits Center in New York City, NY, and learn what is going on, contribute to the discussions, and ensure that your enterprise is on the right path to Digital Transformation.
Internet of @ThingsExpo, taking place October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 21st Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The Internet of Things (IoT) is the most profound change in personal and enterprise IT since the creation of the Worldwide Web more than 20 years ago. All major researchers estimate there will be tens of billions devic...
Amazon started as an online bookseller 20 years ago. Since then, it has evolved into a technology juggernaut that has disrupted multiple markets and industries and touches many aspects of our lives. It is a relentless technology and business model innovator driving disruption throughout numerous ecosystems. Amazon’s AWS revenues alone are approaching $16B a year making it one of the largest IT companies in the world. With dominant offerings in Cloud, IoT, eCommerce, Big Data, AI, Digital Assista...