|By Thomas Krafft||
|August 28, 2012 08:35 PM EDT||
Excellent paper released by researchers at University of California, Berkeley . They have analyzed data from Hadoop installation at Facebook (one of the largest as such in the world) looking at various metrics for Hadoop jobs running at Facebook datacenter that has over 3,000 computers dedicated to Hadoop-based processing.
They have come up with very interesting insights. I advise everyone read it firsthand but I will list some of the interesting bits.
Traditional quest for disk locality (a.k.a. affinity between the Hadoop task and the disk that contains the input data for that task) was based on two key assumptions:
- Local disk access is significantly faster than network access to remote disk
- Hadoop tasks spend significant amount of their processing time in disk IO reading input data
Through careful analysis of Hadoop system at Facebook (as their prime testbed) authors claim that both of these assumptions are rapidly loosing hold:
- With new full-bisection topologies in the modern data centers the local disk access is almost identical in performance to a network access even across the racks (with performance difference today between two is less than 10%).
- Greater parallelization and data compressions leads to lower disk IO demand on the individual tasks; in fact, Hadoop job at Facebook deal mostly with text-baed data that can be compressed dramatically.
Authors then argue that memory locality (i.e. keeping input data in memory and maintaining affinity between Hadoop task and its in-memory input data) produces much greater performance advantages because:
- RAM access is up to three orders of magnitude faster than a local disk access
- Even though memory size is significantly less than disk capacity it is large enough for most cases (see below)
Consider this fact: despite the fact that 75% of all HDFS blocks are accessed only once the 64% of Hadoop jobs at Facebook achieve the full memory locality for all their tasks (!). In case of Hadoop – full locality means that there is no outlier task that will have to access disk and delay the entire job. And this is all achieved utilizing rather primitive LFU caching policy and basic pre-fetching for input data.
With these facts authors conclude that disk locality is no longer worth while to vie for – and in-memory co-location is the way forward for high performance big data processing as it yields far greater returns.
Facebook’s case is a solid proof of this technology, and GridGain’s In-Memory Data Platform is a solid platform for the rest of us.
WebRTC services have already permeated corporate communications in the form of videoconferencing solutions. However, WebRTC has the potential of going beyond and catalyzing a new class of services providing more than calls with capabilities such as mass-scale real-time media broadcasting, enriched and augmented video, person-to-machine and machine-to-machine communications. In his session at @ThingsExpo, Luis Lopez, CEO of Kurento, will introduce the technologies required for implementing thes...
Sep. 2, 2015 02:15 PM EDT
Mobile, social, Big Data, and cloud have fundamentally changed the way we live. “Anytime, anywhere” access to data and information is no longer a luxury; it’s a requirement, in both our personal and professional lives. For IT organizations, this means pressure has never been greater to deliver meaningful services to the business and customers.
Sep. 2, 2015 02:00 PM EDT Reads: 815
17th Cloud Expo, taking place Nov 3-5, 2015, at the Santa Clara Convention Center in Santa Clara, CA, will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud strategy. Meanwhile, 94% of enterprises ar...
Sep. 2, 2015 02:00 PM EDT Reads: 1,569
Consumer IoT applications provide data about the user that just doesn’t exist in traditional PC or mobile web applications. This rich data, or “context,” enables the highly personalized consumer experiences that characterize many consumer IoT apps. This same data is also providing brands with unprecedented insight into how their connected products are being used, while, at the same time, powering highly targeted engagement and marketing opportunities. In his session at @ThingsExpo, Nathan Trel...
Sep. 2, 2015 02:00 PM EDT Reads: 261
Whether you like it or not, DevOps is on track for a remarkable alliance with security. The SEC didn’t approve the merger. And your boss hasn’t heard anything about it. Yet, this unruly triumvirate will soon dominate and deliver DevSecOps faster, cheaper, better, and on an unprecedented scale. In his session at DevOps Summit, Frank Bunger, VP of Customer Success at ScriptRock, will discuss how this cathartic moment will propel the DevOps movement from such stuff as dreams are made on to a prac...
Sep. 2, 2015 02:00 PM EDT Reads: 248
The 3rd International WebRTC Summit, to be held Nov. 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA, announces that its Call for Papers is now open. Topics include all aspects of improving IT delivery by eliminating waste through automated business models leveraging cloud technologies. WebRTC Summit is co-located with 15th International Cloud Expo, 6th International Big Data Expo, 3rd International DevOps Summit and 2nd Internet of @ThingsExpo. WebRTC (Web-based Real-Time Com...
Sep. 2, 2015 01:45 PM EDT Reads: 1,551
SYS-CON Events announced today that HPM Networks will exhibit at the 17th International Cloud Expo®, which will take place on November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. For 20 years, HPM Networks has been integrating technology solutions that solve complex business challenges. HPM Networks has designed solutions for both SMB and enterprise customers throughout the San Francisco Bay Area.
Sep. 2, 2015 01:30 PM EDT Reads: 937
The 17th International Cloud Expo has announced that its Call for Papers is open. 17th International Cloud Expo, to be held November 3-5, 2015, at the Santa Clara Convention Center in Santa Clara, CA, brings together Cloud Computing, APM, APIs, Microservices, Security, Big Data, Internet of Things, DevOps and WebRTC to one location. With cloud computing driving a higher percentage of enterprise IT budgets every year, it becomes increasingly important to plant your flag in this fast-expanding bu...
Sep. 2, 2015 01:30 PM EDT Reads: 1,634
SYS-CON Events announced today that the "Second Containers & Microservices Expo" will take place November 3-5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. Containers and microservices have become topics of intense interest throughout the cloud developer and enterprise IT communities.
Sep. 2, 2015 01:30 PM EDT Reads: 619
The 5th International DevOps Summit, co-located with 17th International Cloud Expo – being held November 3-5, 2015, at the Santa Clara Convention Center in Santa Clara, CA – announces that its Call for Papers is open. Born out of proven success in agile development, cloud computing, and process automation, DevOps is a macro trend you cannot afford to miss. From showcase success stories from early adopters and web-scale businesses, DevOps is expanding to organizations of all sizes, including the ...
Sep. 2, 2015 01:15 PM EDT Reads: 1,617
In his session at @ThingsExpo, Lee Williams, a producer of the first smartphones and tablets, will talk about how he is now applying his experience in mobile technology to the design and development of the next generation of Environmental and Sustainability Services at ETwater. He will explain how M2M controllers work through wirelessly connected remote controls; and specifically delve into a retrofit option that reverse-engineers control codes of existing conventional controller systems so the...
Sep. 2, 2015 12:45 PM EDT Reads: 201
DevOps Summit, taking place Nov 3-5, 2015, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 17th Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time to wait for long development...
Sep. 2, 2015 12:45 PM EDT Reads: 1,556
While many app developers are comfortable building apps for the smartphone, there is a whole new world out there. In his session at @ThingsExpo, Narayan Sainaney, Co-founder and CTO of Mojio, will discuss how the business case for connected car apps is growing and, with open platform companies having already done the heavy lifting, there really is no barrier to entry.
Sep. 2, 2015 12:45 PM EDT Reads: 191
Moving an existing on-premise infrastructure into the cloud can be a complex and daunting proposition. It is critical to understand the benefits as well as the challenges associated with either a full or hybrid approach. In his session at 17th Cloud Expo, Richard Weiss, Principal Consultant at Pythian, will present a roadmap that can be leveraged by any organization to plan, analyze, evaluate and execute on a cloud migration solution. He will review the five major cloud transformation phases a...
Sep. 2, 2015 12:21 PM EDT
Too often with compelling new technologies market participants become overly enamored with that attractiveness of the technology and neglect underlying business drivers. This tendency, what some call the “newest shiny object syndrome,” is understandable given that virtually all of us are heavily engaged in technology. But it is also mistaken. Without concrete business cases driving its deployment, IoT, like many other technologies before it, will fade into obscurity.
Sep. 2, 2015 12:15 PM EDT Reads: 410