Welcome!

Blog Feed Post

Introducing Hadoop monitoring (beta)

We’re excited to announce the beta release of Dynatrace Hadoop monitoring! Hadoop server monitoring provides a high-level overview of main Hadoop components within your cluster. Enhanced insights are available for HDFS and MapReduce. Hadoop-specific metrics are presented alongside all infrastructure measurements, providing you with in-depth Hadoop performance analysis of both current and historical data.

To analyze your Hadoop components 

  1. Click Technologies in the menu.
  2. Click the Hadoop tile.
  3. Click an individual Hadoop component in the Process group list to view metrics and a timeline chart specific to that component. 

Enhanced insights for HDFS

To view NameNode metrics

  1. Follow the steps outlined above. Be sure to select a NameNode process group.
  2. Click the Process group details button. 
  3. On the Process group details page, select the Technology-specific metrics tab to view relevant cluster charts and metrics. Hadoop NameNode pages provide details about your HDFS capacity, usage, blocks, cache, files, and data-node health.
  4. Further down the page, you’ll find a number of cluster-specific charts.

NameNode metrics

Total Raw capacity of DataNodes in bytes
Used Used capacity across all DataNodes in bytes
Remaining Remaining capacity in bytes
Total load The number of connections
 Total The number of allocated blocks in the system
Pending deletion The number of blocks pending deletion
Files total Total number of files
Pending replication The number of blocks pending to be replicated
Under replicated The number of under-replicated blocks
Scheduled replication The number of blocks scheduled for replication
Live The number of live DataNodes
Dead The number of dead DataNodes
Decommission Live The number of decommissioning live DataNodes
Decommission Dead The number of decommissioning dead DataNodes
Usage – Volume failures total Total volume failures
Estimated capacity lost total Estimated capacity lost in bytes
Decommission Decommissioning The number of decommissioning data DataNodes
Stale The number of stale DataNodes
Blocks missing and corrupt – Missing The number of missing blocks
Capacity Cache capacity in bytes
Used Cache used in bytes
Blocks missing and corrupt – Corrupt The number of corrupt blocks
Capacity in bytes – Used, non-DFS Capacity used, non-DFS in bytes
Appended The number of files appended
Created The number of files and directories created by create or mkdir operations
Deleted The number of files and directories deleted by delete or rename operations
Renamed The number of rename operations

To view DataNode metrics

  1. To view DataNode metrics, expand the Details section of a DataNode process group.
  2. Click the Process group details button. 
  3. On the Process group details page, click the Technology-specific metrics tab and select the DataNode.
  4. Select the Hadoop HDFS metrics tab

DataNode metrics

Live The number of live DataNodes
Dead The number of dead DataNodes
Decommission Live The number of decommissioning live DataNodes
Decommission Dead The number of decommissioning dead DataNodes
Decommission Decommissioning The number of decommissioning data DataNodes
Stale The number of stale DataNodes
Capacity Cache capacity in bytes
Used Cache used in bytes
Capacity Disk capacity in bytes
DfsUsed Disk usage in bytes
Cached The number of blocks cached
Failed to cache The number of blocks that failed to cache
Failed to uncache The number of blocks that failed to remove from cache
Number of failed volumes The number of volume failures occurred
Capacity in bytes – Remaining The remaining disk space left in bytes
Blocks The number of blocks read from DataNode
Removed The number of blocks removed
Replicated The number of blocks replicated
Verified The number of blocks verified
Blocks The number of blocks written to DataNode
Bytes The number of bytes read from DataNode
Bytes The number of bytes written to DataNode

Enhanced insights  for MapReduce

To view ResourceManager metrics

  1. To view ResourceManager metrics, expand the Details section of the ResourceManager process group.
  2. Click the Process group details button. 
  3. On the Process group details page, select the Technology-specific metrics tab to view relevant cluster charts and metrics. Hadoop ResourceManager metrics pages provide information about your nodes, applications, memory, cores, and containers.
  4. Further down the page, you’ll find a number of ResourceManager-specific charts.

ResourceManager metrics

Active Number of active NodeManagers
Decommissioned Number of decommissioned NodeManagers
Lost Number of lost NodeManagers – no heartbeats
Rebooted Number of rebooted NodeManagers
Unhealthy Number of unhealthy NodeManagers
Allocated Number of allocated containers
Allocated Allocated memory in bytes
Allocated Number of allocated CPU in virtual cores
Completed Number of successfully completed applications
Failed Number of failed applications
Killed Number of killed applications
Pending Number of pending applications
Running Number of running applications
Submitted Number of submitted applications
Available Amount of available memory in bytes
Available Numberof available CPU in virtual cores
Pending Amount of pending memory resource requests in bytes that are not yet fulfilled by the scheduler
Pending Pending CPU allocation requests in virtual cores that are not yet fulfilled by the scheduler
Reserved Amount of reserved memory in bytes.
Reserved Number of reserved CPU in virtual cores

To view MRAppMaster metrics

  1. To view MRAppMaster metrics, expand the Details section of an MRAppMaster process group.
  2. Click the Process group details button. 
  3. On the Process group details page, click the Technology-specific metrics tab and select the MRAppMaster process.
  4. Click the Hadoop MapReduce tab.

MRAppMaster metrics

Jobs finished – Completed The number of successfully completed jobs
Jobs finished – Failed The number of failed jobs
Jobs finished – Killed The number of killed jobs
Jobs – Preparing The number of preparing jobs
Jobs – Running The number of running jobs
Maps finished – Completed The number of successfully completed maps
Maps finished – Failed The number of failed maps
Maps finished – Killed The number of killed maps
Maps – Running The number of running maps
Maps – Waiting The number of waiting maps
Reduces finished – Completed The number of successfully completed reduces
Reduces finished – Failed The number of failed reduces
Reduces finished – Killed The number of killed reduces
Reduces – Running The number of running reduces
Reduces – Waiting The number of waiting reduces

To view NodeManager metrics

  1. To view NodeManager metrics, expand the Details section of the NodeManager manager process group.
  2. Click the Process group details button. 
  3. On the Process group details page, click the Technology-specific metrics tab and select a NodeManager process.
  4. Click the Hadoop MapReduce tab.

NodeManager metrics

GB Available Current available memory in GB
GB Allocated Current allocated memory in GB
Completed Total number of successfully completed containers
Running Current number of running containers
Launched Total number of launched containers
Initing Current number of initializing containers
Allocated Current number of allocated containers
Failed Total number of failed containers
Killed Total number of killed containers
Connections Number of current connections
Output Bytes output in bytes
Outputs Failed Number of failed outputs
Outputs OK Number of succeeded outputs

Prerequisites

  • For full Hadoop visibility, OneAgent must be installed on all machines running the following Hadoop processess:
    NameNode, ResourceManager, NodeManager, DataNode, and MRAppMaster
  • Linux OS
  • OneAgent 1.103+
  • Hadoop version 2.4.1+

Enable Hadoop monitoring globally

With Hadoop monitoring enabled globally, Dynatrace automatically collects Hadoop metrics whenever a new host running Hadoop is detected in your environment.

  1. Go to Settings > Monitoring > Monitored technologies.
  2. Set the Hadoop switch to On.

Have feedback?

Your feedback about Dynatrace Hadoop monitoring is most welcome! Let us know what you think of the new Hadoop plugin by adding a comment below. Or post your questions and feedback to Dynatrace Answers.

The post Introducing Hadoop monitoring (beta) appeared first on Dynatrace blog – monitoring redefined.

Read the original blog entry...

More Stories By APM Blog

APM: It’s all about application performance, scalability, and architecture: best practices, lifecycle and DevOps, mobile and web, enterprise, user experience

Latest Stories
Your homes and cars can be automated and self-serviced. Why can't your storage? From simply asking questions to analyze and troubleshoot your infrastructure, to provisioning storage with snapshots, recovery and replication, your wildest sci-fi dream has come true. In his session at @DevOpsSummit at 20th Cloud Expo, Dan Florea, Director of Product Management at Tintri, provided a ChatOps demo where you can talk to your storage and manage it from anywhere, through Slack and similar services with...
"At the keynote this morning we spoke about the value proposition of Nutanix, of having a DevOps culture and a mindset, and the business outcomes of achieving agility and scale, which everybody here is trying to accomplish," noted Mark Lavi, DevOps Solution Architect at Nutanix, in this SYS-CON.tv interview at @DevOpsSummit at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"We were founded in 2003 and the way we were founded was about good backup and good disaster recovery for our clients, and for the last 20 years we've been pretty consistent with that," noted Marc Malafronte, Territory Manager at StorageCraft, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"We are an IT services solution provider and we sell software to support those solutions. Our focus and key areas are around security, enterprise monitoring, and continuous delivery optimization," noted John Balsavage, President of A&I Solutions, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"We provide IoT solutions. We provide the most compatible solutions for many applications. Our solutions are industry agnostic and also protocol agnostic," explained Richard Han, Head of Sales and Marketing and Engineering at Systena America, in this SYS-CON.tv interview at @ThingsExpo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"DivvyCloud as a company set out to help customers automate solutions to the most common cloud problems," noted Jeremy Snyder, VP of Business Development at DivvyCloud, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"We want to show that our solution is far less expensive with a much better total cost of ownership so we announced several key features. One is called geo-distributed erasure coding, another is support for KVM and we introduced a new capability called Multi-Part," explained Tim Desai, Senior Product Marketing Manager at Hitachi Data Systems, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
There is a huge demand for responsive, real-time mobile and web experiences, but current architectural patterns do not easily accommodate applications that respond to events in real time. Common solutions using message queues or HTTP long-polling quickly lead to resiliency, scalability and development velocity challenges. In his session at 21st Cloud Expo, Ryland Degnan, a Senior Software Engineer on the Netflix Edge Platform team, will discuss how by leveraging a reactive stream-based protocol,...
SYS-CON Events announced today that Calligo, an innovative cloud service provider offering mid-sized companies the highest levels of data privacy and security, has been named "Bronze Sponsor" of SYS-CON's 21st International Cloud Expo ®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Calligo offers unparalleled application performance guarantees, commercial flexibility and a personalised support service from its globally located cloud plat...
DevOps at Cloud Expo, taking place October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 21st Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time to w...
Internet of @ThingsExpo, taking place October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 21st Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The Internet of Things (IoT) is the most profound change in personal and enterprise IT since the creation of the Worldwide Web more than 20 years ago. All major researchers estimate there will be tens of billions devic...
"The Striim platform is a full end-to-end streaming integration and analytics platform that is middleware that covers a lot of different use cases," explained Steve Wilkes, Founder and CTO at Striim, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"With Digital Experience Monitoring what used to be a simple visit to a web page has exploded into app on phones, data from social media feeds, competitive benchmarking - these are all components that are only available because of some type of digital asset," explained Leo Vasiliou, Director of Web Performance Engineering at Catchpoint Systems, in this SYS-CON.tv interview at DevOps Summit at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
21st International Cloud Expo, taking place October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud strategy. Me...
SYS-CON Events announced today that DXWorldExpo has been named “Global Sponsor” of SYS-CON's 21st International Cloud Expo, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Digital Transformation is the key issue driving the global enterprise IT business. Digital Transformation is most prominent among Global 2000 enterprises and government institutions.