Welcome!

Blog Feed Post

Introducing Spark monitoring (beta)

We’re happy to announce the beta availability of Spark monitoring! Apache Spark monitoring provides insight into the resource usage, job status, and performance of Spark Standalone clusters.

Monitoring is available for the three main Spark components:

  • Cluster manager
  • Driver program
  • Worker nodes

Apache Spark metrics are presented alongside other infrastructure measurements, enabling in-depth cluster performance analysis of both current and historical data.

To view Spark monitoring insights

  1. Click Technologies in the menu.
  2. Click the Spark tile.
  3. To view cluster metrics, expand the Details section of the Spark process group.
  4. Click the Process group details button. 

  5. On the Process group details page, click the Technology-specific metrics tab.  Here you’ll find metrics for all Spark components. 

  6. Further down the page, you’ll find a number of other cluster-specific charts.
    The Cluster charts section provides all the information you need regarding Jobs, StagesMessages, Workers, and Message processing. When Jobs fail, the cause is typically a lack of cores or RAM. Note that for both cores and RAM, the maximum value is not your system’s maximum, it’s the maximum value as defined by your Spark configuration. Using the Workers chart you can immediately see when one of your nodes goes down.

Spark cluster metrics

All jobs
Number of jobs
Active jobs
Number of active jobs
Waiting jobs
Number of waiting jobs
Failed jobs
Number of failed jobs
Running jobs
Number of running jobs
Count
Number of messages in the scheduler’s event-processing loop
Calls per sec
Calls per second
Number of workers
Number of workers
Alive workers
Number of alive workers
Number of apps
Number of running applications
Waiting apps
Number of waiting applications

Spark node/worker monitoring

To access valuable Spark node metrics:

  1. Select a worker from the Process list on the Process group details page (see example below).
  2. Click the Apache Spark worker tab.

Spark Worker metrics

Number of free cores
Number of worker-free cores
Number of cores used
Number of worker cores used
Number of executors
Number of worker executors
Used MB
Amount of worker memory used (megabytes)
Free MB
Amount of worker-free memory in (megabytes)

Prerequisites

  • Dynatrace OneAgent version 1.105+
  • Linux OS or Windows
  • Spark Standalone cluster manager
  • Spark version 1.6
  • Enabled JMX monitoring metrics. Turn on JmxSink for all instances by class names: org.apache.spark.metrics.sink.JmxSink conf/metrics.properties For more details, see Spark documentation.
  • To recognize your cluster, SparkSubmit must be executed with the –master parameter and master host address. See example below:

spark-submit \
--class de.codecentric.SparkPi \
--master spark://192.168.33.100:7077  \
--conf spark.eventLog.enabled=true \
/vagrant/jars/spark-pi-example-1.0.jar 100

Enable Spark monitoring globally

With Spark monitoring enabled globally, Dynatrace automatically collects Spark metrics whenever a new host running Spark is detected in your environment.

  1. Go to Settings > Monitoring > Monitored technologies.
  2. Set the Spark switch to On.

Have feedback?

Your feedback about Dynatrace Spark monitoring is most welcome! Let us know what you think of the new Spark plugin by adding a comment below. Or post your questions and feedback to Dynatrace Answers.

The post Introducing Spark monitoring (beta) appeared first on Dynatrace blog – monitoring redefined.

Read the original blog entry...

More Stories By APM Blog

APM: It’s all about application performance, scalability, and architecture: best practices, lifecycle and DevOps, mobile and web, enterprise, user experience

Latest Stories
Your homes and cars can be automated and self-serviced. Why can't your storage? From simply asking questions to analyze and troubleshoot your infrastructure, to provisioning storage with snapshots, recovery and replication, your wildest sci-fi dream has come true. In his session at @DevOpsSummit at 20th Cloud Expo, Dan Florea, Director of Product Management at Tintri, provided a ChatOps demo where you can talk to your storage and manage it from anywhere, through Slack and similar services with...
"We were founded in 2003 and the way we were founded was about good backup and good disaster recovery for our clients, and for the last 20 years we've been pretty consistent with that," noted Marc Malafronte, Territory Manager at StorageCraft, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"At the keynote this morning we spoke about the value proposition of Nutanix, of having a DevOps culture and a mindset, and the business outcomes of achieving agility and scale, which everybody here is trying to accomplish," noted Mark Lavi, DevOps Solution Architect at Nutanix, in this SYS-CON.tv interview at @DevOpsSummit at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"We are an IT services solution provider and we sell software to support those solutions. Our focus and key areas are around security, enterprise monitoring, and continuous delivery optimization," noted John Balsavage, President of A&I Solutions, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"DivvyCloud as a company set out to help customers automate solutions to the most common cloud problems," noted Jeremy Snyder, VP of Business Development at DivvyCloud, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"We provide IoT solutions. We provide the most compatible solutions for many applications. Our solutions are industry agnostic and also protocol agnostic," explained Richard Han, Head of Sales and Marketing and Engineering at Systena America, in this SYS-CON.tv interview at @ThingsExpo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"We want to show that our solution is far less expensive with a much better total cost of ownership so we announced several key features. One is called geo-distributed erasure coding, another is support for KVM and we introduced a new capability called Multi-Part," explained Tim Desai, Senior Product Marketing Manager at Hitachi Data Systems, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
There is a huge demand for responsive, real-time mobile and web experiences, but current architectural patterns do not easily accommodate applications that respond to events in real time. Common solutions using message queues or HTTP long-polling quickly lead to resiliency, scalability and development velocity challenges. In his session at 21st Cloud Expo, Ryland Degnan, a Senior Software Engineer on the Netflix Edge Platform team, will discuss how by leveraging a reactive stream-based protocol,...
SYS-CON Events announced today that Calligo, an innovative cloud service provider offering mid-sized companies the highest levels of data privacy and security, has been named "Bronze Sponsor" of SYS-CON's 21st International Cloud Expo ®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Calligo offers unparalleled application performance guarantees, commercial flexibility and a personalised support service from its globally located cloud plat...
DevOps at Cloud Expo, taking place October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 21st Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time to w...
Internet of @ThingsExpo, taking place October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 21st Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The Internet of Things (IoT) is the most profound change in personal and enterprise IT since the creation of the Worldwide Web more than 20 years ago. All major researchers estimate there will be tens of billions devic...
"The Striim platform is a full end-to-end streaming integration and analytics platform that is middleware that covers a lot of different use cases," explained Steve Wilkes, Founder and CTO at Striim, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"With Digital Experience Monitoring what used to be a simple visit to a web page has exploded into app on phones, data from social media feeds, competitive benchmarking - these are all components that are only available because of some type of digital asset," explained Leo Vasiliou, Director of Web Performance Engineering at Catchpoint Systems, in this SYS-CON.tv interview at DevOps Summit at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
21st International Cloud Expo, taking place October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud strategy. Me...
SYS-CON Events announced today that DXWorldExpo has been named “Global Sponsor” of SYS-CON's 21st International Cloud Expo, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Digital Transformation is the key issue driving the global enterprise IT business. Digital Transformation is most prominent among Global 2000 enterprises and government institutions.