Blog Feed Post

Improved failure detection for functional errors based on request attributes

Dynatrace failure detection automatically detects the vast majority of error conditions in your environment, including the underlying root causes of such error conditions. With this approach, Dynatrace is able to provide you with answers when problems occur or when your application performance drops. While automated failure detection works well for technical errors that require the attention of your developers, there are some types of functional errors that Dynatrace can’t detect automatically because the conditions technically aren’t errors from the perspective of the application (for example, a business transaction fails because a credit card has expired or is over limit).

Because we’ve received a lot of feedback from our customers about such functional-error situations, we’ve released a major upgrade to Dynatrace service-failure detection. These enhancements improve the accuracy of Dynatrace failure detection while simultaneously enabling you to detect functional errors in your environment.

Custom error handling and exceptions

Dynatrace automatically detects programming exceptions (Java, .NET, Node.js, and PHP) as the reason for failed requests when the exceptions result in the abort of service calls. Many web containers provide error pages for handled exceptions; Dynatrace detects most of these situations as well. Beyond this, however, there are situations where application code handles exceptions gracefully in a manner that isn’t automatically detected by Dynatrace. When this happens, Dynatrace doesn’t detect failed requests or alert you to errors. Such situations can now easily be remedied.

To inform Dynatrace which gracefully-handled exceptions should be marked as failed requests

  1. Select Transactions & services from the navigation menu.
  2. Select the service for which you need to adapt failure detection.
  3. Click the [] browse button and select Edit.
  4. Select the Error detection tab.
  5. Within the Custom handled exceptions section of the page, click the Add exception button.
  6. Type in the Exception class that, when associated with a failed request, even if the exception is handled gracefully, the request will be marked as failed.
  7. (Optional) Type in an Exception message to serve as a filter—only exceptions that include the specified message will lead to failed requests.

Following these steps, if Dynatrace finds the defined exception (and optional defined exception message) on any request, Dynatrace will mark that request as failed.

Custom errors, business errors, & request attributes

There are many cases where requests fail for reasons that are related to business logic. While such situations often aren’t detectable via exceptions or HTTP response codes, they are nevertheless indicative of problems. In fact, these situations may be even more important than situations that are detected via exceptions and response codes. To handle these situations, Dynatrace now allows you to use request attributes as indicators for error situations. For example, you might have a business function in your Java code that indicates an error via a return value. In other situations, you might have your own error handling functionality that, when called, indicates a functional business error.

You might imagine that in such cases the actual error messages can be retrieved via some other method call. All of these situations can already be captured via request attributes. For complete details, visit the links below:

You can now use the existence and values of such request attributes as indicators that your requests have failed.

To create a custom error rule

    1. Select Transactions & services from the navigation menu.
    2. Select the service for which you need to adapt failure detection.
    3. Click the [] browse button and select Edit.
    4. Select the Error detection tab.
    5. Within the Custom errors section of the page, click the Add custom error rule button.
    6. Select a request attribute from the displayed list (note that not all listed attributes may be available for the respective service).
    7. Define a condition for the rule. You can define a simple exists rule, a greater than rule, or a contains rule.

In the example below, a value of -1 in the Amount of recommendations attribute indicates an error. If Dynatrace detects such an error, it will mark the respective service request as failed and explain that the rule match is the reason for the failure.

Other improvements and changes

HTTP client and server-side errors

We’ve also made a variety of other improvements. While we’ve long been able to track the failure of requests both from the calling and the server side, failure detection doesn’t make such a distinction. HTTP-4XX response codes usually only indicate client-side errors, not server-side errors. Normally, you wouldn’t want to be alerted when errors happen due to client problems that you can’t fix on your end. The new error detection functionality reflects this:

Consequently, you can now track the failure rate of a service along with the calls to the service, assuming the service is monitored by Dynatrace (see the Failure rate of requests sent by monitored services chart example below).

Improved HTTP response code treatment

In most cases, an empty HTTP response code isn’t indicative of an error, but rather of a client abort situation. However, this isn’t always the case. You can now inform Dynatrace when you consider the absence of an HTTP status code to be indicative of a failed request. Be careful with this on the client side however: “Fire and forget” HTTP posts that don’t wait for responses also don’t have HTTP response codes.

Broken link detection now an opt-in setting

When a web server can’t find a certain page it returns an HTTP 404 response code. Usually, this indicates a problem on the calling side. In cases where the calling side belongs to the same website, this would be considered a broken link. Dynatrace used to automatically treat such situations as server-side errors. It turned out however that most of our customers don’t consider such situations to be server-side errors. While the functionality still exists, it’s now no longer the default for new services and is now available as an opt-in setting.

Ignore errors that aren’t errors

In a perfect world, every request that triggers an exception would be considered a failed request. There are however cases where your code (or 3rd-party code you have no control over) returns exceptions that indicate a certain response and not an error. Take the Thrift client for Cassandra, for example. It returns a NotFoundException when a row isn’t found. This isn’t an error, but simply a response code. As such, Dynatrace shouldn’t consider such exceptions as failed request indicators. This could be configured previously, but now the setting is more explicit:

Additionally, you can define a string that must be found within an exception message for the exception to be ignored.

Client abort situations

In contrast to exceptions that must be ignored, there are other exceptions that indicate that a call was aborted and as such shouldn’t be considered as failed under any circumstances—even when other information is available. This can now be explicitly configured as well.

If a service request is left with such an exception, Dynatrace won’t consider the request failed, regardless of the HTTP error code or any other information. Error detection will simply ignore such exceptions.

The post Improved failure detection for functional errors based on request attributes appeared first on Dynatrace blog – monitoring redefined.

Read the original blog entry...

More Stories By APM Blog

APM: It’s all about application performance, scalability, and architecture: best practices, lifecycle and DevOps, mobile and web, enterprise, user experience

Latest Stories
DX World EXPO, LLC, a Lighthouse Point, Florida-based startup trade show producer and the creator of "DXWorldEXPO® - Digital Transformation Conference & Expo" has announced its executive management team. The team is headed by Levent Selamoglu, who has been named CEO. "Now is the time for a truly global DX event, to bring together the leading minds from the technology world in a conversation about Digital Transformation," he said in making the announcement.
"Space Monkey by Vivent Smart Home is a product that is a distributed cloud-based edge storage network. Vivent Smart Home, our parent company, is a smart home provider that places a lot of hard drives across homes in North America," explained JT Olds, Director of Engineering, and Brandon Crowfeather, Product Manager, at Vivint Smart Home, in this SYS-CON.tv interview at @ThingsExpo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
SYS-CON Events announced today that Conference Guru has been named “Media Sponsor” of the 22nd International Cloud Expo, which will take place on June 5-7, 2018, at the Javits Center in New York, NY. A valuable conference experience generates new contacts, sales leads, potential strategic partners and potential investors; helps gather competitive intelligence and even provides inspiration for new products and services. Conference Guru works with conference organizers to pass great deals to gre...
DevOps is under attack because developers don’t want to mess with infrastructure. They will happily own their code into production, but want to use platforms instead of raw automation. That’s changing the landscape that we understand as DevOps with both architecture concepts (CloudNative) and process redefinition (SRE). Rob Hirschfeld’s recent work in Kubernetes operations has led to the conclusion that containers and related platforms have changed the way we should be thinking about DevOps and...
The Internet of Things will challenge the status quo of how IT and development organizations operate. Or will it? Certainly the fog layer of IoT requires special insights about data ontology, security and transactional integrity. But the developmental challenges are the same: People, Process and Platform. In his session at @ThingsExpo, Craig Sproule, CEO of Metavine, demonstrated how to move beyond today's coding paradigm and shared the must-have mindsets for removing complexity from the develop...
In his Opening Keynote at 21st Cloud Expo, John Considine, General Manager of IBM Cloud Infrastructure, led attendees through the exciting evolution of the cloud. He looked at this major disruption from the perspective of technology, business models, and what this means for enterprises of all sizes. John Considine is General Manager of Cloud Infrastructure Services at IBM. In that role he is responsible for leading IBM’s public cloud infrastructure including strategy, development, and offering m...
The next XaaS is CICDaaS. Why? Because CICD saves developers a huge amount of time. CD is an especially great option for projects that require multiple and frequent contributions to be integrated. But… securing CICD best practices is an emerging, essential, yet little understood practice for DevOps teams and their Cloud Service Providers. The only way to get CICD to work in a highly secure environment takes collaboration, patience and persistence. Building CICD in the cloud requires rigorous ar...
Companies are harnessing data in ways we once associated with science fiction. Analysts have access to a plethora of visualization and reporting tools, but considering the vast amount of data businesses collect and limitations of CPUs, end users are forced to design their structures and systems with limitations. Until now. As the cloud toolkit to analyze data has evolved, GPUs have stepped in to massively parallel SQL, visualization and machine learning.
"Evatronix provides design services to companies that need to integrate the IoT technology in their products but they don't necessarily have the expertise, knowledge and design team to do so," explained Adam Morawiec, VP of Business Development at Evatronix, in this SYS-CON.tv interview at @ThingsExpo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
To get the most out of their data, successful companies are not focusing on queries and data lakes, they are actively integrating analytics into their operations with a data-first application development approach. Real-time adjustments to improve revenues, reduce costs, or mitigate risk rely on applications that minimize latency on a variety of data sources. In his session at @BigDataExpo, Jack Norris, Senior Vice President, Data and Applications at MapR Technologies, reviewed best practices to ...
Widespread fragmentation is stalling the growth of the IIoT and making it difficult for partners to work together. The number of software platforms, apps, hardware and connectivity standards is creating paralysis among businesses that are afraid of being locked into a solution. EdgeX Foundry is unifying the community around a common IoT edge framework and an ecosystem of interoperable components.
"ZeroStack is a startup in Silicon Valley. We're solving a very interesting problem around bringing public cloud convenience with private cloud control for enterprises and mid-size companies," explained Kamesh Pemmaraju, VP of Product Management at ZeroStack, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
Large industrial manufacturing organizations are adopting the agile principles of cloud software companies. The industrial manufacturing development process has not scaled over time. Now that design CAD teams are geographically distributed, centralizing their work is key. With large multi-gigabyte projects, outdated tools have stifled industrial team agility, time-to-market milestones, and impacted P&L stakeholders.
"Akvelon is a software development company and we also provide consultancy services to folks who are looking to scale or accelerate their engineering roadmaps," explained Jeremiah Mothersell, Marketing Manager at Akvelon, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
Enterprises are adopting Kubernetes to accelerate the development and the delivery of cloud-native applications. However, sharing a Kubernetes cluster between members of the same team can be challenging. And, sharing clusters across multiple teams is even harder. Kubernetes offers several constructs to help implement segmentation and isolation. However, these primitives can be complex to understand and apply. As a result, it’s becoming common for enterprises to end up with several clusters. Thi...