Blog Feed Post

Improved failure detection for functional errors based on request attributes

Dynatrace failure detection automatically detects the vast majority of error conditions in your environment, including the underlying root causes of such error conditions. With this approach, Dynatrace is able to provide you with answers when problems occur or when your application performance drops. While automated failure detection works well for technical errors that require the attention of your developers, there are some types of functional errors that Dynatrace can’t detect automatically because the conditions technically aren’t errors from the perspective of the application (for example, a business transaction fails because a credit card has expired or is over limit).

Because we’ve received a lot of feedback from our customers about such functional-error situations, we’ve released a major upgrade to Dynatrace service-failure detection. These enhancements improve the accuracy of Dynatrace failure detection while simultaneously enabling you to detect functional errors in your environment.

Custom error handling and exceptions

Dynatrace automatically detects programming exceptions (Java, .NET, Node.js, and PHP) as the reason for failed requests when the exceptions result in the abort of service calls. Many web containers provide error pages for handled exceptions; Dynatrace detects most of these situations as well. Beyond this, however, there are situations where application code handles exceptions gracefully in a manner that isn’t automatically detected by Dynatrace. When this happens, Dynatrace doesn’t detect failed requests or alert you to errors. Such situations can now easily be remedied.

To inform Dynatrace which gracefully-handled exceptions should be marked as failed requests

  1. Select Transactions & services from the navigation menu.
  2. Select the service for which you need to adapt failure detection.
  3. Click the [] browse button and select Edit.
  4. Select the Error detection tab.
  5. Within the Custom handled exceptions section of the page, click the Add exception button.
  6. Type in the Exception class that, when associated with a failed request, even if the exception is handled gracefully, the request will be marked as failed.
  7. (Optional) Type in an Exception message to serve as a filter—only exceptions that include the specified message will lead to failed requests.

Following these steps, if Dynatrace finds the defined exception (and optional defined exception message) on any request, Dynatrace will mark that request as failed.

Custom errors, business errors, & request attributes

There are many cases where requests fail for reasons that are related to business logic. While such situations often aren’t detectable via exceptions or HTTP response codes, they are nevertheless indicative of problems. In fact, these situations may be even more important than situations that are detected via exceptions and response codes. To handle these situations, Dynatrace now allows you to use request attributes as indicators for error situations. For example, you might have a business function in your Java code that indicates an error via a return value. In other situations, you might have your own error handling functionality that, when called, indicates a functional business error.

You might imagine that in such cases the actual error messages can be retrieved via some other method call. All of these situations can already be captured via request attributes. For complete details, visit the links below:

You can now use the existence and values of such request attributes as indicators that your requests have failed.

To create a custom error rule

    1. Select Transactions & services from the navigation menu.
    2. Select the service for which you need to adapt failure detection.
    3. Click the [] browse button and select Edit.
    4. Select the Error detection tab.
    5. Within the Custom errors section of the page, click the Add custom error rule button.
    6. Select a request attribute from the displayed list (note that not all listed attributes may be available for the respective service).
    7. Define a condition for the rule. You can define a simple exists rule, a greater than rule, or a contains rule.

In the example below, a value of -1 in the Amount of recommendations attribute indicates an error. If Dynatrace detects such an error, it will mark the respective service request as failed and explain that the rule match is the reason for the failure.

Other improvements and changes

HTTP client and server-side errors

We’ve also made a variety of other improvements. While we’ve long been able to track the failure of requests both from the calling and the server side, failure detection doesn’t make such a distinction. HTTP-4XX response codes usually only indicate client-side errors, not server-side errors. Normally, you wouldn’t want to be alerted when errors happen due to client problems that you can’t fix on your end. The new error detection functionality reflects this:

Consequently, you can now track the failure rate of a service along with the calls to the service, assuming the service is monitored by Dynatrace (see the Failure rate of requests sent by monitored services chart example below).

Improved HTTP response code treatment

In most cases, an empty HTTP response code isn’t indicative of an error, but rather of a client abort situation. However, this isn’t always the case. You can now inform Dynatrace when you consider the absence of an HTTP status code to be indicative of a failed request. Be careful with this on the client side however: “Fire and forget” HTTP posts that don’t wait for responses also don’t have HTTP response codes.

Broken link detection now an opt-in setting

When a web server can’t find a certain page it returns an HTTP 404 response code. Usually, this indicates a problem on the calling side. In cases where the calling side belongs to the same website, this would be considered a broken link. Dynatrace used to automatically treat such situations as server-side errors. It turned out however that most of our customers don’t consider such situations to be server-side errors. While the functionality still exists, it’s now no longer the default for new services and is now available as an opt-in setting.

Ignore errors that aren’t errors

In a perfect world, every request that triggers an exception would be considered a failed request. There are however cases where your code (or 3rd-party code you have no control over) returns exceptions that indicate a certain response and not an error. Take the Thrift client for Cassandra, for example. It returns a NotFoundException when a row isn’t found. This isn’t an error, but simply a response code. As such, Dynatrace shouldn’t consider such exceptions as failed request indicators. This could be configured previously, but now the setting is more explicit:

Additionally, you can define a string that must be found within an exception message for the exception to be ignored.

Client abort situations

In contrast to exceptions that must be ignored, there are other exceptions that indicate that a call was aborted and as such shouldn’t be considered as failed under any circumstances—even when other information is available. This can now be explicitly configured as well.

If a service request is left with such an exception, Dynatrace won’t consider the request failed, regardless of the HTTP error code or any other information. Error detection will simply ignore such exceptions.

The post Improved failure detection for functional errors based on request attributes appeared first on Dynatrace blog – monitoring redefined.

Read the original blog entry...

More Stories By APM Blog

APM: It’s all about application performance, scalability, and architecture: best practices, lifecycle and DevOps, mobile and web, enterprise, user experience

Latest Stories
This session will provide an introduction to Cloud driven quality and transformation and highlight the key features that comprise it. A perspective on the cloud transformation lifecycle, transformation levers, and transformation framework will be shared. At Cognizant, we have developed a transformation strategy to enable the migration of business critical workloads to cloud environments. The strategy encompasses a set of transformation levers across the cloud transformation lifecycle to enhance ...
Your job is mostly boring. Many of the IT operations tasks you perform on a day-to-day basis are repetitive and dull. Utilizing automation can improve your work life, automating away the drudgery and embracing the passion for technology that got you started in the first place. In this presentation, I'll talk about what automation is, and how to approach implementing it in the context of IT Operations. Ned will discuss keys to success in the long term and include practical real-world examples. Ge...
The challenges of aggregating data from consumer-oriented devices, such as wearable technologies and smart thermostats, are fairly well-understood. However, there are a new set of challenges for IoT devices that generate megabytes or gigabytes of data per second. Certainly, the infrastructure will have to change, as those volumes of data will likely overwhelm the available bandwidth for aggregating the data into a central repository. Ochandarena discusses a whole new way to think about your next...
So the dumpster is on fire. Again. The site's down. Your boss's face is an ever-deepening purple. And you begin debating whether you should join the #incident channel or call an ambulance to deal with his impending stroke. Yes, we know this is a developer's fault. There's plenty of time for blame later. Postmortems have a macabre name because they were once intended to be Viking-like funerals for someone's job. But we're civilized now. Sort of. So we call them post-incident reviews. Fires are ne...
Whenever a new technology hits the high points of hype, everyone starts talking about it like it will solve all their business problems. Blockchain is one of those technologies. According to Gartner's latest report on the hype cycle of emerging technologies, blockchain has just passed the peak of their hype cycle curve. If you read the news articles about it, one would think it has taken over the technology world. No disruptive technology is without its challenges and potential impediments t...
Hackers took three days to identify and exploit a known vulnerability in Equifax’s web applications. I will share new data that reveals why three days (at most) is the new normal for DevSecOps teams to move new business /security requirements from design into production. This session aims to enlighten DevOps teams, security and development professionals by sharing results from the 4th annual State of the Software Supply Chain Report -- a blend of public and proprietary data with expert researc...
CloudEXPO New York 2018, colocated with DevOpsSUMMIT and DXWorldEXPO New York 2018 will be held November 12-13, 2018, in New York City and will bring together Cloud Computing, FinTech and Blockchain, Digital Transformation, Big Data, Internet of Things, DevOps, AI and Machine Learning to one location.
DXWorldEXPO LLC announced today that Nutanix has been named "Platinum Sponsor" of CloudEXPO | DevOpsSUMMIT | DXWorldEXPO New York, which will take place November 12-13, 2018 in New York City. Nutanix makes infrastructure invisible, elevating IT to focus on the applications and services that power their business. The Nutanix Enterprise Cloud Platform blends web-scale engineering and consumer-grade design to natively converge server, storage, virtualization and networking into a resilient, softwar...
CloudEXPO | DevOpsSUMMIT | DXWorldEXPO are the world's most influential, independent events where Cloud Computing was coined and where technology buyers and vendors meet to experience and discuss the big picture of Digital Transformation and all of the strategies, tactics, and tools they need to realize their goals. Sponsors of DXWorldEXPO | CloudEXPO benefit from unmatched branding, profile building and lead generation opportunities.
The digital transformation is real! To adapt, IT professionals need to transform their own skillset to become more multi-dimensional by gaining both depth and breadth of a wide variety of knowledge and competencies. Historically, while IT has been built on a foundation of specialty (or "I" shaped) silos, the DevOps principle of "shifting left" is opening up opportunities for developers, operational staff, security and others to grow their skills portfolio, advance their careers and become "T"-sh...
Lori MacVittie is a subject matter expert on emerging technology responsible for outbound evangelism across F5's entire product suite. MacVittie has extensive development and technical architecture experience in both high-tech and enterprise organizations, in addition to network and systems administration expertise. Prior to joining F5, MacVittie was an award-winning technology editor at Network Computing Magazine where she evaluated and tested application-focused technologies including app secu...
DXWorldEXPO LLC announced today that Big Data Federation to Exhibit at the 22nd International CloudEXPO, colocated with DevOpsSUMMIT and DXWorldEXPO, November 12-13, 2018 in New York City. Big Data Federation, Inc. develops and applies artificial intelligence to predict financial and economic events that matter. The company uncovers patterns and precise drivers of performance and outcomes with the aid of machine-learning algorithms, big data, and fundamental analysis. Their products are deployed...
ICC is a computer systems integrator and server manufacturing company focused on developing products and product appliances to meet a wide range of computational needs for many industries. Their solutions provide benefits across many environments, such as datacenter deployment, HPC, workstations, storage networks and standalone server installations. ICC has been in business for over 23 years and their phenomenal range of clients include multinational corporations, universities, and small busines...
This sixteen (16) hour course provides an introduction to DevOps, the cultural and professional movement that stresses communication, collaboration, integration and automation in order to improve the flow of work between software developers and IT operations professionals. Improved workflows will result in an improved ability to design, develop, deploy and operate software and services faster.
Headquartered in Plainsboro, NJ, Synametrics Technologies has provided IT professionals and computer systems developers since 1997. Based on the success of their initial product offerings (WinSQL and DeltaCopy), the company continues to create and hone innovative products that help its customers get more from their computer applications, databases and infrastructure. To date, over one million users around the world have chosen Synametrics solutions to help power their accelerated business or per...