Welcome!

Blog Feed Post

Azure Service Profiler review – How does it fit in your toolbox?

About a year ago Microsoft released the Azure Service Profiler which is designed to be a lightweight profiler for ASP.NET applications. They recently enabled it to work with Application Insights and it is easy to enable for Azure App Services. Since we use App Services and love anything to do with app performance, I thought I would give it a try and see how it compares to other tools.

Note: The Service Profiler is still advertised as a “preview” offering and is not GA.

What is the Azure Service Profiler?

It is a transaction profiler for ASP.NET apps. It is designed to work with ASP.NET apps deployed anywhere, even outside of Azure. However, it uploads the collected data to Azure table storage where the data is then processed by Microsoft. So the name “Azure Service Profiler” is perhaps a little confusing because it can profile more than Azure. It also isn’t a true “.NET CLR profiler” because it uses ETW for data collection, not normal code profiling techniques.

It is designed to collect data in relation to individual web requests, or essentially individual transaction traces. I have written before about how there are 3 types of .NET profilers. Service Profiler is weirdly a mix of all 3 types. A standard profiler, transaction tracing and APM.

Service Profiler is a performance analysis tool used by teams at Microsoft running large-scale services in the cloud, and is optimized for troubleshooting issues in production. – Microsoft

Playing with the online demo

You can play with their online demo to get an idea of what type of data it collects.

Here is a screenshot showing how it plots out the performance of a single action in your app, which is a cool visual to understand percentiles.

Service Profiler diagram of request performance

Service Profiler diagram of request performance

If you select a trace for a specific request, you can dive in to lots of gory details.

Azure Service Profiler trace view

Service Profiler individual trace

Traces are full of details, lots of details

BLOCKED_TIME, EventData, OTHER, dynamicClass_lamda_method, C3PO, R2D2, etc

My immediate reaction to this is… WTF does all this mean?

My screenshot above is just a fraction of the entire trace of what it collected. It provides an overwhelming amount of detail. I feel like I should have a computer science degree to figure it out (which I don’t have). Out of all details it provides, all I can really tell is it looks like my request is doing some database queries. However, I can’t tell what the SQL query was. So… ?

It looks like Microsoft was aiming to help provide every possible detail to help developers solve really hard problems. If I was doing hard core performance tuning, I could see how this could be useful. But if all I want to know is why did my request take 3 seconds… it provides an avalanche of data.

I just want to know what the SQL query is that was slow. I want actionable data I can quickly understand, fix the problem, and go on about my day.

Trying the Service Profiler on my dev box

Being able to use it on my dev box is awesome! I can totally see using this for performance tuning during development, just as you would use the Visual Studio Profiler or ANTS. Installing it is simple. I logged in to http://azureserviceprofiler.com and created a data cube for my dev box. Downloaded the agent and started it up. It runs as a simple console app. You can see how it subscribes to various ETW events. It also easy to install on Azure App Services via Application Insights.

Service Profiler running on my dev box

By default it only profiles 5% of your requests and you can modify the sampling rate to adjust it as you see fit. For a dev box you probably want to increase it to 100% sampling so you can quickly find any request to inspect. BTW, it will be interesting to see how it compares to Prefix over time. The combination of the two would be amazing.

After changing it to sample 100% and letting my browser auto refresh a page for a while, I went back in and played with the data it collected.

Viewing exceptions

It noticed that my request has an exception on every request that gets thrown away. That is really nice.

Service Profiler Exceptions

When I selected a specific trace I was able to find my exception in the trace.

Exception in Trace

Viewing SQL queries… were called, not the query

Like the online demo, I can tell that my code is running 8 SQL queries, but I can’t see what the SQL statements are or any real details about it. To be really useful, you need the raw SQL statements.

Trace view showing 8 SQL Queries

HTTP call example – Code to trace comparison

OK this time, let’s compare my code to what the trace looks like.

Here is my code. A really simple MVC action that downloads a web page with the HttpClient.

        public async Task HttpClientAsync()
        {
            log.Debug("Starting HttpClient.GetStringAsync()");
            string data;
            using (HttpClient hc = new HttpClient())
            {
                data = await hc.GetStringAsync("http://stackify-nop-prod.azurewebsites.net/blog");
            }

            log.Debug("Completed HttpClient.GetStringAsync()");

            return Request.CreateResponse(HttpStatusCode.OK, data);
        }

But here is how it looks in the trace. So obviously the code only does an HTTP call and that should have taken the whole 324-330ms. In the trace it shows it took 1.15ms and then you can see a AWAIT_TIME of 324.77. The other thing that is weird is the “HTTP Activities” part is separate and that part actually shows the URL that was downloaded in only 0.04ms (not 324ms).

Service Profiler view of HTTP call

As a comparison, here is how Retrace/Prefix displays the same type of information (including the log statements).

Retrace view of HTTP Client

Finding slow methods

The best thing I have seen about the profiler is that it tracked some methods that took a lot of time in my code all by itself. In this example I can see that JSON deserialization is taking a lot of time. Awesome!

Find slow methods

Is the Azure Service Profiler really safe for production?

Microsoft claims that the profiler is build for running against production applications. From my testing, it collects a lot of detailed data. The real question is can you run it at all times like an APM solution, or is it designed to run for a short period of time to try and capture detailed data about a problem in production. Even being able to use it occasionally could be very useful for chasing down hard problems.

Service Profiler makes it easy to collect performance data while your service is handling real production load, collecting detailed request duration metrics, deep callstacks, and memory snapshots – but it also makes sure to do this in a low-impact way to minimize overhead to your system. – Microsoft

Any type of profiling or tracing of web requests adds overhead of some form. The question is really how much overhead and is it acceptable for production servers.

Performance test setup & results

I tested the Service Profiler running via App Services in tandem with Application Insights as well as standalone on a Azure VM. I used loader.io to give it some constant load. I tested the Service Profiler with all default settings, including the 5% default sampling rate.

My test apps were a demo nopCommerce app as well as a custom app that has a bunch of common test scenarios that I use for testing Retrace. I tested sync, async, and various scenarios.

Response times went up slightly. Sometimes up to 50 milliseconds higher per request, most likely when sampling kicked in for the request.

Here is a screenshot showing my CPU and memory usage difference on an Azure App Service. The chart actually starts with the Service Profiler enabled. After it is disabled you can see that memory goes down a lot and the CPU (as measured in seconds here) went down about 10%. That 10% (relative) or so CPU change was consistent in my testing on an Azure VM as well.

So is it safe for production?

All types of profiling, tracing, or logging add some amount of overhead. From my testing, I would say it is safe to use in production. Overall the CPU and response times increased 5-20% (relative) which is relatively low and similar to other APM solutions. It would never be zero. So yes it is safe!

Would I recommend running it on production non stop?

Probably not since the data it collects isn’t very valuable unless you are trying to troubleshoot a really complicated problem. If all you want is stats around how long web requests are taking, Application Insights or Retrace is a better option and probably have less overhead. Since it can’t do things like you show you a SQL query, that also greatly limits the functionality for me. But I still believe it is an awesome tool for solving hard problems, it is just too complicated to use for simple problems. I can see using it in QA for performance tuning for sure!

The other unknown is what Microsoft will charge for the Azure Service Profiler once it comes out of preview. Perhaps it is just bundled in to the pricing of Application Insights or it could be a premium feature.

Overall, Microsoft has done a good job optimizing the overhead of it and my testing backs their stance that is it designed to be used in production.

How the Azure Service Profiler fits in your toolbox

Developers love tools and already have access to a wide variety of tools. Including Microsoft provided tools like Visual Studio Profiler, Intellitrace, and Application Insights. Plus popular third party tools like LINQPad, Prefix, Retrace, ANTS, and others.

It is an amazing tool for collect deep performance level statistics. I would say it is perhaps a unique tool in its own category. Deep code level details like you would expect from a standard .NET profiler, but only in the scope of a single web request.

It is sort of like Visual Studio Profiler or ANTS but capable of running on a busy server to collect individual transaction traces for review.

This functionality is similar to what most APM solutions aim to provide. Currently the Service Profiler provides a lot more details, but it also isn’t easy to use.

How does it compare to the data Retrace collects?

Our #1 goal with Retrace is to build a service that is very easy to use and is also safe for production. Our presentation of the profiling output is much, much simpler to view and understand (example above about the HTTP call).

Retrace collects key details like log statements, exceptions, SQL queries, cache keys being used, and lots of other little details and packages them up in a really easy to understand format. After the Service Profiler goes GA, we will write up more of a comparison.

Have you tried the Azure Service Profiler? Have any other thoughts or tips about it? Let us know in the comments!

The post Azure Service Profiler review – How does it fit in your toolbox? appeared first on Stackify.

Read the original blog entry...

More Stories By Stackify Blog

Stackify offers the only developers-friendly solution that fully integrates error and log management with application performance monitoring and management. Allowing you to easily isolate issues, identify what needs to be fixed quicker and focus your efforts – Support less, Code more. Stackify provides software developers, operations and support managers with an innovative cloud based solution that gives them DevOps insight and allows them to monitor, detect and resolve application issues before they affect the business to ensure a better end user experience. Start your free trial now stackify.com

Latest Stories
Modern software design has fundamentally changed how we manage applications, causing many to turn to containers as the new virtual machine for resource management. As container adoption grows beyond stateless applications to stateful workloads, the need for persistent storage is foundational - something customers routinely cite as a top pain point. In his session at @DevOpsSummit at 21st Cloud Expo, Bill Borsari, Head of Systems Engineering at Datera, explored how organizations can reap the bene...
In his Opening Keynote at 21st Cloud Expo, John Considine, General Manager of IBM Cloud Infrastructure, led attendees through the exciting evolution of the cloud. He looked at this major disruption from the perspective of technology, business models, and what this means for enterprises of all sizes. John Considine is General Manager of Cloud Infrastructure Services at IBM. In that role he is responsible for leading IBM’s public cloud infrastructure including strategy, development, and offering m...
In his session at 21st Cloud Expo, Michael Burley, a Senior Business Development Executive in IT Services at NetApp, described how NetApp designed a three-year program of work to migrate 25PB of a major telco's enterprise data to a new STaaS platform, and then secured a long-term contract to manage and operate the platform. This significant program blended the best of NetApp’s solutions and services capabilities to enable this telco’s successful adoption of private cloud storage and launching ...
Kubernetes is an open source system for automating deployment, scaling, and management of containerized applications. Kubernetes was originally built by Google, leveraging years of experience with managing container workloads, and is now a Cloud Native Compute Foundation (CNCF) project. Kubernetes has been widely adopted by the community, supported on all major public and private cloud providers, and is gaining rapid adoption in enterprises. However, Kubernetes may seem intimidating and complex ...
In a recent survey, Sumo Logic surveyed 1,500 customers who employ cloud services such as Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP). According to the survey, a quarter of the respondents have already deployed Docker containers and nearly as many (23 percent) are employing the AWS Lambda serverless computing framework. It’s clear: serverless is here to stay. The adoption does come with some needed changes, within both application development and operations. Tha...
In his general session at 21st Cloud Expo, Greg Dumas, Calligo’s Vice President and G.M. of US operations, discussed the new Global Data Protection Regulation and how Calligo can help business stay compliant in digitally globalized world. Greg Dumas is Calligo's Vice President and G.M. of US operations. Calligo is an established service provider that provides an innovative platform for trusted cloud solutions. Calligo’s customers are typically most concerned about GDPR compliance, application p...
Digital transformation is about embracing digital technologies into a company's culture to better connect with its customers, automate processes, create better tools, enter new markets, etc. Such a transformation requires continuous orchestration across teams and an environment based on open collaboration and daily experiments. In his session at 21st Cloud Expo, Alex Casalboni, Technical (Cloud) Evangelist at Cloud Academy, explored and discussed the most urgent unsolved challenges to achieve f...
You know you need the cloud, but you’re hesitant to simply dump everything at Amazon since you know that not all workloads are suitable for cloud. You know that you want the kind of ease of use and scalability that you get with public cloud, but your applications are architected in a way that makes the public cloud a non-starter. You’re looking at private cloud solutions based on hyperconverged infrastructure, but you’re concerned with the limits inherent in those technologies.
With tough new regulations coming to Europe on data privacy in May 2018, Calligo will explain why in reality the effect is global and transforms how you consider critical data. EU GDPR fundamentally rewrites the rules for cloud, Big Data and IoT. In his session at 21st Cloud Expo, Adam Ryan, Vice President and General Manager EMEA at Calligo, examined the regulations and provided insight on how it affects technology, challenges the established rules and will usher in new levels of diligence arou...
Smart cities have the potential to change our lives at so many levels for citizens: less pollution, reduced parking obstacles, better health, education and more energy savings. Real-time data streaming and the Internet of Things (IoT) possess the power to turn this vision into a reality. However, most organizations today are building their data infrastructure to focus solely on addressing immediate business needs vs. a platform capable of quickly adapting emerging technologies to address future ...
Most technology leaders, contemporary and from the hardware era, are reshaping their businesses to do software. They hope to capture value from emerging technologies such as IoT, SDN, and AI. Ultimately, irrespective of the vertical, it is about deriving value from independent software applications participating in an ecosystem as one comprehensive solution. In his session at @ThingsExpo, Kausik Sridhar, founder and CTO of Pulzze Systems, discussed how given the magnitude of today's application ...
The dynamic nature of the cloud means that change is a constant when it comes to modern cloud-based infrastructure. Delivering modern applications to end users, therefore, is a constantly shifting challenge. Delivery automation helps IT Ops teams ensure that apps are providing an optimal end user experience over hybrid-cloud and multi-cloud environments, no matter what the current state of the infrastructure is. To employ a delivery automation strategy that reflects your business rules, making r...
As you move to the cloud, your network should be efficient, secure, and easy to manage. An enterprise adopting a hybrid or public cloud needs systems and tools that provide: Agility: ability to deliver applications and services faster, even in complex hybrid environments Easier manageability: enable reliable connectivity with complete oversight as the data center network evolves Greater efficiency: eliminate wasted effort while reducing errors and optimize asset utilization Security: imple...
The 22nd International Cloud Expo | 1st DXWorld Expo has announced that its Call for Papers is open. Cloud Expo | DXWorld Expo, to be held June 5-7, 2018, at the Javits Center in New York, NY, brings together Cloud Computing, Digital Transformation, Big Data, Internet of Things, DevOps, Machine Learning and WebRTC to one location. With cloud computing driving a higher percentage of enterprise IT budgets every year, it becomes increasingly important to plant your flag in this fast-expanding busin...
Companies are harnessing data in ways we once associated with science fiction. Analysts have access to a plethora of visualization and reporting tools, but considering the vast amount of data businesses collect and limitations of CPUs, end users are forced to design their structures and systems with limitations. Until now. As the cloud toolkit to analyze data has evolved, GPUs have stepped in to massively parallel SQL, visualization and machine learning.