Welcome!

Blog Feed Post

Azure Service Profiler review – How does it fit in your toolbox?

About a year ago Microsoft released the Azure Service Profiler which is designed to be a lightweight profiler for ASP.NET applications. They recently enabled it to work with Application Insights and it is easy to enable for Azure App Services. Since we use App Services and love anything to do with app performance, I thought I would give it a try and see how it compares to other tools.

Note: The Service Profiler is still advertised as a “preview” offering and is not GA.

What is the Azure Service Profiler?

It is a transaction profiler for ASP.NET apps. It is designed to work with ASP.NET apps deployed anywhere, even outside of Azure. However, it uploads the collected data to Azure table storage where the data is then processed by Microsoft. So the name “Azure Service Profiler” is perhaps a little confusing because it can profile more than Azure. It also isn’t a true “.NET CLR profiler” because it uses ETW for data collection, not normal code profiling techniques.

It is designed to collect data in relation to individual web requests, or essentially individual transaction traces. I have written before about how there are 3 types of .NET profilers. Service Profiler is weirdly a mix of all 3 types. A standard profiler, transaction tracing and APM.

Service Profiler is a performance analysis tool used by teams at Microsoft running large-scale services in the cloud, and is optimized for troubleshooting issues in production. – Microsoft

Playing with the online demo

You can play with their online demo to get an idea of what type of data it collects.

Here is a screenshot showing how it plots out the performance of a single action in your app, which is a cool visual to understand percentiles.

Service Profiler diagram of request performance

Service Profiler diagram of request performance

If you select a trace for a specific request, you can dive in to lots of gory details.

Azure Service Profiler trace view

Service Profiler individual trace

Traces are full of details, lots of details

BLOCKED_TIME, EventData, OTHER, dynamicClass_lamda_method, C3PO, R2D2, etc

My immediate reaction to this is… WTF does all this mean?

My screenshot above is just a fraction of the entire trace of what it collected. It provides an overwhelming amount of detail. I feel like I should have a computer science degree to figure it out (which I don’t have). Out of all details it provides, all I can really tell is it looks like my request is doing some database queries. However, I can’t tell what the SQL query was. So… ?

It looks like Microsoft was aiming to help provide every possible detail to help developers solve really hard problems. If I was doing hard core performance tuning, I could see how this could be useful. But if all I want to know is why did my request take 3 seconds… it provides an avalanche of data.

I just want to know what the SQL query is that was slow. I want actionable data I can quickly understand, fix the problem, and go on about my day.

Trying the Service Profiler on my dev box

Being able to use it on my dev box is awesome! I can totally see using this for performance tuning during development, just as you would use the Visual Studio Profiler or ANTS. Installing it is simple. I logged in to http://azureserviceprofiler.com and created a data cube for my dev box. Downloaded the agent and started it up. It runs as a simple console app. You can see how it subscribes to various ETW events. It also easy to install on Azure App Services via Application Insights.

Service Profiler running on my dev box

By default it only profiles 5% of your requests and you can modify the sampling rate to adjust it as you see fit. For a dev box you probably want to increase it to 100% sampling so you can quickly find any request to inspect. BTW, it will be interesting to see how it compares to Prefix over time. The combination of the two would be amazing.

After changing it to sample 100% and letting my browser auto refresh a page for a while, I went back in and played with the data it collected.

Viewing exceptions

It noticed that my request has an exception on every request that gets thrown away. That is really nice.

Service Profiler Exceptions

When I selected a specific trace I was able to find my exception in the trace.

Exception in Trace

Viewing SQL queries… were called, not the query

Like the online demo, I can tell that my code is running 8 SQL queries, but I can’t see what the SQL statements are or any real details about it. To be really useful, you need the raw SQL statements.

Trace view showing 8 SQL Queries

HTTP call example – Code to trace comparison

OK this time, let’s compare my code to what the trace looks like.

Here is my code. A really simple MVC action that downloads a web page with the HttpClient.

        public async Task HttpClientAsync()
        {
            log.Debug("Starting HttpClient.GetStringAsync()");
            string data;
            using (HttpClient hc = new HttpClient())
            {
                data = await hc.GetStringAsync("http://stackify-nop-prod.azurewebsites.net/blog");
            }

            log.Debug("Completed HttpClient.GetStringAsync()");

            return Request.CreateResponse(HttpStatusCode.OK, data);
        }

But here is how it looks in the trace. So obviously the code only does an HTTP call and that should have taken the whole 324-330ms. In the trace it shows it took 1.15ms and then you can see a AWAIT_TIME of 324.77. The other thing that is weird is the “HTTP Activities” part is separate and that part actually shows the URL that was downloaded in only 0.04ms (not 324ms).

Service Profiler view of HTTP call

As a comparison, here is how Retrace/Prefix displays the same type of information (including the log statements).

Retrace view of HTTP Client

Finding slow methods

The best thing I have seen about the profiler is that it tracked some methods that took a lot of time in my code all by itself. In this example I can see that JSON deserialization is taking a lot of time. Awesome!

Find slow methods

Is the Azure Service Profiler really safe for production?

Microsoft claims that the profiler is build for running against production applications. From my testing, it collects a lot of detailed data. The real question is can you run it at all times like an APM solution, or is it designed to run for a short period of time to try and capture detailed data about a problem in production. Even being able to use it occasionally could be very useful for chasing down hard problems.

Service Profiler makes it easy to collect performance data while your service is handling real production load, collecting detailed request duration metrics, deep callstacks, and memory snapshots – but it also makes sure to do this in a low-impact way to minimize overhead to your system. – Microsoft

Any type of profiling or tracing of web requests adds overhead of some form. The question is really how much overhead and is it acceptable for production servers.

Performance test setup & results

I tested the Service Profiler running via App Services in tandem with Application Insights as well as standalone on a Azure VM. I used loader.io to give it some constant load. I tested the Service Profiler with all default settings, including the 5% default sampling rate.

My test apps were a demo nopCommerce app as well as a custom app that has a bunch of common test scenarios that I use for testing Retrace. I tested sync, async, and various scenarios.

Response times went up slightly. Sometimes up to 50 milliseconds higher per request, most likely when sampling kicked in for the request.

Here is a screenshot showing my CPU and memory usage difference on an Azure App Service. The chart actually starts with the Service Profiler enabled. After it is disabled you can see that memory goes down a lot and the CPU (as measured in seconds here) went down about 10%. That 10% (relative) or so CPU change was consistent in my testing on an Azure VM as well.

So is it safe for production?

All types of profiling, tracing, or logging add some amount of overhead. From my testing, I would say it is safe to use in production. Overall the CPU and response times increased 5-20% (relative) which is relatively low and similar to other APM solutions. It would never be zero. So yes it is safe!

Would I recommend running it on production non stop?

Probably not since the data it collects isn’t very valuable unless you are trying to troubleshoot a really complicated problem. If all you want is stats around how long web requests are taking, Application Insights or Retrace is a better option and probably have less overhead. Since it can’t do things like you show you a SQL query, that also greatly limits the functionality for me. But I still believe it is an awesome tool for solving hard problems, it is just too complicated to use for simple problems. I can see using it in QA for performance tuning for sure!

The other unknown is what Microsoft will charge for the Azure Service Profiler once it comes out of preview. Perhaps it is just bundled in to the pricing of Application Insights or it could be a premium feature.

Overall, Microsoft has done a good job optimizing the overhead of it and my testing backs their stance that is it designed to be used in production.

How the Azure Service Profiler fits in your toolbox

Developers love tools and already have access to a wide variety of tools. Including Microsoft provided tools like Visual Studio Profiler, Intellitrace, and Application Insights. Plus popular third party tools like LINQPad, Prefix, Retrace, ANTS, and others.

It is an amazing tool for collect deep performance level statistics. I would say it is perhaps a unique tool in its own category. Deep code level details like you would expect from a standard .NET profiler, but only in the scope of a single web request.

It is sort of like Visual Studio Profiler or ANTS but capable of running on a busy server to collect individual transaction traces for review.

This functionality is similar to what most APM solutions aim to provide. Currently the Service Profiler provides a lot more details, but it also isn’t easy to use.

How does it compare to the data Retrace collects?

Our #1 goal with Retrace is to build a service that is very easy to use and is also safe for production. Our presentation of the profiling output is much, much simpler to view and understand (example above about the HTTP call).

Retrace collects key details like log statements, exceptions, SQL queries, cache keys being used, and lots of other little details and packages them up in a really easy to understand format. After the Service Profiler goes GA, we will write up more of a comparison.

Have you tried the Azure Service Profiler? Have any other thoughts or tips about it? Let us know in the comments!

The post Azure Service Profiler review – How does it fit in your toolbox? appeared first on Stackify.

Read the original blog entry...

More Stories By Stackify Blog

Stackify offers the only developers-friendly solution that fully integrates error and log management with application performance monitoring and management. Allowing you to easily isolate issues, identify what needs to be fixed quicker and focus your efforts – Support less, Code more. Stackify provides software developers, operations and support managers with an innovative cloud based solution that gives them DevOps insight and allows them to monitor, detect and resolve application issues before they affect the business to ensure a better end user experience. Start your free trial now stackify.com

Latest Stories
Today we can collect lots and lots of performance data. We build beautiful dashboards and even have fancy query languages to access and transform the data. Still performance data is a secret language only a couple of people understand. The more business becomes digital the more stakeholders are interested in this data including how it relates to business. Some of these people have never used a monitoring tool before. They have a question on their mind like “How is my application doing” but no id...
Join us at Cloud Expo June 6-8 to find out how to securely connect your cloud app to any cloud or on-premises data source – without complex firewall changes. More users are demanding access to on-premises data from their cloud applications. It’s no longer a “nice-to-have” but an important differentiator that drives competitive advantages. It’s the new “must have” in the hybrid era. Users want capabilities that give them a unified view of the data to get closer to customers and grow business. The...
The current age of digital transformation means that IT organizations must adapt their toolset to cover all digital experiences, beyond just the end users’. Today’s businesses can no longer focus solely on the digital interactions they manage with employees or customers; they must now contend with non-traditional factors. Whether it's the power of brand to make or break a company, the need to monitor across all locations 24/7, or the ability to proactively resolve issues, companies must adapt to...
It is ironic, but perhaps not unexpected, that many organizations who want the benefits of using an Agile approach to deliver software use a waterfall approach to adopting Agile practices: they form plans, they set milestones, and they measure progress by how many teams they have engaged. Old habits die hard, but like most waterfall software projects, most waterfall-style Agile adoption efforts fail to produce the results desired. The problem is that to get the results they want, they have to ch...
IoT solutions exploit operational data generated by Internet-connected smart “things” for the purpose of gaining operational insight and producing “better outcomes” (for example, create new business models, eliminate unscheduled maintenance, etc.). The explosive proliferation of IoT solutions will result in an exponential growth in the volume of IoT data, precipitating significant Information Governance issues: who owns the IoT data, what are the rights/duties of IoT solutions adopters towards t...
Wooed by the promise of faster innovation, lower TCO, and greater agility, businesses of every shape and size have embraced the cloud at every layer of the IT stack – from apps to file sharing to infrastructure. The typical organization currently uses more than a dozen sanctioned cloud apps and will shift more than half of all workloads to the cloud by 2018. Such cloud investments have delivered measurable benefits. But they’ve also resulted in some unintended side-effects: complexity and risk. ...
With the introduction of IoT and Smart Living in every aspect of our lives, one question has become relevant: What are the security implications? To answer this, first we have to look and explore the security models of the technologies that IoT is founded upon. In his session at @ThingsExpo, Nevi Kaja, a Research Engineer at Ford Motor Company, discussed some of the security challenges of the IoT infrastructure and related how these aspects impact Smart Living. The material was delivered interac...
The taxi industry never saw Uber coming. Startups are a threat to incumbents like never before, and a major enabler for startups is that they are instantly “cloud ready.” If innovation moves at the pace of IT, then your company is in trouble. Why? Because your data center will not keep up with frenetic pace AWS, Microsoft and Google are rolling out new capabilities. In his session at 20th Cloud Expo, Don Browning, VP of Cloud Architecture at Turner, posited that disruption is inevitable for comp...
In 2014, Amazon announced a new form of compute called Lambda. We didn't know it at the time, but this represented a fundamental shift in what we expect from cloud computing. Now, all of the major cloud computing vendors want to take part in this disruptive technology. In his session at 20th Cloud Expo, Doug Vanderweide, an instructor at Linux Academy, discussed why major players like AWS, Microsoft Azure, IBM Bluemix, and Google Cloud Platform are all trying to sidestep VMs and containers wit...
While DevOps most critically and famously fosters collaboration, communication, and integration through cultural change, culture is more of an output than an input. In order to actively drive cultural evolution, organizations must make substantial organizational and process changes, and adopt new technologies, to encourage a DevOps culture. Moderated by Andi Mann, panelists discussed how to balance these three pillars of DevOps, where to focus attention (and resources), where organizations might...
With major technology companies and startups seriously embracing Cloud strategies, now is the perfect time to attend 21st Cloud Expo October 31 - November 2, 2017, at the Santa Clara Convention Center, CA, and June 12-14, 2018, at the Javits Center in New York City, NY, and learn what is going on, contribute to the discussions, and ensure that your enterprise is on the right path to Digital Transformation.
No hype cycles or predictions of zillions of things here. IoT is big. You get it. You know your business and have great ideas for a business transformation strategy. What comes next? Time to make it happen. In his session at @ThingsExpo, Jay Mason, Associate Partner at M&S Consulting, presented a step-by-step plan to develop your technology implementation strategy. He discussed the evaluation of communication standards and IoT messaging protocols, data analytics considerations, edge-to-cloud tec...
When growing capacity and power in the data center, the architectural trade-offs between server scale-up vs. scale-out continue to be debated. Both approaches are valid: scale-out adds multiple, smaller servers running in a distributed computing model, while scale-up adds fewer, more powerful servers that are capable of running larger workloads. It’s worth noting that there are additional, unique advantages that scale-up architectures offer. One big advantage is large memory and compute capacity...
New competitors, disruptive technologies, and growing expectations are pushing every business to both adopt and deliver new digital services. This ‘Digital Transformation’ demands rapid delivery and continuous iteration of new competitive services via multiple channels, which in turn demands new service delivery techniques – including DevOps. In this power panel at @DevOpsSummit 20th Cloud Expo, moderated by DevOps Conference Co-Chair Andi Mann, panelists examined how DevOps helps to meet the de...
Cloud applications are seeing a deluge of requests to support the exploding advanced analytics market. “Open analytics” is the emerging strategy to deliver that data through an open data access layer, in the cloud, to be directly consumed by external analytics tools and popular programming languages. An increasing number of data engineers and data scientists use a variety of platforms and advanced analytics languages such as SAS, R, Python and Java, as well as frameworks such as Hadoop and Spark...