Welcome!

Blog Feed Post

Azure Service Profiler review – How does it fit in your toolbox?

About a year ago Microsoft released the Azure Service Profiler which is designed to be a lightweight profiler for ASP.NET applications. They recently enabled it to work with Application Insights and it is easy to enable for Azure App Services. Since we use App Services and love anything to do with app performance, I thought I would give it a try and see how it compares to other tools.

Note: The Service Profiler is still advertised as a “preview” offering and is not GA.

What is the Azure Service Profiler?

It is a transaction profiler for ASP.NET apps. It is designed to work with ASP.NET apps deployed anywhere, even outside of Azure. However, it uploads the collected data to Azure table storage where the data is then processed by Microsoft. So the name “Azure Service Profiler” is perhaps a little confusing because it can profile more than Azure. It also isn’t a true “.NET CLR profiler” because it uses ETW for data collection, not normal code profiling techniques.

It is designed to collect data in relation to individual web requests, or essentially individual transaction traces. I have written before about how there are 3 types of .NET profilers. Service Profiler is weirdly a mix of all 3 types. A standard profiler, transaction tracing and APM.

Service Profiler is a performance analysis tool used by teams at Microsoft running large-scale services in the cloud, and is optimized for troubleshooting issues in production. – Microsoft

Playing with the online demo

You can play with their online demo to get an idea of what type of data it collects.

Here is a screenshot showing how it plots out the performance of a single action in your app, which is a cool visual to understand percentiles.

Service Profiler diagram of request performance

Service Profiler diagram of request performance

If you select a trace for a specific request, you can dive in to lots of gory details.

Azure Service Profiler trace view

Service Profiler individual trace

Traces are full of details, lots of details

BLOCKED_TIME, EventData, OTHER, dynamicClass_lamda_method, C3PO, R2D2, etc

My immediate reaction to this is… WTF does all this mean?

My screenshot above is just a fraction of the entire trace of what it collected. It provides an overwhelming amount of detail. I feel like I should have a computer science degree to figure it out (which I don’t have). Out of all details it provides, all I can really tell is it looks like my request is doing some database queries. However, I can’t tell what the SQL query was. So… ?

It looks like Microsoft was aiming to help provide every possible detail to help developers solve really hard problems. If I was doing hard core performance tuning, I could see how this could be useful. But if all I want to know is why did my request take 3 seconds… it provides an avalanche of data.

I just want to know what the SQL query is that was slow. I want actionable data I can quickly understand, fix the problem, and go on about my day.

Trying the Service Profiler on my dev box

Being able to use it on my dev box is awesome! I can totally see using this for performance tuning during development, just as you would use the Visual Studio Profiler or ANTS. Installing it is simple. I logged in to http://azureserviceprofiler.com and created a data cube for my dev box. Downloaded the agent and started it up. It runs as a simple console app. You can see how it subscribes to various ETW events. It also easy to install on Azure App Services via Application Insights.

Service Profiler running on my dev box

By default it only profiles 5% of your requests and you can modify the sampling rate to adjust it as you see fit. For a dev box you probably want to increase it to 100% sampling so you can quickly find any request to inspect. BTW, it will be interesting to see how it compares to Prefix over time. The combination of the two would be amazing.

After changing it to sample 100% and letting my browser auto refresh a page for a while, I went back in and played with the data it collected.

Viewing exceptions

It noticed that my request has an exception on every request that gets thrown away. That is really nice.

Service Profiler Exceptions

When I selected a specific trace I was able to find my exception in the trace.

Exception in Trace

Viewing SQL queries… were called, not the query

Like the online demo, I can tell that my code is running 8 SQL queries, but I can’t see what the SQL statements are or any real details about it. To be really useful, you need the raw SQL statements.

Trace view showing 8 SQL Queries

HTTP call example – Code to trace comparison

OK this time, let’s compare my code to what the trace looks like.

Here is my code. A really simple MVC action that downloads a web page with the HttpClient.

        public async Task HttpClientAsync()
        {
            log.Debug("Starting HttpClient.GetStringAsync()");
            string data;
            using (HttpClient hc = new HttpClient())
            {
                data = await hc.GetStringAsync("http://stackify-nop-prod.azurewebsites.net/blog");
            }

            log.Debug("Completed HttpClient.GetStringAsync()");

            return Request.CreateResponse(HttpStatusCode.OK, data);
        }

But here is how it looks in the trace. So obviously the code only does an HTTP call and that should have taken the whole 324-330ms. In the trace it shows it took 1.15ms and then you can see a AWAIT_TIME of 324.77. The other thing that is weird is the “HTTP Activities” part is separate and that part actually shows the URL that was downloaded in only 0.04ms (not 324ms).

Service Profiler view of HTTP call

As a comparison, here is how Retrace/Prefix displays the same type of information (including the log statements).

Retrace view of HTTP Client

Finding slow methods

The best thing I have seen about the profiler is that it tracked some methods that took a lot of time in my code all by itself. In this example I can see that JSON deserialization is taking a lot of time. Awesome!

Find slow methods

Is the Azure Service Profiler really safe for production?

Microsoft claims that the profiler is build for running against production applications. From my testing, it collects a lot of detailed data. The real question is can you run it at all times like an APM solution, or is it designed to run for a short period of time to try and capture detailed data about a problem in production. Even being able to use it occasionally could be very useful for chasing down hard problems.

Service Profiler makes it easy to collect performance data while your service is handling real production load, collecting detailed request duration metrics, deep callstacks, and memory snapshots – but it also makes sure to do this in a low-impact way to minimize overhead to your system. – Microsoft

Any type of profiling or tracing of web requests adds overhead of some form. The question is really how much overhead and is it acceptable for production servers.

Performance test setup & results

I tested the Service Profiler running via App Services in tandem with Application Insights as well as standalone on a Azure VM. I used loader.io to give it some constant load. I tested the Service Profiler with all default settings, including the 5% default sampling rate.

My test apps were a demo nopCommerce app as well as a custom app that has a bunch of common test scenarios that I use for testing Retrace. I tested sync, async, and various scenarios.

Response times went up slightly. Sometimes up to 50 milliseconds higher per request, most likely when sampling kicked in for the request.

Here is a screenshot showing my CPU and memory usage difference on an Azure App Service. The chart actually starts with the Service Profiler enabled. After it is disabled you can see that memory goes down a lot and the CPU (as measured in seconds here) went down about 10%. That 10% (relative) or so CPU change was consistent in my testing on an Azure VM as well.

So is it safe for production?

All types of profiling, tracing, or logging add some amount of overhead. From my testing, I would say it is safe to use in production. Overall the CPU and response times increased 5-20% (relative) which is relatively low and similar to other APM solutions. It would never be zero. So yes it is safe!

Would I recommend running it on production non stop?

Probably not since the data it collects isn’t very valuable unless you are trying to troubleshoot a really complicated problem. If all you want is stats around how long web requests are taking, Application Insights or Retrace is a better option and probably have less overhead. Since it can’t do things like you show you a SQL query, that also greatly limits the functionality for me. But I still believe it is an awesome tool for solving hard problems, it is just too complicated to use for simple problems. I can see using it in QA for performance tuning for sure!

The other unknown is what Microsoft will charge for the Azure Service Profiler once it comes out of preview. Perhaps it is just bundled in to the pricing of Application Insights or it could be a premium feature.

Overall, Microsoft has done a good job optimizing the overhead of it and my testing backs their stance that is it designed to be used in production.

How the Azure Service Profiler fits in your toolbox

Developers love tools and already have access to a wide variety of tools. Including Microsoft provided tools like Visual Studio Profiler, Intellitrace, and Application Insights. Plus popular third party tools like LINQPad, Prefix, Retrace, ANTS, and others.

It is an amazing tool for collect deep performance level statistics. I would say it is perhaps a unique tool in its own category. Deep code level details like you would expect from a standard .NET profiler, but only in the scope of a single web request.

It is sort of like Visual Studio Profiler or ANTS but capable of running on a busy server to collect individual transaction traces for review.

This functionality is similar to what most APM solutions aim to provide. Currently the Service Profiler provides a lot more details, but it also isn’t easy to use.

How does it compare to the data Retrace collects?

Our #1 goal with Retrace is to build a service that is very easy to use and is also safe for production. Our presentation of the profiling output is much, much simpler to view and understand (example above about the HTTP call).

Retrace collects key details like log statements, exceptions, SQL queries, cache keys being used, and lots of other little details and packages them up in a really easy to understand format. After the Service Profiler goes GA, we will write up more of a comparison.

Have you tried the Azure Service Profiler? Have any other thoughts or tips about it? Let us know in the comments!

The post Azure Service Profiler review – How does it fit in your toolbox? appeared first on Stackify.

Read the original blog entry...

More Stories By Stackify Blog

Stackify offers the only developers-friendly solution that fully integrates error and log management with application performance monitoring and management. Allowing you to easily isolate issues, identify what needs to be fixed quicker and focus your efforts – Support less, Code more. Stackify provides software developers, operations and support managers with an innovative cloud based solution that gives them DevOps insight and allows them to monitor, detect and resolve application issues before they affect the business to ensure a better end user experience. Start your free trial now stackify.com

Latest Stories
SYS-CON Events announced today that Linux Academy, the foremost online Linux and cloud training platform and community, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Linux Academy was founded on the belief that providing high-quality, in-depth training should be available at an affordable price. Industry leaders in quality training, provided services, and student certification passes, its goal is to c...
DevOps is often described as a combination of technology and culture. Without both, DevOps isn't complete. However, applying the culture to outdated technology is a recipe for disaster; as response times grow and connections between teams are delayed by technology, the culture will die. A Nutanix Enterprise Cloud has many benefits that provide the needed base for a true DevOps paradigm. In his Day 3 Keynote at 20th Cloud Expo, Chris Brown, a Solutions Marketing Manager at Nutanix, will explore t...
Providing the needed data for application development and testing is a huge headache for most organizations. The problems are often the same across companies - speed, quality, cost, and control. Provisioning data can take days or weeks, every time a refresh is required. Using dummy data leads to quality problems. Creating physical copies of large data sets and sending them to distributed teams of developers eats up expensive storage and bandwidth resources. And, all of these copies proliferating...
SYS-CON Events announced today that SoftLayer, an IBM Company, has been named “Gold Sponsor” of SYS-CON's 18th Cloud Expo, which will take place on June 7-9, 2016, at the Javits Center in New York, New York. SoftLayer, an IBM Company, provides cloud infrastructure as a service from a growing number of data centers and network points of presence around the world. SoftLayer’s customers range from Web startups to global enterprises.
SYS-CON Events announced today that StorageCraft Technology Corp, a global leader in backup and disaster, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. The StorageCraft family of companies, founded in 2003, provides award-winning backup, disaster recovery, system migration and data protection solutions for servers, desktops and laptops in addition to powerful data analytics.
In his session at @ThingsExpo, Eric Lachapelle, CEO of the Professional Evaluation and Certification Board (PECB), will provide an overview of various initiatives to certifiy the security of connected devices and future trends in ensuring public trust of IoT. Eric Lachapelle is the Chief Executive Officer of the Professional Evaluation and Certification Board (PECB), an international certification body. His role is to help companies and individuals to achieve professional, accredited and worldw...
SYS-CON Events announced today that Auditwerx will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Auditwerx specializes in SOC 1, SOC 2, and SOC 3 attestation services throughout the U.S. and Canada. As a division of Carr, Riggs & Ingram (CRI), one of the top 20 largest CPA firms nationally, you can expect the resources, skills, and experience of a much larger firm combined with the accessibility and attent...
SYS-CON Events announced today that Technologic Systems Inc., an embedded systems solutions company, will exhibit at SYS-CON's @ThingsExpo, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Technologic Systems is an embedded systems company with headquarters in Fountain Hills, Arizona. They have been in business for 32 years, helping more than 8,000 OEM customers and building over a hundred COTS products that have never been discontinued. Technologic Systems’ pr...
SYS-CON Events announced today that CA Technologies has been named “Platinum Sponsor” of SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY, and the 21st International Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. CA Technologies helps customers succeed in a future where every business – from apparel to energy – is being rewritten by software. From ...
With major technology companies and startups seriously embracing Cloud strategies, now is the perfect time to attend @CloudExpo | @ThingsExpo, June 6-8, 2017, at the Javits Center in New York City, NY and October 31 - November 2, 2017, Santa Clara Convention Center, CA. Learn what is going on, contribute to the discussions, and ensure that your enterprise is on the right path to Digital Transformation.
HyperConvergence came to market with the objective of being simple, flexible and to help drive down operating expenses. It reduced the footprint by bundling the compute/storage/network into one box. This brought a new set of challenges as the HyperConverged vendors are very focused on their own proprietary building blocks. If you want to scale in a certain way, let’s say you identified a need for more storage and want to add a device that is not sold by the HyperConverged vendor, forget about it...
SYS-CON Events announced today that Loom Systems will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Founded in 2015, Loom Systems delivers an advanced AI solution to predict and prevent problems in the digital business. Loom stands alone in the industry as an AI analysis platform requiring no prior math knowledge from operators, leveraging the existing staff to succeed in the digital era. With offices in S...
FinTech is the sum of financial and technology, and it’s one of the fastest growing tech industries. Total global investments in FinTech almost reached $50 billion last year, but there is still a great deal of confusion over what it is and what it means – especially as it applies to retirement. Building financial startups is not simple, but with the right team, technology and an innovative approach it can be an extremely interesting domain to disrupt. FinTech heralds a financial revolution that...
What if you could build a web application that could support true web-scale traffic without having to ever provision or manage a single server? Sounds magical, and it is! In his session at 20th Cloud Expo, Chris Munns, Senior Developer Advocate for Serverless Applications at Amazon Web Services, will show how to build a serverless website that scales automatically using services like AWS Lambda, Amazon API Gateway, and Amazon S3. We will review several frameworks that can help you build serverle...
Most companies are adopting or evaluating container technology - Docker in particular - to speed up application deployment, drive down cost, ease management and make application delivery more flexible overall. As with most new architectures, this dream takes a lot of work to become a reality. Even when you do get your application componentized enough and packaged properly, there are still challenges for DevOps teams to making the shift to continuous delivery and achieving that reduction in cost ...