|By Andreas Grabner||
|April 27, 2014 03:00 PM EDT||
No matter how often you deploy your application or how sophisticated your delivery pipeline is, you always need to know the quality status of the software you are building. That can only be done if you measure it; but measure what exactly? In Part 1 we introduced the Concept of Quality Metrics in CD (Continuous Delivery) by looking at the metric # of Requests per End User Action. In Part 2 we will focus on metrics around database access.
You need to be aware of bad database access patterns right when they get introduced in your code. Whether the reason is incorrectly configured O/R (Object Relational) Mappers such as Hibernate, TopLink or JDO, or because of bad coding. Finding these problems immediately by looking at the right metrics will make it is easier for developers to fix the problem, which will reduce test cycles and give operations more confidence that a new deployment will not blow their current database server.
Examples of Bad Database Access Patterns
The following example is taken from a web application that displays an account report with 25 items per page that contains the names and high-level statuses of these accounts. The developers decided to go with Hibernate in order to access these account objects stored in the database. A closer look at the generated SQL queries when generating this report reveals that Hibernate not only loads the account objects that match the search query, but actually loads all account objects including all referenced objects in a related history table. These history objects are not needed at all to generate the report. The way Hibernate is used by the application (premature loading of all objects and referenced objects) results in more than 4000 SQL executions contributing 6s to the total page load time:
The way Hibernate is used by the application results in 4k+ individual SQL Statements, returning much more data than is actually needed for the report
If you want to learn more about database access problems check out load balancers cause database locks, when it is really the database to blame or the "Understanding Hibernate" Series: Part I - Session Cache, Part II - Query Cache and Part III - Second Level Cache.
Metric: Total Number of SQL Statements per Transaction
The first metric you want to take a closer look at is the total number of SQL executions per transaction. If you want to go a step further you can even monitor SELECT, INSERT, DELETE and UPDATE statements separately.
If you're always aware how many database statements are executed for your individual transactions (Login, Search, Checkout) and you monitor this along the delivery pipeline for every build, you will immediately see how the newly added functionality impacts the load on your database. The following screenshot shows a way to track this number across builds and across your different deployment stages. In this scenario, Developers extended the search feature in Build #3 by making an additional call to a 3rd party recommendation service. Build #3 suddenly shows a huge spike in SQL queries in the Load Stage and Production. Why is that?
A new call to an external third-party service introduced with Build 3 has major impacts on the load (capacity stage) and production environment when this new feature has to deal with real production data
What can we learn from these metrics above?
- Commit stage: The executed Unit Tests in that stage didn't catch the problem because the call to the third-party service was mocked and therefore no actual DB calls were executed by that service.
- Acceptance stage: A change in behavior was detected due to the additional call to the recommendation service which executed 2 additional SQLs. This could already be a warning sign but will probably still stay unnoticed.
- Capacity stage: a 200x increase in DB calls must stop this deployment so that it never reaches Live. The reason why we have 200x and not only two should also trigger the integration tests to be executed against more than sample database content.
- Production: 4k SQLs instead of just two is a huge impact on production. This again can be explained because the production database has "real life" data and the database access pattern of this third-party service queries every piece of data. If this really makes it into production and we monitor this data down to the transaction level, it is easy to pull this change back and engineering can immediately start working on the problematic area. Most important, however, is that this build never makes it into production because the problem was found in testing already!
How to Measure on Dev Workstations
Developers can look at this data by either turning on certain SQL Logging options of the frameworks that they are using, e.g., Hibernate (see stackoverflow discussion). They can profile their code using the profiler that comes with their IDE or use tools such as the Development Edition of dynaTrace (or the 15 Days Free Trial) to see all database calls made by their own code or the code that they are calling. The following shows a screenshot taken from Database Access Patterns Gone Wild and shows which data can be analyzed on a local machine:
Developers can analyze which SQL statements are executed by their own code or third-party frameworks they use. In this case it was code executed by Telerik to populate .NET control data.
For more measurement tips, and for further insight, click here for the full article
Oct. 25, 2016 08:00 AM EDT Reads: 5,733
Oct. 25, 2016 07:45 AM EDT Reads: 4,881
Oct. 25, 2016 07:45 AM EDT Reads: 1,392
Oct. 25, 2016 06:45 AM EDT Reads: 4,815
Oct. 25, 2016 06:15 AM EDT Reads: 11,432
Oct. 25, 2016 05:45 AM EDT Reads: 3,352
Oct. 25, 2016 05:30 AM EDT Reads: 1,519
Oct. 25, 2016 05:30 AM EDT Reads: 3,375
Oct. 25, 2016 05:15 AM EDT Reads: 1,997
Oct. 25, 2016 04:45 AM EDT Reads: 1,089
Oct. 25, 2016 04:30 AM EDT Reads: 9,704
Oct. 25, 2016 04:15 AM EDT Reads: 1,740
Oct. 25, 2016 04:15 AM EDT Reads: 1,012
Oct. 25, 2016 04:15 AM EDT Reads: 954
Oct. 25, 2016 04:00 AM EDT Reads: 967