Thursday, 18 May, 2017 UTC


Summary

JavaScript Errors can happen for many different reasons: special behavior of certain browsers that weren’t tested; a real coding mistake that slipped through the delivery pipeline; poorly handled timeouts, and I am sure the list goes on.
In this blog we discuss yet a different reason which was brought to my attention by Gerald Madlsperger, one of our long term Dynatrace AppMon super users: A CDN server issue resulting in non-delivered static resource files (CSS) leading to spikes in JavaScript errors.
Let’s review the steps that Gerald took to identify and analyze the issue, and learn which metrics he looked at on both the end-user and server sides. And, because not everyone is fortunate enough to have an expert like Gerald in their team, we show you how Dynatrace automates these steps through our Problem Pattern Detection and with Artificial Intelligence.

The Impact: JavaScript Exception Spikes

The problem that Gerald dealt with was visible in a daily spike of JavaScript errors for a particular web application. The spike always occurred at the same time – between 10:30 and 10:35 a.m.:
Charting the number of captured JavaScript errors captured through real user monitoring from Dynatrace AppMon
The impact was seen across every browser and every geo location. So it was not something that they simply missed in testing, nor was it a problem related with connection or timeout issues in a certain geo location.

The Problem: Object not found errors

To learn more about these errors Gerald compared the type of errors occurring prior to the spike and those that occurred during the spike. He wanted to see whether there was a certain pattern, or type of JavaScript error, that occurred more often within that time frame, hoping that this would take him one step closer to the root cause.
It turned out the JavaScript errors that occurred more frequently during these five minutes were all around HTML objects that couldn’t be found on the page by some of the JavaScript code:
Comparing JavaScript Errors that happen in two different time frames makes it easy to see which Errors are causing the spike.
Therefore, the problem was not necessarily bad JavaScript code, but most likely related to components that were missing or couldn’t be loaded on the page.

Root Cause: Slow CDNs caused by bad CRON job

Gerald’s next step was to drill down to some of these real end-user browser sessions. He wanted to see whether there was anything else abnormal about them. Turns out that most of these users had one thing in common: a very slow responding CDN Server:
User Action PurePaths show that content from their CDN servers was extremely slow in downloading.
As a final step Gerald created the following chart where he correlates the number of JavaScript Errors with the Download Time from that CDN Server. Now it was clear that very day from 10:30 – 10:35 a.m. there was a download time spike on the CDN Server that correlated with the spike in JavaScript errors:
Clear correlation between slow CDN download times to spikes in the number of JavaScript errors.

CRON Jobs to be blamed

After discussing this data with the systems engineers it turned out that two of their CDN Servers ran the same CRON job for log file rotation at the exact same time every day. This resulted in a brief outage of the CDN. That outage caused a delay or failed loading of static CSS files which resulted in the JavaScript code generating “object not found” errors.

Better rely on Dynatrace in case Gerald is not there for you!

First: Hats off to Gerald for doing a great job digging through the Dynatrace AppMon data. Also, thanks for sharing the dashboards which are useful when dealing with CDNs or 3rd parties.
While Dynatrace AppMon collects all this data to make troubleshooting of these problems easier, it requires you to know how to navigate the data. Because of scenarios shared by Gerald and others over the years, we have made significant investment in automating error, problem and root-cause detection.
In the latest versions of Dynatrace AppMon (sign up for your lifetime AppMon Personal License) we automate problem pattern detection, and highlight the “Top Findings” for both End-User and Server-Side Performance Hotspots in the Dynatrace AppMon Web Interface:
Dynatrace AppMon automatically shows you the top findings on why end user or server side performance is impacted
In the Dynatrace SaaS/Managed platform (sign up for our Dynatrace SaaS trial) we went a step further by running all this data through our Artificial Intelligence Engine. A problem like the one Gerald detected would pop up in a Problem Ticket, and include the information on the Impact and Root Cause. This allows you to analyze and fix these problems, even if you don’t have an expert like Gerald on your team. It just means that you can spend more time on innovating, rather than bug hunting.
Dynatrace Artificial Intelligence automatically shows you Impact and Root Cause of any type of End User, Server Side or Infrastructure Issue.
If you have stories like this one that you want to share with your peers, please let us know. Send me an email to Share your PurePath or your Best Artificially Detected Problem Pattern.
The post Correlating JavaScript Errors with Slow CDN Performance appeared first on Dynatrace blog – monitoring redefined.