Lucidchart is a highly performant web application. When editing a Lucidchart document, users expect a high frame rate and a responsive UI. To ensure a great experience, we track frame rate and UI stalls. This article describes how we leverage an existing library, Zone.js, to accomplish this.
Why not just measure frame rate?
We could simply measure and report the frame rate directly. This would be easy to do and offer some insight. However, just the frame rate as a single number is not very actionable. What if we released new code and noticed that the average frame rate dropped by 5%? We wouldn’t know what is actually causing the decrease in performance.
The next logical step at this point would be to start “tagging” the code. Generally we have a pretty good idea of which parts of the code take up the most time (like saving, rendering, etc.). We could time each of these blocks of code and report them individually. Then we would know if the regression was in these blocks of code. However, we’d never be able to instrument every chunk of code using this approach. We might miss code, and it’s more reactive than we might hope.
As a result, we track each and every task that runs. Of course, doing this task doesn’t sound easy. Luckily, we can leverage an existing library to accomplish it.
What is Zone.js?
Zone.js can be thought of as a wrapper around tasks. It inserts itself into every task creation point. By doing so, Zone can know when a task begins and ends, what the task is, what kind of task it is, etc.
Zone is the “magic” behind Angular’s change detection, allowing Angular to ensure the view is updated anytime the application’s state could’ve possibly changed. For most people, this is the only use case for Zone that they’re aware of. In fact, Zone is entirely decoupled from Angular—you do not need to have an Angular application to track performance like this with Zone!
There is a great write-up of how Zone works internally, which I highly recommend reading.
In essence, there can be many nested layers of zones. Each zone can be configured to intercept tasks and/or do whatever they wish with them. Once done, the zone may pass the task to its parent zone, eventually reaching the root zone. The root zone, of course, will just execute the tasks as expected.
A zone is configured with a ZoneSpec, and the specific logic for executing tasks in a zone is contained in the ZoneDelegate. We could create an entirely new zone (Like NgZone for Angular). However, we would then have to ensure the entire application runs in this zone. Instead, we opted to hook into the root zone itself.
Below is an example implementation of hooking Zone like this:
Drag the blocks around the screen. You will notice that different blocks stall for different amounts of time while being dragged. We log output every “minute” (for the purposes of this demo, I am redefining a minute to be 10 seconds) that looks like this:
blocks_X is the number of blocks (we group microtasks with the task they’re attached to and call it a block) executed this minute that took X or less milliseconds. time_X is the cumulative execution time for these blocks in microseconds. There are a number of conclusions we could draw from this data, such as:
- The user spent about ~90% of this interactive time getting less than 60 FPS (time_17/time_0).
- The user experienced 10 stalls that lasted a quarter second or longer (blocks_250).
For every task that runs, we attempt to automatically tag it based on the function name (in our application, we have a way of manually tagging tasks as well). The ZoneTracker notes how long it takes to execute the task, and to bucket them up based on their tag. (e.g., the tagged task `_drag` took 50ms, so we will add it to the 50ms bucket for its tag.) We also report each tag’s contribution to the aggregated numbers shown above.
Note that we aren’t necessarily logging every function that runs. We care about tasks (because tasks are what will block rendering and input.) This example was rather basic, and the ZoneTracker can be extended in many ways. We have caught many performance problems with the data collected through these means!
This sample code is plug and play. Drop it into your application and see what it tells you!