Check out the dev/core collection at thegithubshop.com.

I have a personal vendetta against “dashboards.” Not because they’re not useful — I actually think they’re extremely useful — but rather because they’re generally built with the wrong user in mind, then used by a completely different user and for a different use case.
Let’s look at the origins of dashboards, how our usage has evolved, and most importantly, how to create single-purpose dashboards — what I call launchpads — that are built for their intended purpose.
Wallboards Aren’t for Debugging
Dashboards, as a lot of people think of them today, are more like “wallboards” — built as if they’re going to be put on a 75-inch wall-mounted TV in an office, thinking folks might spot an issue by looking at them. However, these dashboards end up being leveraged by engineers who use them as launchpads to investigate their systems.
The dashboards that are truly useful are curated around system problems, built to serve engineers on their machines. They’re defined by the teams who build and support their code to bring relevant information together so that, in the event of an incident, they can use that as the first — but crucially, not the only — place to look. They’re where on-call engineers go when alerts are fired.
Horses, Telemetry and Real-Time Decisions
The origin of the term “dashboard” (or simply “dash board”) is not modern; it’s actually really old. From what I can tell, the term originated from horse-drawn carriages where a wooden or leather board/apron was added in front of the driver to stop them being hit with debris from the road when the horses “dashed” (galloped faster), hence the name “dash board.” Over time, as we moved away from horse-drawn carriages into combustion engines, the panel in front of the driver became the dashboard.
The dashboard was then used to house readouts from the various instruments monitoring the vehicle’s vital information — for example, the fuel gauge, tire pressure or engine temperature. This information is important as the driver uses it to make real-time decisions. The term “dashboard” became the name for the place where we put our instrument panels.
This is something that translates well into the way we use dashboards today — or at least the principles we use to create them. We think about how we can use the details of the board to make real-time decisions as we watch them, hence why we place so much emphasis on autorefreshing.
My questions: Is that really how people use their dashboards? And if so, is that the most effective use of their time?
Metrics, Metrics Everywhere!
I think we settled on these “wallboard”-style graphs because Network Operations Centers (NOCs) were the pinnacle of monitoring. NOCs are amazing — the staff are some of the most diligent and intelligent people I’ve ever met. However, the issues they’re looking for and debugging are very different from those of a software development company.
Infrastructure analysis is a great use case for metrics (pre-aggregated time-series data with minimal dimensions) since we don’t need to be able to look at individual packet data. Watching CPUs for persistent spikes and correlating that with network traffic is great. At the time, that’s all we had — and because the software itself was fairly simple and noncritical, we didn’t have to worry too much about the internal details of our applications.
This idea that all companies need a NOC — and that a NOC is built in a particular way — has made engineers believe that they should have tons of wallboards, and that they should include graphs, which require metrics. The reality, however, is that NOCs are a different kind of monitoring — and they are about monitoring, not debugging or observability. What engineers who write applications need is not the same as what an operator needs in a NOC.
The other key part is that wallboards were built and curated by the people who built the machines, networks and overall infrastructure they were monitoring. To be clear, the people who built the networks built them in such standardized and uniform ways that building the dashboards was roughly the same from organization to organisation or data center to data centre.
In the midst of all this, Grafana became the standard visualization tool for metric data, as it still is for a lot of companies today. We started to see the proliferation of metric data from our off-the-shelf devices, even for commercial software products, meaning that these devices and products could provide standardized approaches to monitoring them.
Grafana added features like importing pre-built dashboards, features to combine data from different metrics databases in a single view and adding various different visualizations of that data. It was a glorious time for home enthusiasts who had dashboards for their home network, because graphs were cool, right? Right?! And having lots of graphs on a single monitor in your office was very much the “in” thing for geeks like me.
The question is, did these dashboards add value to my daily life? Nope. They did, however, make me feel cool, like I was doing something properly. They may have helped if I was getting a slow download, as I could glance over to the monitor and see if there was other traffic. They also taught me a lot about building — and most importantly, maintaining— monitoring systems. Namely, that I never want to do that myself!
Debugging in a Distributed World
While this revolution in monitoring was going on, we saw the rise of distributed systems. Later, event-driven architectures and microservices. Then even later, nanoservices and serverless. These different types of complex systems changed the way we thought about reliability, uptime, and ultimately, how we reasoned about the system’s behavior.
We found that our systems were essentially large Rube Goldberg machines of our own making, and that we needed a lot more information than percentile graphs to understand why something went wrong. From the complexity of the code itself, to the architectural design of the large distributed systems they live in, we just don’t have enough information from graphs.
We got to the stage where we could no longer diagnose the cause of issues just by looking at a dashboard. That didn’t diminish the usefulness of the dashboard for notification and overview of the situation — on the contrary, it means that the dashboards are a good starting point for the investigation.
Debugging Needs Direction
What we found was that these complex systems fail in interesting ways. They’re not always obvious, but generally, there’s some kind of graph somewhere that can indicate where to look for the underlying failure, but it won’t tell you why.
Enter dashboards again! But this time, their focus isn’t going on a TV in an office. Now, we’re using those dashboards as a “one-stop shop” of places to click, with contextual information that will help us uncover why something is failing. We use them for signposting, guiding the direction of debugging.
They’re a place for engineers who support applications to go as a first click from a runbook or alert on their journey to debugging. This is why I suggest that these are debugging launchpads. They’re not a destination; they’re the first stop on the journey.
The important characteristics of a launchpad over a wallboard are that we use a graphical representation of data to show curated, correlated insights about a problem or a service’s performance. We create these representations to show people where to look for problems, but not to notify them or immediately fix the problems. We use this data to help them find the next question to ask.
The best way to do this is to make each representation of data a link (or signpost) to launch into more questions or further investigation. I’ve equated this approach to the “Enhance! Enhance!” you’ve seen in TV shows like “CSI” where a pixelated image gives you an idea of where you want to look, but only by zooming and enhancing the data in a specific area are you able to see its true nature.
Ask More Questions, Get More Answers
We need to stop thinking about dashboards as a static representation of data and start thinking about them as a tool to aid in debugging. We need to understand that from each graphical representation, there needs to be a next step, another question to ask or an answer to be gained.
Make each panel on a board the start of an investigation. Launch the viewer into a path of questions that will give them the information they need to understand what’s going on.
The post Dashboards, or Launchpads? appeared first on The New Stack.