![]() ![]() We’ve been fixing performance issues for a long time and we know that there are various things that can go wrong: Perfect, now we understand how long it takes to make an API call on the client. That gave us something that looked like this: There was only one problem - while our backend might think a request took 80ms to service, the network delay meant that our clients were seeing something totally different! At the end of the day user perception is what we really care about, which meant it was time to take request tracing to the client.Ī reasonable start was to trace how long it takes to run the request from the client’s perspective. ![]() With that in place, we could look at any endpoint to see how long our backend spent servicing requests and which services were slowing us down. Tracing at Slack outlines how we implemented tracing across our API layer. (This will come in handy later)įor more information on what tracing is and our infrastructure that supports it, see our previous blog post, Tracing at Slack Understanding client / server latency They all have the same trace_id, which allows our tooling to group them together. This gives you a single picture describing latency in a multi-service architecture.Įvery span in the trace can be thought of as a structured log. The “distributed” in distributed tracing comes from its ability to combine spans from separate services, on separate machines, into a single trace. ![]() This allows you to not only tell when something gets slower, but also pinpoint where and why it got slower. Tracing allows you to instrument how long an action takes (called a span ) and break it down into smaller sub-actions (called child spans ). Part I: Infrastructure A quick tracing primerĭistributed tracing is a common technology for managing distributed services, but it’s still quite new for client instrumentation. In this post we’re going to explain how we shipped our tracing stack to our iOS, Android, and JavaScript clients, and what we learned along the way. Our mobile and desktop client infrastructure team did a survey of available technologies and, you guessed it, tracing was the perfect tool for the job!ĭistributed tracing is a widely-used technology for understanding latency in a multi-service web application, but we found it opens up a new world of possibilities for understanding performance on the client as well. We were tired of letting our users down by not being able to diagnose their issues. Slack client engineers were running into situations like this too frequently, where they knew a performance issue existed but couldn’t find the underlying cause. SLACK DESKTOP APP BLURRY ANDROIDLooking at the customer’s logs, you see that it takes over 1.5 seconds to switch between channels on their Android client! That must be what they’re talking about, but what’s causing the slowdown? Nothing else pops out to you in the logs and the trail’s gone cold. Performance problems can be a real struggle to track down, especially if they aren’t easily reproducible. A customer writes in and says the dreaded words: “My app is slow”. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |