PricingDocs
profile picture for Bitdrift Team

Bitdrift Team

Why crash reporting tools don’t give you the full picture

Crash reporting tools are industry standard for a reason, but there’s a problem with this category of tools: they don’t give you the full picture. Read on to find out more.

why-crash-reporting-tools-wont-give-you-the-full-picture-cover
The vast majority of user sessions in a mature mobile app don’t result in a crash. In fact, crashes make up just 0.01% to 0.1% of total sessions, according to providers . That means that for almost all of your users, their day-to-day experience in your application is not affected by a crash at all. The question then becomes, why do we spend so much time and money on crash reporting tools? Crash reporting tools are often seen as “good enough” to keep the user experience in check. But it turns out, “good enough” hides a lot under the surface. When you rely on crash data alone, serious stability, reliability, and usability issues can easily slip through the cracks. Of course, crashes do matter. They hurt your users and your bottom line, and they need to be fixed fast. But crashes aren’t the only thing that matters — and crash reports alone can’t tell you the whole story. In this post, we’ll dive into what crash reporting tools don’t show you - and the impact that non-fatal issues can have on your users.

Why crash reporting tools often fail to uncover real user issues

Most crash reporting tools provide an out-of-the-box dashboard for monitoring app crashes across your install base. You add the tool’s SDK to your app, and when your app crashes, you receive a detailed data dump that (hopefully) details everything that was happening on the device at the time of the crash. Crash reporting tools typically keep historical data, allowing you to monitor the cause of a crash and the conditions under which the crash occurred to aid with resolution and track regressions. Crash reporting tools will usually perform statistical analyses to prioritize fixes as well. Crash tools help with crashes. But most user problems aren’t crashes — and that makes them effectively invisible to your tools. Across bitdrift’s user base - and others - the average crash rate sits between 0.01% and 0.1%. That means 99.9%+ of sessions are crash-free. But crash-free doesn’t mean issue-free. Many user sessions could still contain unexpected behavior that leads to a poor user experience — for instance, a financial app reporting incorrect totals. To effectively resolve real user issues and continue to evolve your product and grow your audience, you need crash reporting tools that do not just focus on the worst-case scenario. Flexible, adaptable, and comprehensive coverage of all app and user behavior is key.

Unseen and unfixed: failure modes that are non-fatal

The Engineering Reliable Mobile Applications report , released by Google Cloud and O’Reilly, suggests a concept of “unavailability” of an application. This report shares some examples of negative user experiences that crash reporting tools may not catch:
  • You tap the app's icon on your device's home screen, the splash screen displays, and then the app immediately vanishes.
  • While using the app, a message states “application has stopped” or “application not responding.” According to an Instabug report , such issues may be as frequent as an additional 0.1% of sessions.
  • You tap a button (probably repeatedly), and the app, while nominally running, makes no visible sign of responding.
  • The app displays a blank screen or stale data, requiring you to refresh or relaunch it.
  • Slow or unavailable backend services and poor connectivity leads to the app being delayed or stuck loading data indefinitely.
  • You experience poor performance or unexpected battery drain, especially after an update.
"Unavailability" is not limited to these kinds of failures. It’s anything that blocks a user from doing what they came to your app to do. Take an example of a ride-sharing app. The success metric at the business level is likely rides delivered. Anything that prevents the ride from being delivered can be considered ”unavailability,” even though the actual app and the web service behind it may be up and running. Failure situations for this example may include:
  • The user requested a ride and received a confirmation message, but the request never made it to the app’s backend.
  • The app got stuck making the request, perhaps retrying with a continually increasing back-off due to a connectivity issue but never outright failing.
  • The user missed their ride due to outdated or inaccurate information.
  • An external service delayed notifications, causing the driver to miss the passenger.
None of these are “crashes.” But they’re critical failures that ultimately prevent the user from getting a ride. Anyone looking to understand user experience (rather than just crash rates) will need to implement additional analytics and monitoring tools to effectively capture the diverse causes of unavailability. Some organizations attempt to address app availability through backend observability, but that doesn’t work for a few key reasons: if your backend observability is all operational, but app users are experiencing an issue, you have an availability problem regardless of what the backend tools declare.

You can’t solve mobile issues through backend observability alone

Many teams assume backend observability is enough. If the app doesn’t crash and the backend’s green, it’s all good — right? Wrong. If a user action never makes it to the backend, your tools won’t see it. No crash, no request, no log. Just a broken experience. Organizations that employ this method are missing the many failure modes between the user and the backend beyond just an app crash. For example, going back to our rideshare company, if a ride request never makes it to the backend, the backend observability tooling will never know that an issue occurred. The app didn't crash and a backend request never arrived, so the tools collected absolutely no information or insight.

Everyone needs a service-level objective

To assess unavailability beyond simple crash reporting, you need to determine what availability and success looks like for your app. You probably don't need to know every time an app stutters, but being able to pick up on important non fatal issues is critical. To monitor what really matters, you need a clear service-level objective (SLO): what does success actually look like for your app? In a ride-sharing app, for example, success isn’t “no crashes” — it’s “user requests ride → ride gets delivered.” Anything that breaks that flow, even if nothing crashes, is a problem. And crash tools won’t catch it. To track this SLO, the ride-sharing app may monitor:
  • Which sessions have ride intent
  • What steps are needed to facilitate a ride and what portion of rides “fall through” at any of the steps
  • The total number of rides happening as the result of mobile app sessions
As we already mentioned, crash reporting tools are not designed to collect, parse, and display all of these data points in a useful way, with context that makes it possible to identify unavailability (including crashes), and what led to it. On top of the architectural reasons why crash tools miss important context, many of them also rely on sampling to deal with large volumes of data. When it comes time to debug individual customer issues, understand device state and hard-to-reproduce bugs, or spot trends across specific device types, app versions, etc., the level of sampling employed by the majority of crash reporting tools very quickly becomes untenable. On the other hand, because of the way most crash reporting tools are designed and priced, sending complete crash reports with full data every time may become expensive, both in terms of data costs and the performance impact for your app’s end users.

Making unavailability visible: two approaches (and why they fall short)

Observability vendors are slowly beginning to align themselves with the idea that crash reporting by itself isn’t sufficient. Unfortunately, the architecture of most crash reporting tools today is still optimized for crash reporting, and all other mobile observability concerns merely get tacked on. Because most crash reporting tools are capable of snapshotting a lot of information at once to capture a crash, vendors often approach all other issues in one of two ways:
  • Option 1: Log and send everything, then throw away what you don’t need once it’s in the backend. While logging everything may sound helpful, constantly sending large amounts of telemetry creates a significant cost both to the developer and, more importantly, to the end user in terms of mobile data as well as compute power and battery.
  • Option 2: Sample the data so that only a small percentage of users or sessions send the full amount of information, and hope that the datapoints that make it through are representative of the overall user base. This approach is less costly (for most users), but it has the downside of potentially missing serious issues or misjudging their commonality due to heavy sampling.
Both approaches miss important edge cases, and can reduce the ability of developers to understand user problems in their applications.

A better solution: crash reporting, real user monitoring, and on-device filtering

You don’t need to choose between logging everything or seeing nothing. While traditional observability & crash reporting tools either heavily sample or let you log everything at an outrageously high cost, bitdrift Capture approaches things differently. By relying on on-device storage, we eliminate the costs associated with sending data to the backend, and instead let the user choose what to store based on criteria they define. This approach offers the best of both worlds: full logging and detailed data, but only sending what you need. All of the information you need to discover and debug unavailability is logged locally, but you only transmit, store, and analyze what you consider relevant. What makes this approach so powerful is the ability to configure your filters on the fly: rather than needing to re-release your app every time you want to collect more telemetry, you can simply update a workflow in bitdrift Capture and immediately start receiving new data from the devices you target. At bitdrift, we've developed a new class of tools that does just this with out-of-the-box mobile SDKs that integrate just like your existing crash reporting tools, but provide much, much more insight into how your users actually behave in your app, how your app performs, and what causes things to go wrong. Want to see what your crash reports are missing? Try bitdrift for free to get unparalleled visibility into your mobile application.

Stay in the know, sign up to the bitdrift newsletter.

Author


profile picture for Bitdrift Team

Bitdrift Team