Should you trust your wearable? What your recovery score isn’t telling you.
We love our health tech, but it might be misleading us about what our bodies really need.
I’m a scientist. I love technology. I love data. And I love that more people than ever are paying attention to their health. But I also think we’ve reached a point where our obsession with quantifying everything may be doing more harm than good. Every morning, millions of us wake up, open an app, and decide how to feel about our bodies based on whether our recovery score is green, yellow, or red. We’re not asking ourselves how we slept or how we feel. We’re asking what our devices think.
The promise and the problem of wearables.
Wearables are incredible, in theory. I use them in my research, and think biometric data is game changing for clinical trials and interventional studies. They offer real-time insights into heart rate variability, sleep duration, resting heart rate, and activity levels that can be deployed into real world study designs. For the general consumer, they can help track patterns, spot overtraining, or even reinforce healthy habits.
Strong Process is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.
Research-grade versus consumer-grade tech.
It’s also important to note that the data I see in my research is very different from what consumers see. In my lab, we have access to “research-grade” data, which captures every heartbeat, every second, and every subtle change over a 24-hour period. That level of detail allows for precise analysis, but it’s not what appears on your app. The summaries, trends, and scores that the general public sees are highly processed, and much of the nuance is lost. This distinction matters when considering how wearables should be used outside of a research or clinical setting.
Proprietary algorithms, questionable scores.
A recent paper in Translational Exercise Biomedicine evaluated composite health scores from major wearable companies like Fitbit, Oura, WHOOP, and Garmin. These scores have become the centerpiece of most wellness apps, designed to quantify readiness, recovery, or stress. But the researchers found that not a single manufacturer discloses how their scores are calculated, and very few provide peer-reviewed evidence that they actually reflect meaningful physiology. Even when the same person wears multiple devices, the numbers can vary dramatically. That’s because each company uses different data timeframes, weights HRV or sleep differently, and feeds it all through proprietary formulas.
This isn’t just an academic concern. I saw it play out recently when a client upgraded from a WHOOP 4.0 to a 5.0 and her calorie burn suddenly dropped by about 250 calories per day. Same workouts, same body, same heart rate. The only difference was a new algorithm. Check out her calorie burn stats below. Can you tell where she switched to the new device?!
If your daily routine, your food intake, or your recovery plan depends on these numbers, that matters. You’re not necessarily responding to your body anymore. You’re responding to software.
Can backward-looking metrics predict future performance?
Another critical point is that almost all wearable metrics are backward-looking. They summarize what has already happened rather than predicting future performance. And that leads to an even more interesting question: What do these numbers actually mean for our future behavior? If your recovery score is red, does that mean you should lay on the couch all day? Should a “bad” sleep score make you skip your morning workout? The reality is, there is no data to show that these scores actually predict future performance or that they should guide day-to-day decisions. In fact, sometimes the opposite may be true.
HRV isn’t always what it seems.
Research on vagal tone, which is what HRV reflects, has found that lower HRV isn’t always bad. In one fascinating review of three studies on active-duty military personnel undergoing high-stress training, lower vagal tone actually predicted better performance (?!?!). The researchers concluded that vagal suppression, or reduced HRV, may reflect an adaptive physiological response that supports focus, emotion regulation, and cognitive control under pressure.
The warfighters were responding to the focusing power of “butterflies flying in formation,” knowing that the work they were entering into was high stakes and inherently stressful. This intuition to stress improved their performance. So the idea that a “low” HRV automatically means “can’t perform” is an oversimplification.
Perception vs. reality and how data shapes health.
Wearables also affect mindset. A 2023 study published in JMIR explored what happens when wearables provide inaccurate feedback. Participants wore Apple Watches with step counts that were either inflated, deflated, or accurate over four weeks. Those who received deflated feedback believed they were less active than they really were and experienced more negative mood, lower self-esteem, poorer diet, and higher blood pressure and heart rate. Those who received inflated step counts felt better about themselves, even though their actual behavior hadn’t changed.
In other words, it wasn’t the activity that changed health outcomes. It was the perception of activity. When people believed they were doing well, they felt better and even showed improved physiological measures. When they believed they were failing, their health metrics worsened.
This has profound implications for how we interpret our wearables. If your device tells you that you slept poorly or that your recovery is low, that information might not just reflect your body’s state. It could actually shape it. The expectation that you are “unrecovered” may create fatigue, anxiety, and decreased performance. Not because you’re truly run down, but because your mindset shifts in response to the data.
Wearables aren’t clinical tools.
The truth is, these metrics are not clinical tools. As least not yet. They were never designed to diagnose or dictate. At best, they’re trend indicators, useful over weeks or months, but currently too noisy and variable to guide daily decisions. Heart rate variability fluctuates naturally throughout the day and is influenced by hydration, caffeine, alcohol, menstrual phase, and even the weather. Sleep staging is also notoriously unreliable. Most wearables are decent at estimating total sleep time but poor at identifying specific sleep stages. If your app says you got seven hours of sleep, that’s probably close. But if it says you got exactly 23% REM and 17% deep sleep, I wouldn’t bet on it.
When tracking triggers anxiety.
Ironically, the more we monitor our sleep, the worse it often gets. Researchers have even coined the term “orthosomnia” to describe insomnia triggered by anxiety about sleep quality. It’s the same story across metrics: the more we fixate on the numbers, the more disconnected we become from how we actually feel. And remember that these algorithms change constantly. If your data suddenly shifts, it might not be your body at all. It might be a firmware update. Which begs the question, which was the most accurate number in the first place? The old one, or the new one?
That said, I’m not anti-wearable. These devices can absolutely be useful when used wisely. I use them in my research, after all. They can also raise individual awareness, reinforce positive habits, and help you recognize long-term patterns. For example, if your HRV consistently drops after late nights, heavy drinking, or stressful travel, that’s good to know. If your sleep duration increases when you get consistent exercise, that’s helpful too. The key is to treat these numbers as signals, not actionable scores.
Self-awareness is the best feedback loop.
Wearable tech isn’t going away, and it has enormous potential when used responsibly. The next frontier should be transparency, research validation, and collaboration between industry, clinicians, and scientists. Until then, the smartest approach is to use your wearable as a guide, not a decision-maker. Your health is not a dashboard, and the best feedback loop you have is still your own awareness.
References
Doherty, C., Baldwin, M., Lambe, R., Burke, D. & Altini, M. (2025). Readiness, recovery, and strain: an evaluation of composite health scores in consumer wearables. Translational Exercise Biomedicine, 2(2), 128-144. https://doi.org/10.1515/teb-2025-0001
Morgan, C.A. III, Aikins, D.E., Steffian, G., Coric, V., & Southwick, S. (2006). Relation between cardiac vagal tone and performance in male military personnel exposed to high stress: Three prospective studies. Psychophysiology, 43(5), 622–631.
Zahrt, O. H., Evans, K., Murnane, E., Santoro, E., Baiocchi, M., Landay, J., Delp, S., & Crum, A. (2023). Effects of Wearable Fitness Trackers and Activity Adequacy Mindsets on Affect, Behavior, and Health: Longitudinal Randomized Controlled Trial. Journal of medical Internet research, 25, e40529. https://doi-org.ezp-prod1.hul.harvard.edu/10.2196/40529

