Analog Precision Medicine | South Bay, Los Angeles, CA

The data these devices generate needs to be interpreted with the same rigor we would apply to any other clinical measurement — and the gap between what consumer wearables can reliably detect and what they are marketed as detecting is significant enough to matter clinically. This article works through the current evidence on four digital biomarker categories: heart rate variability, SpO2, sleep staging, and cardiac rhythm detection.

Heart Rate Variability (HRV)

HRV refers to the variation in time intervals between consecutive heartbeats driven primarily by autonomic nervous system modulation. Parasympathetic (vagal) tone increases HRV; sympathetic activation decreases it. The most common consumer metric is RMSSD (root mean square of successive differences), which predominantly reflects vagal tone.

What the evidence supports:

—Cardiovascular risk — lower HRV is independently associated with increased all-cause mortality and cardiovascular disease in the Framingham Heart Study, ARIC study, and others
—Athletic recovery monitoring — HRV-guided training has evidence from controlled trials suggesting superior training adaptations and reduced overtraining
—Physiological stress monitoring — acute reductions in HRV reliably track with sleep deprivation, alcohol consumption, acute illness, and psychological stress

Key limitation: HRV is highly individual. The clinically meaningful metric is trend within an individual over time — not comparison to population averages. A single HRV reading means very little without a personal baseline.

SpO2: What the Evidence Supports

Consumer wearable SpO2 sensors have the most clinical utility in:

—Sleep-disordered breathing detection — nocturnal oxygen desaturation events are a hallmark of obstructive sleep apnea; several studies show acceptable sensitivity for moderate-to-severe OSA
—High altitude exposure — clinically useful for detecting risk of acute mountain sickness and high-altitude pulmonary edema
—Illness monitoring — demonstrated clinical value during COVID-19 for detecting silent hypoxemia

Important caveat: pulse oximetry has well-documented systematic inaccuracy in individuals with darker skin pigmentation — overestimating SpO2 by a clinically significant margin in some cases. This problem is present in clinical-grade devices and amplified in consumer wearables.

Sleep Staging: Managing Expectations

Multiple validation studies comparing consumer wearable sleep staging to polysomnography (PSG) have found overall sleep/wake classification at 85–90% agreement, but poor-to-moderate accuracy for specific stage identification (particularly N3 deep sleep and REM). A systematic review in Sleep Medicine Reviews found that all consumer wearables tested overestimated total sleep time and sleep efficiency relative to PSG.

Consumer sleep data is most useful for:

—Total sleep duration estimation (reasonably accurate)
—Sleep continuity and fragmentation trends (movement-based, reasonably sensitive)
—Identifying patterns across nights — correlating sleep quality trends with alcohol, meal timing, stress, and exercise timing

A precise "12% deep sleep" number should be treated as an estimate with substantial uncertainty — not a diagnostic-grade measurement.

Cardiac Rhythm Detection: The Strongest Evidence

This is the consumer wearable application with the most direct implications for outcomes. The Apple Heart Study enrolled 419,297 Apple Watch users using photoplethysmography-based irregular pulse detection. Of those who received an irregular pulse notification (0.52% of participants), 34% had AFib confirmed on ECG patch — rising to 41% in participants 65 and older.

AFib affects approximately 33 million people worldwide and is a leading cause of stroke. Approximately 30% of AFib is paroxysmal and frequently asymptomatic — notoriously difficult to capture on routine ECGs. Consumer wearable-based rhythm monitoring provides a meaningful contribution to detection of a condition with direct stroke prevention implications.

Bottom Line

Consumer wearables generate meaningful signal in cardiac rhythm detection (particularly AFib), HRV-based recovery monitoring, and sleep pattern trending. In other domains — precise sleep staging, SpO2 in darker skin tones, and activity measurements — accuracy and clinical utility are more limited than marketing suggests. Used with appropriate calibration of what each metric reliably measures, wearable data adds genuine value. The device is a data source. What the data means requires a clinician.

References

1. Tsuji H, et al. Reduced heart rate variability and mortality risk: the Framingham Heart Study. Circulation. 1994;90(2):878–883.
2. Plews DJ, et al. Training adaptation and heart rate variability in elite endurance athletes. Sports Med. 2013;43(9):773–781.
3. Mendonça F, et al. A review of obstructive sleep apnea detection approaches. IEEE J Biomed Health Inform. 2019;23(2):825–837.
4. Sjoding MW, et al. Racial bias in pulse oximetry measurement. N Engl J Med. 2020;383(25):2477–2478.
5. Kaplan KA, et al. Validation of the Oura Ring with polysomnography. Front Psychiatry. 2020;11:560517.
6. de Zambotti M, et al. Wearable sleep technology in clinical and research settings. Med Sci Sports Exerc. 2019;51(7):1538–1557.
7. Perez MV, et al. Large-scale assessment of a smartwatch to identify atrial fibrillation. N Engl J Med. 2019;381(20):1909–1917.

Dr. RP, MD is dual board-certified in Emergency Medicine and Critical Care Medicine and is the founder of Analog Precision Medicine, a precision medicine practice in Southern California. This article is for educational purposes only and does not constitute medical advice or establish a physician-patient relationship.