Database Restricted Access

BigIdeasLab_STEP: Heart rate measurements captured by smartwatches for differing skin tones

Brinnae Bent Jessilyn Dunn

Published: Feb. 10, 2021. Version: 1.0

When using this resource, please cite: (show more options)
Bent, B., & Dunn, J. (2021). BigIdeasLab_STEP: Heart rate measurements captured by smartwatches for differing skin tones (version 1.0). PhysioNet.

Additionally, please cite the original publication:

Bent, B., Goldstein, B.A., Kibbe, W.A. et al. Investigating sources of inaccuracy in wearable optical heart rate sensors. npj Digit. Med. 3, 18 (2020).

Please include the standard citation for PhysioNet: (show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.


This dataset was collected as part of a study on investigating the sources of inaccuracy in wearable optical heart rate sensors across a variety of activities (sitting, deep breathing, walking, and typing). The original study examined covariates including skin tone, signal lag, and device type. The dataset includes the heart rate given by the ECG and corresponding heart rate given by each smartwatch used in the study. Activity, skin tone, and subject ID are also included. This dataset was recorded in July-August 2019 and includes 53 participants. The data is completely de-identified and observations are time-synced.


Accuracy of wearable technologies has been a hotly debated topic in both the research and popular science literature. Currently, wearable technology companies are responsible for assessing and reporting the accuracy of their products, but little information about the evaluation method is made publicly available.

Heart rate measurements from wearables are derived from photoplethysmography (PPG), an optical method for measuring changes in blood volume under the skin. Potential inaccuracies in PPG stem from three major areas, includes diverse skin types, motion artifacts, and signal crossover. This dataset comprises data to explore the accuracy of wearables across the full range of skin tones and across a range of activities assessing motion artifact and potential signal crossover.


Participants were recruited for this study via campus flyer and email distribution. The study took place in a private research room at Duke University. The study was approved by the Institutional Review Board at Duke University and informed consent was obtained from all participants.

A group of 53 individuals successfully completed the entire study protocol (32 females, 21 males; ages 18–54; equal distribution across the Fitzpatrick (FP) skin tone scale) in July-August 2019. This protocol was designed to assess error and reliability in a total of six wearable devices (four consumer-grade and two research-grade models) over the course of approximately 1 hour. Each round of the study protocol, included:

  1. Seated rest to measure baseline (4 min);
  2. Paced deep breathing (1 min);
  3. Physical activity (walking to increase HR up to 50% of the recommended maximum; 5 min);
  4. Seated rest (washout from physical activity) (~2 min); and
  5. A typing task (1 min).

This protocol was performed three times per study participant in order to test all devices. In each round, the participant wore multiple devices according to the following: Round 1: Empatica E4 + Apple Watch 4; Round 2: Fitbit Charge 2; Round 3: Garmin Vivosmart 3, Xiaomi Miband, and Biovotion Everion. The electrocardiogram (ECG) patch (Bittium Faros 180) was worn during all three rounds. The ECG was used as the reference standard for this study. Subjective analysis of skin tone was recorded using the FP skin tone scale (1–6).

We tested six wearable devices used frequently in research studies, including:

  • Apple Watch 4 (Apple Inc., Cupertino, CA; Release Date: Fall 2018; Software Version 5.1.3);
  • Fitbit Charge 2 (Fitbit, Inc., San Francisco, CA; Release Date: Fall 2016; Software Version: 22.55.2);
  • Garmin Vivosmart 3 (Garmin Ltd., Olathe, Kansas; Release Date: Spring 2017; Software Version: 5.10);
  • Xiaomi Miband 3 (Xiaomi Corp., Beijing, China; Release Date: Spring 2018; Software Version: NA);
  • Empatica E4 (Empatica Inc., Milano, Italy; Release Date: NA; Software Version: Summer 2019); and
  • Biovotion Everion (Biovotion AG, Zurich, Switzerland; Release Date: NA; Software Version: Summer 2019).

All devices were sampled using the highest sampling rate possible (this was done by placing devices in “activity mode” for the duration of the study, when applicable).

Data Description

This dataset consists of a single .csv file. Observations are synced, so that each row represents a single point in time. Timestamps were removed during the deidentification process. The dataset has 10 columns:

  • ECG: heart rate (HR) reported by Bittium Faros 180 ECG Patch (beats per minute, bpm) (~1000 Hz)
  • Apple Watch: HR reported by Apple Watch (bpm) (variable sampling frequency)
  • Empatica: HR reported by Empatica E4 (bpm) (~1 Hz)
  • Garmin: HR reported by Garmin (bpm) (variable sampling frequency)
  • Fitbit: HR reported by Fitbit (bpm) (variable sampling frequency)
  • Miband: HR reported by Xiaomi Miband (bpm) (variable sampling frequency)
  • Biovotion: HR reported by Biovotion Everion (bpm) (~1 Hz)
  • ID: internal study ID
  • Skin Tone: skin tone of participant (standard Fitzpatrick skin tone scale, 1-6)
  • Activity: the activity a participant was participating in (Rest, Activity, Breathe, Type). For a complete description of these conditions, please see the associated publication.

Usage Notes

This dataset is multi-functional and can be used for many purposes, including examining device accuracy, examining methods for signal lag, and data compression methods. Previous studies utilizing this dataset include an original study examining inaccuracies in optical heart rate measurements from wearable sensors, and a study on data compression of wearable sensors [1, 2]. Other studies are ongoing.

A limitation of this dataset is that it was created in 2019. There are likely to have been software updates by these wearable manufacturers since this time. Researchers examining inaccuracies in wearables should be aware of this limitation.


The authors would like to thank Dr. Warren Kibbe and Dr. Benjamin Goldstein for their collaboration and input on the original study publication. The authors would like to thank Dr. Hwanhee Hong for input on the statistical design and Weihsien (Willy) Lee for assistance with study participant recruitment. BB is funded by the Duke FORGE Fellowship and JD is a Whitehead Scholar.

Conflicts of Interest

The authors have no conflicts of interest to declare.


  1. Bent, B., Goldstein, B.A., Kibbe, W.A. et al. Investigating sources of inaccuracy in wearable optical heart rate sensors. npj Digit. Med. 3, 18 (2020).
  2. Bent B, Lu B, Kim J, Dunn JP. Biosignal Compression Toolbox for Digital Biomarker Discovery. Sensors. 2021; 21(2):516.


Access Policy:
Only logged in users who sign the specified data use agreement can access the files.

License (for files):
PhysioNet Restricted Health Data License 1.5.0

Corresponding Author
You must be logged in to view the contact information.