Name: Will Two Do? Varying Dimensions in Electrocardiography: The PhysioNet/Computing in Cardiology Challenge 2021
Published: July 29, 2022
License: https://creativecommons.org/licenses/by/4.0/legalcode

Challenge Open Access

Matthew Reyna , Nadi Sadr , Annie Gu , Erick Andres Perez Alday , Chengyu Liu , Salman Seyedi , Amit Shah , Gari Clifford

Published: July 29, 2022. Version: 1.0.3

2020 and 2021 Challenges are complete (Jan. 26, 2022, midnight)

January 26, 2022: Both the 2020 Challenge and the 2021 Challenge, which extended the 2020 Challenge, are now complete. The CinC articles for both Challenges are available on the CinC website here and here. The final scores can be found here. Please cite Perez Alday EA, Gu A, J Shah A, Robichaux C, Ian Wong AK, Liu C, Liu F, Bahrami Rad A, Elola A, Seyedi S, Li Q, Sharma A, Clifford GD^* Reyna MA^*. Classification of 12-lead ECGs: The PhysioNet/Computing in Cardiology Challenge 2020. Physiol. Meas. 2021 Jan 1;41(12):124003. doi: 10.1088/1361-6579/abc960 to refer to the 2020 Challenge. Please also cite Reyna MA, Sadr N, Perez Alday EA, Gu A, Shah AJ, Robichaux C, Rad AB, Elola A, Seyedi S, Ansari S, Ghanbari H, Li Q, Sharma A, Clifford GD. Will Two Do? Varying Dimensions in Electrocardiography: The PhysioNet/Computing in Cardiology Challenge 2021. Computing in Cardiology 2021; 48: 1-4 and Reyna MA, Sadr N, Perez Alday EA, Gu Annie, Shah AJ, Robichaux C, Rad AB, Elola A, Seyedi S, Ansari S, Ghanbari H, Li Q, Sharma A, Clifford GD. Issues in the automated classification of multilead ECGs using heterogeneous labels and populations. Preprint. 2022 to refer to the 2021 Challenge. Finally, please also cite the standard PhysioNet citation. You can find followup articles to the 2020 Challenge and the 2021 Challenge in the Journal of Physiological Measurement Focus Issue on Classification of Multilead ECGs.

The PhysioNet/Computing in Cardiology Challenge entries are being evaluated (Oct. 25, 2021, 1 a.m.)

October 25, 2021: We are currently evaluating entries on the 2021 Challenge test data in support of the Physiological Measurement focus issue on multilead ECG classification. The deadline to submit your code and a preprint is 1 December 2021, and the deadline to submit your article is 11 January 2022. See this forum announcement for details.

More news

Winners of the PhysioNet/Computing in Cardiology Challenge 2021 announced (Sept. 20, 2021, 1 a.m.)

September 20, 2021: The winners of the 2021 Challenge were announced on 15 September 2021 at CinC in Brno, Czech Republic. Congratulations, teams! See this page for the results and the full announcement for the final steps in this year’s Challenge, including details about the focus issue (deadline: 11 January 2022).

The challenges have been renamed to the George B. Moody PhysioNet Challenge in honor of George Moody (Sept. 15, 2021, 1 a.m.)

September 15, 2021: In honor of the contributions of George Moody to PhysioNet and Computing in Cardiology, the Board of CinC voted to rename the Challenges to the George B. Moody PhysioNet Challenge.

Preparing CinC papers for the PhysioNet/Computing in Cardiology Challenge 2021 (July 21, 2021, 1 a.m.)

July 21, 2021: As you prepare your CinC papers, please follow the CinC preparation and submission instructions and use either our LaTeX (Overleaf or download) or Word templates, which include important instructions, advice, and references. Please see here for more information, including our draft paper and important citation information.

PhysioNet/Computing in Cardiology Challenge 2021 accepted abstracts (June 23, 2021, 1 a.m.)

June 23, 2021: CinC has released its abstract decisions for the Challenge track of the conference. Congratulations to those with accepted abstracts. Those without an accepted abstract can still compete for a wildcard entry as outlined here.

The official phase of the PhysioNet/Computing in Cardiology Challenge 2021 has reopened (May 1, 2021, 1 a.m.)

May 1, 2021: The official phase of the Challenge reopens today. Due to your engagement, we have enormously expanded the training data, modified the lead combination, and modified the example code and scoring function. Please see our announcement on the Challenge forum for more details. We will update and clarify these changes in response to your questions in the coming days.

CinC deadline for the PhysioNet/Computing in Cardiology Challenge 2021 extended (April 19, 2021, 1 a.m.)

April 19, 2021: CinC has extended its abstract submission deadline to April 24, 2021. Please submit your abstract if you have not done so already. Like last year, CinC will host a hybrid conference with both in-person and remote attendance. Please see our announcement on the Challenge forum for more details.

PhysioNet/Computing in Cardiology Challenge 2021 submissions are due soon (April 13, 2021, 1 a.m.)

April 13, 2021: Only two days left to submit an abstract to CinC! Please find the abstract submission announcement and the instructions announcement on the Challenge forum. Please see the leaderboard for the final scores of the unofficial phase, and please submit your abstract today!

Leaderboard for the PhysioNet/Computing in Cardiology Challenge 2021 is available (Feb. 24, 2021, midnight)

February 24, 2021: The leaderboard is now live! Please see the announcement on the Challenge forum. Please see the timing and priority of entries section here regarding the number of submissions allowed per day, so please submit early!

Accepting submissions for the PhysioNet/Computing in Cardiology Challenge 2021 (Jan. 30, 2021, midnight)

January 30, 2021: We are now accepting submissions for the 2021 Challenge! See below for details. Please register your team (even if you registered last year), check the submission instructions, and submit your code when ready. As always, please join the Challenge forum to discuss this year’s Challenge.

The PhysioNet/Computing in Cardiology Challenge 2021 is now open (Dec. 24, 2020, midnight)

December 24, 2020: The NIH-funded 2021 Challenge is now open! See below for details. Please read this website for details and share questions and comments on Challenge forum. This year’s Challenge is generously co-sponsored by Google, MathWorks, and the Gordon and Betty Moore Foundation.

When using this resource, please cite the original publication:

Reyna MA, Sadr N, Perez Alday EA, Gu A, Shah AJ, Robichaux C, Rad AB, Elola A, Seyedi S, Ansari S, Ghanbari H, Li Q, Sharma A, Clifford GD. Will Two Do? Varying Dimensions in Electrocardiography: The PhysioNet/Computing in Cardiology Challenge 2021. 2021 Computing in Cardiology (CinC), Brno, Czech Republic, 2021 (pp. 1-4). doi: 10.23919/CinC53138.2021.9662687

Additionally, when using this resource, please cite:
Reyna, M., Sadr, N., Gu, A., Perez Alday, E. A., Liu, C., Seyedi, S., Shah, A., & Clifford, G. (2022). Will Two Do? Varying Dimensions in Electrocardiography: The PhysioNet/Computing in Cardiology Challenge 2021 (version 1.0.3). PhysioNet. RRID:SCR_007345. https://doi.org/10.13026/34va-7q14

MLA	Reyna, Matthew, et al. "Will Two Do? Varying Dimensions in Electrocardiography: The PhysioNet/Computing in Cardiology Challenge 2021" (version 1.0.3). PhysioNet (2022). RRID:SCR_007345. https://doi.org/10.13026/34va-7q14
APA	Reyna, M., Sadr, N., Gu, A., Perez Alday, E. A., Liu, C., Seyedi, S., Shah, A., & Clifford, G. (2022). Will Two Do? Varying Dimensions in Electrocardiography: The PhysioNet/Computing in Cardiology Challenge 2021 (version 1.0.3). PhysioNet. RRID:SCR_007345. https://doi.org/10.13026/34va-7q14
Chicago	Reyna, Matthew, Sadr, Nadi, Gu, Annie, Perez Alday, Erick Andres, Liu, Chengyu, Seyedi, Salman, Shah, Amit, and Gari Clifford. "Will Two Do? Varying Dimensions in Electrocardiography: The PhysioNet/Computing in Cardiology Challenge 2021" (version 1.0.3). PhysioNet (2022). RRID:SCR_007345. https://doi.org/10.13026/34va-7q14
Harvard	Reyna, M., Sadr, N., Gu, A., Perez Alday, E. A., Liu, C., Seyedi, S., Shah, A., and Clifford, G. (2022) 'Will Two Do? Varying Dimensions in Electrocardiography: The PhysioNet/Computing in Cardiology Challenge 2021' (version 1.0.3), PhysioNet. RRID:SCR_007345. Available at: https://doi.org/10.13026/34va-7q14
Vancouver	Reyna M, Sadr N, Gu A, Perez Alday E A, Liu C, Seyedi S, Shah A, Clifford G. Will Two Do? Varying Dimensions in Electrocardiography: The PhysioNet/Computing in Cardiology Challenge 2021 (version 1.0.3). PhysioNet. 2022. RRID:SCR_007345. Available from: https://doi.org/10.13026/34va-7q14

BibTeX

@article{PhysioNet-challenge-2021-1.0.3,
  author = {Reyna, Matthew and Sadr, Nadi and Gu, Annie and {Perez Alday}, Erick Andres and Liu, Chengyu and Seyedi, Salman and Shah, Amit and Clifford, Gari},
  title = {{Will Two Do? Varying Dimensions in Electrocardiography: The PhysioNet/Computing in Cardiology Challenge 2021}},
  journal = {{PhysioNet}},
  year = {2022},
  month = jul,
  note = {Version 1.0.3},
  doi = {10.13026/34va-7q14},
  url = {https://doi.org/10.13026/34va-7q14}
}

Please include the standard citation for PhysioNet: (show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220. RRID:SCR_007345.

APA	Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220. RRID:SCR_007345.
MLA	Goldberger, A., et al. "PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220." (2000). RRID:SCR_007345.
CHICAGO	Goldberger, A., L. Amaral, L. Glass, J. Hausdorff, P. C. Ivanov, R. Mark, J. E. Mietus, G. B. Moody, C. K. Peng, and H. E. Stanley. "PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220." (2000). RRID:SCR_007345.
HARVARD	Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P.C., Mark, R., Mietus, J.E., Moody, G.B., Peng, C.K. and Stanley, H.E., 2000. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220. RRID:SCR_007345.
VANCOUVER	Goldberger A, Amaral L, Glass L, Hausdorff J, Ivanov PC, Mark R, Mietus JE, Moody GB, Peng CK, Stanley HE. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220. RRID:SCR_007345.

Abstract

The electrocardiogram (ECG) is a non-invasive representation of the electrical activity of the heart. Although the twelve-lead ECG is the standard diagnostic screening system for many cardiological issues, the limited accessibility of twelve-lead ECG devices provides a rationale for smaller, lower-cost, and easier to use devices. While single-lead ECGs are limiting [1], reduced-lead ECG systems hold promise, with evidence that subsets of the standard twelve leads can capture useful information [2], [3], [4] and even be comparable to twelve-lead ECGs in some limited contexts. In 2017 we challenged the public to classify AF from a single-lead ECG, and in 2020 we challenged the public to diagnose a much larger number of cardiac problems using twelve-lead recordings. However, there is limited evidence to demonstrate the utility of reduced-lead ECGs for capturing a wide range of diagnostic information.

In this year’s Challenge, we ask the following question: ‘Will two do?’ This year’s Challenge builds on last year’s Challenge [5], which asked participants to classify cardiac abnormalities from twelve-lead ECGs. We are asking you to build an algorithm that can classify cardiac abnormalities from twelve-lead, six-lead, four-lead, three-lead, and two-lead ECGs. We will test each algorithm on databases of these reduced-lead ECGs, and the differences in performances of the algorithms on these databases will reveal the utility of reduced-lead ECGs in comparison to standard twelve-lead EGCs.

Background

The goal of the 2021 Challenge is to identify clinical diagnoses from twelve-lead, six-lead (I, II, III, aVR, aVL, aVF), four-lead (I, II, III, V2), three-lead (I, II, V2), and two-lead (I and II) ECG recordings.

We ask participants to design and implement a working, open-source algorithm that, based only on the provided twelve-lead ECG recordings and routine demographic data, can automatically identify any cardiac abnormalities present in the recording. We will award prizes for the top performing twelve-lead, six-lead, four-lead, three-lead, and two-lead algorithms.

Participation

Registering for the Challenge and Conditions of Participation

To participate in the Challenge, register your team by providing the full names, affiliations, and official email addresses of your entire team before you submit your algorithm. The details of all authors must be exactly the same as the details in your abstract submission to Computing in Cardiology. You may update your author list by completing this form again (read the form for details), but changes to your authors must not contravene the rules of the Challenge.

Algorithms

For each ECG recording, your algorithm must identify a set of one or more classes as well as a probability or confidence score for each class. As an example, suppose that your classifier identifies atrial fibrillation (164889003) and a first-degree atrioventricular block (270492004) with probabilities of 90% and 60%, respectively, for a particular recording, but it does not identify any other rhythm types. Your code might produce the following output for the recording:

#Record ID
164889003, 270492004, 164909002, 426783006, 59118001, 284470004,  164884008,  429622005, 164931005
  1,       1,         0,         0,         0,        0,          0,          0,         0
0.9,       0.6,       0.2,       0.05,      0.2,      0.35,       0.35,       0.1,       0.1

We have implemented two example algorithms in MATLAB and Python as templates for successful submissions.

The Python classifier implements a random forest classifier that uses age, sex, and the root mean square of the ECG lead signals as features by extracting the available ECG leads and demographic data from the WFDB header file (the .hea file).
The MATLAB classifier implements a linear regression model that uses age, sex, and the root mean square of the ECG lead signals as features from the available ECG leads and demographic data from the WFDB header file (the .hea file).

Submitting your Algorithm

Please use the above example code as templates for your submissions.

Please see the submission instructions for detailed information about how to submit a successful Challenge entry. We will open scoring in January. We will provide feedback on your entry as soon as possible, so please wait at least 72 hours before contacting us about the status of your entry.

Like last year’s Challenge, we will continue to require code both for your trained model and for testing your model. If we cannot reproduce your model from the training code, then you will not be eligible for ranking or a prize.

We will run your training code on Google Cloud using 10 vCPUs, 65 GB RAM, 100 GB disk space, and an optional NVIDIA T4 Tensor Core GPU with 16 GB VRAM. Your training code has a 72 hour time limit without a GPU and a 48 hour time limit with a GPU.

We will run your trained model on Google Cloud using 6 vCPUs, 39 GB RAM, 100 GB disk space, and an optional NVIDIA T4 Tensor Core GPU with 16 GB VRAM. Your trained model has a 24 hour time limit on each of the validation and test sets.

We are using an N1 custom machine type to run submissions on GCP. If you would like to use a predefined machine type, then the n1-highmem-8 is the closest predefined machine type, but with 2 fewer vCPUs and 13 GB less RAM. For GPU submissions, we use the 418.40.04 driver version.

Data Description

The training data contains twelve-lead ECGs. The validation and test data contains twelve-lead, six-lead, four-lead, three-lead, and two-lead ECGs:

Twelve leads: I, II, III, aVR, aVL, aVF, V1, V2, V3, V4, V5, V6
Six leads: I, II, III, aVR, aVL, aVF
Four leads: I, II, III, V2
Three leads: I, II, V2
Two leads: I, II

Each ECG recording has one or more labels that describe cardiac abnormalities (and/or a normal sinus rhythm). We mapped the labels for each recording to SNOMED-CT codes. The lists of scored labels and unscored labels are given with the evaluation code; see the scoring section for details.

Data Sources

The Challenge data include recordings from last year’s Challenge and many new recordings for this year’s Challenge:

CPSC Database and CPSC-Extra Database
INCART Database
PTB and PTB-XL Database
The Georgia 12-lead ECG Challenge (G12EC) Database
Augmented Undisclosed Database
Chapman-Shaoxing and Ningbo Database
The University of Michigan (UMich) Database

The Challenge data include annotated twelve-lead ECG recordings from six sources in four countries across three continents. These databases include over 100,000 twelve-lead ECG recordings with over 88,000 ECGs shared publicly as training data, 6,630 ECGs retained privately as validation data, and 36,266 ECGs retained privately as test data.

The first source is the China Physiological Signal Challenge in 2018 (CPSC 2018), which was held during the 7th International Conference on Biomedical Engineering and Biotechnology in Nanjing, China. This source contains two databases: the data from CPSC 2018 (the CPSC Database) and unused data from CPSC 2018 (the CPSC-Extra Database). Together, these databases contain 13,256 ECGs (10,330 ECGs shared as training data, 1,463 retained as validation data, and 1,463 retained as test data). We shared the training set and an unused dataset from CPSC 2018 as training data, and we split the test set from CPSC 2018 into validation and test sets. Each recording is between 6 and 144 seconds long with a sampling frequency of 500 Hz. Per HIPAA guidelines ages over 89 are not provided for these datasets herein. All ages over 89 are provided as 92 which is the average of the ages over 89 in this dataset. The original data with ages over 89 may be available at the CPSC2018 site.
The second source is the St Petersburg INCART 12-lead Arrhythmia Database. This source contains 74 annotated ECGs (all shared as training data) extracted from 32 Holter monitor recordings. Each recording is 30 minutes long with a sampling frequency of 257 Hz.
The third source is the Physikalisch-Technische Bundesanstalt (PTB) and includes two public datasets: the PTB and the PTB-XL databases. The source contains 22,353 ECGs (all shared as training data). Each recording is between 10 and 120 seconds long with a sampling frequency of either 500 or 1,000 Hz.
The fourth source is a Georgia database which represents a unique demographic of the Southeastern United States. This source contains 20,672 ECGs (10,344 ECGs shared as training data, 5,167 retained as validation data, and 5,161 retained as test data). Each recording is between 5 and 10 seconds long with a sampling frequency of 500 Hz.
The fifth source is an undisclosed American database that is geographically distinct from the Georgia database. This source contains 10,000 ECGs (all retained as test data).
The sixth source is the Chapman University, Shaoxing People’s Hospital (Chapman-Shaoxing) and Ningbo First Hospital (Ningbo) database [6], [7]. This source contains 45,152 ECGS (all shared as training data). Each recording is 10 seconds long with a sampling frequency of 500 Hz.
The seventh source is UMich Database from the University of Michigan. This source contains 19,642 ECGs (all retained as test data). Each recording is 10 seconds long with a sampling frequency of either 250 Hz or 500 Hz.

Like other real-world datasets, different databases may have different proportions of cardiac abnormalities, but all of the labels in the validation or test data are represented in the training data. Moreover, while this is a curated dataset, some of the data and labels are likely to have errors, and an important part of the Challenge is to work out these issues. In particular, some of the databases have human-overread machine labels with single or multiple human readers, so the quality of the labels varies between databases. You can find more information about the label mappings of the Challenge training data in this table.

The six-lead, four-lead, three-lead, and two-lead validation data are reduced-lead versions of the twelve-lead validation data: the same recordings with the same header data but only with signal data for the relevant leads.

We are not planning to release the test data at any point, including after the end of the Challenge. Requests for the test data will not receive a response. We do not release test data to prevent overfitting on the test data and claims or publications of inflated performances. We will entertain requests to run code on the test data after the Challenge on a limited basis based on publication necessity and capacity. (The Challenge is largely staged by volunteers.)

Data Format

All data was formatted in WFDB format. Each ECG recording uses a binary MATLAB v4 file (see page 27) for the ECG signal data and a plain text file in WFDB header format for the recording and patient attributes, including the diagnosis, i.e., the labels for the recording. The binary files can be read using the load function in MATLAB and the scipy.io.loadmat function in Python; see our MATLAB and Python example code for working examples. The first line of the header provides information about the total number of leads and the total number of samples or time points per lead, the following lines describe how each lead was encoded, and the last lines provide information on the demographics and diagnosis of the patient.

For example, a header file A0001.hea may have the following contents:

A0001 12 500 7500
A0001.mat 16+24 1000/mV 16 0 28 -1716 0 I
A0001.mat 16+24 1000/mV 16 0 7 2029 0 II
A0001.mat 16+24 1000/mV 16 0 -21 3745 0 III
A0001.mat 16+24 1000/mV 16 0 -17 3680 0 aVR
A0001.mat 16+24 1000/mV 16 0 24 -2664 0 aVL
A0001.mat 16+24 1000/mV 16 0 -7 -1499 0 aVF
A0001.mat 16+24 1000/mV 16 0 -290 390 0 V1
A0001.mat 16+24 1000/mV 16 0 -204 157 0 V2
A0001.mat 16+24 1000/mV 16 0 -96 -2555 0 V3
A0001.mat 16+24 1000/mV 16 0 -112 49 0 V4
A0001.mat 16+24 1000/mV 16 0 -596 -321 0 V5
A0001.mat 16+24 1000/mV 16 0 -16 -3112 0 V6
#Age: 74
#Sex: Male
#Dx: 426783006
#Rx: Unknown
#Hx: Unknown
#Sx: Unknown

From the first line of the file, we see that the recording number is A0001, and the recording file is A0001.mat. The recording has 12 leads, each recorded at a 500 Hz sampling frequency, and contains 7500 samples. From the next 12 lines of the file (one for each lead), we see that each signal was written at 16 bits with an offset of 24 bits, the floating point number (analog-to-digital converter (ADC) units per physical unit) is 1000/mV, the resolution of the analog-to-digital converter (ADC) used to digitize the signal is 16 bits, and the baseline value corresponding to 0 physical units is 0. The first value of the signal (-1716, etc.), the checksum (0, etc.), and the lead name (I, etc.) are the last three entries of each of these lines. From the final 6 lines, we see that the patient is a 74-year-old male with a diagnosis (Dx) of 426783006, which is the SNOMED-CT code for sinus rhythm. The medical prescription (Rx), history (Hx), and symptom or surgery (Sx) are unknown. Please visit WFDB header format for more information on the header file and variables.

Data Access

The training data from the 2021 Challenge can be downloaded from the training folder in the Files section below or accessed via the WFDB Matlab toolbox. A summary of the training datasets by folder is provided here:

cpsc_2018, 6,877 recordings
cpsc_2018_extra (China 12-Lead ECG Challenge Database – unused CPSC 2018 data), 3,453 recordings
st_petersburg_incart (12-lead Arrhythmia Database), 74 recordings
ptb (Diagnostic ECG Database,) 516 recordings
ptb-xl (electrocardiography Database), 21,837 recordings
georgia (12-Lead ECG Challenge Database), 10,344 recordings
chapman-shaoxing (Chapman University, Shaoxing People’s Hospital -12-lead ECG Database), 10,247 recordings
ningbo (Ningbo First Hospital - 12-lead ECG Database), 34,905 recordings

Under each dataset folder the files are grouped into subfolders with up to 1000 records per subfolder. These subfolders are named as g# where the # starts at 1. Once 1000 records are allocated to a folder a new folder is started with the # incremented by one.

Evaluation

Scoring

For last year’s Challenge, we developed a new scoring metric that awards partial credit to misdiagnoses that result in similar treatments or outcomes as the true diagnosis as judged by our cardiologists. This scoring metric reflects the clinical reality that some misdiagnoses are more harmful than others and should be scored accordingly. Moreover, it reflects the fact that confusing some classes is less harmful than confusing others.

We are starting this year’s Challenge with this scoring metric, but we welcome feedback. It is defined as follows:

Let C = [c_i] be a collection of diagnoses. We compute a multi-class confusion matrix A = [a_ij], where a_ij is the number of recordings in a database that were classified as belonging to class c_i but actually belong to class c_j. We assign different weights W = [w_ij] to different entries in this matrix based on the similarity of treatments or differences in risks. The score s is given by s = Σ_ij w_ij a_ij, which is a generalized version of the traditional accuracy metric. The score s is then normalized so that a classifier that always outputs the true class(es) receives a score of 1 and an inactive classifier that always outputs the normal class receives a score of 0.

The scoring metric is designed to award full credit to correct diagnoses and partial credit to misdiagnoses with similar risks or outcomes as the true diagnosis. A classifier that returns only positive outputs typically receives a negative score, i.e., a lower score than a classifier that returns only negative outputs.

The leaderboard provides the scores of successful submissions on the hidden data.

Rules and Deadlines

Overview of rules

There are two phases for the Challenge: an unofficial phase and an official phase. The unofficial phase of the Challenge allows us to introduce and ‘beta test’ the data, scores, and submission system before the official phase of the Challenge. Participation in the unofficial phase is mandatory for participating in the official phase of the Challenge because it helps us to improve the official phase.

Entrants may have an overall total of up to 15 scored entries over both the unofficial and official phases of the competition (see the below table). All deadlines occur at 11:59pm GMT on the dates mentioned below, and all dates are during 2021 unless indicated otherwise. If you do not know the difference between GMT and your local time, then find it out before the deadline!

Please submit your entries early to ensure that you have the most chances for success. If you wait until the last few days to submit your entries, then you may not receive feedback before the submission deadline, and you may be unable to resubmit your entries if there are unexpected errors or issues with your submissions. Every year, several teams wait until the last few days to submit their first entry and are unable to debug their work before the deadline.

Timing and priority of entries

Although we score on a first-come-first-serve basis, please note that if you submit more than one entry in a 24-hour period, your second entry may be deprioritized compared to other teams’ first entries. If you submit more than one entry in the final 24 hours before the Challenge deadline, then we may be unable to provide feedback or a score for more than one of your entries. It is unlikely that we will be able to debug any code in the final days of the Challenge.

For these reasons, we strongly suggest that you start submitting entries at least 5 days before the unofficial deadline and 10 days before the official deadline. We have found that the earlier teams enter the Challenge, the better they do because they have time to digest feedback and performance. We therefore suggest entering your submissions many weeks before the deadline to give yourself the best chance for success.

Key dates/deadlines

	Start	End	Submissions
Unofficial phase	24 December 2020	8 April 2021	1-5 scored entries (*)
Hiatus	9 April 2021	30 April 2021	N/A
Abstract deadline	24 April 2021	24 April 2021	1 abstract
Official phase	1 May 2021	15 August 2021	1-10 scored entries (*)
Abstract decisions released	21 June 2021	21 June 2021	N/A
Wild card entry date	31 July 2021	31 July 2021	N/A
Hiatus	16 August 2021	11 September 2021	N/A
Preprint deadline	1 September 2021	1 September 2021	One 4-page paper (**)
Conference	12 September 2021	15 September 2021	1 presentation (***)
Final scores released	16 September 2021	16 September 2021	N/A
Final paper deadline	23 September 2021	30 September 2021	One 4-page paper (***)

(* Entries that fail to score do not count against limits.)

(** Must include preliminary scores.)

(*** Must include final scores, your ranking in the Challenge, and any updates to your work as a result of feedback after presenting at CinC. This final paper daedline is earlier than the deadline given by CinC so that we can check these details.)

To be eligible for the open-source award, you must do all the following:

Register for the Challenge here.
Submit at least one open-source entry that can be scored during the unofficial phase.
Submit an abstract to CinC by the abstract submission deadline. Include your team name and score from the unofficial phase in your abstract. Please select ‘PhysioNet/CinC Challenge’ as the topic of your abstract so that it can be identified easily by the abstract review committee. Please read “Advice on Writing an Abstract” for important information on writing a successful abstract.
Submit at least one open-source entry that can be scored during the official phase.
Submit a full 4-page paper on your work to CinC by the above preprint deadline.
One of your team members must attend CinC 2021 to present your work either orally or as a poster (depending on your abstract acceptance). If you have a poster, then you must stand by it to defend your work. No shows (oral or poster) will be disqualified. One of your team members must also attend the closing ceremony to collect your prize. No substitutes will be allowed.
Submit a full 4-page paper on your work to CinC by the above final paper deadline. Please note that we expect the abstract to change significantly, both in terms of results and methods. You may also update your title with the caveat that it must not be substantially similar to the title of the competition or contain the words ‘physionet’ ‘challenge’ or ‘competition’.

You must not submit an analysis of this year’s Challenge data to other conferences or journals until after CinC 2021 so that we can discuss the Challenge in a single forum. If we discover evidence that you have submitted elsewhere before the end of CinC 2021, then you will be disqualified and de-ranked on the website, banned from future Challenges, and the journal/conference will be contacted to request your article be withdrawn for contravention of the terms of use.

There are many reasons for this policy: 1) we do not release results on the test data before the end of CinC, and only reporting results on the training data increases the likelihood of overfitting and is not comparable to the official results on the test data, and 2) attempting to publish on the Challenge data before the Challengers present their results is unprofessional and comes across as a territorial grab. This requirement stands even if your abstract is rejected, but you may continue to enter the competition and receive scores. (However, unless you are accepted into the conference at a later date as a ‘wild card’ entry, you will not be eligible to win a prize.) Of course, any publicly available data that was available before the Challenge is exempted from this condition, but any of the novelty of the Challenge (the Challenge design, the Challenge data that you downloaded from this page because it was processed for the Challenge, the scoring function, etc.) is not exempted.

After the Challenge is over and the final scores have been posted (in late September), everyone may then submit their work to a journal or another conference. In particular, we encourage all entrants (including those who missed the opportunity to compete or attend CinC 2021) to submit extended analysis and articles to the special issue, taking into account the publications and discussions at CinC 2021.

Wild Card Entries

If your abstract is rejected or if you otherwise failed to qualify during the unofficial period, then there is still a chance to present as CinC and win the Challenge. A ‘wild card’ entry has been reserved for a high-scoring entry from a team that was unable to submit an accepted abstract to CinC by the original abstract submission deadline. A successful entry must be submitted by the wild card entry deadline. We will contact eligible teams and ask them to submit an abstract. The abstract will still be reviewed as thoroughly as any other abstract accepted for the conference. See Advice on Writing an Abstract.

Advice on Writing an Abstract

To improve your chances of having your abstract accepted, we offer the following advice:

Ensure that all of your authors agree on your abstract, and be sure that all of the author details match your registration information, including email addresses.
Stick to the word limit and deadline on the conference website. Include time for errors, internet outages, etc.
Select ‘PhysioNet/CinC Challenge’ as the submission topic so it can be identified easily by the abstract review committee. However, do not include the words ‘PhysioNet’ or ‘PhysioNet/CinC’ or ‘Challenge’ in the title because this creates confusion with the hundreds of other articles and the main descriptor of the Challenge.
Your title, abstract and author list (collaborators) can be modified in September when you submit the final paper.
While your work is bound to change, the quality of your abstract is a good indicator of the final quality of your work. We suggest you spell check, write in full sentences, and be specific about your approaches. Include your method’s cross validated training performance (using the Challenge metrics) and your score provided by the Challenge submission system. If you omit or inflate this latter score, then your abstract will be rejected.
Do not be embarrassed by any low scores. We do not expect high scores at this stage. We are focused on the thoughtfulness of the approach and quality of the abstract.
If you are unable to receive a score during the unofficial phase, then you can still submit, but the work should be very high quality and you should include the cross validation results of your algorithm on training set.

You will be notified if your abstract has been accepted by email from CinC in June. You may not enter more than one abstract describing your work in the Challenge. We know you may have multiple ideas, and the actual abstract will evolve over the course of the Challenge. More information, particularly on discounts and scholarships, can be found here. We are sorry, but the Challenge Organizers do not have extra funds to enable discounts or funding to attend the conference.

Again, we cannot guarantee that your code will be run in time for the CinC abstract deadline, especially if you submit your code immediately before the deadline. It is much more important to focus on writing a high-quality abstract describing your work and submit this to the conference by abstract deadline. Please follow these instructions here carefully.

Please make sure that all of your team members are authors on your abstract. If you need to add or subtract authors, do this at least a week before the abstract deadline. Asking us to alter your team membership near or after the deadline is going to lead to confusion that could affect your score during review. It is better to be more inclusive on the abstract in terms of authorship, though, and if we find authors have moved between abstracts/teams without permission, then this is likely to lead to disqualification. As noted above, you may change the authors/team members later in the Challenge.

Please make sure that you include your team name, your official score as it appears on the leaderboard, and cross validation results in your abstract using the scoring metrics for this year’s Challenge (especially if you are unable to receive a score or are scoring poorly). The novelty of your approach and the rigor of your research is much more important during the unofficial phase. Please make sure you describe your technique and any novelty very specifically. General statements such as ‘a 1D CNN was used’ are uninformative and will score poorly in review.

The Organizers of the Challenge have no ability to help with any problems with the abstract submission system. We do not operate it. Please do not email us with issues related to the abstract submission system.

Open-Source Licenses

We encourage the use of open-source licenses for your entries.

Entries with non open-source licenses will be scored but not ranked in the official competition. All scores will be made public. At the end of the competition, all entries will be posted publicly, and therefore automatically mirrored on several sites around the world. We have no control over these sites, so we cannot remove your code even on request. Code which the organizers deem to be functional will be made publicly available after the end of the Challenge. You can request to withdraw from the Challenge, so that your entry’s performance is no longer listed in the official leader board, up until a week before the end of the official phase. However, the Organizers reserve the right to publish any submitted open-source code after the official phase is over. The Organizers also retain the right to use a copy of submitted code for non-commercial use. This allows us to re-score if definitions change and validate any claims made by competitors.

If no license is specified in your submission, then the license given in the example code will be added to your entry, i.e., we will assume that you have released your code under the BSD 3-Clause license.

Rules on Competing in Teams / Collaboration

To maintain the scientific impact of the Challenges, it is important that all Challengers contribute truly independent ideas. For this reason, we impose the following rules on team composition/collaboration:

Multiple teams from a single entity (such as a company, university, or university department) are allowed as long as the teams are truly independent and do not share team members (at any point), code, or any ideas. Multiple teams from the same research group or company unit are not allowed because of the difficulty of maintaining independence in those situations. If there is any question on independence, the teams will be required to supply an official letter from the company that indicates that the teams do not interact at any point (socially or professionally) and work in separate facilities, as well as the location of those facilities.
You can join an existing team before the abstract deadline as long as you have not belonged to another team or communicated with another team about the current Challenge. You may update your author list by completing this form again (check the ‘Update team members’ box on the form), but changes to your authors must not contravene the rules of the Challenge.
You may use public code from another team if they posted it before the competition.
You may not make your Challenge code publicly available during the Challenge or use any code from another Challenger that was shared, intentionally or not, during the course of the Challenge.
You may not publicly post information describing your methods (blog, vlog, code, preprint, presentation, talk, etc.) or give a talk outside your own research group at any point during the Challenge that reveals the methods you have employed or will employ in the Challenge. Obviously, you can talk about and publish the same methods on other data as long as you don’t indicate that you used or planned to use it for the Challenge.
You must use the same team name and email address for your team throughout the course of the Challenge. The email address should be the same as the one used to register for the Challenge, and to submit your abstract to CinC. Note that the submitter of the conference article/code does not need to present at the conference or be in any particular location in the author order on the abstract/poster/paper, but they must be a contributing member of the team. If your team uses multiple team names and/or email addresses to enter the Challenge, please contact the Organizers immediately to avoid disqualification of all team members concerned. Ambiguity will result in disqualification.
If you participate in the Challenge as part of a class project, then please treat your class as a single team — please use the same team name as other groups in your class, limit the number of submissions from your class to the number allowed for each team, and feel free to present your work within your class. If your class needs more submissions than the Challenge submission limits allow, then please perform cross-validation on the training data to evaluate your work.

If we discover evidence of the contravention of these rules, then you will be ineligible for a prize and your entry publicly marked as possibly associated with another entry. Although we will contact the team(s) in question, time and resources are limited and the Organizers must use their best judgement on the matter in a short period of time. The Organizers’ decision on rule violations will be final.

Conference Attendance

CinC 2021 will take place from 12-15 September 2021 in Brno, Czech Republic. You must attend the whole conference to be eligible for prizes. If you send someone in your place who is not a team member or co-author, then you will be disqualified and your abstract will be removed from the proceedings. In particular, it is vital that the presenter (oral or poster) can defend your work and have in-depth knowledge of all decisions made during the development of your algorithm. Due to this year’s challenges, both in person and remote attendance are allowed. If you require a visa to attend the conference, we strongly suggest that you apply as soon as possible. Please contact the local conference organizing committee (not the Challenge Organizers) for any visa sponsorship letters and answer any questions concerning the conference.

Hackathon

Due to the uncertainties around travel, we have unfortunately decided not to run the Hackathon again this year.

Ethics

The authors declare no ethics concerns.

Acknowledgements

This year’s Challenge is generously co-sponsored by Google, MathWorks, and the Gordon and Betty Moore Foundation.

Obtaining Complimentary MATLAB Licenses

MathWorks has generously decided to sponsor this Challenge by providing complimentary licenses to all teams that wish to use MATLAB. Users can apply for a license and learn more about MATLAB support by visiting the PhysioNet Challenge page from MathWorks. If you have questions or need technical support, then please contact MathWorks at studentcompetitions@mathworks.com.

Obtaining Complimentary Google Cloud Platform Credits

Google has generously agreed to provide Google Cloud Platform (GCP) credits for a limited nunber of teams for this Challenge.

At the time of launching this Challenge, Google Cloud offers multiple services for free on a one-year trial basis and $300 in cloud credits. Additionally, if teams are based at an educational institution in selected countries, then they can access free GCP training online.

Google Cloud credits will be made available to teams that requested credits when registering for the Challenge. Only one credit will be provided to one email address associated with each team, and teams must have a successful entry to the official phase of the Challenge and an accepted abstract to CinC.

The Challenge Organizers, their employers, PhysioNet and Computing in Cardiology accept no responsibility for the loss of credits, or failure to issue credits for any reason. Please note, by requesting credits, you are granting us permission to forward your details to Google for the distribution of credits. You can register for these credits during the Challenge registration process.

Conflicts of Interest

The authors have no conflicts of interest to declare.

References

Drew BJ, Pelter MM, Adams MG, Wung SF. 12-lead ST-segment monitoring vs single-lead maximum ST-segment monitoring for detecting ongoing ischemia in patients with unstable coronary syndromes. American Journal of Critical Care. 1998 Sep 1;7(5):355.
Drew BJ, Pelter MM, Brodnick DE, Yadav AV, Dempel D, Adams MG. Comparison of a new reduced lead set ECG with the standard ECG for diagnosing cardiac arrhythmias and myocardial ischemia. Journal of electrocardiology. 2002 Oct 1;35(4):13-21.
Green M, Ohlsson M, Forberg JL, Björk J, Edenbrandt L, Ekelund U. Best leads in the standard electrocardiogram for the emergency detection of acute coronary syndrome. Journal of electrocardiology. 2007 May 1;40(3):251-6.
Aldrich HR, Hindman NB, Hinohara T, Jones MG, Boswick J, Lee KL, Bride W, Califf RM, Wagner GS. Identification of the optimal electrocardiographic leads for detecting acute epicardial injury in acute myocardial infarction. The American journal of cardiology. 1987 Jan 1;59(1):20-3.
Alday EA, Gu A, Shah AJ, Robichaux C, Wong AK, Liu C, Liu F, Rad AB, Elola A, Seyedi S, Li Q. Classification of 12-lead ecgs: the physionet/computing in cardiology challenge 2020. Physiological measurement. 2020 Dec 29;41(12):124003.
Zheng J, Zhang J, Danioko S, Yao H, Guo H, Rakovski C. A 12-lead electrocardiogram database for arrhythmia research covering more than 10,000 patients. Scientific Data. 2020 Feb 12;7(1):1-8.
Zheng J, Chu H, Struppa D, Zhang J, Yacoub SM, El-Askary H, Chang A, Ehwerhemuepha L, Abudayyeh I, Barrett A, Fu G. Optimal multi-stage arrhythmia classification approach. Scientific reports. 2020 Feb 19;10(1):1-7.

Parent Projects

Will Two Do? Varying Dimensions in Electrocardiography: The PhysioNet/Computing in Cardiology Challenge 2021 was derived from:

Please cite them when using this project.

Access

Access Policy:
Anyone can access the files, as long as they conform to the terms of the specified license.

License (for files):
Creative Commons Attribution 4.0 International Public License

Discovery

DOI (version 1.0.3):
https://doi.org/10.13026/34va-7q14

DOI (latest version):
https://doi.org/10.13026/gt86-a263

Topics:
challenge cardiac abnormalities multilead ecgs classification competition

Project Website:
https://physionetchallenges.org/2021/

Project Views

802

Current Version

1057

All Versions

Project Views by Unique Registered Users

View Details

Corresponding Author

You must be logged in to view the contact information.

Versions

Files

Total uncompressed size: 12.6 GB.

Access the files

Download the files using your terminal:

wget -r -N -c -np https://physionet.org/files/challenge-2021/1.0.3/

To download the files using AWS command line tools, first configure your AWS credentials.

Visualize waveforms

Folder Navigation:

Name	Size	Modified
Parent Directory
chapman_shaoxing
cpsc_2018
cpsc_2018_extra
georgia
ningbo
ptb
ptb-xl
st_petersburg_incart

Will Two Do? Varying Dimensions in Electrocardiography: The PhysioNet/Computing in Cardiology Challenge 2021

Cite