Database Open Access

Icentia11k Single Lead Continuous Raw Electrocardiogram Dataset

Shawn Tan Satya Ortiz-Gagné Nicolas Beaudoin-Gagnon Pierre Fecteau Aaron Courville Yoshua Bengio Joseph Paul Cohen

Published: April 12, 2022. Version: 1.0


When using this resource, please cite: (show more options)
Tan, S., Ortiz-Gagné, S., Beaudoin-Gagnon, N., Fecteau, P., Courville, A., Bengio, Y., & Cohen, J. P. (2022). Icentia11k Single Lead Continuous Raw Electrocardiogram Dataset (version 1.0). PhysioNet. https://doi.org/10.13026/kk0v-r952.

Additionally, please cite the original publication:

Tan, S., Androz, G., Ortiz-Gagné, S., Chamseddine, A., Fecteau, P., Courville, A., Bengio, Y., & Cohen, J. P. (2021, October 21). Icentia11K: An Unsupervised Representation Learning Dataset for Arrhythmia Subtype Discovery. Computing in Cardiology Conference (CinC).

Please include the standard citation for PhysioNet: (show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.

Abstract

This is a dataset of continuous raw electrocardiogram (ECG) signals containing 11 thousand patients and 2 billion labelled beats. The signals were recorded with a 16-bit resolution at 250Hz with a fixed chest mounted single lead probe for up to 2 weeks. The average age of the patient is 62.2±17.4 years. 20 technologists annotated each beat's type (Normal, Premature Atrial Contraction, Premature Ventricular contraction) and rhythm (Normal Sinusal Rhythm, Atrial Fibrillation, Atrial Flutter).


Background

Arrhythmia detection is presently performed by cardiologists or technologists familiar with ECG readings. Recently, supervised machine learning has been successfully applied to perform automated detection of many arrhythmias [1,2,3,4]. However, there may be ECG anomalies that warrant further investigation because they do not fit the morphology of presently known arrhythmia. We seek to use a data driven approach to finding these differences that cardiologists have anecdotally observed. Existing public ECG datasets include the the MIMIC-III Waveform Database and the ECG-ViEW II dataset [5,6]. Here we present Icentia11k, a dataset of continuous raw electrocardiogram (ECG) signals containing 11 thousand patients and 2 billion labelled beats


Methods

Our data is collected by the CardioSTAT, a single-lead heart monitor device from Icentia [7]. The raw signals were recorded with a 16-bit resolution and sampled at 250Hz with the CardioSTAT in a modified lead 1 position. The wealth of data this provides us can allow us to improve on the techniques currently used by the medical industry to process days worth of ECG data, and perhaps to catch anomalous events earlier than currently possible.

The dataset is processed from data provided by 11,000 patients who used the CardioSTAT device predominantly in Ontario, Canada, from various medical centers. While the device captures ECG data for up to two weeks, the majority of the prescribed duration of wear was one week.

The data is analyzed by Icentia's team of 20 technologists who performed annotation using proprietary analysis tools. Initial beat detection is performed automatically and then a technologist analyses the record labelling beat and rhythm types performing a full disclosure analysis (i.e. they see the whole recording). Finally the analysis is approved by a senior technologist before making it to the dataset.

The ethics institutional review boards at the Université de Montréal approved the study and release of data (CERSES-19-065-D).


Data Description

We segment each patient record into segments of 2 20 + 1 2^{20}+1  signal samples (≈70 minutes). This longer time context was informed by discussions with technologists: the context is useful for rhythm detection. We made it a power of two with a middle sample to allow for easier convolution stack parameterization. From this, we randomly select 50 of the segments and their respective labels from the list of segments. The goal here is to reduce the size of the dataset while maintaining a fair representation of each patient.

Data structure

The data is structured into patients and segments.

Patient level (3-14 days)

At this level, the data can capture features which vary in a systematic way and not isolated events, like the placement of the probes or patient specific noise.

Segment level (1,048,577 int16 samples, approximately 1 hour)

A cardiologist can look at a specific segment and identify patterns which indicate a disease while ignoring noise from the signal such as a unique signal amplitude. Looking at trends in the segment help to correctly identify arrhythmia as half an hour provides the necessary context to observe the stress of a specific activity.

Aggregate statistics

Aggregate statistics are shown below:

Statistic # (units)
Number of patients 11,000
Number of labeled beats 2,774,054,987
Sample rate 250Hz
Segment size 2 20 + 1 2^{20}+1  = 1,048,577
Total number of segments 541,794 (not all patients have enough for 50 segments)

Beats are annotated in ann.symbols at the R timepoint in the QRS complex. The timepoint in the rec.signal for each annotation is found in ann.sample Below shows the counts for beats over the entire dataset. There are also annotations with a '+' symbol which just mean there is a rhythm annotation (next table).

Symbol Beat Description Count
N Normal 2,061,141,216
S ESSV (PAC): Premature or ectopic supraventricular beat, premature atrial contraction 19,346,728
V ESV (PVC): Premature ventricular contraction, premature ventricular contraction 17,203,041
Q Undefined: Unclassifiable beat 676,364,002

Rhythms are annotated in ann.aux_note at each timepoint. For example a normal sinusal rhythm will start with a '(N' annotation and then end with a ')' annotation. The entire sequence in between is annotated as a normal sinusal rhythm. Below are the counts of each annotated region which could be one beat or thousands.

Symbol Rhythm Labels Count
(N ... ) NSR (Normal sinusal rhythm) 16,083,158
(AFIB ... ) AFib (Atrial fibrillation) 848,564
(AFL ... ) AFlutter (Atrial flutter) 313,251

Details on how the dataset is encoded into wfdb format are available on GitHub [8].


Usage Notes

By releasing this dataset, we seek to enable the research community to develop better models for detection of arrhythmia and related heart disease. The dataset is described in more detail in our accompanying paper [9], which also describes our efforts to evaluation existing models for classification of arrhythmia. Code for working with the data, including executable notebooks, is available on GitHub [8].

Example code

To look at patient 9000 and segment 0 the filename would be: p09/09000/p09000_s00 and it can loaded using wfdb as follows:

import wfdb
patient_id=9000
segment_id=0
start=2000
length=1024
filename = f'{data_path}/p0{str(patient_id)[:1]}/p{patient_id:05d}/p{patient_id:05d}_s{segment_id:02d}'
rec = wfdb.rdrecord(filename, sampfrom=start, sampto=start+length)
ann = wfdb.rdann(filename, "atr", sampfrom=start, sampto=start+length, shift_samps=True)
wfdb.plot_wfdb(rec, ann, plot_sym=True, figsize=(15,4));

Limitations

It should be noted that since the people who wear the device are patients, the dataset does not represent a true random sample of the global population.  For one, the average age of the patient is 62.2±17.4 years of age.  Furthermore, whereas the CardioSTAT can be worn by any patient, it is mostly used for third line exam, so the majority of records in the dataset exhibit arrhythmias. No particular effort has been done on patient selection except data collection has been conducted over years 2017 and 2018.


Release Notes

Version 1.0: First release on PhysioNet. Prior to this release data was made available on AcademicTorrents [10].


Ethics

The authors declare no ethics concerns. The ethics institutional review boards at the University of Montreal approved the study and release of data (#CERSES-19-065-D).


Acknowledgements

We thank Leon Glass, Yannick Le Devehat, Germain Ethier, and Margaux Luck, Kris Sankaran, and Gabriele Prato for useful discussions. This work is partially funded by a grant from Icentia, Fonds de Recherche en Santé du Québec, and the Institut de valorisation des donnees (IVADO). This work utilized the supercomputing facilities managed by Compute Canada and Calcul Quebec. We thank AcademicTorrents.com for making data available for our research.


Conflicts of Interest

None


References

  1. Hannun AY, Rajpurkar P, Haghpanahi M, Tison GH, Bourn C, Turakhia MP, Ng AY. Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network. Nature Medicine 2019
  2. Yıldırım O, Pławiak P, Tan RS, Acharya UR. Arrhythmia detection using deep convolutional neural network with long duration ecg signals. Computers in biology and medicine 2018.
  3. Minchole A, Rodriguez B. Artificial intelligence for the electrocardiogram. Nature Medicine 1 2019.
  4. Porumb M, Iadanza E, Massaro S, Pecchia L. A convolutional neural network approach to detect congestive heart failure. Biomedical Signal Processing and Control 2020.
  5. Johnson, A., Pollard, T., & Mark, R. (2016). MIMIC-III Clinical Database (version 1.4). PhysioNet. https://doi.org/10.13026/C2XW26.
  6. Kim YG, Shin D, Park MY, Lee S, Jeon MS, Yoon D, Park RW. ECG-ViEW II, a freely accessible electrocardiogram database. PloS one 2017.
  7. Icentia website. https://www.icentia.com/
  8. Icentia11k project on GitHub. https://github.com/shawntan/icentia-ecg/tree/master/physionet
  9. Tan, S., Androz, G., Ortiz-Gagné, S., Chamseddine, A., Fecteau, P., Courville, A., Bengio, Y., & Cohen, J. P. (2021, October 21). Icentia11K: An Unsupervised Representation Learning Dataset for Arrhythmia Subtype Discovery. Computing in Cardiology Conference (CinC). https://www.cinc.org/2021/Program/accepted/229_Preprint.pdf
  10. Icentia11k Dataset on Academic Torrents. https://academictorrents.com/details/af04abfe9a3c96b30e5dd029eb185e19a7055272

Share
Access

Access Policy:
Anyone can access the files, as long as they conform to the terms of the specified license.

License (for files):
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License

Corresponding Author
You must be logged in to view the contact information.

Files

Total uncompressed size: 1.1 TB.

Access the files

Visualize waveforms

Folder Navigation: <base>/p06
Name Size Modified
Parent Directory
p06000
p06001
p06002
p06003
p06004
p06005
p06006
p06007
p06008
p06009
p06010
p06011
p06012
p06013
p06014
p06015
p06016
p06017
p06018
p06019
p06020
p06021
p06022
p06023
p06024
p06025
p06026
p06027
p06028
p06029
p06030
p06031
p06032
p06033
p06034
p06035
p06036
p06037
p06038
p06039
p06040
p06041
p06042
p06043
p06044
p06045
p06046
p06047
p06048
p06049
p06050
p06051
p06052
p06053
p06054
p06055
p06056
p06057
p06058
p06059
p06060
p06061
p06062
p06063
p06064
p06065
p06066
p06067
p06068
p06069
p06070
p06071
p06072
p06073
p06074
p06075
p06076
p06077
p06078
p06079
p06080
p06081
p06082
p06083
p06084
p06085
p06086
p06087
p06088
p06089
p06090
p06091
p06092
p06093
p06094
p06095
p06096
p06097
p06098
p06099
p06100
p06101
p06102
p06103
p06104
p06105
p06106
p06107
p06108
p06109
p06110
p06111
p06112
p06113
p06114
p06115
p06116
p06117
p06118
p06119
p06120
p06121
p06122
p06123
p06124
p06125
p06126
p06127
p06128
p06129
p06130
p06131
p06132
p06133
p06134
p06135
p06136
p06137
p06138
p06139
p06140
p06141
p06142
p06143
p06144
p06145
p06146
p06147
p06148
p06149
p06150
p06151
p06152
p06153
p06154
p06155
p06156
p06157
p06158
p06159
p06160
p06161
p06162
p06163
p06164
p06165
p06166
p06167
p06168
p06169
p06170
p06171
p06172
p06173
p06174
p06175
p06176
p06177
p06178
p06179
p06180
p06181
p06182
p06183
p06184
p06185
p06186
p06187
p06188
p06189
p06190
p06191
p06192
p06193
p06194
p06195
p06196
p06197
p06198
p06199
p06200
p06201
p06202
p06203
p06204
p06205
p06206
p06207
p06208
p06209
p06210
p06211
p06212
p06213
p06214
p06215
p06216
p06217
p06218
p06219
p06220
p06221
p06222
p06223
p06224
p06225
p06226
p06227
p06228
p06229
p06230
p06231
p06232
p06233
p06234
p06235
p06236
p06237
p06238
p06239
p06240
p06241
p06242
p06243
p06244
p06245
p06246
p06247
p06248
p06249
p06250
p06251
p06252
p06253
p06254
p06255
p06256
p06257
p06258
p06259
p06260
p06261
p06262
p06263
p06264
p06265
p06266
p06267
p06268
p06269
p06270
p06271
p06272
p06273
p06274
p06275
p06276
p06277
p06278
p06279
p06280
p06281
p06282
p06283
p06284
p06285
p06286
p06287
p06288
p06289
p06290
p06291
p06292
p06293
p06294
p06295
p06296
p06297
p06298
p06299
p06300
p06301
p06302
p06303
p06304
p06305
p06306
p06307
p06308
p06309
p06310
p06311
p06312
p06313
p06314
p06315
p06316
p06317
p06318
p06319
p06320
p06321
p06322
p06323
p06324
p06325
p06326
p06327
p06328
p06329
p06330
p06331
p06332
p06333
p06334
p06335
p06336
p06337
p06338
p06339
p06340
p06341
p06342
p06343
p06344
p06345
p06346
p06347
p06348
p06349
p06350
p06351
p06352
p06353
p06354
p06355
p06356
p06357
p06358
p06359
p06360
p06361
p06362
p06363
p06364
p06365
p06366
p06367
p06368
p06369
p06370
p06371
p06372
p06373
p06374
p06375
p06376
p06377
p06378
p06379
p06380
p06381
p06382
p06383
p06384
p06385
p06386
p06387
p06388
p06389
p06390
p06391
p06392
p06393
p06394
p06395
p06396
p06397
p06398
p06399
p06400
p06401
p06402
p06403
p06404
p06405
p06406
p06407
p06408
p06409
p06410
p06411
p06412
p06413
p06414
p06415
p06416
p06417
p06418
p06419
p06420
p06421
p06422
p06423
p06424
p06425
p06426
p06427
p06428
p06429
p06430
p06431
p06432
p06433
p06434
p06435
p06436
p06437
p06438
p06439
p06440
p06441
p06442
p06443
p06444
p06445
p06446
p06447
p06448
p06449
p06450
p06451
p06452
p06453
p06454
p06455
p06456
p06457
p06458
p06459
p06460
p06461
p06462
p06463
p06464
p06465
p06466
p06467
p06468
p06469
p06470
p06471
p06472
p06473
p06474
p06475
p06476
p06477
p06478
p06479
p06480
p06481
p06482
p06483
p06484
p06485
p06486
p06487
p06488
p06489
p06490
p06491
p06492
p06493
p06494
p06495
p06496
p06497
p06498
p06499
p06500
p06501
p06502
p06503
p06504
p06505
p06506
p06507
p06508
p06509
p06510
p06511
p06512
p06513
p06514
p06515
p06516
p06517
p06518
p06519
p06520
p06521
p06522
p06523
p06524
p06525
p06526
p06527
p06528
p06529
p06530
p06531
p06532
p06533
p06534
p06535
p06536
p06537
p06538
p06539
p06540
p06541
p06542
p06543
p06544
p06545
p06546
p06547
p06548
p06549
p06550
p06551
p06552
p06553
p06554
p06555
p06556
p06557
p06558
p06559
p06560
p06561
p06562
p06563
p06564
p06565
p06566
p06567
p06568
p06569
p06570
p06571
p06572
p06573
p06574
p06575
p06576
p06577
p06578
p06579
p06580
p06581
p06582
p06583
p06584
p06585
p06586
p06587
p06588
p06589
p06590
p06591
p06592
p06593
p06594
p06595
p06596
p06597
p06598
p06599
p06600
p06601
p06602
p06603
p06604
p06605
p06606
p06607
p06608
p06609
p06610
p06611
p06612
p06613
p06614
p06615
p06616
p06617
p06618
p06619
p06620
p06621
p06622
p06623
p06624
p06625
p06626
p06627
p06628
p06629
p06630
p06631
p06632
p06633
p06634
p06635
p06636
p06637
p06638
p06639
p06640
p06641
p06642
p06643
p06644
p06645
p06646
p06647
p06648
p06649
p06650
p06651
p06652
p06653
p06654
p06655
p06656
p06657
p06658
p06659
p06660
p06661
p06662
p06663
p06664
p06665
p06666
p06667
p06668
p06669
p06670
p06671
p06672
p06673
p06674
p06675
p06676
p06677
p06678
p06679
p06680
p06681
p06682
p06683
p06684
p06685
p06686
p06687
p06688
p06689
p06690
p06691
p06692
p06693
p06694
p06695
p06696
p06697
p06698
p06699
p06700
p06701
p06702
p06703
p06704
p06705
p06706
p06707
p06708
p06709
p06710
p06711
p06712
p06713
p06714
p06715
p06716
p06717
p06718
p06719
p06720
p06721
p06722
p06723
p06724
p06725
p06726
p06727
p06728
p06729
p06730
p06731
p06732
p06733
p06734
p06735
p06736
p06737
p06738
p06739
p06740
p06741
p06742
p06743
p06744
p06745
p06746
p06747
p06748
p06749
p06750
p06751
p06752
p06753
p06754
p06755
p06756
p06757
p06758
p06759
p06760
p06761
p06762
p06763
p06764
p06765
p06766
p06767
p06768
p06769
p06770
p06771
p06772
p06773
p06774
p06775
p06776
p06777
p06778
p06779
p06780
p06781
p06782
p06783
p06784
p06785
p06786
p06787
p06788
p06789
p06790
p06791
p06792
p06793
p06794
p06795
p06796
p06797
p06798
p06799
p06800
p06801
p06802
p06803
p06804
p06805
p06806
p06807
p06808
p06809
p06810
p06811
p06812
p06813
p06814
p06815
p06816
p06817
p06818
p06819
p06820
p06821
p06822
p06823
p06824
p06825
p06826
p06827
p06828
p06829
p06830
p06831
p06832
p06833
p06834
p06835
p06836
p06837
p06838
p06839
p06840
p06841
p06842
p06843
p06844
p06845
p06846
p06847
p06848
p06849
p06850
p06851
p06852
p06853
p06854
p06855
p06856
p06857
p06858
p06859
p06860
p06861
p06862
p06863
p06864
p06865
p06866
p06867
p06868
p06869
p06870
p06871
p06872
p06873
p06874
p06875
p06876
p06877
p06878
p06879
p06880
p06881
p06882
p06883
p06884
p06885
p06886
p06887
p06888
p06889
p06890
p06891
p06892
p06893
p06894
p06895
p06896
p06897
p06898
p06899
p06900
p06901
p06902
p06903
p06904
p06905
p06906
p06907
p06908
p06909
p06910
p06911
p06912
p06913
p06914
p06915
p06916
p06917
p06918
p06919
p06920
p06921
p06922
p06923
p06924
p06925
p06926
p06927
p06928
p06929
p06930
p06931
p06932
p06933
p06934
p06935
p06936
p06937
p06938
p06939
p06940
p06941
p06942
p06943
p06944
p06945
p06946
p06947
p06948
p06949
p06950
p06951
p06952
p06953
p06954
p06955
p06956
p06957
p06958
p06959
p06960
p06961
p06962
p06963
p06964
p06965
p06966
p06967
p06968
p06969
p06970
p06971
p06972
p06973
p06974
p06975
p06976
p06977
p06978
p06979
p06980
p06981
p06982
p06983
p06984
p06985
p06986
p06987
p06988
p06989
p06990
p06991
p06992
p06993
p06994
p06995
p06996
p06997
p06998
p06999