Database Open Access

Icentia11k Single Lead Continuous Raw Electrocardiogram Dataset

Shawn Tan Satya Ortiz-Gagné Nicolas Beaudoin-Gagnon Pierre Fecteau Aaron Courville Yoshua Bengio Joseph Paul Cohen

Published: April 12, 2022. Version: 1.0


When using this resource, please cite: (show more options)
Tan, S., Ortiz-Gagné, S., Beaudoin-Gagnon, N., Fecteau, P., Courville, A., Bengio, Y., & Cohen, J. P. (2022). Icentia11k Single Lead Continuous Raw Electrocardiogram Dataset (version 1.0). PhysioNet. https://doi.org/10.13026/kk0v-r952.

Additionally, please cite the original publication:

Tan, S., Androz, G., Ortiz-Gagné, S., Chamseddine, A., Fecteau, P., Courville, A., Bengio, Y., & Cohen, J. P. (2021, October 21). Icentia11K: An Unsupervised Representation Learning Dataset for Arrhythmia Subtype Discovery. Computing in Cardiology Conference (CinC).

Please include the standard citation for PhysioNet: (show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.

Abstract

This is a dataset of continuous raw electrocardiogram (ECG) signals containing 11 thousand patients and 2 billion labelled beats. The signals were recorded with a 16-bit resolution at 250Hz with a fixed chest mounted single lead probe for up to 2 weeks. The average age of the patient is 62.2±17.4 years. 20 technologists annotated each beat's type (Normal, Premature Atrial Contraction, Premature Ventricular contraction) and rhythm (Normal Sinusal Rhythm, Atrial Fibrillation, Atrial Flutter).


Background

Arrhythmia detection is presently performed by cardiologists or technologists familiar with ECG readings. Recently, supervised machine learning has been successfully applied to perform automated detection of many arrhythmias [1,2,3,4]. However, there may be ECG anomalies that warrant further investigation because they do not fit the morphology of presently known arrhythmia. We seek to use a data driven approach to finding these differences that cardiologists have anecdotally observed. Existing public ECG datasets include the the MIMIC-III Waveform Database and the ECG-ViEW II dataset [5,6]. Here we present Icentia11k, a dataset of continuous raw electrocardiogram (ECG) signals containing 11 thousand patients and 2 billion labelled beats


Methods

Our data is collected by the CardioSTAT, a single-lead heart monitor device from Icentia [7]. The raw signals were recorded with a 16-bit resolution and sampled at 250Hz with the CardioSTAT in a modified lead 1 position. The wealth of data this provides us can allow us to improve on the techniques currently used by the medical industry to process days worth of ECG data, and perhaps to catch anomalous events earlier than currently possible.

The dataset is processed from data provided by 11,000 patients who used the CardioSTAT device predominantly in Ontario, Canada, from various medical centers. While the device captures ECG data for up to two weeks, the majority of the prescribed duration of wear was one week.

The data is analyzed by Icentia's team of 20 technologists who performed annotation using proprietary analysis tools. Initial beat detection is performed automatically and then a technologist analyses the record labelling beat and rhythm types performing a full disclosure analysis (i.e. they see the whole recording). Finally the analysis is approved by a senior technologist before making it to the dataset.

The ethics institutional review boards at the Université de Montréal approved the study and release of data (CERSES-19-065-D).


Data Description

We segment each patient record into segments of 2 20 + 1 2^{20}+1  signal samples (≈70 minutes). This longer time context was informed by discussions with technologists: the context is useful for rhythm detection. We made it a power of two with a middle sample to allow for easier convolution stack parameterization. From this, we randomly select 50 of the segments and their respective labels from the list of segments. The goal here is to reduce the size of the dataset while maintaining a fair representation of each patient.

Data structure

The data is structured into patients and segments.

Patient level (3-14 days)

At this level, the data can capture features which vary in a systematic way and not isolated events, like the placement of the probes or patient specific noise.

Segment level (1,048,577 int16 samples, approximately 1 hour)

A cardiologist can look at a specific segment and identify patterns which indicate a disease while ignoring noise from the signal such as a unique signal amplitude. Looking at trends in the segment help to correctly identify arrhythmia as half an hour provides the necessary context to observe the stress of a specific activity.

Aggregate statistics

Aggregate statistics are shown below:

Statistic # (units)
Number of patients 11,000
Number of labeled beats 2,774,054,987
Sample rate 250Hz
Segment size 2 20 + 1 2^{20}+1  = 1,048,577
Total number of segments 541,794 (not all patients have enough for 50 segments)

Beats are annotated in ann.symbols at the R timepoint in the QRS complex. The timepoint in the rec.signal for each annotation is found in ann.sample Below shows the counts for beats over the entire dataset. There are also annotations with a '+' symbol which just mean there is a rhythm annotation (next table).

Symbol Beat Description Count
N Normal 2,061,141,216
S ESSV (PAC): Premature or ectopic supraventricular beat, premature atrial contraction 19,346,728
V ESV (PVC): Premature ventricular contraction, premature ventricular contraction 17,203,041
Q Undefined: Unclassifiable beat 676,364,002

Rhythms are annotated in ann.aux_note at each timepoint. For example a normal sinusal rhythm will start with a '(N' annotation and then end with a ')' annotation. The entire sequence in between is annotated as a normal sinusal rhythm. Below are the counts of each annotated region which could be one beat or thousands.

Symbol Rhythm Labels Count
(N ... ) NSR (Normal sinusal rhythm) 16,083,158
(AFIB ... ) AFib (Atrial fibrillation) 848,564
(AFL ... ) AFlutter (Atrial flutter) 313,251

Details on how the dataset is encoded into wfdb format are available on GitHub [8].


Usage Notes

By releasing this dataset, we seek to enable the research community to develop better models for detection of arrhythmia and related heart disease. The dataset is described in more detail in our accompanying paper [9], which also describes our efforts to evaluation existing models for classification of arrhythmia. Code for working with the data, including executable notebooks, is available on GitHub [8].

Example code

To look at patient 9000 and segment 0 the filename would be: p09/09000/p09000_s00 and it can loaded using wfdb as follows:

import wfdb
patient_id=9000
segment_id=0
start=2000
length=1024
filename = f'{data_path}/p0{str(patient_id)[:1]}/p{patient_id:05d}/p{patient_id:05d}_s{segment_id:02d}'
rec = wfdb.rdrecord(filename, sampfrom=start, sampto=start+length)
ann = wfdb.rdann(filename, "atr", sampfrom=start, sampto=start+length, shift_samps=True)
wfdb.plot_wfdb(rec, ann, plot_sym=True, figsize=(15,4));

Limitations

It should be noted that since the people who wear the device are patients, the dataset does not represent a true random sample of the global population.  For one, the average age of the patient is 62.2±17.4 years of age.  Furthermore, whereas the CardioSTAT can be worn by any patient, it is mostly used for third line exam, so the majority of records in the dataset exhibit arrhythmias. No particular effort has been done on patient selection except data collection has been conducted over years 2017 and 2018.


Release Notes

Version 1.0: First release on PhysioNet. Prior to this release data was made available on AcademicTorrents [10].


Ethics

The authors declare no ethics concerns. The ethics institutional review boards at the University of Montreal approved the study and release of data (#CERSES-19-065-D).


Acknowledgements

We thank Leon Glass, Yannick Le Devehat, Germain Ethier, and Margaux Luck, Kris Sankaran, and Gabriele Prato for useful discussions. This work is partially funded by a grant from Icentia, Fonds de Recherche en Santé du Québec, and the Institut de valorisation des donnees (IVADO). This work utilized the supercomputing facilities managed by Compute Canada and Calcul Quebec. We thank AcademicTorrents.com for making data available for our research.


Conflicts of Interest

None


References

  1. Hannun AY, Rajpurkar P, Haghpanahi M, Tison GH, Bourn C, Turakhia MP, Ng AY. Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network. Nature Medicine 2019
  2. Yıldırım O, Pławiak P, Tan RS, Acharya UR. Arrhythmia detection using deep convolutional neural network with long duration ecg signals. Computers in biology and medicine 2018.
  3. Minchole A, Rodriguez B. Artificial intelligence for the electrocardiogram. Nature Medicine 1 2019.
  4. Porumb M, Iadanza E, Massaro S, Pecchia L. A convolutional neural network approach to detect congestive heart failure. Biomedical Signal Processing and Control 2020.
  5. Johnson, A., Pollard, T., & Mark, R. (2016). MIMIC-III Clinical Database (version 1.4). PhysioNet. https://doi.org/10.13026/C2XW26.
  6. Kim YG, Shin D, Park MY, Lee S, Jeon MS, Yoon D, Park RW. ECG-ViEW II, a freely accessible electrocardiogram database. PloS one 2017.
  7. Icentia website. https://www.icentia.com/
  8. Icentia11k project on GitHub. https://github.com/shawntan/icentia-ecg/tree/master/physionet
  9. Tan, S., Androz, G., Ortiz-Gagné, S., Chamseddine, A., Fecteau, P., Courville, A., Bengio, Y., & Cohen, J. P. (2021, October 21). Icentia11K: An Unsupervised Representation Learning Dataset for Arrhythmia Subtype Discovery. Computing in Cardiology Conference (CinC). https://www.cinc.org/2021/Program/accepted/229_Preprint.pdf
  10. Icentia11k Dataset on Academic Torrents. https://academictorrents.com/details/af04abfe9a3c96b30e5dd029eb185e19a7055272

Share
Access

Access Policy:
Anyone can access the files, as long as they conform to the terms of the specified license.

License (for files):
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License

Corresponding Author
You must be logged in to view the contact information.

Files

Total uncompressed size: 1.1 TB.

Access the files

Visualize waveforms

Folder Navigation: <base>/p04
Name Size Modified
Parent Directory
p04000
p04001
p04002
p04003
p04004
p04005
p04006
p04007
p04008
p04009
p04010
p04011
p04012
p04013
p04014
p04015
p04016
p04017
p04018
p04019
p04020
p04021
p04022
p04023
p04024
p04025
p04026
p04027
p04028
p04029
p04030
p04031
p04032
p04033
p04034
p04035
p04036
p04037
p04038
p04039
p04040
p04041
p04042
p04043
p04044
p04045
p04046
p04047
p04048
p04049
p04050
p04051
p04052
p04053
p04054
p04055
p04056
p04057
p04058
p04059
p04060
p04061
p04062
p04063
p04064
p04065
p04066
p04067
p04068
p04069
p04070
p04071
p04072
p04073
p04074
p04075
p04076
p04077
p04078
p04079
p04080
p04081
p04082
p04083
p04084
p04085
p04086
p04087
p04088
p04089
p04090
p04091
p04092
p04093
p04094
p04095
p04096
p04097
p04098
p04099
p04100
p04101
p04102
p04103
p04104
p04105
p04106
p04107
p04108
p04109
p04110
p04111
p04112
p04113
p04114
p04115
p04116
p04117
p04118
p04119
p04120
p04121
p04122
p04123
p04124
p04125
p04126
p04127
p04128
p04129
p04130
p04131
p04132
p04133
p04134
p04135
p04136
p04137
p04138
p04139
p04140
p04141
p04142
p04143
p04144
p04145
p04146
p04147
p04148
p04149
p04150
p04151
p04152
p04153
p04154
p04155
p04156
p04157
p04158
p04159
p04160
p04161
p04162
p04163
p04164
p04165
p04166
p04167
p04168
p04169
p04170
p04171
p04172
p04173
p04174
p04175
p04176
p04177
p04178
p04179
p04180
p04181
p04182
p04183
p04184
p04185
p04186
p04187
p04188
p04189
p04190
p04191
p04192
p04193
p04194
p04195
p04196
p04197
p04198
p04199
p04200
p04201
p04202
p04203
p04204
p04205
p04206
p04207
p04208
p04209
p04210
p04211
p04212
p04213
p04214
p04215
p04216
p04217
p04218
p04219
p04220
p04221
p04222
p04223
p04224
p04225
p04226
p04227
p04228
p04229
p04230
p04231
p04232
p04233
p04234
p04235
p04236
p04237
p04238
p04239
p04240
p04241
p04242
p04243
p04244
p04245
p04246
p04247
p04248
p04249
p04250
p04251
p04252
p04253
p04254
p04255
p04256
p04257
p04258
p04259
p04260
p04261
p04262
p04263
p04264
p04265
p04266
p04267
p04268
p04269
p04270
p04271
p04272
p04273
p04274
p04275
p04276
p04277
p04278
p04279
p04280
p04281
p04282
p04283
p04284
p04285
p04286
p04287
p04288
p04289
p04290
p04291
p04292
p04293
p04294
p04295
p04296
p04297
p04298
p04299
p04300
p04301
p04302
p04303
p04304
p04305
p04306
p04307
p04308
p04309
p04310
p04311
p04312
p04313
p04314
p04315
p04316
p04317
p04318
p04319
p04320
p04321
p04322
p04323
p04324
p04325
p04326
p04327
p04328
p04329
p04330
p04331
p04332
p04333
p04334
p04335
p04336
p04337
p04338
p04339
p04340
p04341
p04342
p04343
p04344
p04345
p04346
p04347
p04348
p04349
p04350
p04351
p04352
p04353
p04354
p04355
p04356
p04357
p04358
p04359
p04360
p04361
p04362
p04363
p04364
p04365
p04366
p04367
p04368
p04369
p04370
p04371
p04372
p04373
p04374
p04375
p04376
p04377
p04378
p04379
p04380
p04381
p04382
p04383
p04384
p04385
p04386
p04387
p04388
p04389
p04390
p04391
p04392
p04393
p04394
p04395
p04396
p04397
p04398
p04399
p04400
p04401
p04402
p04403
p04404
p04405
p04406
p04407
p04408
p04409
p04410
p04411
p04412
p04413
p04414
p04415
p04416
p04417
p04418
p04419
p04420
p04421
p04422
p04423
p04424
p04425
p04426
p04427
p04428
p04429
p04430
p04431
p04432
p04433
p04434
p04435
p04436
p04437
p04438
p04439
p04440
p04441
p04442
p04443
p04444
p04445
p04446
p04447
p04448
p04449
p04450
p04451
p04452
p04453
p04454
p04455
p04456
p04457
p04458
p04459
p04460
p04461
p04462
p04463
p04464
p04465
p04466
p04467
p04468
p04469
p04470
p04471
p04472
p04473
p04474
p04475
p04476
p04477
p04478
p04479
p04480
p04481
p04482
p04483
p04484
p04485
p04486
p04487
p04488
p04489
p04490
p04491
p04492
p04493
p04494
p04495
p04496
p04497
p04498
p04499
p04500
p04501
p04502
p04503
p04504
p04505
p04506
p04507
p04508
p04509
p04510
p04511
p04512
p04513
p04514
p04515
p04516
p04517
p04518
p04519
p04520
p04521
p04522
p04523
p04524
p04525
p04526
p04527
p04528
p04529
p04530
p04531
p04532
p04533
p04534
p04535
p04536
p04537
p04538
p04539
p04540
p04541
p04542
p04543
p04544
p04545
p04546
p04547
p04548
p04549
p04550
p04551
p04552
p04553
p04554
p04555
p04556
p04557
p04558
p04559
p04560
p04561
p04562
p04563
p04564
p04565
p04566
p04567
p04568
p04569
p04570
p04571
p04572
p04573
p04574
p04575
p04576
p04577
p04578
p04579
p04580
p04581
p04582
p04583
p04584
p04585
p04586
p04587
p04588
p04589
p04590
p04591
p04592
p04593
p04594
p04595
p04596
p04597
p04598
p04599
p04600
p04601
p04602
p04603
p04604
p04605
p04606
p04607
p04608
p04609
p04610
p04611
p04612
p04613
p04614
p04615
p04616
p04617
p04618
p04619
p04620
p04621
p04622
p04623
p04624
p04625
p04626
p04627
p04628
p04629
p04630
p04631
p04632
p04633
p04634
p04635
p04636
p04637
p04638
p04639
p04640
p04641
p04642
p04643
p04644
p04645
p04646
p04647
p04648
p04649
p04650
p04651
p04652
p04653
p04654
p04655
p04656
p04657
p04658
p04659
p04660
p04661
p04662
p04663
p04664
p04665
p04666
p04667
p04668
p04669
p04670
p04671
p04672
p04673
p04674
p04675
p04676
p04677
p04678
p04679
p04680
p04681
p04682
p04683
p04684
p04685
p04686
p04687
p04688
p04689
p04690
p04691
p04692
p04693
p04694
p04695
p04696
p04697
p04698
p04699
p04700
p04701
p04702
p04703
p04704
p04705
p04706
p04707
p04708
p04709
p04710
p04711
p04712
p04713
p04714
p04715
p04716
p04717
p04718
p04719
p04720
p04721
p04722
p04723
p04724
p04725
p04726
p04727
p04728
p04729
p04730
p04731
p04732
p04733
p04734
p04735
p04736
p04737
p04738
p04739
p04740
p04741
p04742
p04743
p04744
p04745
p04746
p04747
p04748
p04749
p04750
p04751
p04752
p04753
p04754
p04755
p04756
p04757
p04758
p04759
p04760
p04761
p04762
p04763
p04764
p04765
p04766
p04767
p04768
p04769
p04770
p04771
p04772
p04773
p04774
p04775
p04776
p04777
p04778
p04779
p04780
p04781
p04782
p04783
p04784
p04785
p04786
p04787
p04788
p04789
p04790
p04791
p04792
p04793
p04794
p04795
p04796
p04797
p04798
p04799
p04800
p04801
p04802
p04803
p04804
p04805
p04806
p04807
p04808
p04809
p04810
p04811
p04812
p04813
p04814
p04815
p04816
p04817
p04818
p04819
p04820
p04821
p04822
p04823
p04824
p04825
p04826
p04827
p04828
p04829
p04830
p04831
p04832
p04833
p04834
p04835
p04836
p04837
p04838
p04839
p04840
p04841
p04842
p04843
p04844
p04845
p04846
p04847
p04848
p04849
p04850
p04851
p04852
p04853
p04854
p04855
p04856
p04857
p04858
p04859
p04860
p04861
p04862
p04863
p04864
p04865
p04866
p04867
p04868
p04869
p04870
p04871
p04872
p04873
p04874
p04875
p04876
p04877
p04878
p04879
p04880
p04881
p04882
p04883
p04884
p04885
p04886
p04887
p04888
p04889
p04890
p04891
p04892
p04893
p04894
p04895
p04896
p04897
p04898
p04899
p04900
p04901
p04902
p04903
p04904
p04905
p04906
p04907
p04908
p04909
p04910
p04911
p04912
p04913
p04914
p04915
p04916
p04917
p04918
p04919
p04920
p04921
p04922
p04923
p04924
p04925
p04926
p04927
p04928
p04929
p04930
p04931
p04932
p04933
p04934
p04935
p04936
p04937
p04938
p04939
p04940
p04941
p04942
p04943
p04944
p04945
p04946
p04947
p04948
p04949
p04950
p04951
p04952
p04953
p04954
p04955
p04956
p04957
p04958
p04959
p04960
p04961
p04962
p04963
p04964
p04965
p04966
p04967
p04968
p04969
p04970
p04971
p04972
p04973
p04974
p04975
p04976
p04977
p04978
p04979
p04980
p04981
p04982
p04983
p04984
p04985
p04986
p04987
p04988
p04989
p04990
p04991
p04992
p04993
p04994
p04995
p04996
p04997
p04998
p04999