Among the most frequent questions asked by visitors to PhysioNet are requests for data with specific combinations of characteristics. For example, "Which records include three or more ECG signals and a respiration signal, are at least two hours long, and are from male patients between the ages of 60 and 70?" This and many similar questions are readily answered using the PhysioBank Index. (See how to do this below.)
All records in PhysioBank that can be viewed by the PhysioBank ATM are indexed in the PhysioBank Index, a 73M text file (last updated Wednesday, 3 April 2013).
You can search the PhysioBank Index using a variety of methods:
If you wish to index your own PhysioBank-compatible records, the portable sources for the PhysioBank Index generator are available in the pbindex software package.
You do not need to download the Index in order to use the recommended PhysioBank Record Search. If you wish to another method for searching the Index, you may need to do so.
The Index can be downloaded using any of these commands:
wfdbcat physiobank-index >temp; mv temp physiobank-indexor
curl http://physionet.org/physiobank/database/physiobank-index >physiobank-indexor
rsync -avz physionet.org::physiobank-core/database/physiobank-index .
Don't attempt to load the Index directly in your web browser, since its large size will cause problems for most browsers! (wfdbcat is part of the WFDB Software Package, curl is available here, and rsync is here. All three are open-source and run on all popular platforms.)
Each line of the PhysioBank Index describes one signal, annotation file, or other feature of a single PhysioBank record; there are about 860,000 lines in the Index. All lines pertaining to any given record are consecutive, and the records appear in dictionary order. Here is a sample from the Index:
edb/e0103 Info1 Mixed angina edb/e0103 Info2 1-vessel disease (RCA) edb/e0103 Meds1 nitrates, diltiazem edb/e0103 Info3 Recorder type: ICR 7200 edb/e0103 AgeSex 62 M edb/e0103 ECG1 V4 250 200 adu/mV 7200 edb/e0103 ECG2 MLIII 250 200 adu/mV 7200 edb/e0103 AnnR1 atr 250 7336 7200 0-7200 edb/e0103 AnnR1 atr/(N 250 9 5243 edb/e0103 AnnR1 atr/N 250 7212 7199 0-7200 edb/e0103 AnnR1 atr/s 250 15 6098 859-6956 edb/e0103 AnnR1 atr/(VT 250 2 6 edb/e0103 AnnR1 atr/V 250 82 5069 1892-6961 edb/e0103 AnnR1 atr/(B 250 6 58 edb/e0103 AnnR1 atr/F 250 2 574 2132-2706 edb/e0103 AnnR1 atr/~ 250 8 3613 2682-6295
Each entry (line) of the Index contains up to seven tab-separated fields (columns) that describe a signal, annotation set, or feature associated with the record. For entries describing features (such as the first five lines in the example above), these columns are (from left to right):
In entries describing signal and annotation sets, the columns are (from left to right):
Some records have both annotated and unannotated segments; in these cases, the length of the annotation set is shorter than the length of the signals. In annotation subset entries, the duration and time interval reflect the times of the first and last annotations of the associated type, and are generally shorter than the entire annotation set.
In most cases, signals are present throughout, and the last column is omitted
in entries that describe such signals. The MIMIC II Waveform Database is an
exception to this rule, since many of its signals have been recorded in only a
subset of segments; in these cases, the lengths of the signals is less than
that of the entire record, and there may be more than one time interval shown.
As for the signal and annotation set lines, the first two columns are the
record name and class (data type). The first four feature lines shown above
illustrate diagnoses, medications, and two lines of free-text information;
the data appear in the third column. The final feature line contains the
age (in years) in the third column, and the sex (M, F, or ? in the fourth
column). If the subject's age is over 89, it is shown as 90 (since ages over
89 are protected health information); if the age was not recorded, it is
shown as -1.
Using PhysioBank Record Search to search the
PhysioBank Index
In this section, we'll answer the question at the top of this page using PhysioBank Record Search. Please follow along in your web browser. Note that your results may vary from those shown below if additional records have been added and indexed since this tutorial was written.
PhysioBank Record Search is controlled from your browser. You can open it from the PhysioNet menu button at the top left corner of most pages on PhysioNet (choose PhysioBank → PhysioBank Search). To follow this exercise, click here to open it in another browser tab or window. You should then be able to go back and forth between this page and the PhysioBank Record Search page.
The upper section of the page, below the PhysioBank Record Search heading, contains a form for composing simple queries, searches that are defined by a Subject, a Relationship, and a Value. Once you have performed any simple queries, the Results section opens beneath the upper section; it contains the results of your queries and additional controls for combining and manipulating them. Near the bottom of the page, on-line help for PhysioBank Record Search appears below the heading How to search for records in PhysioBank. Read the on-line help to familiarize yourself with the controls.
If you have used PhysioBank Record Search within the past week or so on this computer, you will see two buttons in the upper section of the page, labeled "Restore previous session" and "Discard previous results". Unless you wish to keep the results of your earlier searches, discard them before continuing.
To answer the example question ("Which records include three or more ECG signals and a respiration signal, are at least two hours long, and are from male patients between the ages of 60 and 70?"), our strategy will be to decompose it into simple queries, collect the results of those queries, and combine them. At each step, we'll be able to see how many PhysioBank records fit the simple queries.
Let's begin. The first simple query will find all records that contain three or more ECG signals. From the Subject menu, select '(#) ECG'. The notation '(#)' indicates that we can specify a minimum number of signals of this type (ECG) in the Name/# box below the Subject menu; type 3 into that box. From the Relationship menu, choose '?' ("defined"), since for this simple query, we only want to know if the 3 ECGs exist. It is unnecessary to enter anything in the Value box (if you do, it will be ignored since the Relationship is '?'). Once the query has been set up, click on Get List.
Immediately below the Results heading, you should now see a line that looks like this:
☐ A [25372] ECG-3 ?
"A" is the tag for the list of results of this query; the bracketed number (which, as noted above, may vary) is the number of PhysioBank records that match the criterion that you have just defined, and the criterion itself appears as a link ("ECG-3 ?"). Click the link (on the search page; the facsimile shown above is not a working link) to view the results if you wish (a list of 25372 record names, probably beginning with challenge/2009/test-set-a/101a/101a).
For the second simple query, let's find all records that include a respiration signal. We might construct a query such as 'Resp ?', but since we also want records that are at least two hours long, we can include that constraint as an element of this query. Select '(#) Respiration' as the Subject, '>=' as the Relationship, and type '2:0:0' (i.e., two hours) as the Value. Don't forget to erase the '3' that may be left in the Name/# box from the previous query; it's unnecessary (but harmless) to type a '1' in its place.
After clicking Get List, a second checkbox and result appears. New results appear above old ones, so you should now see this:
☐ B [38507] Resp >= 2:0:0
☐ A [25372] ECG-3 ?
There are many records that satisfy this criterion, too.
If you forgot to erase the '3' before clicking 'Get list', list B will be much shorter, since it will contain only records that include 3 or more respiration signals. Such records do exist (for example, some have thoracic and abdominal impedance plethysmograms and a simultaneously recorded nasal thermistor signal), but they are relatively uncommon.
Next, let's find records from male subjects ('sex = M'). By now, you may be becoming familiar with the steps of creating a simple query: select the subject, then the relationship, then type a value, and click Get List.
☐ C [9962] sex = M
☐ B [38507] Resp >= 2:0:0
☐ A [25372] ECG-3 ?
You may be surprised that list C has only 9662 records from male subjects, given that list B has about 4 times as many records. For many records, however, information about the subject's gender (and, as we shall see, age) is not available.
To restrict the search to a range of ages (60-70), we'll use two simple queries ('age >= 60' and 'age <= 70'). When you have run these, your results should look like this:
☐ E [10938] age <= 70
☐ D [10022] age >= 60
☐ C [9962] sex = M
☐ B [38507] Resp >= 2:0:0
☐ A [25372] ECG-3 ?
The final step is to combine all of these results. The 'And' button combines two or more selected lists, producing a new list containing those records that belong to all of its input lists. Select each of the five lists now by clicking on its checkbox, then click 'And' to generate list F:
☐ F [131] E ∩ D ∩ C ∩ B ∩ A
☐ E [10938] age <= 70
☐ D [10022] age >= 60
☐ C [9962] sex = M
☐ B [38507] Resp >= 2:0:0
☐ A [25372] ECG-3 ?
The final list contains 131 records, all from the 'mimic2wdb/matched/' data collection (the MIMIC II Waveform Database Matched Subset). You can view or download the list of records if you wish by clicking on the link next to F, or you can examine the records using the PhysioBank ATM. If you wish to do this, select list F by clicking its checkbox, then click on 'Choose'. The PhysioBank ATM will appear in place of the PhysioBank Record Search page, and the first record belonging to list F will be preselected as input.
If you are unfamiliar with the ATM, read its on-line help (visible below the How to use the PhysioBank ATM heading), then click '*' (in the ATM control panel under Navigation) to dismiss the help and display the first 10 seconds of the first record. You can use any of the ATM's controls to examine the record as you wish; when you are ready, click on 'Next record' to view the next record in list F.
The '+' and '-' buttons in the ATM control panel are active only while you are examining search results. Use them to mark individual records of interest. When you click on either of them, a '+' or '-' appears after the record's name in the ATM's Record menu (control panel, upper left). Mark at least one record now.
After reviewing as many records as you wish, return to the search page (for example, by clicking the PhysioBank Record Search button in the ATM's page header). When you do, you will see that one or two new lists have been created below list F. These lists, tagged as F+ and F-, contain only those records that you have marked using the ATM's '+' and '-' buttons. You can select list F again and click 'Choose' to return to the ATM if you wish to mark additional records, and these lists will be updated as you do so. Although lists F+ and F- are described as "accepted from F" and "rejected from F", you can use these lists in any way you wish. Note, however, that a list made by combining them with other lists will not update itself automatically if you make changes in them later on; to get updated results, repeat the actions you used to combine the lists initially.
If you don't need to refer back to a list, select it and click on Erase to discard it. This helps to avoid confusion if you use PhysioBank Record Search frequently.
Your results are retained for about a week; after that, they will be removed.
Download them if you wish to keep them (especially if you have invested effort
in choosing individual records using the ATM). When you return, your previous
results are identified using a browser cookie (pbs_id); if you use more than
one browser, or more than one computer, you will have different cookies (hence
different sets of results) for each one unless you synchronize your cookies.
Using standard command-line utilities to search the PhysioBank Index
Begin by downloading the Index using any of the methods above. Open a terminal emulator window and navigate to the directory in which you saved physiobank-index.
There are five records in PhysioBank that include a left ventricular stroke volume signal, which is labelled SV. Finding them is simple: type
grep SV physiobank-indexand the results appear in your terminal window quickly:
edb/e0112 AnnR1 atr/(SVTA 250 16 49 edb/e0114 AnnR1 atr/(SVTA 250 3 9 edb/e0607 AnnR1 atr/(SVTA 250 1 2 edb/e1304 AnnR1 atr/(SVTA 250 2 3 incartdb/I33 Info2 PVCs, SVEBs, supraventricular couplets incartdb/I34 Info2 SVEBs, couplets, paroxysmal supraventricular tachycardia with aberrated QRS incartdb/I70 Info2 SVEBs with wide QRS, WPW incartdb/I73 Info2 PVCs on bradycardia, SVEBs, couplets mghdb/mgh050 Info50 34 min VT vs. SVT with ABERRANCY mghdb/mgh208 Info10 Runs of VT/SVT mitdb/114 AnnR1 atr/(SVTA 360 1 5 mitdb/201 AnnR1 atr/(SVTA 360 1 2 mitdb/207 Info7 episode of SVTA. mitdb/207 AnnR1 atr/(SVTA 360 1 51 mitdb/209 AnnR1 atr/(SVTA 360 10 102 mitdb/220 AnnR1 atr/(SVTA 360 8 14 mitdb/222 AnnR1 atr/(SVTA 360 4 8 mitdb/234 AnnR1 atr/(SVTA 360 1 26 qtdb/sel114 AnnR6 atr/(SVTA 250 1 5 slpdb/slp59 SV1 SV 250 7.93846 adu/ml 14400 slpdb/slp60 SV1 SV 250 7.90293 adu/ml 21300 slpdb/slp61 SV1 SV 250 958.995 adu/cc 22200 slpdb/slp66 SV1 SV 250 9.957 adu/ml 13200 slpdb/slp67x SV1 SV 250 5.25615 adu/ml 4620 vfdb/426 AnnR1 atr/(SVTA 250 3 81 vfdb/607 AnnR1 atr/(SVTA 250 2 26 vfdb/611 AnnM1 qrs/(SVTA 250 16 49 vfdb/611 AnnR2 atr/(SVTA 250 1 0 vfdb/611 AnnR3 qrsc/(SVTA 250 16 49Most of these results are records containing supraventricular tacharrhythmias (annotated as '(SVTA'), and others contain SV in comments. These are easily ignored, but it's also possible to improve the search using either
grep $'\tSV\t' physiobank-index(if you are using the bash shell), or
grep -P '\tSV\t' physiobank-index(if you are using GNU grep). Either of these commands interprets the sequence \t as a tab character in the search pattern, so that the results contain only lines from the index in which SV appears surrounded by tabs (i.e., in a column by itself):
slpdb/slp59 SV1 SV 250 7.93846 adu/ml 14400 slpdb/slp60 SV1 SV 250 7.90293 adu/ml 21300 slpdb/slp61 SV1 SV 250 958.995 adu/cc 22200 slpdb/slp66 SV1 SV 250 9.957 adu/ml 13200 slpdb/slp67x SV1 SV 250 5.25615 adu/ml 4620
|
If we want to find records that have at least 3 ECG signals, we can look for ECG3: grep ECG3 physiobank-indexThis results in a very long list of records that quickly scrolls off the screen. If we want to know how long the list is, we can use wc to count the lines: grep ECG3 physiobank-index | wc -l(The pipe symbol, '|', connects a pair of commands; it means "take the standard output of the command on the left and feed it to the standard input of the command on the right".) When this page was written, there were 25,372 recordings with at least 3 ECG signals in PhysioBank. We can save the entire list by redirecting the standard output into a file, like this: grep ECG3 physiobank-index >ECG3-recordsThe '>' collects the standard output of the command, which would otherwise be shown in the terminal window, and saves it in a file (ECG3-records). |
Getting (re)acquainted with the command lineIf you've ever used any version of Unix, or even MS-DOS, the examples on this page may look familiar. If not, consult any introductory book or on-line tutorial about Unix or GNU/Linux. Here are a few places to start:
|
Suppose what we really want are the longest such recordings. Here's how to find the 3 longest cases:
grep ECG3 physiobank-index | cut -f 1,6 | sort -nr -k2 | head -3(This command uses pipes to chain four commands together, each one reading the output of the previous one; cut selects the first and sixth fields — the record name and the duration — from each line output by grep; sort rearranges the lines in reverse numerical order of the second field output by cut; and head discards all but the first three lines output by sort.) The output lists 3 recordings, each containing over 400 hours of ECG3:
mimic2wdb/31/3101148/ 2299827 mimic2wdb/37/3752730/ 1845755 mimic2wdb/32/3212213/ 1842540There is a caveat, however: these recordings are all from the MIMIC II database, and the signals are not necessarily continuous; in fact, they may not even be simultaneously available. To find a set of long records with at least 3 continuous, simultaneous ECG signals, we can exclude the MIMIC databases and the similar Challenge 2009 database from the search:
grep ECG3 physiobank-index | \ egrep -i -v "mimic|challenge/2009" | \ cut -f 1,6 | sort -nr -k2 | head -3(Here the \ characters indicate the command continues on the following line.) The results are:
ltstdb/s30691 85860 ltstdb/s30731 85845 ltstdb/s30801 85821
These examples illustrate the flexibility of using standard command-line tools
to search within the PhysioBank Index. If these tools are already familiar,
it's easy to perform much more complex searches, including many that would be
very difficult to perform using a relational database and SQL.
How many records are in the index?
As of January 2012, over 36,000 record sets from over 50 collections are included in the PhysioBank Index. Many record sets include two or more records, and some records belong to more than one collection, so the number of record names in the index is nearly 73,000.
The MIMIC Database and the MIMIC II Waveform Database consist of record sets (pairs of records acquired simultaneously from each subject: a waveform record of signals sampled at 125 or 500 Hz, and a numerics record of vital signs sampled once per second or once per minute).
Records (or excerpts of records) may belong to more than one data collection:
Each file listed below contains an index for a PhysioBank data collection (or part of one). These files are concatenated to form the PhysioBank Index.
Name Last modified Size Description
Parent Directory -
ucddb 03-Mar-2012 08:19 22K UCD Sleep Apnea Database
tpehgdb 25-Feb-2012 03:38 317K Term-Preterm EHG Database
twadb 13-Feb-2012 21:06 53K T-Wave Alternans Challenge Database
sddb 13-Feb-2012 21:04 17K Sudden Cardiac Death Holter Database
drivedb 13-Feb-2012 21:04 5.4K Stress Recognition in Automobile Drivers
incartdb 13-Feb-2012 21:04 60K St Petersburg INCART 12-lead Arrhythmia Database
mvtdb_data 25-Feb-2012 03:37 55K Spontaneous Ventricular Tachyarrhythmia Database
sleep-edf 25-Feb-2012 03:37 3.6K Sleep-EDF Database
shhpsgdb 13-Feb-2012 21:04 1.0K Sleep Heart Health Study Polysomnography Database
excluded 14-Feb-2012 18:53 1.0K Recordings excluded from nsrdb
qtdb 14-Feb-2012 18:53 178K QT Database
szdb 13-Feb-2012 21:02 2.4K Post-Ictal Heart Rate Oscillations in Partial Epilepsy
ptbdb 13-Feb-2012 21:03 2.1M PTB Diagnostic ECG Database
afpdb 25-Feb-2012 03:37 72K PAF Prediction Challenge Database
nsr2db 13-Feb-2012 21:01 14K Normal Sinus Rhythm RR Interval Database
nifecgdb 13-Feb-2012 21:01 33K Non-Invasive Fetal ECG Database
svdb 14-Feb-2012 18:53 23K MIT-BIH Supraventricular Arrhythmia Database
stdb 13-Feb-2012 21:01 5.0K MIT-BIH ST Change Database
slpdb 13-Feb-2012 21:01 10K MIT-BIH Polysomnographic Database
capslpdb 26-Jul-2012 01:58 122K MIT-BIH Polysomnographic Database
nsrdb 14-Feb-2012 18:53 6.5K MIT-BIH Normal Sinus Rhythm Database
nstdb 14-Feb-2012 18:53 5.6K MIT-BIH Noise Stress Test Database
vfdb 13-Feb-2012 21:00 5.9K MIT-BIH Malignant Ventricular Ectopy Database
ltdb 14-Feb-2012 18:52 2.7K MIT-BIH Long-Term ECG Database
cdb 13-Feb-2012 21:00 13K MIT-BIH ECG Compression Test Database
afdb 14-Feb-2012 18:52 9.7K MIT-BIH Atrial Fibrillation Database
mitdb 13-Feb-2012 21:00 28K MIT-BIH Arrhythmia Database
mimic2wdb_39 23-Mar-2012 03:47 4.4M MIMIC II Waveform Database, version 3 part 9
mimic2wdb_38 23-Mar-2012 03:27 4.3M MIMIC II Waveform Database, version 3 part 8
mimic2wdb_37 23-Mar-2012 03:08 4.2M MIMIC II Waveform Database, version 3 part 7
mimic2wdb_36 23-Mar-2012 02:48 4.1M MIMIC II Waveform Database, version 3 part 6
mimic2wdb_35 23-Mar-2012 02:30 4.4M MIMIC II Waveform Database, version 3 part 5
mimic2wdb_34 23-Mar-2012 02:09 4.2M MIMIC II Waveform Database, version 3 part 4
mimic2wdb_33 23-Mar-2012 01:50 4.3M MIMIC II Waveform Database, version 3 part 3
mimic2wdb_32 23-Mar-2012 01:30 4.0M MIMIC II Waveform Database, version 3 part 2
mimic2wdb_31 23-Mar-2012 01:11 4.5M MIMIC II Waveform Database, version 3 part 1
mimic2wdb_30 23-Mar-2012 00:49 4.5M MIMIC II Waveform Database, version 3 part 0
mimic2wdb_matched 27-Feb-2012 00:29 8.5M MIMIC II Waveform Database Matched Subset
mimic2db 27-Feb-2012 00:56 4.8M MIMIC II Waveform DB, v2 [deprecated, use v3]
mimic2db_numerics 27-Feb-2012 01:15 3.5M MIMIC II Waveform DB, v2 Numerics [deprecated, use v3]
mimic2cdb 13-Feb-2012 16:59 7.9K MIMIC II Clinical Database Public Subset
mimicdb_numerics 15-May-2012 08:16 97K MIMIC Database Numerics
mimicdb 13-Feb-2012 16:58 64K MIMIC Database
mghdb 13-Feb-2012 16:57 513K MGH/MF Waveform Database
ltstdb 13-Feb-2012 16:57 272K Long Term ST Database
ltafdb 25-Feb-2012 03:36 27K Long Term AF Database
iafdb 13-Feb-2012 16:56 23K Intracardiac Atrial Fibrillation Database
meditation_data 25-Feb-2012 03:36 13K Heart Rate Oscillations during Meditation
gaitndd 13-Feb-2012 16:56 6.3K Gait in Neurodegenerative Disease Database
gaitdb 13-Feb-2012 16:55 1.3K Gait in Aging and Disease Database
gait-maturation-db_data 26-Feb-2012 22:31 5.8K Gait Maturation Database
fantasia 13-Feb-2012 16:55 14K Fantasia Database
emgdb 13-Feb-2012 16:55 1.1K Examples of Electromyograms
earndb 25-Feb-2012 03:35 702K Evoked Auditory Responses in Normals
edb 13-Feb-2012 16:51 50K European ST-T Database
eegmmidb 13-Feb-2012 16:55 5.2M EEG Motor Movement/Imagery Dataset
chf2db 13-Feb-2012 16:41 8.8K Congestive Heart Failure RR Interval Database
challenge_2011_set-a 13-Feb-2012 16:37 690K Challenge 2011 Training Set A
challenge_2011_set-b 13-Feb-2012 16:39 345K Challenge 2011 Test Set B
challenge_2011_sim 25-Feb-2012 03:28 24K Challenge 2011 Pilot Set
challenge_2010_set-a 13-Feb-2012 16:34 36K Challenge 2010 Training Set A
challenge_2010_set-c 13-Feb-2012 16:35 38K Challenge 2010 Test Set C
challenge_2010_set-b 13-Feb-2012 16:35 38K Challenge 2010 Test Set B
challenge_2009_test-set-b 26-Feb-2012 20:56 340K Challenge 2009 Test Set B
challenge_2009_test-set-a 26-Feb-2012 20:55 79K Challenge 2009 Test Set A
cudb 13-Feb-2012 16:45 7.4K CU Ventricular Tachyarrhythmia Database
chbmit 26-Feb-2012 20:58 1.1M CHB-MIT Scalp EEG Database
crisdb 13-Feb-2012 16:45 923K CAST RR Interval Sub-Study Database
bpssrat 26-Jun-2012 14:55 2.7K Blood Pressure in Salt-Sensitive Dahl Rats
chfdb 14-Feb-2012 18:52 6.6K BIDMC Congestive Heart Failure Database
apnea-ecg 13-Feb-2012 16:33 36K Apnea-ECG Database
aami-ec13 13-Feb-2012 16:33 1.0K ANSI/AAMI EC13 Test Waveforms
ahadb 26-Feb-2012 20:50 438 AHA Database [sample excluded record]
aftdb 13-Feb-2012 16:33 24K AF Termination Challenge Database
ob1db 01-Aug-2012 01:40 288
challenge_2013_set-b 03-Apr-2013 00:10 22K
challenge_2013_set-a 03-Apr-2013 00:10 17K
adfecgdb 13-Dec-2012 00:11 2.6K
| Send feedback about this page to PhysioNet |
|
Your comments and suggestions are welcome. We encourage you to use our
feedback form to comment
on this page. If you would like to receive a reply, please send your
comments by email to
webmaster@physionet.org.
Updated Friday, 16 December 2011 at 01:34 EST |
|