If you have digital recordings of signals or time series, perhaps with annotations, that you would like to study using PhysioToolkit software such as that in the WFDB software package, or that you would like to contribute to PhysioBank, the information on this page should get you started on creating PhysioBank-compatible records from your data.
Many data formats are WFDB-compatible; there is no single "WFDB format". This tutorial will help you determine if your data are already in a WFDB-compatible format, and to choose a suitable WFDB-compatible format if they are not.
If you haven't done so already, install the WFDB software package before continuing.
The basic component of a PhysioBank data set is a record, which consists of data that describe a single subject, simulation, or experimental run. Typically, a record contains one or more signals and one or more sets of annotations, together with header information (described below).
In this context, a signal is a time series of measured or calculated samples separated by uniform time intervals (sampling intervals). In PhysioBank-compatible records, samples are represented as 8-, 10-, 12-, 16-, 24-, or 32-bit integers.*. The accompanying header information provides, among much else, the parameters needed to convert the dimensionless integer samples into calibrated physical quantities (such as blood pressures in mmHg, etc.). The sampling frequency of a signal is the number of sampling intervals per second (which may be less than one for infrequently-sampled signals). In most cases, all signals belonging to a record are sampled at the same sampling frequency. If this is not true, then a frame interval must be defined, generally as the least common multiple of the various sampling intervals used in a record, and the frame frequency is the number of frame intervals per second.
Also in this context, an annotation is a label that "points" to a specific sampling interval (or frame interval) in the record (and optionally to a specific signal as well). Each annotation can have a small number of numeric attributes, as well as either a string or a URL, associated with it. Annotations are commonly used in PhysioBank databases to label heart beats, and to record observations and events that do not take place at uniform intervals.
Many medical device manufacturers have either adopted PhysioBank-compatible formats natively, or provide a means of exporting their proprietary data into a PhysioBank-compatible format. Sometimes the term "MIT format" is used to describe a PhysioBank-compatible format. European Data Format (EDF), used widely to store unannotated data, differs from the formats most often used in PhysioBank in that it does not make use of external header files, but it is fully PhysioBank-compatible. The newer EDF+, which is a variant of EDF that incorporates a limited capability for storing annotations, is mostly PhysioBank-compatible (but see this note about EDF+ annotations).
If you have records that include .hea or .edf files, verify that they are PhysioBank-compatible by trying to read them with wfdbdesc, rdsamp, and (if you have annotation files) rdann. Record names (needed by these WFDB applications to specify their inputs) never include .hea, but they do include .edf when reading EDF files.
Most files in common binary formats that use fixed-length samples for storing digitized signals (including many that, like EDF and EDF+, contain embedded metadata at the beginning of the file) are also PhysioBank-compatible signal files. If your data are already in such a format, it may be sufficient to create a header file and, if applicable, an annotation file for each record. See signal(5) for details of supported signal file formats, and header(5) for complete specifications of header file format, with examples.
Unlike records in relational databases, each PhysioBank-compatible record is stored in its own files. The files belonging to any given record share a record name (the initial part of the file names), and are distinguished by suffixes. For example, record 100 of the MIT-BIH Arrhythmia Database consists of three files, named 100.hea, 100.dat, and 100.atr.
WFDB-compatible records generally contain three types of files, although two of them are optional:
If you don't already have PhysioBank-compatible records, an easy way to make them from the data you have is to begin by creating a CSV file containing one sample of each signal per line, as in this example consisting of samples of two ECG signals:
927,998 927,1017 939,1034 958,1048 980,1064 1010,1086 1048,1111 1099,1131 1148,1140 1180,1119 1192,1066 1177,1007 1128,978 1058,974 991,981 951,988 937,987 939,992 950,994 958,994If you have written your data in this format to a CSV file named foo.csv, create foo.hea and foo.dat using this command:
wrsamp -F freq -i foo.csv -o foo -s, 0 1replacing freq by the sampling frequency of your signals. The final command line arguments (0 and 1 in the example) specify the columns of the input file that should be written as signals to the output; column 0 is the leftmost, 1 the next, etc. Columns can be omitted, reordered, or duplicated as desired. See wrsamp for details and additional options that can be used if your samples are not 16-bit integers.
Support for 24- and 32-bit integer samples was introduced in WFDB version 10.5.0 (March 2010). Previous versions were limited to resolutions of 16 bits or fewer.
Edit the .hea file using any text editor of your choice to insert signal names and physical units, and calibration parameters. For records to be contributed to PhysioBank, please add, at the end of the file, an info string (a comment line beginning with '#') that describes (at a minimum) the age, gender, diagnoses, and medications of the subject (other information that does not identify the subject is also welcome). Example:
# <age>: 35 <sex>: M <diagnoses>: (none) <medications>: (none)Please use this format to permit indexing software to parse this information reliably. This string may extend over multiple lines if necessary, but begin each such line with '#'.
If your records include beat labels or other non-periodic observations, they can be stored in annotation files. The easiest way to do this is to put your non-periodic information into the text format produced by rdann; text in this format can be converted into PhysioBank-compatible annotation files using wrann.
EDF+ files are, as noted above, mostly compatible with WFDB applications. Current versions of the WFDB library do not read EDF+ annotations directly, however; it is necessary to extract them from the EDF+ file and rewrite them into a conventional PhysioBank-compatible annotation file in order to read them with WFDB applications. This can be done easily using rdedfann(1) and wrann(1).