ECG-ID Database 1.0.0

File: <base>/biometric.shtml (36,637 bytes)
<!--#set var="TITLE" value="Biometric Human Identification based on ECG"-->
<!--#include virtual="/head.shtml"-->

<p align="center"><b>Tatiana S. Lugovaya</b></p>

<div class="notice">
<p>
This material originally appeared in:

<blockquote>
Lugovaya T.S. Biometric human identification based on electrocardiogram.
[Master's thesis] Faculty of Computing Technologies and Informatics,
Electrotechnical University "LETI",  Saint-Petersburg, Russian Federation;
June 2005.
</blockquote>

<blockquote>
Nemirko A.P., Lugovaya T.S. Biometric human identification based on
electrocardiogram. Proc. XII-th Russian Conference on Mathematical Methods of
Pattern Recognition, Moscow, MAKS Press, 2005, pp. 387-390. ISBN 5-317-01445-X.
<br>
PDF copy in Russian is available <a href="http://www.mmro.ru/files/2005-mmro-12.pdf">here</a>.
</blockquote>

Original titles:

<blockquote>
Луговая Т.С. Биометрическая идентификация личности по электрокардиограмме:
выпускная квалификационная работа магистра. Санкт-Петербургский государственный
электротехнический университет "ЛЭТИ", Факультет компьютерных технологий и
информатики (ФКТИ), Кафедра математического обеспечения и применения ЭВМ (МО ЭВМ);
Июнь 2005.
</blockquote>

<blockquote>
Немирко А.П., Луговая Т.С. Биометрическая идентификация личности по
электрокардиограмме. Математические методы распознавания образов: 12-я Всероссийская
конференция: Сборник докладов. - М.:МАКС Пресс, 2005. - с. 387-390. ISBN 5-317-01445-X.
<br>
Электронная версия сборника доступна <a href="http://www.mmro.ru/files/2005-mmro-12.pdf">здесь</a>.
</blockquote>

</p>
</div>

<br>

<a name="abstract">
<h2>Abstract</h2></a>

<p><i> This research investigates the possibility of biometric human
identification based on the electrocardiogram (ECG).  The ECG, being a record
of electrical currents generated by the beating heart, is potentially a
distinctive human characteristic, since ECG waveforms and other properties of
the ECG depend on the anatomic features of the human heart and body.
Experimental studies involved 90 volunteers. Heart rate, physical and emotional
state were not limited. For usability, Lead I was chosen, since it is easily
measured and it is not sensitive to minor variations in electrode
locations. ECG fragments containing QRS complexes, P and T waves extracted from
the ECG were processed by Principal Component Analysis and classified using
Linear Discriminant Analysis and a Majority Vote Classifier. Using this method
on a predetermined group of 90 individuals, the experimental results showed
that the rate of correct identification was 96%. The findings support using the
ECG as a new biometric characteristic in various biometric access control
applications.
</i></p>

<a name="introduction">
<h2>Introduction</h2></a>

<p>
Biometric technologies are among fast-developing fields of information
security, gradually entering into all spheres of human activity. Today only
three biometric methods have proved their efficiency, namely, identification
based on fingerprints, iris or retina, and face. Hand geometry, voice, writing
and typing dynamics, etc. are also useful, depending on the purpose and range
of application.
</p>

<p>
This research aims to develop an identification system based on the ECG (figure
1).  Electrical currents that are generated by the heart as it beats spread not
only within the heart, but also throughout the body. Therefore, shapes of the
ECG waveforms depend on human heart and body anatomic features. Thus one may
consider ECG as a human biometric characteristic, as previously considered in
[1, 2, 3]. This study suggests data interpretation and classification methods
not used in these previous studies.  The system developed in this study was
tested on a larger set of live input data than used in previous studies, with a
larger number of subjects, and the data from this study have been made
available <a href="./">here</a> to allow objective comparisons to be made with
other methods. The respective studies are compared below (table 1).

<center>
<img src="images/ecg.png" alt="[ECG with standard notations]">
<br>
<b>Figure 1.</b> Example of ECG with standard notations.
</center>
</p>

<a name="biometric">
<h2>ECG as a biometric characteristic</h2></a>

<p>
To determine the potential use of ECG as a biometric, it is necessary to
evaluate how ECG satisfies the requirements for biometric characteristics.
</p>

<p>
A "perfect" biometric characteristic should be:
<ul>
<li type="circle"> universal, i.e., each individual possesses this
characteristic,</li>
<li type="circle"> easily measured, i.e., it is quite easy technically and
convenient for an individual to obtain the characteristic,</li>
<li type="circle"> unique, i.e., there are no two individuals with identical
characteristics, and</li>
<li type="circle">permanent, i.e., the characteristic does not change over
time.</li>
</ul>

"Good" biometric characteristics can to a greater or lesser extent satisfy these
requirements, depending on the purpose and application of biometric system.
</p>

<p>
The ECG is a universal characteristic, as the heart beat is a necessary sign of
life, and it can be recorded with minimum inconvenience to the individual. To
evaluate its uniqueness, and especially persistence, is much more difficult.
The following are just some empirical arguments in support of ECG as a
biometric.
</p>

<p>
It is plausible to assume that an ECG is an almost unique human characteristic
because morphology and amplitudes of recorded cardiac complexes are governed by
multiple individual factors, in particular by the shape and position of the
heart, and the presence and nature of pathologies, among other factors.  As a
result, QRS complexes have a variety of configurations and metrics (figure 2).

<center>
<table>
<tr align="center" valign="middle">
<td>
<img src="images/ecg_biometric_configurations.png" alt="[QRS configurations]">
</td>
<td width=30></td>
<td>
<table border=1>
<tr align="center"><td><b>Wave</b></td><td><b>Amplitude, mm</b></td><td><b>Duration, s</b></td></tr>
<tr align="center"><td>P</td><td>0 - 2.5</td><td>0.08 - 0.10</td></tr>
<tr align="center"><td>Q</td><td>0 - 3</td><td rowspan=3>0.06 - 0.10</td></tr>
<tr align="center"><td>R</td><td>6 - 21</td></tr>
<tr align="center"><td>S</td><td>0 - 6</td></tr>
<tr align="center"><td>T</td><td>0 - 5</td><td>0.10 - 0.25</td></tr>
</table>
</td>
</tr>
</table>

<b>Figure 2.</b> Schematic representation of various QRS complex configurations
and normal ranges of wave amplitudes and durations.
</center>
</p>

<p>
The most obvious factor of the ECG variability is the heart rate variability.
At rest, the heart does 60-80 beats per minute. During physical activity, or
under conditions of stress or excitement, heart rate may increase to 200 beats
per minute.  Undoubtedly, such heart rate increases have some influence on the
waveforms of the cardiac cycle, as its duration is reduced by 2-3 times.  Heart
rate increases mainly cause shortening of diastole duration (the baseline
fragment of the ECG) and the ventricular depolarization period (corresponding
to the ST segment in the ECG) and attenuation of the R wave amplitude.  The
duration of the QRS complex doesn't change significantly with heart rate,
however (figure 3). In clinical cardiology, the dependence of QT interval
duration of the heart rate is investigated for diagnostic purposes, and there
are several formulas (of which Bazett's, Fridericia's, and the Framingham Heart
Study's are most widely used) reflecting the dependence of QT intervals on
cardiac cycle length (RR intervals).  These empirically derived relationships
presumably can be used for ECG normalization, i.e. scaling of cardiac cycle to
normal heart rate.

<center>
<img src="images/ecg_biometric_heart_rate.png" alt="[ECG variations with different heart rates]">
<br>
<b>Figure 3.</b> Variation of ECG of one individual with different heart rates.
</center>
</p>


<p>
Another equally important question is the persistence of an individual's ECG
characteristics over time. Since the shape of ECG waveforms is determined
primarily by human anatomical features, the natural variability of which is
measured in years, it can be assumed that the ECG signal will have slow and
gradual variations (at least in individuals who do not experience traumatic
events such as myocardial infarctions that alter relevant anatomical features
between successive observations).  To evaluate the tendency of ECG variability
over time the set of ECG records of one individual was collected during the
study. For each record the acquisition procedure including electrodes attaching
was repeated from the beginning. Figure 4 shows variability of ECG cycles
recorded within one hour (figure 4.A) and within six months (figure 4.B). On
the figures it is seen that ECG cycles have no significant qualitative or
quantitative variations over time, and the configuration of QRS complex and the
ratio of amplitudes remain stable. Variations within an hour are almost
identical with variations within six months. Observed ECG variations are more
likely caused by variations in acquisition procedure and filter distortions. Of
course, this is only the result of empirical observations and six months is too
short a period to judge the variability of the ECG over much longer periods,
but these results support the hypothesis of a slow variation. It is unlikely
that there is a functional relationship that can correct for changes in the ECG
depending on the time interval between observations (although there have
been many clinical studies that report systematic age-related changes in
specific ECG characteristics such as heart rate variability).  Given this, it
is likely that periodic updates of the training set records and classifier
retraining will be needed as components of an identification system.

<center>
<table>
<tr>
<td align=center><img src="images/ecg_biometric_over_day.png" alt="[ECG variations within one hour]"><br>A. within one hour</td>
<td align=center><img src="images/ecg_biometric_over_time.png" alt="[ECG variations within six months]"><br>B. within six months</td>
</tr>
</table>

<b>Figure 4.</b> Variation of ECG of one individual recorded at different times.
</center>
</p>

<p>
Of course, there are many other natural and artificial, intentional and
unintentional causes of ECG variability. For example, taking certain medications
may temporarily change the configuration of the cardiac cycle. Some pathologies
over time gradually change the form of the cardiac cycle. Moreover, human
actions during the ECG recording can significantly distort or change the signal.
Thus, an ECG identification system will face a variety of challenges that are
similar to those posed by various attacks on other types of biometric systems.
</p>

<a name="system">
<h2>Identification system synthesis</h2></a>

<p>
The identification system uses a classical scheme including data acquisition,
data preprocessing, formation of input feature space, transition to reduced
feature space, cardiac cycle classification, and ECG record identification
(figure 5).

<center>
<img src="images/system_synthesis.png" alt="[Identification system structure]">
<br><b>Figure 5.</b> Identification system structure.
</center>
</p>

<p>
The generic system structure (figure 5, left) shows the sequence of essential
data processing stages.  Feedforward links show processed data transfer between
stages. The output of one stage is the input to the subsequent stage. Each
stage can be implemented using different processing methods.  The detailed
system structure (figure 5, right) shows methods considered in this study for
each system stage. For most stages, these methods are alternatives, but the
data preprocessing stage is usually comprised of several complementary methods.
</p>

<p>
Identification system synthesis is a process of selection between alternative
methods, determination of composition and sequence of complementary methods,
and adjustment of parameters for all methods in such a way as to obtain the
best results. It is almost impossible to perform an exhaustive search. In this
study, a strategy of result-directed backtracking was applied. Multiple
feedback links in the detailed structure scheme show that results obtained on
one stage can influence decisions made in previous stages. In other words,
usefulness of some changes in data processing on one stage can be evaluated by
comparing the results of one of the following stages obtained before and after
these changes.  This empirical technique doesn't guarantee optimal results, but
it yields more or less reasonable method selections without requiring excessive
computation.
</p>

<a name="acquisition">
<h3>Data acquisition</h3></a>

<p>
Experimental studies involved 90 volunteers. ECG records were made in
the sitting position. Heart rate, physical and emotional state were not
limited.
</p> 

<p>
For usability, it is necessary to be able to collect the ECG easily and
quickly.  The procedure for ECG acquisition should be convenient for
individuals and should require interaction with a minimal set of equipment;
therefore it was decided to use single-lead ECG.  Since single-lead ECGs vary
significantly within an individual depending on the lead (the locations of the
electrodes used to observe the ECG), the choice of lead is important. Lead I is
the potential difference between the left and right hands (LA - RA).  It was
chosen because it is easily measured and it is not sensitive to minor
variations in electrode locations. Limb clamp electrodes were used. This type
of skin-electrode contact closely imitates the likely scenarios of user
interaction with a practical identification system.
</p>

<p>
The data collected for this study comprise the <a href="./">ECG-ID Database</a>,
consisting of 310 I-lead ECG recordings from 90 individuals, each 20 seconds
long, sampled at 500 Hz with 12-bit precision.
</p>

<p>
To train and test the identification system, the collected data were divided
into two sets. 195 records were assigned to the training set and 115 records to
the test set.  Differentiation between the training and test sets aimed to
provide for maximum performance complexity, i.e., maximum difference between
records in different sets both in monitoring time and human physical state.
</p>

<a name="preprocessing">
<h3>Data preprocessing</h3></a>

<p> The raw ECG is often rather noisy and contains distortions of various
origins.  Nevertheless it was decided to turn off all filters provided by
cardiograph software, because it was unclear if they can suppress informative
features essential for identification. Examples of ECG with different noise
components are presented in figure 6. For each case possible filtering methods
are proposed.

<center>
<table border=0 cellpadding=5>
<tr>
<td align=center><img src="images/ecg_noise_drift.png" alt="[isoline drift]">
	<br><img src="images/ecg_noise_drift_opers.png" alt="[isoline drift filtering methods]">
	<br>A. ECG with baseline drift</td>
<td align=center><img src="images/ecg_noise_net.png" alt="[power-line noise]">
	<br><img src="images/ecg_noise_net_opers.png" alt="[power-line noise filtering methods]">
	<br>B. ECG with power-line noise</td>
</tr>
<tr>
<td align=center><img src="images/ecg_noise_high.png" alt="[high-frequency noise]">
	<br><img src="images/ecg_noise_high_opers.png" alt="[high-frequency noise filtering methods]">
	<br>C. ECG with high-frequency noise</td>
<td align=center><img src="images/ecg_noise_both.png" alt="[power-line and high-frequency noise]">
	<br><img src="images/ecg_noise_both_opers.png" alt="[power-line and high-frequency noise filtering methods]">
	<br>D. ECG with both power-line and high-frequency noise</td>
</tr>
</table>

<b>Figure 6.</b> Examples of noisy ECG and corresponding filtering methods.
</center>
</p>

<p>
Visual analysis of noisy ECG shows that the preprocessing stage should perform
three major tasks: baseline drift correction, frequency-selective filtering and
signal enhancement. As a result of a series of experiments, the following
combination of methods was selected for the preprocessing stage (figure 7).

<center>
<img src="images/system_preprocessing.png" alt="[Preprocessing methods]">
<br>
<table border=0>
<tr>
<td><img src="images/oper_Wavelet_Drift_Correction.png" alt="[Operator: Wavelet Drift Correction]"></td>
<td>Wavelet decomposition: wname = 'db8', N = 9.</td>
</td>
</tr>
<tr>
<td><img src="images/oper_Adaptive_Bandstop_Filter.png" alt="[Operator: Adaptive Bandstop Filter]"></td>
<td>Ws = 50 Hz, dA = 1.5.</td>
</tr>
<tr>
<td><img src="images/oper_Lowpass_Filter.png" alt="[Operator: Lowpass Filter]"></td>
<td>Butterworth filter, Wp = 40 Hz, Ws = 60 Hz, Rp = 0.1 dB, Rs = 30 dB.</td>
</tr>
<tr>
<td><img src="images/oper_Smoothing.png" alt="[Operator: Smoothing]"></td>
<td>N = 5.</td>
</tr>
</table>

<b>Figure 7.</b> Combination of methods for data preprocessing stage.
</center>
</p>

<p>
Baseline drift correction was implemented using multilevel one-dimensional
wavelet analysis.  The original signal was decomposed at level 9 using
biorthogonal wavelets.  The signal reconstructed using the final approximation
coefficients is assumed to be the drifting baseline, which is subtracted from
the original signal (figure 8). This method shows good results in both cases of
clear and rather noisy ECG signals.

<center>
<table>
<tr>
<td><img src="images/wavelet_drift_correction.png" alt="[ECG baseline drift correction 1]"></td>
<td><img src="images/wavelet_drift_correction_noisy.png" alt="[ECG baseline drift correction 2]"></td>
</tr>
</table>

<b>Figure 8.</b> Examples of ECG baseline drift correction results.
</center>
</p>

<p>
Frequency-selective signal filtering was implemented using a set of adaptive
bandstop and lowpass filters (figure 9). Adaptive bandstop filter fairly well
suppresses power-line noise. Filtering quality is comparable with the classic
bandstop filter. At the same time it has some advantages, including
recursive structure, computational simplicity, and ability to adapt to some
changes of power-line noise characteristics.  A lowpass filter is used to remove
the remaining noise components, caused by possible high-frequency distortions.

<center>
<table>
<tr>
<td><img src="images/noise_filter_1.png" alt="[ECG filtering 1]"></td>
<td><img src="images/noise_filter_2.png" alt="[ECG filtering 2]"></td>
</tr>
</table>

<b>Figure 9.</b> Examples of ECG frequency-selective filtering results.
</center>
</p>



<a name="feature_space">
<h3>Initial feature space formation</h3></a>

<p>
Initial feature space formation is a key stage of the identification system.
Although decisions made in other stages significantly influence the final
identification results, feature selection determines the potential performance
of the identification system most of all.
</p>

<p>
Obviously, information about cardiac electrical activity is primarily contained
in the cardiac cycle fragment containing the QRS complex, P and T waves. The
QRS complex is the most distinctive among them.  The value for identification
of P and T waves is questionable.  The P wave has a low amplitude and can be
greatly distorted by noise.  The T wave is rather changeable, since its
position depends on heart rate.  Including the P and T waves in the
"informative fragment" (the portion of the cardiac cycle from which features
used for identification will be derived) will probably add some information
about the individual, but it will also add some challenges to processing
methods.  To evaluate the discriminative power of the P and T waves, it was
decided to consider four possible "informative fragments": QRS, P-QRS, QRS-T,
and P-QRS-T (figure 10).

<center>
<img src="images/pqrst_config.png" alt="[Cardiac cycle informative fragments]">
<br>
<b>Figure 10.</b> Variants of cardiac cycle informative fragments.
</center>
</p>

<p>
Selection of the informative fragment was realized by performing the complete
ECG identification system runs independently using these four variants of
initial feature space formation.  This experiment yielded useful results even
though the complete system was not fully determined, because the prime interest
was not to determine absolute performance values for each variant, but to rank
them.  The results (figure 11) show that P wave inclusion significantly
improves identification results, while T wave inclusion yields a smaller
improvemement, provided that the P wave is also included. Thus, the P-QRS-T
fragment was selected, named below as the PQRST-fragment.

<center>
<img src="images/system_initial_feature_space.png" alt="[Cardiac cycle informative fragment selection]">
<br>
<b>Figure 11.</b> Test set ECG identification results for considered informative fragments.
</center>
</p>
 
<p>
The process of initial space formation begins with extraction of a set of R
peak synchronized PQRST-fragments (figure 12). PR, QRS and especially QT
intervals have variable length depending on individual physiology and heart
rate. Since feature vectors must have equal length, the PQRST-fragment length
was fixed at 0.5 sec or 250 samples. For each cardiac cycle, regardless of the
actual lengths of PR, QRS and QT intervals, 250 samples (80 samples to the left
of R-peak and 170 samples to the right) were extracted and analyzed.

<center>
<table>
<tr><td align="center"><img src="images/pqrst_set_1.png" alt="[PQRST-fragment extraction 1]"></td></tr>
<tr><td align="center"><img src="images/pqrst_set_2.png" alt="[PQRST-fragment extraction 2]"></td></tr>
</table>

<b>Figure 12.</b> Examples of R peak detection and PQRST-fragment extraction.
</center>
</p>

<p>
For each ECG record, 10 PQRST-fragments were extracted.  Since PQRST-fragment
samples are used as informative features, extracted PQRST-fragments are
processed to enhance their similarity as follows:

<table>
<tr>
<td valign="top">1.</td>
<td colspan=3>Correcting PQRST-fragment mutual "vertical" shift due to residual baseline drift.
 <br>Mean values of all PQRST-fragments were subtracted, so that the corrected
segments had mean values of zero.</td>
</tr>
<tr>
<td></td>
<td colspan=3 align="center"><img src="images/pqrst_set_vertical_correction.png"
 alt="[vertical shift correction]"></td>
</tr>
<tr>
<td valign="top">2.</td>
<td colspan=3>Culling PQRST-fragments that are distorted due to breathing or motion artifacts, as well as pathological PQRST-fragments.
 <br>From the set of 10 extracted PQRST-fragments, the "mean" PQRST-fragment
 was estimated (red line); using Euclidean distance, only the 6 closest
 PQRST-fragments were selected (blue lines) for further analysis.</td>
</tr>
<tr>
<td></td>
<td colspan=3 align="center"><img src="images/pqrst_set_atypical.png"
 alt="[atypical PQRST-fragments]"></td>
</tr>
<tr>
<td valign="top">3.</td>
<td colspan=3>Correcting PQRST-fragments depending on heart rate.
 <br>The ST-fragment (samples from the end of the S wave to the end of the T
 wave) of PQRST-fragment was scaled using QT interval correction
 formulas. The Framingham and Bazett's formulas were considered;  use of the
 Framingham formula gave better and more robust results.</td>
</tr>
<tr>
<td></td>
<td colspan=3 align="center"><img src="images/pqrst_set_rate_original.png" alt="[heart rate influence]">
  <br><img src="images/arrow_ver.png">
  <br><img src="images/pqrst_set_rate_corrected.png" alt="[heart rate correction]"></td>
</tr>
</table>
</p>

<p>
Thus in the initial feature space (dimension <i>N</i>=250) the ECG appears as a
set of 6 PQRST-fragments with each seen as a separate pattern at subsequent
system stages, to be interpreted and classified independently.
</p>



<a name="reduction">
<h3>Feature space reduction</h3></a>

<p>
Initially, the feature space dimension (or number of features for each pattern)
is quite large (<i>N</i>=250). This fact may make subsequent processing difficult
or computationally impossible.  The initial features, however, may include
redundant and useless information due to correlation and interdependencies
between features. In other words, the informational content of each initial
feature and its contribution to the distinction of classes vary significantly.
The feature space reduction procedure aims to make the transition from the
initial feature space, with a correlated basis, into a new feature space with
an uncorrelated basis.  This procedure allows significant dimensionality
reduction with minimal information loss.
</p>

<p>
Two methods of feature space reduction were considered: Principal Component
Analysis (PCA) and Wavelet Transform (WT). Principal Component Analysis allows
reduction of the initial feature space dimension <i>N</i> to 30 according to
the Kaiser criterion, or even to 10 according to the scree test. Use of a
Wavelet Transform provides the same space reduction but with slightly poorer
PQRTS-fragment classification results (figure 13); furthermore, it is hard to
find reasonable criteria for wavelet and decomposition level selection. In a
series of experiments with different wavelets, the best results were obtained
using Daubechies wavelet ('db3'), and decomposition at level 3.

<center>
<img src="images/system_feature_space_reduction.png" alt="[Feature space reduction methods]">
<br>
<b>Figure 13.</b> Test set PQRTS-fragment classification results for
the candidate methods for feature space reduction.
</center>
</p>

<p>
To explore the reduced feature space, geometric and numerical analysis were
carried out. Class distribution in 2D feature space based on the first two
principal components is represented on figure 14. Each PQRTS-fragment is a
point in this feature space, and each class (set of PQRTS-fragments) is a set
of points.  For clarity, each class is represented not as a cloud of points but
as a convex hull containing these points. In the figure, it is apparent that
class points are well-grouped, and that some classes are separated from others
and thus are easily classified;  yet most classes are concentrated in the
same region, and two components are insufficient for classification.

<center>
<img src="images/feature_space_2D.png" alt="[Classes in reduced feature space 2D]">
<br><b>Figure 14.</b> Class distribution in the reduced feature space (first 2
principal components).</td>
</center>
</p>

<p>
To compare the initial and reduced feature spaces quantitatively, intra- and
interclass metrics were considered. Usually, the class centroid is defined as
the mean of all class points, so that the intraclass distance can be defined
as the mean distance between the class points and the class centroid, and
interclass distance as the distance between class centroids. Conceptually,
intraclass distance characterizes the volume of the class or ECG variation
within an individual, and interclass distance characterizes the separation of
classes from each other or ECG variation between individuals. Obviously, good
classification results can be achieved when interclass distances are
significantly greater than intraclass distances.
</p>

<p>
To analyze the feature spaces, intra- and interclass distances were calculated
for all of the 90 classes, and the distributions of the minimal and mean
interclass distances in the initial and reduced feature spaces were calculated
for each class.  Euclidean distance was used as a measure of
distance. Box-and-whisker plots for calculated metrics are presented on figure
15.

<center>
<table border=0>
<tr>
<td>(A)&nbsp;</td>
<td align=center>&nbsp;<img src="images/class_metrics_initial.png" alt="[class metrics for initial feature space]"></td>
</tr>
<tr>
<td>(B)&nbsp;</td>
<td align=center>&nbsp;<img src="images/class_metrics_reduced.png" alt="[class metrics for reduced feature space]"></td>
</tr>
</table>
<b>Figure 15.</b> Class metrics for initial (A) and reduced (B) feature spaces.
</center>
</p>

<p>
In the figure, it is clear that feature space reduction slightly increases
intraclass distances.  More important, however, it noticeably improves class
separability, increasing both minimal and mean interclass distances. In other
words, classes in reduced feature space gain some extra volume, but are more
clearly separated from each other. Overlap between the plots of intraclass
distances and minimal interclass distances indicates the presence of some
collisions (ambiguities) in the classification. Significant separation of plots
of intraclass distances and mean interclass distances indicates that,
potentially, classification can achieve high performance. Of course, it is
difficult to predict how the feature space will be filled with new classes
(i.e., as the system is required to identify a larger number of individuals),
but judging by the ratio of distances there is the potential to increase this
number without losing the classification quality.
</p>

<a name="classification">
<h3>Classification and Identification</h3></a>

<p>
As a result of data processing on previous stages, the original ECG record is
represented as a set of six PQRST-fragment patterns in the reduced feature
space.  At this stage, each PQRST-fragment pattern is classified independently
of the others and assigned to some class, and each PQRST-fragment
classification result is a vote for the candidate class of the final ECG record
identification, which is elected by a majority of votes.
</p>

<p>
In this research, several classifiers were considered.  The Nearest Mean
Classifier is a computationally simple but rather effective method in case of
close to normal distribution of class patterns and almost equal variances of
features values (i.e., when the class patterns fill a hyper-sphere).  The
Weighted Nearest Mean Classifier is a modified version, which takes into
account differences of variances of different features, and the distance metric
is normalized by the variances of corresponding features values within class
(i.e., the class patterns fill a hyper-ellipsoid).  Linear Discriminant
Analysis is a classic method of classification that usually achieves good
performance, even when its assumptions about normality and variances are
violated.  Figure 16 compares the performance of these three classifiers.

<center>
<img src="images/system_classification.png" alt="[classification methods]">
<br>
<b>Figure 16.</b> Final results for considered classification methods.</td>
</center>
</p>

<p>
The figure shows that the results obtained are consistent with expectations.
Indeed, the Nearest Mean Classifier provides quite good results. Figure 14
shows that classes typically have elongated shapes, i.e., that the variances of
the different features are not equal, so that the Weighted Nearest Mean
Classifier provides better results.  Linear Discriminant Analysis not only
provides the best recognition result in the test set, but also minimizes the
recognition error in the training set.  ECG record identification rate is
higher than the PQRTS-fragments recognition rate, since a pair of misclassified
PQRTS-fragments do not affect the correctness of the ECG record identification.
</p>

<a name="results">
<h2>Results</h2></a>

<p>
As a result of this research, a recognition system was developed to solve the
problem of biometric human identification based on ECG on a sufficiently large
set of input data. The findings support the use of ECG as a new biometric
characteristic in various biometric access control problems. Of course, it is
doubtful that ECG is unique enough to be feasible for identification of large
numbers of individuals in a general population.  More likely, it can be useful
for identification within relatively small predetermined groups, or as an
additional feature in multi-variable biometric identification systems. Thus, it
opens up a brand new perspective for the study of biometric technologies with
potential applications in security and modern life amenity systems.
</p>

<p>
Table 1 summarizes and compares the key features and results of this study with
those of studies [1-3], which were available at the time of this study.

<center>
<table border="1" cellspacing="2" bordercolor="#808080" cellpadding="3">
<tbody>
<tr>
<td align="center"><b>Study</b></td>
<td align="center"><b>Biel et al. [1]</b></td>
<td align="center"><b>Yi et al. [2]</b></td>
<td align="center"><b>Shen et al. [3]</b></td>
<td align="center"><b>[This study]</b></td>
</tr>
<tr>
<td align="center"><b>Number of individuals</b></td>
<td align="center">20<br>(20-55 years)</td>
<td align="center">9<br>(22-28 years)</td>
<td align="center">20<br></td>
<td align="center">90<br>(13-75 years)</td>
</tr>
<tr>
<td align="center"><b>ECG acquisition method</b></td>
<td align="center">12 leads</td>
<td align="center">Wireless,<br>30 minutes long</td>
<td align="center">Lead I</td>
<td align="center">Lead I,<br>20 seconds long</td>
</tr>
<tr>
<td align="center"><b>Number of ECG records</b></td>
<td align="center">135 records,<br>different days</td>
<td align="center">18 records,<br>different days</td>
<td align="center">20 records</td>
<td align="center">210 records,<br>different days during 6 months</td>
</tr>
<tr>
<td align="center"><b>Number of ECG records for each individual</b></td>
<td align="center">4-10 records</td>
<td align="center">2 records</td>
<td align="center">1 record</td>
<td align="center">2-20 records</td>
</tr>
<tr>
<td align="center"><b>Training set</b></td>
<td align="center">85 records</td>
<td align="center">9 records of one day,<br>30 fragments for each record</td>
<td align="center">20 heartbeats for each record</td>
<td align="center">195 records,<br>6 of 10 heartbeats for each record</td>
</tr>
<tr>
<td align="center"><b>Test set</b></td>
<td align="center">50 records</td>
<td align="center">9 records of another day,<br>all fragments for each record</td>
<td align="center">1 heartbeat for each ECG (different part)</td>
<td align="center">115 records,<br>6 of 10 heartbeats for each record</td>
</tr>
<tr>
<td align="center"><b>Informative features</b></td>
<td align="center">Heartbeat waves amplitudes and intervals duration (<i>N</i>=30)</td>
<td align="center">Coefficients of the wavelet decomposition of successive ECG fragments 10 seconds long</td>
<td align="center">QRS complex and T wave amplitudes and intervals duration (<i>N</i>=7)</td>
<td align="center">Samples of cardiac cycle fragment containing the QRS complex, P and T waves (<i>N</i>=250)</td>
</tr>
<tr>
<td align="center"><b>Reduction method</b></td>
<td align="center">-</td>
<td align="center">Principal Component Analysis</td>
<td align="center">-</td>
<td align="center">Principal Component Analysis or Wavelet Transform</td>
</tr>
<tr>
<td align="center"><b>Classification method</b></td>
<td align="center">Soft Independent Modeling of Class Analogy (Multivariate Analysis)</td>
<td align="center">Probabilistic neural network</td>
<td align="center">Template matching (TM), Decision-based neural network (DBNN)</td>
<td align="center">Linear Discriminant Analysis and Majority Vote Classifier</td>
</tr>
<tr>
<td align="center"><b>Identification results</b></td>
<td align="center">98 %</td>
<td align="center">95 %</td>
<td align="center">TM: 95 %,<br>DBNN: 80 %,<br>Both: 100 %</td>
<td align="center">96 %</td>
</tr>
</tbody></table>

<b>Table 1.</b> Summary of the key features of different studies.
</center>
</p>



<a name="refs">
<h2>References</h2></a>

<p>[1] Biel L., Pettersson O., Philipson L., Wide P.  ECG analysis: a new approach
 in human identification.  IEEE Transactions on Instrumentation and Measurement
 2001 June; 50(3):808-812.</p>

<p>[2] Yi WJ, Park KS, Jeong DU. Personal identification from ECG measured
 without body surface electrodes using probabilistic neural networks. Proc
 2003 World Congress on Medical Physics and Biomedical Engineering,
 Sydney, Australia, 2003 August.</p>

<p>[3] T.W. Shen, W.J. Tompkins, Y.H. Hu. One-lead ECG for identity verification. 
Proc. of the 2nd Joint Conference of the IEEE Engineering in Medicine and Biology
 Society and the Biomedical Engineering Society, vol. 1, pp. 62-63, 2002.</p>


<!--#include virtual="/footer.shtml" -->