next up previous
Next: Applications Up: ibsi Previous: Methods


Here we provide a concise example to demonstrate how to calculate an information-based similarity index between two time series. The following figure illustrates a sample heartbeat interval time series from a healthy subject (left panel) showing complex variability. In contrast, a time series from a CHF subject (right panel) shows less variability. Both sample time series contain 1000 inter-beat intervals. (See RR Intervals, Heart Rate, and HRV Howto for information on how to obtain additional inter-beat interval time series in this format.)

Figure 4: Representative inter-beat interval time series for (a) a healthy subject, and (b) a subject with congestive heart failure (CHF).
\includegraphics[width=5in, clip]{}

We first map each signal to a binary sequence according to the increment of consecutive inter-beat intervals. Suppose we set the m equal to 8, then there will be $2^8$ = 256 different 8-bit words. We count the occurrences of each 8-bit word, and then sort them by descending frequency. The resulting rank-frequency distribution represents the statistical hierarchy of repetitive patterns of a given time series. For example, the top-ranked 8-bit words correspond to the most frequently occurring patterns in a given heartbeat time series. In contrast, the last ranked word defines the rarest patterns.

Table: $H(w_k)= -p(w_k)\, \log\,p(w_k)$ is the Shannon entropy.
8-bit words $w_k$ $R_1(w_k)$ $R_2(w_k)$ $p_1(w_k)$ $p_2(w_k)$ $H_1(w_k)$ $H_2(w_k)$
00000000 21 113 0.011089 0.002016 0.049919 0.012513
00000001 9 111 0.018145 0.002016 0.072750 0.012513
00000010 118 74 0.003024 0.005040 0.017544 0.026665
00000011 3 195 0.029234 0.000000 0.103267 0.000000
00000100 45 35 0.006048 0.009073 0.030895 0.042664
00000101 117 86 0.003024 0.004032 0.017544 0.022232
00000110 47 161 0.006048 0.001008 0.030895 0.006955
00000111 1 194 0.037298 0.000000 0.122667 0.000000
00001000 80 24 0.004032 0.012097 0.022232 0.053405
00001001 83 73 0.004032 0.005040 0.022232 0.026665

The rank order difference between two time series can be visualized by plotting the rank number of each 8-bit word in the first time series against that of the second time series. The dashed diagonal line indicates the case where the rank order of words for both time series is identical.

As demonstrated by the above rank order comparison map, the ``distance'' (or dissimilarity) between any two time series can be quantified by measuring the scatter of these points from the diagonal line in the rank order comparison plot. By applying Eq.1 to the rank-order frequency list obtained from the sample time series, we obtained an information-based similarity index equal to 0.412725. Using the example data files provided with the ibs software, this result may be reproduced by running the command

    ibs 8 healthy.txt chf.txt

next up previous
Next: Applications Up: ibsi Previous: Methods
Albert Yang (