next up previous
Next: Example Up: ibsi Previous: Background


Human cardiac dynamics are driven by the complex nonlinear interactions of two competing forces: sympathetic stimulation increases and parasympathetic stimulation decreases heart rate. For this type of intrinsically noisy system, it may be useful to simplify the dynamics via mapping the output to binary sequences, where the increase and decrease of the inter-beat intervals are denoted by 1 and 0, respectively. The resulting binary sequence retains important features of the dynamics generated by the underlying control system, but is tractable enough to be analyzed as a symbolic sequence.

Consider an inter-beat interval time series, $\{ x_1, x_2, \cdots
,x_N\}$, where $x_i$ is the $i$-th inter-beat interval. We can classify each pair of successive inter-beat intervals into one of the two states that represents a decrease in $x$, or an increase in $x$. These two states are mapped to the symbols 0 and 1, respectively

I_n = \left\{
0, & {\rm if} \; x_{n}\l...
...1} \\
1, & {\rm if} \; x_{n} > x_{n-1}.
\end{array} \right.
\end{displaymath} (1)

Figure 1: Schematic illustration of the mapping procedure for 8-bit words from a heartbeat time series.

We map $m + 1$ successive intervals to a binary sequence of length $m$, called an $m$-bit ``word.'' Each $m$-bit word, $w_k$, therefore, represents a unique pattern of fluctuations in a given time series. By shifting one data point at a time, the algorithm produces a collection of $m$-bit words over the whole time series. Therefore, it is plausible that the occurrence of these $m$-bit words reflects the underlying dynamics of the original time series. Different types of dynamics thus produce different distributions of these $m$-bit words.

In studies of natural languages, it has been observed that authors have characteristic preferences for the words they use with higher frequency. To apply this concept to symbolic sequences mapped from the inter-beat interval time series, we count the occurrences of different words, and then sort them in descending order by frequency of occurrence.

Figure 2: The linear regime (for rank $\le 50$).

The resulting rank-frequency distribution, therefore, represents the statistical hierarchy of symbolic words of the original time series. For example, the first rank word corresponds to one type of fluctuation which is the most frequent pattern in the time series. In contrast, the last rank word defines the most unlikely pattern in the time series.

To define a measurement of similarity between two signals, we plot the rank number of each $m$-bit word in the first time series against that of the second time series.

Figure 3: Rank order comparison of two cardiac inter-beat interval time series from same subject. For each word, its rank in the first time series is plotted against its rank in the second time series. The dashed diagonal line indicates the case where the rank-order of words for both time series is identical.
\includegraphics[width=3in, clip]{}

If two time series are similar in their rank order of the words, the scattered points will be located near the diagonal line. Therefore, the average deviation of these scattered points away from the diagonal line is a measure of the ``distance'' between these two time series. Greater distance indicates less similarity and vice versa. In addition, we incorporate the likelihood of each word in the following definition of a weighted distance, $D_m$, between two symbolic sequences, $S_1$ and $S_2$.

D_m(S_1, S_2) ={1 \over{2^m-1}}
{\sum\limits_{k=1}^{2^m} \vert R_1(w_k)-R_2(w_k)\vert\,F(w_k)}
\end{displaymath} (2)

F(w_k)={1 \over Z} [-p_1(w_k)\,\log p_1(w_k) - \,p_2(w_k)\,\log p_2(w_k)].
\end{displaymath} (3)

Here $p_1(w_k)$ and $R_1(w_k)$ represent probability and rank of a specific word, $w_k$, in time series $S_1$. Similarly, $p_2(w_k)$ and $R_2(w_k)$ stand for probability and rank of the same $m$-bit word in time series $S_2$. The absolute difference of ranks is multiplied by the normalized probabilities as a weighted sum by using Shannon entropy as the weighting factor. Finally, the sum is divided by the value $2^m-1$ to keep the value in the same range of [0, 1]. The normalization factor $Z$ in Eq. 3 is given by $Z=\sum_k [-p_1(w_k)\,\log p_1(w_k) - \,p_2(w_k)\,\log p_2(w_k)]$.

next up previous
Next: Example Up: ibsi Previous: Background
Albert Yang (