Database Open Access

# Open Access Dataset and Toolbox of High-Density Surface Electromyogram Recordings

Published: May 28, 2021. Version: 1.0.0

Jiang, X., Dai, C., Liu, x., & Fan, J. (2021). Open Access Dataset and Toolbox of High-Density Surface Electromyogram Recordings (version 1.0.0). PhysioNet. https://doi.org/10.13026/ym7v-bh53.

X. Jiang et al., "Open Access Dataset, Toolbox and Benchmark Processing Results of High-Density Surface Electromyogram Recordings," in IEEE Transactions on Neural Systems and Rehabilitation Engineering, http://doi.org/10.1109/TNSRE.2021.3082551

Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.

## Abstract

We provide an open access dataset of High densitY Surface Electromyogram (HD-sEMG) Recordings (named "Hyser"), and a toolbox for neural interface research. We acquired data from 20 subjects with each subject participating in our experiment twice on separate days following the same experiment paradigm. Using our dataset, researchers can develop advanced techniques on pattern recognition of 34 hand gestures and regression between HD-sEMG and forces of five fingers. These techniques are essential for intuitive control of neuroprostheses and neuroexoskeletons. Our toolbox can be used to: (1) analyze each of the five datasets using standard benchmark methods and (2) decompose HD-sEMG signals into motor unit action potentials via independent component analysis.

## Background

Surface electromyogram (sEMG)-based neural interface [1] has attracted soaring attention recently. Specifically, by decoding movement intent information of amputees from sEMG signals, contractions of residual muscles in the stump can be automatically detected and then used to intuitively control neuroprostheses and neuroexoskeletons [2]. With the advancement of flexible sensing techniques, high-density sEMG (HD-sEMG), with a large number of channels covering a larger area of the skin above a specific muscle, can provide high-resolution muscle activation maps [3]. However, HD-sEMG datasets are quite scarce. Additionally, there are no HD-sEMG datasets for hand prostheses to control the force of five fingers (we use the term fingers loosely to refer to both the fingers and thumb on a given hand). Therefore, we provide an open access dataset of High densitY Surface Electromyogram Recordings (named "Hyser").

The Hyser dataset consists of 5 sub-datasets as: (1) pattern recognition (PR) dataset which was acquired under 34 hand gestures in common daily use, (2) maximal voluntary muscle contraction (MVC) dataset which was acquired when each subject contracted each of their 5 fingers individually (MVC signals can be used to normalize force data), (3) one-degree of freedom (DoF) dataset which was acquired during contraction of each individual finger (tracking a target force trajectory), (4) N-DoF dataset which was acquired during prescribed contractions of multiple finger combinations (tracking target force trajectories), and (5) random task dataset which was acquired during random contraction of finger combinations without following any prescribed force trajectory.

Both pattern recognition of different hand gestures [4] and regression between HD-sEMG and finger forces [2] can be investigated using our Hyser dataset. Additionally, our toolbox can be used to: (1) analyze each of the five datasets using standard benchmark methods [2] and (2) decompose HD-sEMG signals into motor unit action potentials via independent component analysis [5]. We expect our dataset can provide a platform to promote a wider range of research on neural interface techniques and collaboration among engineers in the area of neural rehabilitation.

## Methods

Twenty subjects, consisting of 12 males and 8 females (aged 22 to 34 years) with intact fingers, participated in this study. All subjects signed a written informed consent. This experiment was reviewed and approved by the ethics committee of Fudan University (approval number: BE2035).

For the PR dataset, a 256-channel HD-sEMG was acquired when subjects performed 34 different hand gestures. For each gesture, both dynamic tasks (1 s duration, from subjects' relaxed state to the required gesture) and maintenance tasks (4 s duration, from subjects' relaxed state to the required gesture followed with maintenance at that gesture) were performed.

For the MVC dataset, 1-DoF dataset, N-DoF dataset and Random dataset, both 256-channel HD-sEMG and ground truth force values were acquired synchronously.

HD-sEMG signals were acquired at 2048 Hz sampling rate. Ground truth force signals were acquired at 100 Hz sampling rate.

In the preprocessing stage, the acquired HD-sEMG signals were first filtered with a 10--500 Hz 8-order Butterworth bandpass filter. A notch filter combination was then used to attenuate the power line interference at 50 Hz and all harmonic components up to 400 Hz. Force data were filtered by an 8-order 10 Hz low-pass Butterworth filter. We provide both raw HD-sEMG signals and preprocessed HD-sEMG signals in our dataset.

Additionally, we provide a toolbox for HD-sEMG analysis, which performs: (1) analysis of the pattern recognition dataset using linear discriminant analysis (LDA)-based and deep learning-based hand gesture classification, (2) analysis of datasets 2--4 (from the 5 sub-datasets outlined in the Background section), (3) decomposition of HD-sEMG signals into MU spike trains using ICA. All analyses in our toolbox were implemented via MATLAB. Note that, to fully implement our toolbox, users still need a MATLAB license.

## Data Description

Data corresponding to five datasets are stored in five folders, namely "pr_dataset" (37.1GB), "mvc_dataset" (7.8GB), "1dof_dataset" (29.3GB), "ndof_dataset" (58.6GB), and "random_dataset" (9.8GB). For the PR dataset, ground truth gesture labels were stored in "*.txt" files with comma-separated values format. All other signal segments (both HD-sEMG signals and ground truth force trajectories) were stored in waveform database (WFDB) format, with one "*.dat" file storing all 16-bit signed type quantitized values, and one "*.hea" file (with the same file name as the ".dat" file except the filename extension) storing the scaling factors.

In each of the five sub-datasets, data of 20 subjects acquired in 2 experiment sessions (on 2 separate days) are stored in 40 folders, named "subjecti_sessionj", where i $\in$ {'01,'02',...,'20'} represents the subject index and j $\in${'1','2'} represents the session index.

For the PR dataset, data segments in each folder "subjecti_sessionj" are named by "taskType_sigType_samplek.dat", "taskType_sigType_samplek.hea" and "label_taskType.txt", where taskType $\in$ {'dynamic','maintenance'} represents the two tasks of each gesture, sigType $\in$ {'raw','preprocess'} represents raw and preprocessed EMG segments, respectively, and k $\in$ {'1','2',...,'$N_s$'} represents the segment index for each task ($N_s$ is the total number of signal segments for each task). We also provide "label_dynamic.txt" and "label_maintenance.txt" files which contain the ground truth gesture labels of each data segment.

For the MVC dataset, data segments in each folder "subjecti_sessionj" are named by "mvc_sigType_fingeru_direction.dat" and "mvc_sigType_fingeru_direction.hea", where sigType $\in$ {'raw','preprocess','force} represents signal segments of raw EMG, preprocessed EMG and ground truth force, respectively, u $\in$ {'1','2','3','4','5'} represents contractions of thumb, index, middle, ring and little finger, respectively, and direction$\in${'extension','flexion'} represents the two contraction directions.

For the one DoF dataset, data segments in each folder "subjecti_sessionj" are named by "1dof_sigType_fingeru_samplek.dat" and "1dof_sigType_fingeru_samplek.hea", where sigType $\in$ {'raw','preprocess','force} represents signal segments of raw EMG, preprocessed EMG and ground truth force, respectively, u $\in$ {'1','2','3','4','5'} represents contractions of thumb, index, middle, ring and little finger, respectively, and k $\in$ {'1','2',...,'$N_s$'} represents the segment index for each task ($N_s$ is the total number of signal segments for each task).

For the N DoF dataset, data segments in each folder "subjecti_sessionj" are named by "ndof_sigType_combinationu_samplek.dat" and "ndof_sigType_combinationu_samplek.hea", where sigType $\in$ {'raw','preprocess','force} represents signal segments of raw EMG, preprocessed EMG and ground truth force, respectively, u $\in$ {'1','2',...,'15'}represents the index number of 15 finger combinations, and k $\in$ {'1','2',...,'$N_s$'} represents the segment index for each task ($N_s$ is the total number of signal segments for each task).

For the random dataset, data segments in each folder "subjecti_sessionj" are named by "random_sigType_samplek.dat" and "random_sigType_samplek.hea", where sigType $\in$ {'raw','preprocess','force} represents signal segments of raw EMG, preprocessed EMG and ground truth force, respectively, and k $\in$ {'1','2',...,'$N_s$'} represents the segment index for each task ($N_s$ is the total number of signal segments for each task).

For files with sigType='force', ground truth force trajectory data corresponding to each trial were stored. For files with a file name "label_taskType.txt", the stored data are the gesture labels of all segments (either dynamic or maintenance tasks, depending on the value of taskType) in PR dataset, formatted as $1\times N_s$ comma-separated values (one value per segment). For all other files, HD-sEMG data were stored.

You can access this data using the PhysioNet WFDB toolboxes (Matlab, Python, C). Alternatively, you can also download the data from this project and use the Matlab toolbox supplied as part of this project to load the data by using the "load_pr", "load_mvc", "load_1dof", "load_ndof" and "load_random" functions. The loaded EMG data will be in a $N_T \times 256$ matrix (one channel per column). The loaded force data will be in a $N_T \times 5$ matrix with force of one finger in one column (following the order of thumb, index, middle, ring and little fingers). $N_T$ is the length of the loaded time series.

## Usage Notes

Research directions which might benefit from our dataset and toolbox include:

(1) HD-sEMG-based neuroprosthetic control. In previous studies, both macroscopic features extracted from global sEMG [2,4] and microscopic features extracted from motor unit spike trains obtained via decomposition [6,7] have been used as the input of control models. Our dataset and toolbox can be used to develop neuroprostheses based on both pattern recognition and proportional control, using both macroscopic and microscopic features. Using our dataset, generalized neural interface techniques can also be developed for users with intact fingers to manipulate mobile devices in Internet of Things (IoT) applications.

(2) Compression of HD-sEMG signals. HD-sEMG acquires signals from a large number of channels, greatly increasing the burdens of data storage and transmission [8] in tele-rehabilitation applications. Several unique properties of HD-sEMG, such as the similarity between neighboring channels, may facilitate new solutions for multi-channel sEMG compression. To-date, investigations of HD-sEMG signal compression are very scarce in the literature.

(3) Signal quality assessment of HD-sEMG. In many applications, sEMG measurement needs to be achieved in an unsupervised way. In this case, low-quality signals may disproportionately reduce the robustness of systems [9]. By designing a signal quality descriptor, we can exclude low-quality channels from the analysis procedure, or set the system (neuroprostheses or health monitoring systems) to an idle state if signal quality is lower than a predefined threshold. In the context of signal quality assessment of sEMG, the specific properties of large HD-sEMG arrays may provide new challenges and opportunities.

(4) Neuromuscular physiology studies. Neuromuscular physiology studies highly rely on decoding the discharge activities of motor units via non-invasive sEMG measurement [10,11]. The ICA-based HD-sEMG decomposition algorithm in our toolbox can contribute to extending the body of knowledge in neuromuscular physiology.

(5) Neuromuscular biometrics decoded from HD-sEMG for user authentication or identification. Very recent studies have demonstrated excellent performance using HD-sEMG as a new cancelable and cross-application discrepant biometric trait, due to the individually-unique characteristics of HD-sEMG signals [12,13]. One of our recent studies used the data in Hyser PR dataset for biometric authentication, which might be an important reference if you would like to use Hyser dataset for biometrics studies [14]. Our dataset provides HD-sEMG under diverse hand gestures and muscle contraction efforts, which can be used to investigate HD-sEMG-based biometrics. Data acquired on different days can also support the evaluation of cross-day variation of HD-sEMG biometrics.

Additional detail on the dataset, toolbox, and related analyses may be found in our associated paper entitled: "Open Access Dataset, Toolbox and Benchmark Processing Results of High-Density Surface Electromyogram Recordings" [15]. A preprint of the manuscript is included with this dataset for convenience (manuscript_author_version.pdf).

## Acknowledgements

This work is supported by Shanghai Municipal Science and Technology Major Project (Grant No. 2017SHZDZX01) and Shanghai Pujiang Program (Grant No. 19PJ1401100).

## Conflicts of Interest

The authors have no conflicts of interest to declare.

## References

1. McDonald, C. G., Sullivan, J. L., Dennis, T. A., & O’Malley, M. K. (2020). A Myoelectric Control Interface for Upper-Limb Robotic Rehabilitation Following Spinal Cord Injury. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 28(4), 978–987. https://doi.org/10.1109/TNSRE.2020.2979743
2. Dai, C., Bardizbanian, B., & Clancy, E. A. (2017). Comparison of Constant-Posture Force-Varying EMG-Force Dynamic Models About the Elbow. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 25(9), 1529–1538. https://doi.org/10.1109/TNSRE.2016.2639443
3. Farina, D., Lorrain, T., Negro, F., & Jiang, N. (2010). High-density EMG E-Textile systems for the control of active prostheses. 2010 Annual International Conference of the IEEE Engineering in Medicine and Biology, 3591–3593. https://doi.org/10.1109/IEMBS.2010.5627455
4. Phinyomark, A., Quaine, F., Charbonnier, S., Serviere, C., Tarpin-Bernard, F., & Laurillau, Y. (2013). EMG feature evaluation for improving myoelectric pattern recognition robustness. Expert Systems with Applications, 40(12), 4832–4840. https://doi.org/10.1016/j.eswa.2013.02.023
5. Dai, C., & Hu, X. (2019). Independent component analysis based algorithms for high-density electromyogram decomposition: Experimental evaluation of upper extremity muscles. Computers in Biology and Medicine, 108, 42–48. https://doi.org/10.1016/j.compbiomed.2019.03.009
6. Farina, D., Vujaklija, I., Sartori, M., Kapelner, T., Negro, F., Jiang, N., Bergmeister, K., Andalib, A., Principe, J., & Aszmann, O. C. (2017). Man/machine interface based on the discharge timings of spinal motor neurons after targeted muscle reinnervation. Nature Biomedical Engineering, 1(2), 0025. https://doi.org/10.1038/s41551-016-0025
7. Dai, C., & Hu, X. (2019). Finger Joint Angle Estimation Based on Motoneuron Discharge Activities. IEEE Journal of Biomedical and Health Informatics, 1–1. https://doi.org/10.1109/JBHI.2019.2926307
8. Itiki, C., Furuie, S. S., & Merletti, R. (2014). Compression of high-density EMG signals for trapezius and gastrocnemius muscles. BioMedical Engineering OnLine, 13(1), 25. https://doi.org/10.1186/1475-925X-13-25
9. Grönlund, C., Roeleveld, K., Holtermann, A., & Karlsson, J. S. (2005). On-line signal quality estimation of multichannel surface electromyograms. Medical and Biological Engineering and Computing, 43(3), 357–364. https://doi.org/10.1007/BF02345813
10. Dai, C., Shin, H., Davis, B., & Hu, X. (2017). Origins of Common Neural Inputs to Different Compartments of the Extensor Digitorum Communis Muscle. Scientific Reports, 7(1), 13960. https://doi.org/10.1038/s41598-017-14555-x
11. Jiang, X., Ren, H., Xu, K., Ye, X., Dai, C., Clancy, E. A., Zhang, Y.-T., & Chen, W. (2021). Quantifying Spatial Activation Patterns of Motor Units in Finger Extensor Muscles. IEEE Journal of Biomedical and Health Informatics, 25(3), 647–655. https://doi.org/10.1109/JBHI.2020.3002329
12. Jiang, X., Xu, K., Liu, X., Dai, C., Clifton, D. A., Clancy, E. A., Akay, M., & Chen, W. (2021). Neuromuscular Password-Based User Authentication. IEEE Transactions on Industrial Informatics, 17(4), 2641–2652. https://doi.org/10.1109/TII.2020.3001612
13. Jiang, X., Xu, K., Liu, X., Dai, C., Clifton, D. A., Clancy, E. A., Akay, M., & Chen, W. (2021). Cancelable HD-sEMG-Based Biometrics for Cross-Application Discrepant Personal Identification. IEEE Journal of Biomedical and Health Informatics, 25(4), 1070–1079. https://doi.org/10.1109/JBHI.2020.3027389
14. Jiang, X., Liu, X., Fan, J., Ye, X., Dai, C., Clancy, E. A., Farina, D., & Chen, W. (2021). Enhancing IoT Security via Cancelable HD-sEMG-based Biometric Authentication Password, Encoded by Gesture. IEEE Internet of Things Journal, 1–1. https://doi.org/10.1109/JIOT.2021.3074952
15. X. Jiang et al., "Open Access Dataset, Toolbox and Benchmark Processing Results of High-Density Surface Electromyogram Recordings," in IEEE Transactions on Neural Systems and Rehabilitation Engineering, http://doi.org/10.1109/TNSRE.2021.3082551.

##### Access

Access Policy:
Anyone can access the files, as long as they conform to the terms of the specified license.

##### Corresponding Author
You must be logged in to view the contact information.

## Files

Total uncompressed size: 142.8 GB.

##### Access the files
wget -r -N -c -np https://physionet.org/files/hd-semg/1.0.0/

Visualize waveforms