Database Restricted Access
Kinematic dataset of actors expressing emotions
Published: April 26, 2020. Version: 1.0.0
When using this resource, please cite:
(show more options)
Zhang, M., Yu, L., Zhang, K., Du, B., Zhan, B., Chen, S., Jiang, X., Guo, S., Zhao, J., Wang, Y., Wang, B., Liu, S., & Luo, W. (2020). Kinematic dataset of actors expressing emotions (version 1.0.0). PhysioNet. https://doi.org/10.13026/ckyh-jf25.
Please include the standard citation for PhysioNet:
(show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.
We produced a kinematic dataset to assist in recognizing cues from all parts of the body that indicate human emotions (happy, sad, angry, fearful, neutral, and disgust). The present dataset was created using a portable wireless motion capture system. Twenty-two semi-professional actors (50% female) completed performances. A total of 1190 recordings at 125 Hz were collected, consisting of the position and rotation data of 72 anatomical nodes. We hope this dataset will contribute to multiple fields of research and practice, including social neuroscience, psychiatry, computer vision, and biometric and information forensics.
Recognizing human emotions is crucial for social communication and survival. There are various carriers and channels of emotional expression. The relevant existing works in both psychology  and computer science [2, 3] mainly focus on human faces and voices. Recently, psychologists have found that body movements can provide considerable emotional information relative to facial expression, regardless of static and dynamic conditions [4,5]. Therefore, human body indicators are essential for thorough emotion recognition.
Earlier efforts have created emotional body movement datasets using motion capture techniques, which record emotional expression while dancing, walking, and performing other actions . However, most of these datasets provide finished products (e.g., point-light displays, videos), rather than raw kinematic data which may capture how emotion is encoded in body movements more precisely. There is currently a lack of public kinematic datasets that capture humans expressing emotions.
The dataset was created using a wireless motion capture system (Noitom Perception Neuron, Noitom Technology Ltd., Beijing, China) with 17 wearable sensors at 125 Hz [6-9]. These sensors were placed on both sides of the actors, including their upper and lower arms, hips, spine, head, feet, hands, shoulders, and both upper and lower legs. A four-step calibration procedure using four successive static poses was carried out before performances and whenever necessary (e.g. bad WIFI signal or after resting). The actors performed in a square stage of 1 * 1 m.
Twenty-four college students (13 females, mean age = 20.75 years, SD = 1.92) from the drama and dance clubs of the Dalian University of Technology were recruited as actors. Two females dropped out (i.e., F04, F13), so there were 22 actors left. All of them gave their written informed consent before performing and were informed that their motion data would be used only for scientific research. The study was approved by the Human Research Institutional Review Board of Liaoning Normal University in accordance with the Declaration of Helsinki (1991). After the recording phase, the actors were paid appropriately.
The actors started in a neutral stance (i.e., facing forward and arms naturally at sides) and then needed to complete free and scenario performances successively for each emotion (happy, sad, angry, fearful, neutral, and disgust). The former based on their self-understanding; for the latter, the order of scenarios was random. The actors had six seconds to complete each performance, after which it was reviewed and evaluated for signal quality; hence, some performances would be repeated several times.
A total of 1190 trials were collected. Each actor has their own folder (named with the actor ID) consisting of BVH files for all emotions. Each trial was named systematically as "
actor_ID: represents the actor ID;
emotion: includes happy (H), sad (SA), neutral (N), angry (A), disgust (D), and fearful (F);
scenario_ID: consists of the free (indicated as 0) and scenario performance indicated with the corresponding numeral from 1 to 5;
version: denotes the number of repetitions (for details, see
Each BVH file contains ASCII text and two sections (i.e.,
MOTION). Beginning with the keyword
HIERARCHY, this section defines the joint tree, the name of each node, the number of channels, and the relative position between joints (i.e., the bone length of each part of the human body). In total there are 72 nodes (1 Root, 58 Joints, and 13 End Sites) in this section, which are calculated according to the 17 sensors. The
MOTION section records the motion data. According to the joint sequence defined, the data of each frame is provided, and the position and rotation information of each joint node is recorded. There are some legends in a BVH file:
HIERARCHY: beginning of the header section
ROOT: location of the Hips
JOINT: location of the skeletal joint refers to the parent-joint
CHANNELS: number of channels including position and rotation channels
OFFSET: X, Y, and Z offsets of the segment relative to its parent-joint
End Site: end of a JOINT which has no child-joint
MOTION: beginning of the second section
FRAMES: numbers of frames
Frame Time: sampling time per frame
The mass center of the first frame for each recording was used to evaluate the effect of calibration.
BVH files are plain text and can be imported directly into popular software such as 3ds Max, MotionBuilder, and other open access 3D applications. The data can be reused to build different avatars in virtual reality and augmented reality products. Previous studies on emotion recognition in the field of computer and information science have mainly focused on human faces and voices; hence, the dataset created in this study may help to improve technologies and contribute to scientific research in fields such as psychiatry and psychology.
We also thank X. Yi and S. Liu for contribution to the data collation. This work was supported by the National Natural Science Foundation of China (31871106).
Conflicts of Interest
The authors declare no conflict of interest.
- de Gelder, B. Why bodies? Twelve reasons for including bodily expressions in affective neuroscience. Philos T R Soc B 364, 3475-3484 (2009).
- Schuller, B., Rigoll, G. & Lang, M. Hidden markov model-based speech emotion recognition. 2003 International Conference on Multimedia and Expo, Vol I, Proceedings, 401-404 (2003).
- Lalitha, S., Madhavan, A., Bhushan, B. & Saketh, S. Speech emotion recognition. 2014 International Conference on Advances in Electronics, Computers and Communications (ICAECC) (2014).
- de Gelder, B. & Van den Stock, J. The bodily expressive action stimulus test (BEAST). Construction and validation of a stimulus basis for measuring perception of whole body expression of emotions. Front. Psychol. 2, 181 (2011).
- Atkinson, A. P., Dittrich, W. H., Gemmell, A. J. & Young, A. W. Emotion perception from dynamic and static body expressions in point-light and full-light displays. Perception 33, 717-746 (2004).
- Sers, R. et al. Validity of the perception neuron inertial motion capture system for upper body motion analysis. Measurement 149 (2020).
- Kim, H. S. et al. Application of a perception neuron system in simulation-based surgical training. J Clin Med 8 (2019).
- Robert-Lachaine, X., Mecheri, H., Muller, A., Larue, C. & Plamondon, A. Validation of a low-cost inertial motion capture system for whole-body motion analysis. J. Biomech. 99, 109520 (2020).
- Perception Neuron website: https://neuronmocap.com/content/axis-neuron [Accessed 20 April 2019]
Only logged in users who sign the specified data use agreement can access the files.
License (for files):
PhysioNet Restricted Health Data License 1.5.0