In recent years, life logging in daily environments has become increasingly important for applications such as health monitoring for older adults, behavior understanding, and optimization of daily activities. In particular, hand-object manipulation appears in a wide range of everyday situations, including smartphone interaction, writing, cooking, and tool usage. If such manipulations can be recognized in detail, it would enable more fine-grained life logging as well as novel interaction interfaces that utilize everyday objects as input devices.
However, conventional wearable sensing approaches have difficulty fully capturing fingertip contact states and subtle manipulation differences, even though they can observe wrist and arm movements. For example, operations such as smartphone scrolling and typing may exhibit similar wrist trajectories while differing significantly in fingertip contact patterns and contact transitions, making them difficult to distinguish using inertial sensors alone. Similarly, ultrasonic sensing alone can capture reflection changes caused by hands and objects, but cannot sufficiently model whole-arm motion information.
To address this challenge, we propose Shape-N-Motion, a multimodal recognition method that directly integrates ultrasonic reflection signals and IMU time-series signals. The proposed system employs a wrist-worn device equipped with an ultrasonic sensor oriented toward the fingertips together with an IMU. The ultrasonic sensor captures reflection changes associated with fingertip contact states, while the IMU captures wrist and arm movements, allowing the two modalities to provide complementary information.
In the proposed method, ultrasonic and IMU signals are processed using separate encoders and subsequently integrated for classification. Specifically, ultrasonic signals are transformed into time-frequency representations using Short-Time Fourier Transform (STFT), and local reflection changes are emphasized using BiLSTM and attention mechanisms. In contrast, IMU signals are processed using a combination of convolutional layers and Transformers to capture both short-term motion variations and long-term movement patterns. By jointly modeling contact-related information and motion-related information, the proposed method enables fine-grained hand-object manipulation recognition that has been difficult to achieve using conventional approaches.
As a preliminary step toward this research, we previously investigated a multimodal wearable sensing system integrating RGB sensors for recognizing manipulated objects, ultrasonic sensors for capturing hand shape and motion, and IMU sensors for capturing arm trajectories and hand movements. In this study, we compared recognition accuracy across different sensor combinations and analyzed the influence of sensor configurations on object manipulation recognition performance. We also evaluated the relationship between image sensor placement and recognition performance.
Experimental results demonstrated that Shape-N-Motion outperformed IMU-only, IMU + audio-based baselines, and ultrasonic-only methods. In particular, substantial improvements were observed for manipulations that exhibited similar wrist trajectories but different fingertip contact states and contact transitions. Furthermore, performance improvements were also confirmed under participant-independent evaluation (LOPO: Leave-One-Participant-Out), suggesting robustness against individual differences. These results indicate the effectiveness of integrating contact-state information captured by ultrasonic sensing with motion information captured by IMUs.
Overall, this work introduces a new wearable sensing approach focused on contact-state understanding, enabling fine-grained recognition of hand-object manipulations that has previously been difficult to achieve.
— 関連研究 —
[1]L.C.Jung+, EchoWrist: Continuous Hand Pose Tracking and Hand-Object Interaction Recognition Using Low-Power Active Acoustic Sensing On a Wristband, CHI ’24
Publications
Kaito Fujishige, Kota Tsubouchi, Yuuki Nishiyama, Masamichi Shimosaka.
Shape-N-Motion: Fine-Grained Hand Object Manipulation Recognition with Ultrasonic and IMU
PerCom’26: Proceedings of the IEEE International Conference on Pervasive Computing and Communications (PerCom), Pisa, Italy, Mar. 16–20, 2026.
