This paper aims to tackle the practically very challenging problem of efficient and accurate hand pose estimation from single depth images. A dedicated two-step regression forest pipeline is proposed: Given an input hand depth image, step one involves mainly estimation of 3D location and in-plane rotation of the hand using a pixel-wise regression forest. This is utilized in step two which delivers final hand estimation by a similar regression forest model based on the entire hand image patch. Moreover, our estimation is guided by internally executing a 3D hand kinematic chain model. For an unseen test image, the kinematic model parameters are estimated by a proposed dynamically weighted scheme. As a combined effect of these proposed building blocks, our approach is able to deliver more precise estimation of hand poses. In practice, our approach works at 15.6 frame-per-second (FPS) on an average laptop when implemented in CPU, which is further sped-up to 67.2 FPS when running on GPU. In addition, we introduce and make publicly available a data-glove annotated depth image dataset covering various hand shapes and gestures, which enables us conducting quantitative analyses on real-world hand images. The effectiveness of our approach is verified empirically on both synthetic and the annotated real-world datasets for hand pose estimation, as well as related applications including partbased labeling and gesture classification. In addition to empirical studies, the consistency property of our approach is also theoretically analyzed.
International Journal of Computer Vision (IJCV), Vol. 112, No. 3, May 2015, pp.1-25
Date of acceptance