Occlusion-aware Hand Pose Estimation Using Hierarchical Mixture Density Network

11/29/2017

∙

Hand pose estimation is to predict the pose parameters representing a 3D hand model, such as locations of hand joints. This problem is very challenging due to large changes in viewpoints and articulations, and intense self-occlusions, etc. Many researchers have investigated the problem from both aspects of input feature learning and output prediction modelling. Though effective, most of the existing discriminative methods only give a deterministic estimation of target poses. Also, due to their single-value mapping intrinsic, they fail to adequately handle self-occlusion problems, where occluded joints present multiple modes. In this paper, we tackle the self-occlusion issue and provide a complete description of observed poses given an input depth image through a hierarchical mixture density network (HMDN) framework. In particular, HMDN leverages the state-of-the-art CNN module to facilitate feature learning, while proposes a density in a two-level hierarchy to reconcile single-valued and multi-valued mapping in the output. The whole framework is naturally end-to-end trainable with a mixture of two differentiable density functions. HMDN produces interpretable and diverse candidate samples, and significantly outperforms the state-of-the-art algorithms on benchmarks that exhibit occlusions.

READ FULL TEXT

Occlusion-aware Hand Pose Estimation Using Hierarchical Mixture Density Network

Sign in with Google

Consider DeepAI Pro