3D human pose estimation is a corner stone for person-centric augmented reality and virtual reality applications. Real-time performance is critical for such applications for a more immersive experience. Although there is a great improvement in the accuracy of 3D human pose estimation algorithms over the past couple of years, most of these methods still fall short of real-time performance, as they rely on a multi-stage pipeline of, first, a 2D detection of the person, and a subsequent stage of 3D pose estimation. In this project, we aim to improve upon our existing expertise in the field [1, 2, 3] and improve the efficiency of our pose estimation algorithms by making use of more compact single-shot convolutional neural network architectures, such as the YOLO network design for object detection . We have shown recently that such single-shot architectures are extremely accurate and computationally efficient for recovering the 6D pose of multiple objects . We would like to explore this further for 3D human pose estimation. Ultimately, we aim to efficiently and accurately recover the 3D pose of a single person and multiple people.
 Tekin et al., "Direct Prediction of 3D Body Poses from Motion Compensated Sequences", CVPR 2016.
 Tekin et al., Tekin et al., "Structured Prediction of 3D Human Pose with Deep Neural Networks", BMVC 2016.
 Tekin et al., "Learning to Fuse 2D and 3D Image Cues for Monocular Body Pose Estimation", ICCV 2017.
 Redmon et al., "You Only Look Once (YOLO): Unified, Real-Time Object Detection", CVPR 2016.
 Tekin et al., "Real-Time Seamless Single Shot 6D Object Pose Prediction", arXiv 2017.
Back to the project list.
The candidate should have strong programming experience, ideally in Python. Previous experience with deep learning is a big plus.
20% Theory, 40% Implementation, 40% Research and Experiments