Reseach Agenda:

We target top-tier research at the intersection of computer vision, computer graphics and machine learning. Our research mission is to capture, perceive and understand the human-centric dynamic and static scenes in the complex real world. Our goal is to digitalize humans, objects and events, and eventually to enable realistic and immersive tele-presence in virtual reality and augmented reality.

2022

NeuralHOFusion: Neural Volumetric Rendering under Human-object Interactions

We present a Fourier PlenOctree (FPO) technique for neural dynamic scene representation, which enables effi cient neural modeling and real-time rendering of unseen dynamic objects with compact memory overload.

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
[Paper (Coming soon)]   [Project Page]   [Video (coming soon)]   [Arxiv]   [bibtex (Coming soon)]

Fourier PlenOctrees for Dynamic Radiance Field Rendering in Real-time

We present a Fourier PlenOctree (FPO) technique for neural dynamic scene representation, which enables effi cient neural modeling and real-time rendering of unseen dynamic objects with compact memory overload.

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
[Paper (Coming soon)]   [Project Page]   [Video (coming soon)]   [Arxiv]   [bibtex (Coming soon)]

HumanNeRF: Efficiently Generated Human Radiance Field from Sparse Inputs

We present a neural free-view synthesis approach for general dynamic humans using only sparse RGB streams, which efficiently optimizes a more generalizable radiance field on-the-fly for unseen performers in an hour.

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
[Paper (Coming soon)]   [Project Page]   [Video (coming soon)]   [Arxiv]   [bibtex (Coming soon)]

LiDARCap: Long-range Marker-less 3D Human Motion Capture with LiDAR Point Clouds

We propose the first monocular LiDAR-based approach for marker-less, long-range 3D human motion capture in a data-driven manner using a new LiDARHuman26M dataset with rich modalities and ground-truth annotations.

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
[Paper (Coming soon)]   [Project Page (Coming soon)]   [Video (coming soon)]   [bibtex (Coming soon)]

HSC4D: Human-centered 4D Scene Capture in Large-scale Space Using Wearable IMUs and LiDAR

We present a Human-centered 4D Scene capture method to accurately and efficiently create a dynamicdigital world using only body-mounted IMUs and LiDAR.

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
[Paper (Coming soon)]   [Project Page]   [Video]   [Arxiv]   [bibtex (Coming soon)]

STCrowd: A Multimodal Dataset for Pedestrian Perception in Crowded Scenes

We propose a new multimodal dataset with diverse crowd densities, multiple scenes, various weather, and different human poses, which can facilitate many perceptio tasks like detection, tracking, and prediction.

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
[Paper (Coming soon)]   [Project Page (Coming soon)]   [Video (coming soon)]   [bibtex (Coming soon)]

Anisotropic Fourier Features for Neural Image-Based Rendering and Relighting

We present an anisotropic RFF mapping scheme for a range of neural implicit image-based rendering and relighting tasks, which improves the performance by taking the RFF mapping into the new anisotropic realm.

Proceedings of the the Association for the Advance of Artificial Intelligence (AAAI), 2022.
[Paper]   [Project Page (Coming soon)]   [Video]   [bibtex (Coming soon)]


2021

RobustFusion: Robust Volumetric Performance Reconstruction under Human-object Interactions

We present a robust volumetric performance reconstruction approach from a single RGBD stream, which solves the challenging ambiguity and occlusions under human-object interactions without pre-scanned templates.

submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021.
[Paper (Coming soon)]   [Project Page (Coming soon)]   [Video]   [Arxiv]   [bibtex (Coming soon)]

TightCap: 3D Human Shape Capture with Clothing Tightness Field

We present a data-driven approach to capture both human shape and dressed garments robustly from only a single complete 3D scanned mesh of the performer using clothing tightness field and the CTD dataset.

ACM Transactions on Graphics (TOG), 2021.
[Paper (Coming soon)]   [Project Page]   [Video (Coming soon)]   [Arxiv]   [bibtex (Coming soon)]

SportsCap: Monocular 3D Motion Capture and Fine-Grained Understanding in Challenging Sports Videos

We present the first joint 3D motion capture and fine-grained understanding approach for various challenging sports movements from only a single RGB video input using mid-level sub-motion embedding analysis.

International Journal of Computer Vision (IJCV) , 2021.
[Paper]   [Project Page]   [Video]   [bibtex]

GNeRF: GAN-based Neural Radiance Field without Posed Camera

We present GNeRF, a method that can estimate camera poses and neural radiance fields jointly when the cameras are initialized at random poses in complex scenarios (outside-in scenes with less texture or intense noise ).

International Conference on Computer Vision (ICCV), 2021. Oral
[Paper (Coming soon)]   [Project Page (Coming soon)]   [Video (Coming soon)]   [Arxiv]   [bibtex]

Neural Video Portrait Relighting in Real-time via Consistency Modeling

We present a approach for realistic video portrait relighting into new scenes with dynamic illuminations in real-time even even on portable device by jointly modeling the semantic, temporal and lighting consistency.

International Conference on Computer Vision (ICCV), 2021.
[Paper (Coming soon)]   [Project Page]   [Video]   [Arxiv]   [bibtex]

Neural Free-Viewpoint Performance Rendering under Complex Human-object Interactions

We present a human-object neural volumetric rendering using only sparse RGB cameras, which generates both high-quality geometry and photo-realistic texture of human activities in novel views for interaction scenarios.

ACM International Conference on Multimedia (ACMMM), 2021. Oral
[Paper (Coming soon)]   [Video (Coming soon)]   [bibtex (Coming soon)]

iButter: Neural Interactive Bullet Time Generator for Human Free-viewpoint Rendering

We present an interactive bullet-time generator for human free-viewpoint rendering from multiple RGB streams. It enables trajectory-aware refinement and real-time dynamic NeRF rendering without tedious per-scene training.

ACM International Conference on Multimedia (ACMMM), 2021. Oral
[Paper (Coming soon)]   [Video (Coming soon)]   [bibtex (Coming soon)]

Towards Controllable and Photorealistic Region-wise Image Manipulation

We build an auto-encoder for photorealistic region-wise style editing on real images, with the aid of code alignment loss and content consistency loss in a self-supervised manner to modulate the training process.

ACM International Conference on Multimedia (ACMMM), 2021.
[Paper (Coming soon)]   [Video (Coming soon)]   [bibtex (Coming soon)]

Few-shot Neural Human Performance Rendering from Sparse RGBD Videos

We present the first few-shot neural human performance rendering approach using six sparse RGBD cameras which generates photorealistic texture of challenging human activities under the sparse capture setup.

International Joint Conferences on Artificial Intelligence Organization (IJCAI), 2021.
[Paper (Coming soon)]   [Project Page (Coming soon)]   [Video (Coming soon)]   [arXiv]   [bibtex]

PIANO: A Parametric Hand Bone Model from Magnetic Resonance Imaging

We present PIANO, the first statistical hand bone model from MRI data, which is biologically correct, simple to animate, and differentiable. It enables anatomically fine-grained understanding of the hand kinematic structure.

International Joint Conferences on Artificial Intelligence Organization (IJCAI), 2021.
[Paper (Coming soon)]   [Project Page]   [Video]   [arXiv]   [bibtex]

Editable Free-Viewpoint Video using a Layered Neural Representation

We present the first approach to generate editable photo-realistic free-viewpoint videos of large-scale dynamic scenes using a new neural layered representation,which enables numerous photo-realistic visual editing effects.

ACM Transactions on Graphics (Proc. of SIGGRAPH), 2021.
[Paper]   [Project Page]   [Video]   [bibtex]

MirrorNeRF: One-shot Neural Portrait Radiance Field from Multi-mirror Catadioptric Imaging

We present a one-shot neural portrait rendering approach using a catadioptric imaging system with multiple sphere mirrors and a single high-resolution digital camera, which maintains low-cost and casual capture setting.

International Conference on Computational Photography (ICCP), 2021.
[Paper]   [Project Page (Coming soon)]   [Video]   [arXiv]   [bibtex]

Convolutional Neural Opacity Radiance Fields

We present a novel scheme to generate convolutional neural opacity radiance fields for fuzzy objects, which combines explicit opacity modeling with NeRF for high-quality appearance and alpha mattes generation.

International Conference on Computational Photography (ICCP), 2021.
[Paper]   [Project Page (Coming soon)]   [Video]   [arXiv]   [bibtex]

ChallenCap: Monocular 3D Capture of Challenging Human Performances using Multi-Modal References

We propose a robust monocualr human motion capture scheme for challenging scenarios with with extreme poses and complex motion patterns, which embrances multi-modal references in a data-driven manner.

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021. Oral
[Paper]   [Project Page (Coming soon)]   [Video]   [arXiv]   [bibtex]

NeuralHumanFVV: Real-Time Neural Volumetric Human Performance Rendering using RGB Cameras

We present a real-time human neural volumetric rendering system using only sparse RGB cameras, which generates both high-quality geometry and photo-realistic texture of human activities in arbitrary novel views.

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
[Paper]   [Project Page (Coming soon)]   [Video]   [arXiv]   [bibtex]


2020

BuildingFusion: Semantic-aware Structural Building-scale 3D Reconstruction

We propose an RGBD-based semantic-aware building-scale reconstruction system, which recovers building-scale dense geometry collaboratively and provides semantic and structural reconstruction on-the-fly.

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020.
[Paper]   [Video]   [bibtex]

Neural3D: Light-weight Neural Portrait Scanning via Context-aware Correspondence Learning

We propose RobustFusion –- a robust template-less human volumetric capture system combined with various data-driven visual cues using only a single RGBD sensor

Proceedings of the 26th ACM international conference on Multimedia (ACMMM), 2020.
[Paper]   [Video]   [bibtex]

RobustFusion: Human Volumetric Capture with Data-driven Visual Cues using a RGBD Camera

We propose RobustFusion –- a robust template-less human volumetric capture system combined with various data-driven visual cues using only a single RGBD sensor.

European Conference on Computer Vision and Pattern Recognition (ECCV), 2020. Sportlight
[Paper]   [Project Page(Coming soon)]   [Video]   [bibtex]

EventCap: Monocular 3D Capture of High-Speed Human Motions using an Event Camera

We propose the first approach for 3D capturing of high-speed human motions using a single event camera. We can capture fast motions at millisecond resolution with significantly higher data efficiency.

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020. Oral
[Paper]   [Project Page]   [Video]   [arXiv]   [bibtex]

OccuSeg: Occupancy-aware 3D Instance Segmentation

We propose an occupancy-aware 3D instance segmentation scheme, which achieves state-of-the-art performance on 3 real-world datasets, while maintaining high efficiency.

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
[Paper(Coming soon)]  [Project Page(Coming soon)]   [Video]   [arXiv]   [bibtex]

Multiscale-VR: Multiscale Gigapixel 3D Panoramic Videography for Virtual Reality

We propose a VR camera which can zoom-in to local regions at a great distance away, allowing multi-scale, gigapixel-level, and 3D panoramic videography for VR content generation.

International Conference on Computational Photography (ICCP), 2020. Oral
[Paper(Coming soon)]  [Project Page(Coming soon)]   [Video]   [bibtex]

Live Semantic 3D Perception for Immersive Augmented Reality

We present a real-time simultaneous 3D reconstruction and semantic segmentation system working on mobile devices, with a live immersive AR demo, where the users can interact with the environment.

IEEE Transactions on Visualization and Computer Graphics (Proc. IEEE VR), 2020.
[Paper]   [bibtex]


2019

UnstructuredFusion: Realtime 4D Geometry and Texture Reconstruction using Commercial RGBD Cameras

We propose UnstructuredFusion, which allows realtime, high-quality, complete reconstruction of 4D textured models of human performance via only three commercial RGBD cameras.

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2019.
[Paper]   [Video]   [Project Page(Coming soon)]   [bibtex]

FlyFusion: Realtime Dynamic Scene Reconstruction Using a Flying Depth Camera

We explore active dynamic scene reconstruction based on a single flying camera, wihch can adaptively select the capture view targeting on real-time dynamic scene reconstruction.

IEEE Transactions on Visualization and Computer Graphics (TVCG), 2019.
[Paper]   [Video]   [Project Page(Coming soon)]   [bibtex]

Real-Time Global Registration for Globally Consistent RGB-D SLAM

We achieve globally consistent pose estimation in real-time via CPU computing, and owns comparable accuracy as state-of-the-art that use GPU computing, enabling the practical usage of globally consistent RGB-D SLAM.

IEEE Transactions on Robotics (TRO), 2019.
[Paper]   [bibtex]


2018

FlyCap: Markerless motion capture using multiple autonomous flying cameras

We propose to use three autonomous flying cameras for motion capture, which simultaneously performs non-rigid reconstruction and localization of the camera in each frame and each view.

IEEE Transactions on Visualization and Computer Graphics (TVCG), 2018.
[Paper]   [Video]   [Project Page(Coming soon)]   [bibtex]

iHuman3D: Intelligent Human Body 3D Reconstruction using a Single Flying Camera

In this work, we present an adaptive human body 3D reconstruction system using a single fl ying camera, which removes the extra manual labor constraint.

ACM International Conference on Multimedia (ACMMM), 2018. Oral
[Paper]   [bibtex]

Beyond SIFT using binary features in loop closure detection

A binary feature based LCD approach is presented in this paper, which achieves the highest accuracy compared with state-of-the-art while running at 30Hz on a laptop.

IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2018. Oral
[Paper]   [bibtex]