Reseach Agenda:

We target top-tier research at the intersection of computer vision, computer graphics and machine learning. Our research mission is to capture, perceive and understand the human-centric dynamic and static scenes in the complex real world. Our goal is to digitalize humans, objects and events, and eventually to enable realistic and immersive tele-presence in virtual reality and augmented reality.


Recent Preprint

Instant Gaussian Splatting Generation for Realtime Facial Rendering

We introduce a diffusion transformer that instantly translates physically-based facial assets into the corresponding GauFace representations, delivering high fidelity and real-time facial interaction.


[Arxiv]   [Project Page]   [Video]   [bibtex (Coming soon)]


2024

Robust Dual Gaussian Splatting for Immersive Human-centric Volumetric Videos

We present a novel Gaussian-based representation DualGS for volumetric videos, achieving robust human performance tracking and high-fidelity rendering..

ACM Transactions on Graphics (Proc. of SIGGRAPH Asia), 2024.
[Arxiv]   [Project Page]   [Video]   [bibtex (Coming soon)]

V3: Viewing Volumetric Videos on Mobiles via Streamable 2D Dynamic Gaussians

We present a novel approach that enables high-quality mobile rendering through the streaming of dynamic Gaussians. It views the dynamic 3DGS as 2D videos to facilitate the use of hardware video codecs.

ACM Transactions on Graphics (Proc. of SIGGRAPH Asia), 2024.
[Arxiv]   [Project Page]   [Video]   [bibtex (Coming soon)]

LetsGo: Large-Scale Garage Modeling and Rendering via LiDAR-Assisted Gaussian Primitives

We introduce a text-driven 3D garment generation framework, DressCode, which aims to democratize design for novices and offer immense potential in fashion design, virtual try-on, and digital human creation.

ACM Transactions on Graphics (Proc. of SIGGRAPH Asia), 2024.
[Arxiv]   [Project Page]   [Video]   [bibtex (Coming soon)]

HiSC4D: Human-Centered Interaction and 4D Scene Capture in Large-Scale Space Using Wearable IMUs and LiDAR

We introduce a novel Human-centered interaction and 4D Scene Capture method to creat a dynamic digital world with large-scale indoor-outdoor scenes, diverse human motions, and rich human-human/enviroment interactions.

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024.
[Paper]   [Project Page]   [Video]   [bibtex]

InterGen: Diffusion-based Multi-human Motion Generation under Complex Interactions

We present an effective diffusion-based approach that enables layman users to customize high-quality two-person interaction motions, with only text guidance.

International Journal of Computer Vision (IJCV) , 2024.
[Paper]   [Project Page]   [Video]   [bibtex]

CLAY: A Controllable Large-scale Generative Model for Creating High-quality 3D Assets

We introduce a text-driven 3D garment generation framework, DressCode, which aims to democratize design for novices and offer immense potential in fashion design, virtual try-on, and digital human creation.

ACM Transactions on Graphics (Proc. of SIGGRAPH), 2024. Best Paper Honorable Mention
[Paper]   [Project Page]   [Video]   [bibtex]

DressCode: Autoregressively Sewing and Generating Garments from Text Guidance

We introduce a text-driven 3D garment generation framework, DressCode, which aims to democratize design for novices and offer immense potential in fashion design, virtual try-on, and digital human creation.

ACM Transactions on Graphics (Proc. of SIGGRAPH), 2024. Best Paper Honorable Mention
[Paper]   [Project Page]   [Video]   [bibtex]

Implicit Swept Volume SDF: Enabling Continuous Collision-Free Trajectory Generation for Arbitrary Shapes

We propose a novel hierarchical trajectory generation pipeline, which utilizes the Swept Volume Signed Distance Field (SVSDF) to guide trajectory optimization for Continuous Collision Avoidance.

ACM Transactions on Graphics (Proc. of SIGGRAPH), 2024.
[Paper]   [Project Page]   [Video]   [bibtex]

Media2Face: Co-speech Facial Animation Generation With Multi-Modality Guidance

We present a diffusion model in the latent space of the Generalized Neural Parametric Facial Asset, enabling co-speech facial animation generation from rich multi-modality guidances from audio, text, and image. .

Proceedings of the SIGGRAPH Conference, 2024.
[Paper]   [Project Page]   [Video]   [bibtex]

Gait Recognition in Large-scale Free Environment via Single

We propose FreeGait, a new LiDAR-based in-the-wild gait dataset under various crowd density and occlusion across different real-life scenes.

ACM International Conference on Multimedia (ACMMM), 2024. Oral
[Arxiv]   [Project Page (coming soon)]   [bibtex (coming soon)]

HmPEAR: A Dataset for Human Pose Estimation and Action Recognition

We propose a novel dataset, named HmPEAR, which integrates imagery and point cloud data for 3D Human pose estimation and human action recognition.

ACM International Conference on Multimedia (ACMMM), 2024.
[Paper (coming soon)]   [Project Page]   [bibtex (coming soon)]

OMG: Towards Open-vocabulary Motion Generation via Mixture of Controllers

We present a progressive scheme to generate personalized 3D faces with text guidance, which can customize 3D facial assets with the desired shape and physically-based textures, as well as empowered animation capabilities.

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024.
[Paper]   [Project Page]   [Video]   [bibtex]

BOTH2Hands: Inferring 3D Hands from Both Text Prompts and Body Dynamics

We present a progressive scheme to generate personalized 3D faces with text guidance, which can customize 3D facial assets with the desired shape and physically-based textures, as well as empowered animation capabilities.

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024.
[Paper]   [Project Page]   [Video]   [bibtex]

I'M HOI: Inertia-aware Monocular Capture of 3D Human-Object Interactions

We present a progressive scheme to generate personalized 3D faces with text guidance, which can customize 3D facial assets with the desired shape and physically-based textures, as well as empowered animation capabilities.

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024.
[Paper]   [Project Page]   [Video]   [bibtex]

HiFi4G: High-fidelity human performance rendering via compact gaussian splatting

We present a progressive scheme to generate personalized 3D faces with text guidance, which can customize 3D facial assets with the desired shape and physically-based textures, as well as empowered animation capabilities.

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024.
[Paper]   [Project Page]   [Video]   [bibtex]

VideoRF: Rendering Dynamic Radiance Fields as 2D Feature Video Streams

We present a progressive scheme to generate personalized 3D faces with text guidance, which can customize 3D facial assets with the desired shape and physically-based textures, as well as empowered animation capabilities.

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024.
[Paper]   [Project Page]   [Video]   [bibtex]

HOI-M^3: Capture Multiple Humans and Objects Interaction within Contextual Environment

We introduce HOI-M3, a novel large-scale dataset for modeling the interactions of Multiple huMans and Multiple objects, covering 199 sequences and 181M frames of diverse humans and objects under rich activities.

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024.
[Paper]   [Project Page]   [Video]   [bibtex]

RELI11D: A Comprehensive Multimodal Human Motion Dataset and Method

We present RELI11D, a high-quality multimodal human motion dataset involves RGB camera, Event camera, LiDAR and IMU system. It records the motions of 10 actors performing 5 sports in 7 scenes and 3.32 hours.

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024.
[Paper]   [Project Page]   [Video]   [bibtex]

LiveHPS: LiDAR-based Scene-level Human Pose and Shape Estimation in Free Environment

We present LiveHPS, a novel single-LiDAR-based approach for 3D HPS in large-scale scenarios, which is not limited to fixed studios, light conditions, and wearable devices.

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024.
[Paper]   [Project Page]   [Video]   [bibtex]

A Unified Diffusion Framework for Scene-aware Human Motion Estimation from Sparse Signals

We introduce a unified diffusion method, S2Fusion, tailored for the scene-aware human motion estimation with sparse signals.

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024.
[Paper]   [Project Page]   [Video]   [bibtex]


2023

DreamFace: Progressive Generation of Animatable 3D Faces under Text Guidance

We present a progressive scheme to generate personalized 3D faces with text guidance, which can customize 3D facial assets with the desired shape and physically-based textures, as well as empowered animation capabilities.

ACM Transactions on Graphics (Proc. of SIGGRAPH), 2023.
[Paper]   [Project Page]   [Video]   [bibtex]

HACK: Learning a Parametric Head and Neck Model for High-fidelity Animation

We present a novel parametric model for constructing the cervical region of digital humans which tackles the full spectrum of neck and larynx motions to offer more personalized and anatomically-consistent controls

ACM Transactions on Graphics (Proc. of SIGGRAPH), 2023.
[Paper]   [Project Page]   [Video]   [bibtex]

Free-bloom: Zero-shot text-to-video generator with llm director and ldm animator

We propose a novel zero-shot and training-free text-to-video approach, which mainly focuses on improving the narrative of the progression of events by harnessing the knowledge from the pre-trained LLM and LDM.

Neural Information Processing Systems (NeurIPS), 2023.
[Paper]   [Project Page]   [Video]

StackFLOW: Monocular Human-Object Reconstruction by Stacked Normalizing Flow with Offset

We propose a novel scheme to encode and capture highly detailed 3D human-object spatial relations from single-view images using Human-Object Offset, with a Stacked Normalizing Flow to infer the posterior distribution.

International Joint Conferences on Artificial Intelligence Organization (IJCAI), 2023.
[Paper]   [Project Page]   [Video]   [bibtex]

NeReF: Neural Refractive Field for Fluid Surface Reconstruction and Rendering

We propose a neural scene representation for refractive fluid surfaces, which can render the refraction effect directly from the implicit representation.

International Conference on Computational Photography (ICCP), 2023.
[Paper]   [Project Page]   [Video]   [bibtex]

ReRF: Neural Residual Radiance Fields for Streamably Free-Viewpoint Videos

We present a novel neural modeling technique that we call the Residual Radiance Field or ReRF as a highly compact representation of dynamic scenes, enabling high-quality FVV streaming and rendering.

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
[Paper]   [Project Page]   [Video]   [bibtex]

HumanGen: Generating Human Radiance Fields with Explicit Priors

We present a novel neural scheme to generate high-quality radiance fields for 3D humans, by explicitly utilizing richer priors from the top-tier 2D generation and 3D reconstruction schemes.

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
[Paper]   [Project Page]   [Video]   [bibtex]

Instant-NVR: Instant Neural Volumetric Rendering for Human-object Interactions from RGBD Stream

We present an instant neural volumetric rendering system for human-object interacting scenes using a single RGBD camera, via on-the-fly generation of the radiance fields for both the rigid object and dynamic human.

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
[Paper]   [Project Page]   [Video]   [bibtex]

NeuralDome: A Neural Modeling Pipeline on Multi-View Human-Object Interactions

We present a neural pipeline that takes multi-view dome capture as inputs and conducts accurate 3D modeling and photo-realistic rendering of complex human-object interaction.

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
[Paper]   [Project Page]   [Video]   [bibtex]

Relightable Neural Human Assets from Multi-view Gradient Illuminations

We contribute a new 3D human dataset that contains more than 2,000 high-quality human assets captured under both multi-view and multi-illumination settings.

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
[Paper]   [Project Page]   [Video]   [bibtex]

SLOPER4D: A Scene-Aware Dataset for Global 4D Human Pose Estimation in Urban Environments

We present a novel scene-aware dataset collected in large urban environments to facilitate the research of global human pose estimation with human-scene interaction in the wild.

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
[Paper]   [Project Page]   [Video]   [bibtex]

CIMI4D: A Large Multimodal Climbing Motion Dataset under Human-scene Interactions

We contribute a large rock climbing motion datasets, consiting of around 180,000 frames of inertial measurements, LiDAR point clouds, RGB videos, static point cloud scenes, and reconstructed scene meshes.

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
[Paper]   [Project Page]   [Video]   [bibtex]

LiDAR-aid Inertial Poser: Large-scale Human Motion Capture by Sparse Inertial and LiDAR Sensors

We present a novel approach to capture challenging human motionsin large-scale scenarios accurately using a light-weight hardware setup with only single LiDAR and 4 IMUs.

IEEE Transactions on Visualization and Computer Graphics (Proc. IEEE VR), 2023.
[Paper]   [Project Page]   [Video]   [bibtex]

HybridCap: Inertia-aid Monocular Capture of Challenging Human Motions

We present a high-quality inertia-aid monocular approach for capturing challenging human motions, using a light-weight hybrid setting with a single RGB camera and sparse IMUs.

Proceedings of the the Association for the Advance of Artificial Intelligence (AAAI), 2023.
[Paper]   [Project Page (Coming soon)]   [Video]   [bibtex]

IKOL: Inverse kinematics optimization layer for 3D human pose and shape estimation

We propose an inverse kinematics optimization layer that leverages the strengths of both optimization and regression for end-to-end 3D human pose and shape estimation.

Proceedings of the the Association for the Advance of Artificial Intelligence (AAAI), 2023.
[Paper]   [Project Page]   [Video]   [bibtex]

Weakly Supervised 3D Multi-person Pose Estimation for Large-scale Scenes

we propose a monocular camera and single LiDAR-based method for 3D multi-person pose estimation in large-scale scenes, which is easy to deploy and insensitive to light.

Proceedings of the the Association for the Advance of Artificial Intelligence (AAAI), 2023.
[Paper]   [Project Page]   [Video (Coming soon)]   [bibtex]


2022

Human Performance Modeling and Rendering via Neural Animated Mesh

We present a novel learning-based, video-driven approach to generate dynamic facial geometry along with high-quality physically-based textures including pore-level albedo,specular and normal maps for production.

ACM Transactions on Graphics (Proc. of SIGGRAPH Asia), 2022.
[Paper]   [Project Page]   [Video]   [bibtex]

Video-driven Neural Physically-based Facial Asset for Production

We present a novel learning-based, video-driven approach to generate dynamic facial geometry along with high-quality physically-based textures including pore-level albedo,specular and normal maps for production.

ACM Transactions on Graphics (Proc. of SIGGRAPH Asia), 2022.
[Paper]   [Project Page]   [Video]   [bibtex]

SCULPTOR: Skeleton-Consistent Face Creation Using a Learned Parametric Generator

We present SCULPTOR, a novel parametric facial generator, which jointly models the skull, geometry and appearance to create and facial features that define a character and maintain physiological soundness.

ACM Transactions on Graphics (Proc. of SIGGRAPH Asia), 2022.
[Paper]   [Project Page]   [Video]   [bibtex]

Artemis: Articulated Neural Pets with Appearance and Motion Synthesis

We present ARTEMIS, a novel neural modeling and rendering pipeline for generating ARTiculated neural pets with appEarance and Motion synthesIS, for real-time animation and photo-realistic rendering of furry animals.

ACM Transactions on Graphics (Proc. of SIGGRAPH), 2022.
[Paper]   [Project Page]   [Video]   [bibtex]

NIMBLE: a Non-rigid Hand Model with Bones and Muscles

We present NIMBLE, a novel parametric hand model that includes the missing key components, bringing 3D hand model to a new level of realism, learnt from a MRI dataset with detailed annotation on joint, bone and muscles.

ACM Transactions on Graphics (Proc. of SIGGRAPH), 2022.
[Paper]   [Project Page]   [Video]   [bibtex]

Mutual Adaptive Reasoning for Monocular 3D Multi-Person Pose Estimations

We propose to leverage the mutual benefits of both these subtasks. Within the framework, a robust structured 2.5D pose estimation is designed to recognize inter-person occlusion based on depth relationships.

ACM International Conference on Multimedia (ACMMM), 2022.
[Paper]   [bibtex]

NeuralHOFusion: Neural Volumetric Rendering under Human-object Interactions

We propose NeuralHOFusion for volumetric human-object capture and rendering using sparse consumer RGBD sensors, which marries traditional non-rigid fusion with recent neural implicit modeling and blending advances.

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
[Paper]   [Project Page]   [Video]   [Arxiv]   [bibtex]

Fourier PlenOctrees for Dynamic Radiance Field Rendering in Real-time

We present a Fourier PlenOctree (FPO) technique for neural dynamic scene representation, which enables effi cient neural modeling and real-time rendering of unseen dynamic objects with compact memory overload.

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022. Oral
[Paper]   [Project Page]   [Video]   [Arxiv]   [bibtex]

HumanNeRF: Efficiently Generated Human Radiance Field from Sparse Inputs

We present a neural free-view synthesis approach for general dynamic humans using only sparse RGB streams, which efficiently optimizes a more generalizable radiance field on-the-fly for unseen performers in an hour.

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
[Paper]   [Project Page]   [Video]   [Arxiv]   [bibtex]

LiDARCap: Long-range Marker-less 3D Human Motion Capture with LiDAR Point Clouds

We propose the first monocular LiDAR-based approach for marker-less, long-range 3D human motion capture in a data-driven manner using a new LiDARHuman26M dataset with rich modalities and ground-truth annotations.

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
[Paper]   [Project Page]   [bibtex]

HSC4D: Human-centered 4D Scene Capture in Large-scale Space Using Wearable IMUs and LiDAR

We present a Human-centered 4D Scene capture method to accurately and efficiently create a dynamicdigital world using only body-mounted IMUs and LiDAR.

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
[Paper]   [Project Page]   [Video]   [Arxiv]   [bibtex]

STCrowd: A Multimodal Dataset for Pedestrian Perception in Crowded Scenes

We propose a new multimodal dataset with diverse crowd densities, multiple scenes, various weather, and different human poses, which can facilitate many perceptio tasks like detection, tracking, and prediction.

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
[Paper]   [Project Page]   [Video]   [bibtex]

Anisotropic Fourier Features for Neural Image-Based Rendering and Relighting

We present an anisotropic RFF mapping scheme for a range of neural implicit image-based rendering and relighting tasks, which improves the performance by taking the RFF mapping into the new anisotropic realm.

Proceedings of the the Association for the Advance of Artificial Intelligence (AAAI), 2022.
[Paper]   [Project Page]   [Video]   [bibtex]

RobustFusion: Robust Volumetric Performance Reconstruction under Human-object Interactions

We present a robust volumetric performance reconstruction approach from a single RGBD stream, which solves the challenging ambiguity and occlusions under human-object interactions without pre-scanned templates.

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022.
[Paper]   [Video]   [Arxiv]   [bibtex]


2021

TightCap: 3D Human Shape Capture with Clothing Tightness Field

We present a data-driven approach to capture both human shape and dressed garments robustly from only a single complete 3D scanned mesh of the performer using clothing tightness field and the CTD dataset.

ACM Transactions on Graphics (TOG), 2021.
[Paper]   [Project Page]   [Video]   [Arxiv]   [bibtex]

SportsCap: Monocular 3D Motion Capture and Fine-Grained Understanding in Challenging Sports Videos

We present the first joint 3D motion capture and fine-grained understanding approach for various challenging sports movements from only a single RGB video input using mid-level sub-motion embedding analysis.

International Journal of Computer Vision (IJCV) , 2021.
[Paper]   [Project Page]   [Video]   [bibtex]

GNeRF: GAN-based Neural Radiance Field without Posed Camera

We present GNeRF, a method that can estimate camera poses and neural radiance fields jointly when the cameras are initialized at random poses in complex scenarios (outside-in scenes with less texture or intense noise ).

International Conference on Computer Vision (ICCV), 2021. Oral
[Paper]   [Project Page]   [Video]   [Arxiv]   [bibtex]

Neural Video Portrait Relighting in Real-time via Consistency Modeling

We present a approach for realistic video portrait relighting into new scenes with dynamic illuminations in real-time even even on portable device by jointly modeling the semantic, temporal and lighting consistency.

International Conference on Computer Vision (ICCV), 2021.
[Paper]   [Project Page]   [Video]   [Arxiv]   [bibtex]

Neural Free-Viewpoint Performance Rendering under Complex Human-object Interactions

We present a human-object neural volumetric rendering using only sparse RGB cameras, which generates both high-quality geometry and photo-realistic texture of human activities in novel views for interaction scenarios.

ACM International Conference on Multimedia (ACMMM), 2021. Oral
[Paper]   [Project Page]   [Video]   [bibtex]

iButter: Neural Interactive Bullet Time Generator for Human Free-viewpoint Rendering

We present an interactive bullet-time generator for human free-viewpoint rendering from multiple RGB streams. It enables trajectory-aware refinement and real-time dynamic NeRF rendering without tedious per-scene training.

ACM International Conference on Multimedia (ACMMM), 2021. Oral
[Paper]   [Project Page] [Video]   [bibtex]

Towards Controllable and Photorealistic Region-wise Image Manipulation

We build an auto-encoder for photorealistic region-wise style editing on real images, with the aid of code alignment loss and content consistency loss in a self-supervised manner to modulate the training process.

ACM International Conference on Multimedia (ACMMM), 2021.
[Paper]   [Video]   [bibtex]

Few-shot Neural Human Performance Rendering from Sparse RGBD Videos

We present the first few-shot neural human performance rendering approach using six sparse RGBD cameras which generates photorealistic texture of challenging human activities under the sparse capture setup.

International Joint Conferences on Artificial Intelligence Organization (IJCAI), 2021.
[Paper]   [Video]   [arXiv]   [bibtex]

PIANO: A Parametric Hand Bone Model from Magnetic Resonance Imaging

We present PIANO, the first statistical hand bone model from MRI data, which is biologically correct, simple to animate, and differentiable. It enables anatomically fine-grained understanding of the hand kinematic structure.

International Joint Conferences on Artificial Intelligence Organization (IJCAI), 2021.
[Paper]   [Project Page]   [Video]   [arXiv]   [bibtex]

Editable Free-Viewpoint Video using a Layered Neural Representation

We present the first approach to generate editable photo-realistic free-viewpoint videos of large-scale dynamic scenes using a new neural layered representation,which enables numerous photo-realistic visual editing effects.

ACM Transactions on Graphics (Proc. of SIGGRAPH), 2021.
[Paper]   [Project Page]   [Video]   [bibtex]

MirrorNeRF: One-shot Neural Portrait Radiance Field from Multi-mirror Catadioptric Imaging

We present a one-shot neural portrait rendering approach using a catadioptric imaging system with multiple sphere mirrors and a single high-resolution digital camera, which maintains low-cost and casual capture setting.

International Conference on Computational Photography (ICCP), 2021.
[Paper]   [Video]   [arXiv]   [bibtex]

Convolutional Neural Opacity Radiance Fields

We present a novel scheme to generate convolutional neural opacity radiance fields for fuzzy objects, which combines explicit opacity modeling with NeRF for high-quality appearance and alpha mattes generation.

International Conference on Computational Photography (ICCP), 2021.
[Paper]   [Project Page]   [Video]   [arXiv]   [bibtex]

ChallenCap: Monocular 3D Capture of Challenging Human Performances using Multi-Modal References

We propose a robust monocualr human motion capture scheme for challenging scenarios with with extreme poses and complex motion patterns, which embrances multi-modal references in a data-driven manner.

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021. Oral
[Paper]   [Video]   [arXiv]   [bibtex]

NeuralHumanFVV: Real-Time Neural Volumetric Human Performance Rendering using RGB Cameras

We present a real-time human neural volumetric rendering system using only sparse RGB cameras, which generates both high-quality geometry and photo-realistic texture of human activities in arbitrary novel views.

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
[Paper]   [Project Page]   [Video]   [arXiv]   [bibtex]


2020

BuildingFusion: Semantic-aware Structural Building-scale 3D Reconstruction

We propose an RGBD-based semantic-aware building-scale reconstruction system, which recovers building-scale dense geometry collaboratively and provides semantic and structural reconstruction on-the-fly.

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020.
[Paper]   [Video]   [bibtex]

Neural3D: Light-weight Neural Portrait Scanning via Context-aware Correspondence Learning

We propose RobustFusion –- a robust template-less human volumetric capture system combined with various data-driven visual cues using only a single RGBD sensor

Proceedings of the 26th ACM international conference on Multimedia (ACMMM), 2020.
[Paper]   [Video]   [bibtex]

RobustFusion: Human Volumetric Capture with Data-driven Visual Cues using a RGBD Camera

We propose RobustFusion –- a robust template-less human volumetric capture system combined with various data-driven visual cues using only a single RGBD sensor.

European Conference on Computer Vision and Pattern Recognition (ECCV), 2020. Sportlight
[Paper]   [Video]   [bibtex]

EventCap: Monocular 3D Capture of High-Speed Human Motions using an Event Camera

We propose the first approach for 3D capturing of high-speed human motions using a single event camera. We can capture fast motions at millisecond resolution with significantly higher data efficiency.

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020. Oral
[Paper]   [Project Page]   [Video]   [arXiv]   [bibtex]

OccuSeg: Occupancy-aware 3D Instance Segmentation

We propose an occupancy-aware 3D instance segmentation scheme, which achieves state-of-the-art performance on 3 real-world datasets, while maintaining high efficiency.

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
[Paper]   [Video]   [arXiv]   [bibtex]

Multiscale-VR: Multiscale Gigapixel 3D Panoramic Videography for Virtual Reality

We propose a VR camera which can zoom-in to local regions at a great distance away, allowing multi-scale, gigapixel-level, and 3D panoramic videography for VR content generation.

International Conference on Computational Photography (ICCP), 2020. Oral
[Paper]   [Video]   [bibtex]

Live Semantic 3D Perception for Immersive Augmented Reality

We present a real-time simultaneous 3D reconstruction and semantic segmentation system working on mobile devices, with a live immersive AR demo, where the users can interact with the environment.

IEEE Transactions on Visualization and Computer Graphics (Proc. IEEE VR), 2020.
[Paper]   [bibtex]


2019

UnstructuredFusion: Realtime 4D Geometry and Texture Reconstruction using Commercial RGBD Cameras

We propose UnstructuredFusion, which allows realtime, high-quality, complete reconstruction of 4D textured models of human performance via only three commercial RGBD cameras.

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2019.
[Paper]   [Video]   [bibtex]

FlyFusion: Realtime Dynamic Scene Reconstruction Using a Flying Depth Camera

We explore active dynamic scene reconstruction based on a single flying camera, wihch can adaptively select the capture view targeting on real-time dynamic scene reconstruction.

IEEE Transactions on Visualization and Computer Graphics (TVCG), 2019.
[Paper]   [Video]   [bibtex]

Real-Time Global Registration for Globally Consistent RGB-D SLAM

We achieve globally consistent pose estimation in real-time via CPU computing, and owns comparable accuracy as state-of-the-art that use GPU computing, enabling the practical usage of globally consistent RGB-D SLAM.

IEEE Transactions on Robotics (TRO), 2019.
[Paper]   [bibtex]


2018

FlyCap: Markerless motion capture using multiple autonomous flying cameras

We propose to use three autonomous flying cameras for motion capture, which simultaneously performs non-rigid reconstruction and localization of the camera in each frame and each view.

IEEE Transactions on Visualization and Computer Graphics (TVCG), 2018.
[Paper]   [Video]   [bibtex]

iHuman3D: Intelligent Human Body 3D Reconstruction using a Single Flying Camera

In this work, we present an adaptive human body 3D reconstruction system using a single fl ying camera, which removes the extra manual labor constraint.

ACM International Conference on Multimedia (ACMMM), 2018. Oral
[Paper]   [bibtex]

Beyond SIFT using binary features in loop closure detection

A binary feature based LCD approach is presented in this paper, which achieves the highest accuracy compared with state-of-the-art while running at 30Hz on a laptop.

IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2018. Oral
[Paper]   [bibtex]