Sifei Liu

@nvidia.com

NVIDIA LPR
NVIDIA

Sifei Liu

EDUCATION

PhD, University of California, Merced

RESEARCH INTERESTS

Computer Vision, Machine Learning
78

Scopus Publications

12736

Scholar Citations

48

Scholar h-index

81

Scholar i10-index

Scopus Publications

  • SSE: Multimodal Semantic Data Selection and Enrichment for Industrial-scale Data Assimilation
    Maying Shen, Nadine Chang, Sifei Liu, Jose M. Alvarez
    Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2025
  • Synthesizing Consistent Novel Views Via 3D Epipolar Attention Without Re-Training
    Botao Ye, Sifei Liu, Xueting Li, Marc Pollefeys, Ming-Hsuan Yang
    Proceedings 2025 International Conference on 3D Vision 3dv 2025, 2025
    Large diffusion models demonstrate remarkable zeroshot capabilities in novel view synthesis from a single image. However, these models often face challenges in maintaining consistency across novel and reference views. A crucial factor leading to this issue is the limited utilization of contextual information from reference views. Specifically, when there is an overlap in the viewing frustum between two views, it is essential to ensure that the corresponding regions maintain consistency in both geometry and appearance. This observation leads to a simple yet effective approach, where we propose to use epipolar geometry to locate and retrieve overlapping information from the input view. This information is then incorporated into the generation of target views, eliminating the need for training or fine-tuning, as the process requires no learnable parameters. Furthermore, to enhance the overall consistency of generated views, we extend the utilization of epipolar attention to a multi-view setting, allowing retrieval of overlapping information from the input view and other target views. Qualitative and quantitative experimental results demonstrate the effectiveness of our method in significantly improving the consistency of synthesized views without the need for any fine-tuning. Moreover, This enhancement also boosts the performance of downstream applications such as 3D reconstruction. The code is available at https://github.com/botaoye/ConsisSyn.
  • M3: 3D-SPATIAL MULTIMODAL MEMORY
    13th International Conference on Learning Representations Iclr 2025, 2025
  • NO POSE, NO PROBLEM: SURPRISINGLY SIMPLE 3D GAUSSIAN SPLATS FROM SPARSE UNPOSED IMAGES
    13th International Conference on Learning Representations Iclr 2025, 2025
  • Compositional Text-to-Image Generation with Feedforward Layout Generation
    Sifei Liu, Weili Nie, An-Chieh Cheng, Morteza Mardani, Chao Liu, Benjamin Eckart, Arash Vahdat
    Lecture Notes in Computer Science, 2025
  • Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks
    Miran Heo, Min-Hung Chen, De-An Huang, Sifei Liu, Subhashree Radhakrishnan, Seon Joo Kim, Yu-Chiang Frank Wang, Ryo Hachiuma
    Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2025
    We present Omni-RGPT, a multimodal large language model designed to facilitate region-level comprehension for both images and videos. To achieve consistent region representation across spatio-temporal dimensions, we introduce Token Mark, a set of tokens highlighting the target regions within the visual feature space. These tokens are directly embedded into spatial regions using region prompts (e.g., boxes or masks) and simultaneously incorporated into the text prompt to specify the target, establishing a direct connection between visual and text tokens. To further support robust video understanding without requiring tracklets, we introduce an auxiliary task that guides Token Mark by leveraging the consistency of the tokens, enabling stable region interpretation across the video. Additionally, we introduce a large-scale region-level video instruction dataset (RegVID300k). Omni-RGPT achieves state-of-the-art results on image and video-based commonsense reasoning benchmarks while showing strong performance in captioning and referring expression comprehension tasks.
  • Parallel Sequence Modeling via Generalized Spatial Propagation Network
    Hongjun Wang, Wonmin Byeon, Jiarui Xu, Jinwei Gu, Ka Chun Cheung, Xiaolong Wang, Kai Han, Jan Kautz, Sifei Liu
    Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2025
    We present the Generalized Spatial Propagation Network (GSPN), a new attention mechanism optimized for vision tasks that inherently captures 2D spatial structures. Existing attention models, including transformers, linear attention, and state-space models like Mamba, process multidimensional data as 1D sequences, compromising spatial coherence and efficiency. GSPN overcomes these limitations by directly operating on spatially coherent image data and forming dense pairwise connections through a line-scan approach. Central to GSPN is the Stability-Context Condition, which ensures stable, long-context propagation across 2D sequences and reduces the effective sequence length to $\sqrt N $ for a square map with N elements, which significantly enhances computational efficiency. With learnable, input-dependent weights and no reliance on positional embeddings, GSPN achieves superior spatial fidelity and state-of-the-art performance in vision tasks, including ImageNet classification, class-guided image generation, and text-to-image generation. Notably, GSPN accelerates SD-XL with softmax-attention by over 84× when generating 16K images. Project page: https://whj363636.github.io/GSPN/
  • Scaling Vision Pre-Training to 4K Resolution
    Baifeng Shi, Boyi Li, Han Cai, Yao Lu, Sifei Liu, Marco Pavone, Jan Kautz, Song Han, Trevor Darrell, Pavlo Molchanov, Hongxu Yin
    Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2025
    High-resolution perception of visual details is crucial for daily tasks. Current vision pre-training, however, is still limited to low resolutions (e.g., 378×378 pixels) due to the quadratic cost of processing larger images. We introduce PS3 that scales CLIP-style vision pre-training to 4K resolution with a near-constant cost. Instead of contrastive learning on global image representation, PS3 is pre-trained by selectively processing local regions and contrasting them with local detailed captions, enabling high-resolution representation learning with greatly reduced computational overhead. The pre-trained PS3 is able to both encode the global image at low resolution and selectively process local high-resolution regions based on their saliency or relevance to a text prompt. When applying PS3 to multi-modal LLM (MLLM), the resulting model, named VILA-HD, significantly improves high-resolution visual perception compared to baselines without high-resolution vision pre-training such as AnyRes and S<sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> while using up to 4.3× fewer tokens. PS3 also unlocks appealing scaling properties of VILA-HD, including scaling up resolution for free and scaling up test-time compute for better performance. Compared to state of the arts, VILA-HD outperforms previous MLLMs such as NVILA and Qwen2-VL across multiple benchmarks and achieves better efficiency than latest token pruning approaches. Finally, we find current benchmarks do not require 4K-resolution perception, which motivates us to propose 4KPro, a new benchmark of image QA at 4K resolution, on which VILA-HD outperforms all previous MLLMs, including a 14.5% improvement over GPT-4o, and a 3.2% improvement and 2.96× speedup over Qwen2-VL.
  • NVILA: Efficient Frontier Visual Language Models
    Zhijian Liu, Ligeng Zhu, Baifeng Shi, Zhuoyang Zhang, Yuming Lou, Shang Yang, Haocheng Xi, Shiyi Cao, Yuxian Gu, Dacheng Li, Xiuyu Li, Haotian Tang, Yunhao Fang, Yukang Chen, Cheng-Yu Hsieh, De-An Huang, An-Chieh Cheng, Jinyi Hu, Sifei Liu, Ranjay Krishna, Pavlo Molchanov, Jan Kautz, Hongxu Yin, Song Han, Yao Lu
    Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2025
    Visual language models (VLMs) have made significant advances in accuracy in recent years. However, their efficiency has received much less attention. This paper introduces NVILA, a family of open VLMs designed to optimize both efficiency and accuracy. Building on top of VILA, we improve its model architecture by first scaling up the spatial and temporal resolutions, and then compressing visual tokens. This "scale-then-compress" approach enables NVILA to efficiently process high-resolution images and long videos. We also conduct a systematic investigation to enhance the efficiency of NVILA throughout its entire lifecycle, from training to deployment. NVILA matches or surpasses the accuracy of many leading open and proprietary VLMs across a wide range of image and video benchmarks. At the same time, it reduces training costs by 1.9-5.1×, prefilling latency by 1.6-2.2×, and decoding latency by 1.2-2.8×.
  • BlobGEN-3D: Compositional 3D-Consistent Freeview Image Generation with 3D Blobs
    Chao Liu, Weili Nie, Sifei Liu, Abhishek Badki, Hang Su, Morteza Mardani, Benjamin Eckart, Arash Vahdat
    Proceedings SIGGRAPH Asia 2024 Conference Papers SA 2024, 2024
  • CosAE: Learnable Fourier Series for Image Restoration
    Advances in Neural Information Processing Systems, 2024
  • Physics-based Indirect Illumination for Inverse Rendering
    Youming Deng, Xueting Li, Sifei Liu, Ming-Hsuan Yang
    Proceedings 2024 International Conference on 3D Vision 3dv 2024, 2024
  • RGBD Objects in the Wild: Scaling Real-World 3D Object Learning from RGB-D Videos
    Hongchi Xia, Yang Fu, Sifei Liu, Xiaolong Wang
    Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2024
  • TUVF: LEARNING GENERALIZABLE TEXTURE UV RADIANCE FIELDS
    12th International Conference on Learning Representations Iclr 2024, 2024
  • A Unified Approach for Text-and Image-Guided 4D Scene Generation
    Yufeng Zheng, Xueting Li, Koki Nagano, Sifei Liu, Otmar Hilliges, Shalini De Mello
    Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2024
  • Compositional Text-to-Image Generation with Dense Blob Representations
    Proceedings of Machine Learning Research, 2024
  • HOIDiffusion: Generating Realistic 3D Hand-Object Interaction Data
    Mengqi Zhang, Yang Fu, Zheng Ding, Sifei Liu, Zhuowen Tu, Xiaolong Wang
    Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2024
  • COLMAP-Free 3D Gaussian Splatting
    Yang Fu, Xiaolong Wang, Sifei Liu, Amey Kulkarni, Jan Kautz, Alexei A. Efros
    Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2024
  • AGG: Amortized Generative 3D Gaussians for Single Image to 3D
    Transactions on Machine Learning Research, 2024
  • 3D RECONSTRUCTION WITH GENERALIZABLE NEURAL FIELDS USING SCENE PRIORS
    12th International Conference on Learning Representations Iclr 2024, 2024
  • RegionGPT: Towards Region Understanding Vision Language Model
    Qiushan Guo, Shalini De Mello, Hongxu Yin, Wonmin Byeon, Ka Chun Cheung, Yizhou Yu, Ping Luo, Sifei Liu
    Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2024
  • SpatialRGPT: Grounded Spatial Reasoning in Vision-Language Models
    Advances in Neural Information Processing Systems, 2024
  • Self-Supervised Super-Plane for Neural 3D Reconstruction
    Botao Ye, Sifei Liu, Xueting Li, Ming-Hsuan Yang
    Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2023
  • Generalizable One-shot 3D Neural Head Avatar
    Advances in Neural Information Processing Systems, 2023
  • Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models
    Jiarui Xu, Sifei Liu, Arash Vahdat, Wonmin Byeon, Xiaolong Wang, Shalini De Mello
    Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2023
  • Zero-shot Pose Transfer for Unrigged Stylized 3D Characters
    Jiashun Wang, Xueting Li, Sifei Liu, Shalini De Mello, Orazio Gallo, Xiaolong Wang, Jan Kautz
    Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2023
  • Affordance Diffusion: Synthesizing Hand-Object Interactions
    Yufei Ye, Xueting Li, Abhinav Gupta, Shalini De Mellon, Stan Birchfield, Jiaming Song, Shubham Tulsiani, Sifei Liu
    Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2023
  • Deblurring Dynamic Scenes via Spatially Varying Recurrent Neural Networks
    Wenqi Ren, Jiawei Zhang, Jinshan Pan, Sifei Liu, Jimmy S. J. Ren, Junping Du, Xiaochun Cao, Ming-Hsuan Yang
    IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022
  • Correction to: Learning Contrastive Representation for Semantic Correspondence (International Journal of Computer Vision, (2022), 130, 5, (1293-1309), 10.1007/s11263-022-01602-y)
    Taihong Xiao, Sifei Liu, Shalini De Mello, Zhiding Yu, Jan Kautz, Ming-Hsuan Yang
    International Journal of Computer Vision, 2022
  • Learning Contrastive Representation for Semantic Correspondence
    Taihong Xiao, Sifei Liu, Shalini De Mello, Zhiding Yu, Jan Kautz, Ming-Hsuan Yang
    International Journal of Computer Vision, 2022
  • Autoregressive 3D Shape Generation via Canonical Mapping
    An-Chieh Cheng, Xueting Li, Sifei Liu, Min Sun, Ming-Hsuan Yang
    Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 2022
  • Scraping Textures from Natural Images for Synthesis and Editing
    Xueting Li, Xiaolong Wang, Ming-Hsuan Yang, Alexei A. Efros, Sifei Liu
    Lecture Notes in Computer Science, 2022
  • LEARNING CONTINUOUS ENVIRONMENT FIELDS VIA IMPLICIT FUNCTIONS
    Iclr 2022 10th International Conference on Learning Representations, 2022
  • CoordGAN: Self-Supervised Dense Correspondences Emerge from GANs
    Jiteng Mu, Shalini De Mello, Zhiding Yu, Nuno Vasconcelos, Xiaolong Wang, Jan Kautz, Sifei Liu
    Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2022
  • GroupViT: Semantic Segmentation Emerges from Text Supervision
    Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, Thomas Breuel, Jan Kautz, Xiaolong Wang
    Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2022
  • Learning Continuous Image Representation with Local Implicit Image Function
    Yinbo Chen, Sifei Liu, Xiaolong Wang
    Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2021
  • Video Autoencoder: self-supervised disentanglement of static 3D structure and motion
    Zihang Lai, Sifei Liu, Alexei A. Efros, Xiaolong Wang
    Proceedings of the IEEE International Conference on Computer Vision, 2021
  • Video Matting via Consistency-Regularized Graph Neural Networks
    Tiantian Wang, Sifei Liu, Yapeng Tian, Kai Li, Ming-Hsuan Yang
    Proceedings of the IEEE International Conference on Computer Vision, 2021
  • Learning 3D Dense Correspondence via Canonical Point Autoencoder
    Advances in Neural Information Processing Systems, 2021
  • Semi-supervised 3D hand-object poses estimation with interactions in time
    Shaowei Liu, Hanwen Jiang, Jiarui Xu, Sifei Liu, Xiaolong Wang
    Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2021
  • Synthesizing Long-Term 3D Human Motion and Interaction in 3D Scenes
    Jiashun Wang, Huazhe Xu, Jingwei Xu, Sifei Liu, X. Wang
    Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2021
  • Regularizing Meta-learning via Gradient Dropout
    Hung-Yu Tseng, Yi-Wen Chen, Yi-Hsuan Tsai, Sifei Liu, Yen-Yu Lin, Ming-Hsuan Yang
    Lecture Notes in Computer Science, 2021
  • Learning to Track Instances without Video Annotations
    Yang Fu, Sifei Liu, Umar Iqbal, Shalini De Mello, Humphrey Shi, J. Kautz
    Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2021
  • Hierarchical Contrastive Motion Learning for Video Action Recognition
    32nd British Machine Vision Conference Bmvc 2021, 2021
  • CONTRASTIVE SYN-TO-REAL GENERALIZATION
    Iclr 2021 9th International Conference on Learning Representations, 2021
  • Coupled Segmentation and Edge Learning via Dynamic Graph Propagation
    Advances in Neural Information Processing Systems, 2021
  • Self-Supervised Object Detection via Generative Image Synthesis
    Siva Karthik Mustikovela, Shalini De Mello, Aayush Prakash, Umar Iqbal, Sifei Liu, Thu Nguyen-Phuoc, Carsten Rother, Jan Kautz
    Proceedings of the IEEE International Conference on Computer Vision, 2021
  • Weakly-Supervised Semantic Segmentation by Iterative Affinity Learning
    Xiang Wang, Sifei Liu, Huimin Ma, Ming-Hsuan Yang
    International Journal of Computer Vision, 2020
  • Few-shot viewpoint estimation
    30th British Machine Vision Conference 2019 Bmvc 2019, 2020
  • Self-Supervised Viewpoint Learning from Image Collections
    Siva Karthik Mustikovela, V. Jampani, Shalini De Mello, Sifei Liu, Umar Iqbal, C. Rother, J. Kautz
    Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2020
  • Online adaptation for consistent mesh reconstruction in the wild
    Advances in Neural Information Processing Systems, 2020
  • Self-supervised Single-View 3D Reconstruction via Semantic Consistency
    Xueting Li, Sifei Liu, Kihwan Kim, Shalini De Mello, V. Jampani, Ming-Hsuan Yang, J. Kautz
    Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 2020
  • Learning propagation for arbitrarily-structured data
    Sifei Liu, Xueting Li, V. Jampani, Shalini De Mello, J. Kautz
    Proceedings of the IEEE International Conference on Computer Vision, 2019
  • Low-Light Image Enhancement via a Deep Hybrid Network
    Wenqi Ren, Sifei Liu, Lin Ma, Qianqian Xu, Xiangyu Xu, Xiaochun Cao, Junping Du, Ming-Hsuan Yang
    IEEE Transactions on Image Processing, 2019
  • Learning linear transformations for fast image and video style transfer
    Xueting Li, Sifei Liu, J. Kautz, Ming-Hsuan Yang
    Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2019
  • Putting humans in a scene: Learning affordance in 3D indoor environments
    Xueting Li, Sifei Liu, Kihwan Kim, Xiaolong Wang, Ming-Hsuan Yang, J. Kautz
    Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2019
  • SCOPS: Self-supervised co-part segmentation
    W. Hung, V. Jampani, Sifei Liu, Pavlo Molchanov, Ming-Hsuan Yang, J. Kautz
    Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2019
  • Joint-task self-supervised learning for temporal correspondence
    Advances in Neural Information Processing Systems, 2019
  • Learning Dual Convolutional Neural Networks for Low-Level Vision
    Jinshan Pan, Sifei Liu, Deqing Sun, Jiawei Zhang, Yang Liu, Jimmy S. J. Ren, Zechao Li, Jinhui Tang, Huchuan Lu, Yu-Wing Tai, Ming-Hsuan Yang
    Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2018
  • Hallucinating Compressed Face Images
    Chih-Yuan Yang, Sifei Liu, Ming-Hsuan Yang
    International Journal of Computer Vision, 2018
  • Learning video-story composition via recurrent neural network
    Guangyu Zhong, Yi-Hsuan Tsai, Sifei Liu, Zhixun Su, Ming-Hsuan Yang
    Proceedings 2018 IEEE Winter Conference on Applications of Computer Vision Wacv 2018, 2018
  • Context-aware synthesis and placement of object instances
    Advances in Neural Information Processing Systems, 2018
  • Rendering portraitures from monocular camera and beyond
    Xiangyu Xu, Deqing Sun, Sifei Liu, Wenqi Ren, Yujin Zhang, Ming-Hsuan Yang, Jian Sun
    Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 2018
  • Switchable temporal propagation network
    Sifei Liu, Guangyu Zhong, Shalini De Mello, Jinwei Gu, Ming-Hsuan Yang, J. Kautz
    Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 2018
  • Unsupervised Domain Adaptation for Face Recognition in Unlabeled Videos
    Kihyuk Sohn, Sifei Liu, Guangyu Zhong, Xiang Yu, Ming-Hsuan Yang, Manmohan Chandraker
    Proceedings of the IEEE International Conference on Computer Vision, 2017
  • Generative face completion
    Yijun Li, Sifei Liu, Jimei Yang, Ming-Hsuan Yang
    Proceedings 30th IEEE Conference on Computer Vision and Pattern Recognition Cvpr 2017, 2017
  • Face parsing via recurrent propagation
    Sifei Liu, Jianping Shi, Liang Ji, Ming-Hsuan Yang
    British Machine Vision Conference 2017 Bmvc 2017, 2017
  • Learning affinity via spatial propagation networks
    Advances in Neural Information Processing Systems, 2017
  • Learning recursive filters for low-level vision via a hybrid neural network
    Sifei Liu, Jinshan Pan, Ming-Hsuan Yang
    Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 2016
  • Deep cascaded Bi-network for face hallucination
    Shizhan Zhu, Sifei Liu, Chen Change Loy, Xiaoou Tang
    Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 2016
  • Multi-objective convolutional learning for face labeling
    Sifei Liu, Jimei Yang, Chang Huang, Ming-Hsuan Yang
    Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2015
  • Compressed face hallucination
    Sifei Liu, Ming-Hsuan Yang
    2014 IEEE International Conference on Image Processing Icip 2014, 2014
  • Structured face hallucination
    Chih-Yuan Yang, Sifei Liu, Ming-Hsuan Yang
    Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2013
  • Heterogeneous face image matching using multi-scale features
    Sifei Liu, Dong Yi, Zhen Lei, Stan Z. Li
    Proceedings 2012 5th Iapr International Conference on Biometrics Icb 2012, 2012
  • Discriminant analysis with Gabor phase for robust face recognition
    Jianfei Zhu, Dong Cao, Sifei Liu, Zhen Lei, Stan Z. Li
    Proceedings 2012 5th Iapr International Conference on Biometrics Icb 2012, 2012
  • A face antispoofing database with diverse attacks
    Zhiwei Zhang, Junjie Yan, Sifei Liu, Zhen Lei, Dong Yi, Stan Z. Li
    Proceedings 2012 5th Iapr International Conference on Biometrics Icb 2012, 2012
  • Face alignment under partial occlusion in near infrared images
    Sifei Liu, Dong Yi, Bin Li, Stan Z. Li
    2010 Chinese Conference on Pattern Recognition Ccpr 2010 Proceedings, 2010
  • Novel method for fire smoke recognition based on Gabor wavelet
    Yi Qi Yi Biao Xue Bao Chinese Journal of Scientific Instrument, 2010

RECENT SCHOLAR PUBLICATIONS

  • Context-aware synthesis and placement of object instances
    D Lee, S Liu, J Gu, MY Liu, J Kautz
    US Patent App. 19/433,543 , 2026
    2026
  • Scaling rl to long videos
    Y Chen, W Huang, B Shi, Q Hu, H Ye, L Zhu, Z Liu, P Molchanov, J Kautz, ...
    Advances in Neural Information Processing Systems 38, 172842-172870 , 2026
    2026
    Citations: 72
  • Diffusion-based open-vocabulary segmentation
    J Xu, S De Mello, S Liu, A Vahdat, W Byeon
    US Patent 12,586,199 , 2026
    2026
    Citations: 8
  • Compositional 3d-consistent freeview image generation with 3d blobs
    C Liu, W Nie, S Liu, AH Badki, H Su, M Mardani, BD Eckart, A Vahdat
    US Patent App. 19/227,222 , 2026
    2026
  • Techniques for fine-tuning a machine learning model to reconstruct a three-dimensional scene
    Y Fu, S Liu, J Kautz, X Li, S De Mello, A Kulkarni, M Naphade
    US Patent 12,548,234 , 2026
    2026
    Citations: 2
  • Techniques for training a machine learning model to reconstruct different three-dimensional scenes
    Y Fu, S Liu, J Kautz, X Li, S De Mello, A Kulkarni, M Naphade
    US Patent 12,548,258 , 2026
    2026
  • Learnable fourier series for image restoration
    S Liu, S De Mello, J Kautz
    US Patent App. 18/975,124 , 2026
    2026
  • Training and inferencing using a neural network to predict orientations of objects in images
    SK Mustikovela, V Jampani, S De Mello, S Liu, U Iqbal, J Kautz
    US Patent App. 19/094,621 , 2025
    2025
  • Context-aware synthesis and placement of object instances
    D Lee, S Liu, J Gu, MY Liu, J Kautz
    US Patent 12,462,453 , 2025
    2025
    Citations: 1
  • Segmentation using an unsupervised neural network training technique
    V Jampani, WC Hung, S Liu, P Molchanov, J Kautz
    US Patent 12,450,748 , 2025
    2025
  • Token-Efficient VLM: High-Resolution Image Understanding Via Dynamic Region Proposal
    Y Jiang, J Gu, T Xue, KC Cheung, P Molchanov, H Yin, S Liu
    2025 IEEE/CVF International Conference on Computer Vision (ICCV), 24147-24158 , 2025
    2025
    Citations: 5
  • OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM
    H Ye, CHH Yang, A Goel, W Huang, L Zhu, Y Su, S Lin, AC Cheng, Z Wan, ...
    arXiv preprint arXiv:2510.15870 , 2025
    2025
    Citations: 10
  • QeRL: Beyond Efficiency--Quantization-enhanced Reinforcement Learning for LLMs
    W Huang, Y Ge, S Yang, Y Xiao, H Mao, Y Lin, H Ye, S Liu, KC Cheung, ...
    arXiv preprint arXiv:2510.11696 , 2025
    2025
    Citations: 7
  • Compositional text-to-image generation with dense blob representations
    W Nie, S Liu, MM Korani, C Liu, BD Eckart, A Vahdat
    US Patent App. 18/889,975 , 2025
    2025
  • 3d aware region prompted vision language model
    AC Cheng, Y Fu, Y Chen, Z Liu, X Li, S Radhakrishnan, S Han, Y Lu, ...
    arXiv preprint arXiv:2509.13317 , 2025
    2025
    Citations: 19
  • Region-aware vision language processor
    Q Guo, S De Mello, H Yin, W Byeon, KC Cheung, SCW See, J Kautz, ...
    US Patent App. 19/065,367 , 2025
    2025
  • Machine learning framework applied in a semi-supervised setting to perform instance tracking in a sequence of image frames
    Y Fu, S Liu, U Iqbal, S De Mello, J Kautz
    US Patent 12,400,341 , 2025
    2025
    Citations: 1
  • Sse: Multimodal semantic data selection and enrichment for industrial-scale data assimilation
    M Shen, N Chang, S Liu, JM Alvarez
    Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and … , 2025
    2025
    Citations: 4
  • Egovla: Learning vision-language-action models from egocentric human videos
    R Yang, Q Yu, Y Wu, R Yan, B Li, AC Cheng, X Zou, Y Fang, X Cheng, ...
    arXiv preprint arXiv:2507.12440 , 2025
    2025
    Citations: 72
  • View synthesis using camera poses learned from a video
    Y Fu, S Liu, A Kulkarni, J Kautz
    US Patent App. 18/963,075 , 2025
    2025
    Citations: 1

MOST CITED SCHOLAR PUBLICATIONS

  • Learning continuous image representation with local implicit image function
    Y Chen, S Liu, X Wang
    Proceedings of the IEEE/CVF conference on computer vision and pattern … , 2021
    2021
    Citations: 1182
  • A face antispoofing database with diverse attacks
    Z Zhang, J Yan, S Liu, Z Lei, D Yi, SZ Li
    2012 5th IAPR international conference on Biometrics (ICB), 26-31 , 2012
    2012
    Citations: 1120
  • Groupvit: Semantic segmentation emerges from text supervision
    J Xu, S De Mello, S Liu, W Byeon, T Breuel, J Kautz, X Wang
    Proceedings of the IEEE/CVF conference on computer vision and pattern … , 2022
    2022
    Citations: 868
  • Generative face completion
    Y Li, S Liu, J Yang, MH Yang
    Proceedings of the IEEE conference on computer vision and pattern … , 2017
    2017
    Citations: 849
  • Open-vocabulary panoptic segmentation with text-to-image diffusion models
    J Xu, S Liu, A Vahdat, W Byeon, X Wang, S De Mello
    Proceedings of the IEEE/CVF conference on computer vision and pattern … , 2023
    2023
    Citations: 752
  • Low-light image enhancement via a deep hybrid network
    W Ren, S Liu, L Ma, Q Xu, X Xu, X Cao, J Du, MH Yang
    IEEE Transactions on Image Processing 28 (9), 4364-4375 , 2019
    2019
    Citations: 592
  • Spatialrgpt: Grounded spatial reasoning in vision-language models
    AC Cheng, H Yin, Y Fu, Q Guo, R Yang, J Kautz, X Wang, S Liu
    Advances in Neural Information Processing Systems 37, 135062-135093 , 2024
    2024
    Citations: 431
  • Learning affinity via spatial propagation networks
    S Liu, S De Mello, J Gu, G Zhong, MH Yang, J Kautz
    Advances in Neural Information Processing Systems 30 , 2017
    2017
    Citations: 372
  • Learning linear transformations for fast image and video style transfer
    X Li, S Liu, J Kautz, MH Yang
    Proceedings of the IEEE/CVF conference on computer vision and pattern … , 2019
    2019
    Citations: 338
  • COLMAP-Free 3D Gaussian Splatting
    Y Fu, S Liu, A Kulkarni, J Kautz, AA Efros, X Wang
    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern … , 2024
    2024
    Citations: 307
  • Self-supervised single-view 3d reconstruction via semantic consistency
    X Li, S Liu, K Kim, S De Mello, V Jampani, MH Yang, J Kautz
    European Conference on Computer Vision, 677-693 , 2020
    2020
    Citations: 307
  • Deep cascaded bi-network for face hallucination
    S Zhu, S Liu, CC Loy, X Tang
    European conference on computer vision, 614-630 , 2016
    2016
    Citations: 297
  • Learning dual convolutional neural networks for low-level vision
    J Pan, S Liu, D Sun, J Zhang, Y Liu, J Ren, Z Li, J Tang, H Lu, YW Tai, ...
    Proceedings of the IEEE conference on computer vision and pattern … , 2018
    2018
    Citations: 266
  • Semi-supervised 3d hand-object poses estimation with interactions in time
    S Liu, H Jiang, J Xu, S Liu, X Wang
    Proceedings of the IEEE/CVF conference on computer vision and pattern … , 2021
    2021
    Citations: 256
  • Learning recursive filters for low-level vision via a hybrid neural network
    S Liu, J Pan, MH Yang
    European conference on computer vision, 560-576 , 2016
    2016
    Citations: 211
  • Joint-task self-supervised learning for temporal correspondence
    X Li, S Liu, S De Mello, X Wang, J Kautz, MH Yang
    Advances in Neural Information Processing Systems 32 , 2019
    2019
    Citations: 209
  • Scops: Self-supervised co-part segmentation
    WC Hung, V Jampani, S Liu, P Molchanov, MH Yang, J Kautz
    Proceedings of the IEEE/CVF conference on computer vision and pattern … , 2019
    2019
    Citations: 204
  • No pose, no problem: Surprisingly simple 3d gaussian splats from sparse unposed images
    B Ye, S Liu, H Xu, X Li, M Pollefeys, MH Yang, S Peng
    International Conference on Learning Representations 2025, 54009-54033 , 2025
    2025
    Citations: 194
  • Synthesizing long-term 3d human motion and interaction in 3d scenes
    J Wang, H Xu, J Xu, S Liu, X Wang
    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern … , 2021
    2021
    Citations: 192
  • Nvila: Efficient frontier visual language models
    Z Liu, L Zhu, B Shi, Z Zhang, Y Lou, S Yang, H Xi, S Cao, Y Gu, D Li, X Li, ...
    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern … , 2025
    2025
    Citations: 190