Qizhe Zhang

PKU Ph.D. candidate

简体中文

About Me

I am currently a Ph.D. candidate at HMI Lab, NELVT, School of Computer Science, Peking University, advised by Prof. Shanghang Zhang. Before that, I graduated from Turing Class (Honor Class, Artificial Intelligence), CFCS (EECS), PKU. Additionally, I obtained a dual degree in Economics from National School of Development, PKU.

Research Interests

My research interests are in the intersection of computer vision and machine learning, including visual foundation models, contrastive learning, transfer learning, efficient tuning, continual learning, autonomous driving, and computational neuroscience. The overall goal of my research is to develop a large-scale visual perception system with human-like expression, adaptation, and generalization, equipped with powerful abilities including fundamental perception, cognitive reasoning, and autonomous creativity.

More specifically, my research interests include:

Generalized Visual Foundation Models (VFMs)
Parameter-Efficient Fine-Tuning (PEFT)
Continual Learning for Large-scale Models
Multimodal Large Language Models (MLLMs)
Vehicle-Road Cooperative (V2X) Autonomous Driving
Neural Ordinary Differential Equations (ODEs)

Education

Ph.D. Candidate in Visual Information Processing and Brain-inspired Intelligence
Sep. 2023 -- Jun. 2028 (ETA)

Peking University, Beijing, China
Bachelor of Intelligence Science and Technology & Economics (Dual Degree)
Sep. 2019 -- Jun. 2023

Peking University, Beijing, China
High School Education
Sep. 2013 -- Jun. 2019

Middle School affiliated to NPU, Xi'an, China

News

02/2024: Two papers are accepted by CVPR 2024.
01/2024: One paper is accepted by ICRA 2024.
12/2023: One paper is accepted by AAAI 2024.

Experience

Intern at 2050 Lab (Memory Mechanism for LLM)
Sep. 2023 -- Oct. 2023

KUNLUN, Beijing, China
Intern in AGI (Continual Learning for LVM)
Jul. 2023 -- Sep. 2023

BAAI, Beijing, China
Intern in Computer Vision (Autonomous Driving)
Sep. 2022 -- Feb. 2023

OPPO, Beijing, China
Intern at GCV Lab (Multi-Modal Learning)
Oct. 2021 -- Feb. 2022

BIGAI, Beijing, China

Publications

Split & Merge: Unlocking the Potential of Visual Adapters via Sparse Training
Qizhe Zhang, Bocheng Zou, Ruichuan An, Jiaming Liu, Shanghang Zhang†
Arxiv 2023 [Paper] [Code]
" We propose Mixture of Sparse Adapters (MoSA) as a novel Adapter Tuning method to fully unleash the potential of each parameter in the adapter. MoSA can achieve significantly better performance than standard adapters without any additional computational or storage overhead. "

Gradient-based Parameter Selection for Efficient Fine-Tuning
Zhi Zhang*, Qizhe Zhang*, Zijun Gao, Renrui Zhang, Ekaterina Shutova, Shiji Zhou, Shanghang Zhang†
CVPR 2024 [Paper] [Code]
" We propose a novel gradient-based parameter selection (GPS) method for effeicient fine-tuning. GPS does not introduce any additional storage or computational cost during both training and inference stages. Moreover, it possesses model-agnostic and task-adaptive properties, achieving outstanding performance. "

Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation
Jiaming Liu*, Ran Xu*, Senqiao Yang*, Renrui Zhang†, Qizhe Zhang, Zehui Chen, Yandong Guo, Shanghang Zhang‡
CVPR 2024 [Paper] [Code]
" We propose Adaptive Distribution Masked Autoencoders (ADMA) as a novel continual self-supervised method. ADMA enhances the extraction of target domain knowledge while mitigating the accumulation of distribution shifts. "

Unsupervised Spike Depth Estimation via Cross-modality Cross-domain Knowledge Transfer
Jiaming Liu*, Qizhe Zhang*, Jianing Li, Ming Lu, Tiejun Huang, Shanghang Zhang†
ICRA 2024 [Paper] [Code]
" We propose a novel cross-modality cross-domain (BiCross) framework for unsupervised spike depth estimation. To be mentioned, we are the first to exploit the opensource RGB datasets to help unsupervised learning for spike depth estimation. "

Exploring Sparse Visual Prompt for Cross-domain Semantic Segmentation
Senqiao Yang*, Jiarui Wu*, Jiaming Liu*, Xiaoqi Li, Qizhe Zhang, Mingjie Pan, Shanghang Zhang†
AAAI 2024 [Paper] [Code]
" We propose a novel Sparse Visual Domain Prompts (SVDP) approach for dense prediction TTA tasks, which holds minimal trainable parameters in the image-level prompt and reserves more spatial information of the input. "

Plan

Universal Vision-only Foundation Model (ViT / Mamba)
Comparative Analysis of Full Fine-Tuning and PEFT
Hypernetwork-based Adapter for Continual Learning
Large-scale Liquid Time-Constant Networks

Automatic Music Transcription & Voice Conversion
Complex Game AI with Reinforcement Learning (Mahjong)

Contact

theia@pku.edu.cn theia4869@gmail.com
+86 · 18810920885
Theia-4869