Pengfei Song 宋鹏飞

📍 Nanjing, China | 南京

I am a Master's student at Southeast University, majoring in Intelligent Science and Technology. My research focuses on computer vision, multimodal learning, and large language models, particularly on test-time adaptation, semantic segmentation, and vision-language models.

I have interned at Microsoft (AI Agent & LLM) and Huawei (AI Infra & Multimodal LLM Training), gaining hands-on experience in model training, deployment, and AI coding workflows.

📧 Email 💻 GitHub 🎓 Google Scholar 🏠 Homepage

Education

Southeast University — M.S. in Intelligent Science and Technology | Sept 2024 – Jun 2027 (Expected)
School of Computer Science and Engineering. GPA: 87/100 (Top 8%)

Southeast University — B.S. in Applied Mathematics | Sept 2020 – Jun 2024
School of Mathematics. GPA: 88/100 (Top 15%)

UC Berkeley — Extension Program | Aug 2022 – Dec 2022
Fully funded by CSC (China Scholarship Council) national public dispatch scholarship (Top 5%)

Research Interests

My research interests focus on efficient and robust visual understanding, particularly:

Test-Time Adaptation (TTA) — Training-free adaptation of vision-language models with semantic prior knowledge
Weakly Supervised Semantic Segmentation (WSSS) — Leveraging diffusion models to enhance CLIP dense representations
Multimodal Learning — Cross-modal semantic enhancement and vision-language model alignment
LLM Agent Systems — Multi-step task planning, tool integration, and intelligent workflow automation
Model Compression & Inference Optimization — RQ-VAE quantization, BF16 mixed precision, distillation, vLLM deployment

Experience

Huawei — AI Infra / Multimodal LLM Training Intern | Nov 2025 – Apr 2026
Worked on 13B multimodal LLM adaptation for terminal-side AI assistants and domain-specific Q&A. Built end-to-end SFT pipelines with PyTorch, Transformers, and LoRA/QLoRA; performed inference validation on Ascend NPU. Participated in model compression (RQ-VAE quantization, BF16, distillation) and vLLM deployment with OpenAI Triton custom kernel development.

Microsoft — AI Large Model Intern | Sept 2025 – Nov 2025
Designed and deployed an intelligent agent for automated ticket processing based on real MS Teams enterprise workflow data, leveraging NVIDIA H100. Completed multi-step task orchestration, tool integration, state management, and Azure deployment. Proficient in Claude Code, GitHub Copilot, and AI coding-assisted development workflows.

Publications

DiCLIP: Diffusion Model Enhances CLIP's Dense Knowledge for Weakly Supervised Semantic Segmentation Accepted

Zhiwei Yang, Pengfei Song, Yucong Meng, Kexue Fu, Shuo Wang, Zhijian Song

IEEE Transactions on Image Processing (TIP), 2025

ASC for Training-Free Adaptation of VLMs Under Review

Pengfei Song, et al.

CCF-A Conference, 2025

Human Semantic Segmentation using Millimeter Wave Radar Point Clouds Accepted

Pengfei Song, et al.

IEEE CSCWD, 2023

Research Projects

Cross-Modal Semantic Enhanced Test-Time Adaptation Framework | Nov 2024 – Apr 2025
Proposed CSE, innovatively combining semantic prior knowledge with caching mechanisms to achieve efficient test-time adaptation without backpropagation. Computational efficiency is improved by over 10× compared to existing training-based methods. Designed cross-modal semantic enhancement to compensate for visual feature limitations via dictionary semantic priors. Proposed top-k label exploration and outlier rejection strategies, achieving 2.93% improvement over baseline on ImageNet and variants. Served as first author; one CCF-A paper under review.

Human Semantic Segmentation from mmWave Radar Sparse Point Clouds | Jun 2023 – Aug 2024
Compared to cameras and LiDAR, mmWave radar offers privacy protection advantages. Built a semantic segmentation framework with spatio-temporal feature extraction modules for radar point clouds. Served as primary contributor; one first-author paper accepted at IEEE CSCWD 2023. Further proposed DiCLIP, a novel WSSS method combining diffusion models to enhance CLIP dense representations; one TIP paper accepted in 2025.

Honors & Awards

National Scholarship — 2025
14th Huawei Cup Graduate Mathematical Modeling Competition — Second Prize (Top 5%), 2024
Mathematical Contest in Modeling (MCM/ICM) — Meritorious Winner (First Prize), 2023
CSC National Scholarship Council Scholarship — Top 5%, 2022
Zhishan Excellence Scholarship — Top 10%, 2022 (Two consecutive terms)

Skills

Deep Learning: PyTorch, Transformers, Hugging Face, LoRA/QLoRA, Distributed Training
LLM & Agent: Claude Code, GitHub Copilot, vLLM, RQ-VAE, Model Distillation
Hardware: NVIDIA H100, Ascend NPU, OpenAI Triton kernel development
Programming: Python, C/C++, Shell, Git, Azure Deployment
Language: CET-4 (603), CET-6 (616), IELTS 6.5