My research focuses on computer vision and deep learning, and I have a deep passion for AI-powered content creation. My expertise includes image, 3D, and video generation, diffusion models, and neural rendering.
Analyzed RoPE interpolation failures in mixed-resolution DiTs and developed a phase-coherent attention mechanism for efficient, high-fidelity generation.
We propose a novel image-to-3D framework using multi-view diffusion, which scales well to high resolution by generating explicit surface geometry and texture.