Back to jobs
Tokyo, Japan
2026-05-25
Sony
Prestige
East Asia
Internship - Audio-Visual AI Research Scientist
Role Description
**Technology Field**
Computer Vision
Speech/Audio Signal Processing
**Position Summary**
We are seeking Research Scientist Interns to join our fundamental and applied research teams at Sony in Tokyo.
Our aim is to rapidly advance the process of cinematic content creation. To achieve this, we work together with Sony Pictures Entertainment to develop AI technologies that restore and enhance movie content.
With us, you will research and develop innovative computer vision and machine learning technologies for cinematic content creation. You will also have many opportunities to publish your findings and collaborate with a variety of academic institutions worldwide.
See More Here: https://www.sony.com/en/SonyInfo/sony\_ai/
**Responsibilities**
■ Research and development of novel computer vision technologies in areas including generative methods, audio-visual scene understanding, audio-visual sound separation/localization, and beyond.
■ Implement findings from computer vision research into real products through collaboration.
■ Work with a strong international team of researchers and engineers with various areas of expertise to develop innovative solutions.
■ Collaborate with Sony`s various branches, including Sony Pictures Entertainment.
■ Collaborate with academic institutions to drive state-of-the-art research.
■ Contribute to the development of research publications to be published at top-tier conferences and journals.
**Required qualifications**
■ Experience publishing research about machine learning/computer vision at conferences and/or in journals (e.g. CVPR/ICCV/ECCV/NeurIPS/ICLR/ICML/IJCV/PAMI).
■ Experience developing ML/deep learning models for computer vision tasks.
■ Fluency in Python and deep learning frameworks.
**Preferred qualifications**
■ Ph.D. Degree (graduated or currently pursuing) in computer science, machine learning, or electrical engineering, OR equivalent practical experience.
■ Experience developing ML/deep learning models for audio-visual tasks or other multi-modal tasks.
■ Experience developing ML/deep learning-based generative models.
■ Professional proficiency in English.
**Product, Service**
Movie production for Sony Pictures Entertainment.
**Development Environment**
OS: Windows/Linux
**Application Requirements**
Essay: Required
Coding test: Not Required
**Required Skills:**
コンピュータ ビジョン, 音響信号処理, 音声処理