Back to jobs
Tokyo, Japan
2026-05-25
Sony
Prestige
East Asia
Internship - Researcher / Natural Language Processing (NLP) for Multimodal Machine Learning
Role Description
**Position Summary**
Technologies like generative AI have the potential to transform the lifestyles of consumers and the workflows of professional creators. Sony R\&D is developing large-scale generative AI technologies for content generation and restoration. This is called Sony generative AI, and is broken down into three categories: diffusion-based models, stochastic vector quantization techniques, and visual-and-language pre-training. Sony generative AI is expected to become an integral part of the music, film, and gaming industries in the years ahead. We at Sony R\&D would like to make the most out of our unique opportunity to work directly with world-leading entertainment groups within those industries. Demonstrations of media generation and restoration are available at https://sony.github.io/creativeai.
Our team is building a wide range of natural language processing technologies, deep generative AI, and vision-language pre-training. You will work on advanced research on natural language processing and deep learning (e.g., vision-language pre-training, multimodal LLM).
**Responsibilities**
Fundamental research in natural language processing such as vision-language pre-training, multimodal learning, multimodal LLM, music/video understanding, agent, reasoning, controllable generative modeling, deep generative models for discrete data, image/audio captioning, text-to-image/audio, commonsense knowledge graphs, large-scale data development, etc.
You will be responsible for a wide range of activities, including paper submissions to top conferences (e.g. ACL, EMNLP, NeurIPS, ICLR, CVPR, etc.), collaborative research with universities and/or business groups, deployment of developed technologies within Sony and/or to third-party products together with product teams, etc.
You will also contribute to improving the efficiency of content creation in Sony's studios for music, movies, and game services by delivering AI-assisted tools developed by R\&D.
**Required qualifications**
All of the following criteria are required.
■ Master's degree in natural language processing, artificial intelligence, machine learning, or closely related areas OR equivalent practical experience.
■ 3 years of experience with Python, C/C\+\+, and Linux/Unix.
■ 2 years of experience in machine learning fields and NLP, using common frameworks such as PyTorch and TensorFlow.
■ Research ability, as demonstrated by a track record of conference papers, open-source software, or other scientific activities.
■ Ability to speak and write in English fluently and idiomatically.
**Preferred qualifications**
Ph.D student in natural language processing, artificial intelligence, or machine learning is desirable.
**Product, Service**
Content creation support for movies/music/games, robots (Aibo), etc.
**Development Environment**
■OS: Windows and Linux
■Language: Python, C/C\+\+, etc.
■PC, Server, Cloud Computing
**Application Requirements**
Essay: Not Required
Coding test: Not Required
**Required Skills:**
Machine Learning (ML), 自然言語処理 (NLP)
**Optional Skills:**