Back to jobs
Sony
Prestige East Asia

Internship - Researcher / Natural Language Processing (NLP) for Multimodal Machine Learning

Tokyo, Japan
2026-05-25

Role Description

**Position Summary** Technologies like generative AI have the potential to transform the lifestyles of consumers and the workflows of professional creators. Sony R\&D is developing large-scale generative AI technologies for content generation and restoration. This is called Sony generative AI, and is broken down into three categories: diffusion-based models, stochastic vector quantization techniques, and visual-and-language pre-training. Sony generative AI is expected to become an integral part of the music, film, and gaming industries in the years ahead. We at Sony R\&D would like to make the most out of our unique opportunity to work directly with world-leading entertainment groups within those industries. Demonstrations of media generation and restoration are available at https://sony.github.io/creativeai. Our team is building a wide range of natural language processing technologies, deep generative AI, and vision-language pre-training. You will work on advanced research on natural language processing and deep learning (e.g., vision-language pre-training, multimodal LLM). **Responsibilities** Fundamental research in natural language processing such as vision-language pre-training, multimodal learning, multimodal LLM, music/video understanding, agent, reasoning, controllable generative modeling, deep generative models for discrete data, image/audio captioning, text-to-image/audio, commonsense knowledge graphs, large-scale data development, etc. You will be responsible for a wide range of activities, including paper submissions to top conferences (e.g. ACL, EMNLP, NeurIPS, ICLR, CVPR, etc.), collaborative research with universities and/or business groups, deployment of developed technologies within Sony and/or to third-party products together with product teams, etc. You will also contribute to improving the efficiency of content creation in Sony's studios for music, movies, and game services by delivering AI-assisted tools developed by R\&D. **Required qualifications** All of the following criteria are required. ■ Master's degree in natural language processing, artificial intelligence, machine learning, or closely related areas OR equivalent practical experience. ■ 3 years of experience with Python, C/C\+\+, and Linux/Unix. ■ 2 years of experience in machine learning fields and NLP, using common frameworks such as PyTorch and TensorFlow. ■ Research ability, as demonstrated by a track record of conference papers, open-source software, or other scientific activities. ■ Ability to speak and write in English fluently and idiomatically. **Preferred qualifications** Ph.D student in natural language processing, artificial intelligence, or machine learning is desirable. **Product, Service** Content creation support for movies/music/games, robots (Aibo), etc. **Development Environment** ■OS: Windows and Linux ■Language: Python, C/C\+\+, etc. ■PC, Server, Cloud Computing **Application Requirements** Essay: Not Required Coding test: Not Required **Required Skills:** Machine Learning (ML), 自然言語処理 (NLP) **Optional Skills:**

Internship - Researcher / Natural Language Processing (NLP) for Multimodal Machine Learning

Sony

Sign Up →