Back to jobs
Artha Venture Fund
South Asia

Data Scientist Intern

Mumbai, India
2026-03-27

Role Description

**About Artha**Artha Group is a performance-first investment house managing ₹2,300 crores across domestic and international investment vehicles, including Category I \& II AIFs, LLPs, and Private Limited companies. With active investments in 130\+ startups, with 32\+ successful exits, and 10\+ renewable energy projects. We operate at the convergence of capital precision and operational depth. Our Technology Division is building the Unified Intelligence Platform (UIP) — an AI-first portfolio intelligence system powered by multi-agent orchestration, knowledge graphs, and large language models. **Location**: Mumbai / Onsite **Employment Type**: Internship (6 months) **Reporting To:** CTO, Artha Group **Team: Technology Division** – AI \& Data Science **Experience Level:** Final-year student or recent graduates (0–1 year) **Role Overview**This is a hands-on data science internship focused on fine-tuning language models, building financial data pipelines, and supporting AI workflows for a production-grade intelligence platform. You will work directly with the CTO and the AI team, gaining exposure to real VC data, deal intelligence, and advanced ML systems. This is not a research-only role. You will be expected to ship working components, handle messy real-world data, and contribute to production workflows. **You will*** Fine-tune small language models (SLMs) on proprietary VC and portfolio datasets * Build and clean structured/unstructured financial data pipelines * Develop embeddings for semantic search on deal memos and financials * Support multi-agent AI workflows with ML components * Design evaluation frameworks for LLM outputs in financial contexts * Perform exploratory data analysis (EDA) on portfolio metrics and market trends * Enrich knowledge graphs with ML-derived signals * Key Responsibilities * Implement LoRA/QLoRA fine-tuning workflows on HuggingFace * Work with SLMs (Phi-3, Mistral, Gemma, LLaMA) and understand tokenization, context windows * Handle financial datasets: P\&L, balance sheets, MIS reports, time-series metrics * Build and maintain Python-based ML pipelines (NumPy, Pandas, Scikit-learn, PyTorch/TensorFlow) * Integrate vector databases (ChromaDB, Qdrant) for semantic search * Contribute to evaluation and monitoring of model performance **What Success Looks Like in 6 Months*** Delivered at least one fine-tuned model integrated into UIP workflows * Built robust data pipelines for financial datasets * Demonstrated ability to work independently on assigned ML tasks * Produced clear documentation and reproducible experiments * Received positive feedback from CTO and AI team on ownership and execution **Candidate Profile*** Education: Final-year or recent graduate in CS, ECE, Statistics, Data Science, or MBA with strong quant background * Experience: 0–1 year; prior projects in NLP, ML, or financial data preferred * Mindset: Ownership-driven, curious, comfortable with ambiguity, strong execution discipline * Portfolio: GitHub repos, Kaggle notebooks, fine-tuning experiments, or research papers are a strong plus **Required Skills*** Strong foundations in statistics, probability, and ML theory * Hands-on experience with fine-tuning language models (LoRA, PEFT) * Proficiency in Python and ML stack (NumPy, Pandas, Scikit-learn, PyTorch/TensorFlow) * Familiarity with vector databases and semantic search * Understanding of transformer architectures and attention mechanisms **Good to Have:*** Exposure to VC/FinTech datasets * Experience with LangChain/LangGraph, Neo4j, or MLOps tools * Knowledge of RAG pipelines and LLM evaluation frameworks **Compensation Structure*** Stipend: 25,000 per month, with the possibility of converting to a full-time position. * Duration: 6 months * Start Date: Immediate * PPO: High performers will be considered for a full-time role **What This Role Is NOT*** This is not a pure research internship — you will work on production-grade systems * This is not a remote-only role — fulltime presence in Mumbai is expected * This is not a short-term project — full 6-month commitment required

Data Scientist Intern

Artha Venture Fund

Sign Up →