Role Description
**About Artha**Artha Group is a performance-first investment house managing ₹2,300 crores across domestic and international investment vehicles, including Category I \& II AIFs, LLPs, and Private Limited companies. With active investments in 130\+ startups, with 32\+ successful exits, and 10\+ renewable energy projects. We operate at the convergence of capital precision and operational depth.
Our Technology Division is building the Unified Intelligence Platform (UIP) — an AI-first portfolio intelligence system powered by multi-agent orchestration, knowledge graphs, and large language models.
**Location**: Mumbai / Onsite
**Employment Type**: Internship (6 months)
**Reporting To:** CTO, Artha Group
**Team: Technology Division** – AI \& Data Science
**Experience Level:** Final-year student or recent graduates (0–1 year)
**Role Overview**This is a hands-on data science internship focused on fine-tuning language models, building financial data pipelines, and supporting AI workflows for a production-grade intelligence platform. You will work directly with the CTO and the AI team, gaining exposure to real VC data, deal intelligence, and advanced ML systems.
This is not a research-only role. You will be expected to ship working components, handle messy real-world data, and contribute to production workflows.
**You will*** Fine-tune small language models (SLMs) on proprietary VC and portfolio datasets
* Build and clean structured/unstructured financial data pipelines
* Develop embeddings for semantic search on deal memos and financials
* Support multi-agent AI workflows with ML components
* Design evaluation frameworks for LLM outputs in financial contexts
* Perform exploratory data analysis (EDA) on portfolio metrics and market trends
* Enrich knowledge graphs with ML-derived signals
* Key Responsibilities
* Implement LoRA/QLoRA fine-tuning workflows on HuggingFace
* Work with SLMs (Phi-3, Mistral, Gemma, LLaMA) and understand tokenization, context windows
* Handle financial datasets: P\&L, balance sheets, MIS reports, time-series metrics
* Build and maintain Python-based ML pipelines (NumPy, Pandas, Scikit-learn, PyTorch/TensorFlow)
* Integrate vector databases (ChromaDB, Qdrant) for semantic search
* Contribute to evaluation and monitoring of model performance
**What Success Looks Like in 6 Months*** Delivered at least one fine-tuned model integrated into UIP workflows
* Built robust data pipelines for financial datasets
* Demonstrated ability to work independently on assigned ML tasks
* Produced clear documentation and reproducible experiments
* Received positive feedback from CTO and AI team on ownership and execution
**Candidate Profile*** Education: Final-year or recent graduate in CS, ECE, Statistics, Data Science, or MBA with strong quant background
* Experience: 0–1 year; prior projects in NLP, ML, or financial data preferred
* Mindset: Ownership-driven, curious, comfortable with ambiguity, strong execution discipline
* Portfolio: GitHub repos, Kaggle notebooks, fine-tuning experiments, or research papers are a strong plus
**Required Skills*** Strong foundations in statistics, probability, and ML theory
* Hands-on experience with fine-tuning language models (LoRA, PEFT)
* Proficiency in Python and ML stack (NumPy, Pandas, Scikit-learn, PyTorch/TensorFlow)
* Familiarity with vector databases and semantic search
* Understanding of transformer architectures and attention mechanisms
**Good to Have:*** Exposure to VC/FinTech datasets
* Experience with LangChain/LangGraph, Neo4j, or MLOps tools
* Knowledge of RAG pipelines and LLM evaluation frameworks
**Compensation Structure*** Stipend: 25,000 per month, with the possibility of converting to a full-time position.
* Duration: 6 months
* Start Date: Immediate
* PPO: High performers will be considered for a full-time role
**What This Role Is NOT*** This is not a pure research internship — you will work on production-grade systems
* This is not a remote-only role — fulltime presence in Mumbai is expected
* This is not a short-term project — full 6-month commitment required