Activities planned for 2025
These are mainly the open source efforts I will be a part of. Most are AI related. The core principles are defined by the AI Punk’s manifesto
LLM
Telugu dataset
Research on low resource datasets
Research on mobilizing crowd source dataset collection
Creating a foundational Telugu model
Chunking
Quantizing/ShorteningLLMs
SLM with synthetic data
Speech
Telugu Dataset
Research on low resource dataset for ASR
Transfer learning, from Telugu to all other languages.
Mobile models
Foundational models in speech without using Transformers
TTS dataset
10,000 TTS voices
Voice cloning
Vision
New image embeddings
Image RAG
Image generation using low resources
Image to text
Community
Dataset and model licenses
Crowd sourcing templates for the global south
Upskilling template
Upskilling 1 lakh engineering students
Intern repository with ranking
Training tools in English and Telugu
Data collection app and tools
Cluster compute for inference
If anyone wants to take part in any of these activities, please reach out and I will get you involved.