~/thinkingdbx/blog/9b-moe-models
← all articles
thinkingdbx / blog
notes from the notebook
model training
Article #11 April 2026 8 min read

How I Built 3 9B MoE Models Under $5K-$10K Budget

Training three 9B Mixture-of-Experts models on 100B tokens each for under $10K using Google's TPU Research Cloud program.

MM
Mallesh Madapathi
Founder & CEO, ThinkingDBx

The Challenge

Frontier Models API costs are expensive, free models suck at reasoning, and the AI tech stack is way too expensive in the long-run. Small Language Models (SLMs) excel at their domain, often beating Frontier Models.

The Solution: Domain-Specific SLMs

I embarked on building three domain-specific models (SLMs) for long-term AI model solutions to power my data platforms. Google's 6th Gen and 4th Gen TPUs, available for free for 30 days under the TRC Program, were the perfect solution.

Data Sources

Setup & Architecture

To maximize TPU utilization, I used:

The MoE architecture uses 2B active parameters per token, giving the knowledge of 9B while paying inference costs closer to a 2B dense model.

Training Process

The most painful aspect of using free TPUs is that Trillium VMs get preempted frequently. To automate this:

I initially made the mistake of saving all checkpoints, resulting in 40TB of training data and exploding Google Cloud Storage costs. Later, I learned to save only the last 1 or 2 checkpoints.

Workflow

  1. Setup
  2. Data Download
  3. Tokenize
  4. Train
  5. SFT + GRPO
  6. Deploy

Post-Training

Timeline & Costs

Key Takeaways

Learn to own your AI stack. Frontier AI Models and companies are fragile with a lot going on in their space. Building your own models gives you control and long-term cost savings.

#ModelTraining #SmallLanguageModels #buildinpublic #AI #Data #CyberSecurity #Finance #DataEngineering #DataScience #LLM #Google #TPU

— own your stack ✎

← back to blog