I pulled 25 months of activity from three live sources: GitHub repos, arXiv papers, and Hugging Face models. Folded them into one table, five categories, 613 data points - then had an agent write and run the analysis entirely inside Bonacci Studio.
Here is what the data actually says.
The Numbers
- Agents and RAG are the two breakout areas. RAG is up 340% year over year, with agents right behind it.
- Inference is growing, but it sits in the middle of the pack - not out front the way people tend to assume.
- Training from scratch is very much alive. Up 32% month over month. The "nobody trains anymore" narrative is wrong.
- Nothing is cooling. Every category was up in the latest month, so the field is broadening, not narrowing.
Research Leads Shipping
arXiv activity moves about a month ahead of GitHub and Hugging Face across most categories. Papers appear, then repos and models follow. That lag is consistent and measurable.
The one exception is inference. What ships barely tracks the papers. Inference research is prolific, but the engineering reality of deploying fast inference lags further behind than in any other category.
How It Ran
All of this ran inside Bonacci Studio. The agent wrote the PySpark, ran it against the data, and rendered the charts and the read right in the dock - no notebook switching, no copy-pasting results between tools.
What This Means
The AI field is not consolidating around one thing. Agents and RAG are the current inflection points, but inference, fine-tuning, and full pre-training are all growing simultaneously. Anyone telling you the space is narrowing is reading a different dataset.
The research-to-shipping lag also matters practically. If arXiv is your leading indicator, you have roughly a month of runway to build around the ideas before the ecosystem catches up. Inference is the anomaly - papers there are not translating to shipped tooling at the same rate, which means the gap between what is possible and what is easy to deploy is wider than anywhere else.