~/thinkingdbx/thinkinglanguage
thinkingdbx / thinkinglanguage
compiled language for data & ai · apache 2.0
open source · apache 2.0

The language built for data & ai

Python made data accessible. TL makes it fast, safe, and intelligent — in one compiled language. 1,322 tests passing across 34 implementation phases — AI agents with tool-use, full MCP ecosystem (client + server), generics, pattern matching, Python FFI, LLVM & WASM backends, package manager, full LSP, and a comprehensive security audit already shipping.

▸ launch TL ↗ ▸ view on github documentation
1,322 tests passing rust-powered zero GIL
pipeline.tl
source users = postgres("db").table("users") -> User transform active_users(src: table<User>) -> table<User> { src |> filter(is_active == true) |> clean(nulls: { name: "unknown" }) |> with { tenure = today() - signup_date } } model churn = train xgboost { data: active_users(users) target: "is_active" features: [tenure, monthly_spend] }
philosophy
Seven principles that define TL
  1. 01

    DATA IS A TYPE

    Tables, Streams, Tensors are native types in the language.

  2. 02

    PIPELINES ARE PROGRAMS

    ETL/ELT flows are composable first-class constructs.

  3. 03

    AI IS A VERB

    train, predict, embed, agent are keywords — not libraries.

  4. 04

    PARALLEL BY DEFAULT

    No GIL, automatic partitioning across cores.

  5. 05

    FAIL LOUD, RECOVER SMART

    Built-in error handling for unreliable data sources.

  6. 06

    READABLE BEATS CLEVER

    Python-like readability, Rust-like safety guarantees.

  7. 07

    FAST WITHOUT TRYING

    Compiled to native code with lazy evaluation. Performance is the default, not an afterthought.

the problem
Replace your entire stack

Stop duct-taping together a dozen tools. TL unifies the modern data stack into one language.

today's stackTL equivalent
Python + PandasNative table type
SQL in stringsNative query syntax
Spark / PySparkBuilt-in distributed execution
Airflow / DagsterNative pipeline construct
PyTorch / TF / sklearnNative model / train / predict
Kafka consumersNative stream type
dbtNative transformations with typing
Docker + K8stl deploy CLI command
LangChain / CrewAI / AutoGenNative agent construct with tool-use
Custom MCP integrationsBuilt-in mcp_connect() + mcp_serve()
syntax
Clean, expressive, powerful
schema & source
schema User { id: int64 name: string email: string signup_date: date is_active: bool } source users = postgres("db") .table("users") -> User
transform & pipeline
transform clean_users(src: table<User>) { src |> filter(is_active == true) |> clean(nulls: { name: "unknown" }) |> with { tenure = today() - signup_date } } pipeline daily_etl { users |> clean_users }
ai training
model churn_predictor = train xgboost { data: clean_users(users) target: "is_active" features: [tenure, monthly_spend] split: 0.8 } // Use the model let result = predict(churn_predictor, new_user)
pattern matching
match load_users("data.csv") { Ok(users) => process(users) Err(DataError::FileNotFound(p)) => log("Missing: {p}") Err(e) => alert("{e}") } // Destructuring + guards let [head, ...tail] = items let Point { x, y } = origin
generics & traits
fn top_n<T: Comparable>( data: table<T>, col: fn(T) -> float64, n: int ) -> table<T> { data |> sort(col, desc) |> limit(n) } trait Connectable { fn connect() -> result<Conn, Error> }
ai agents
AI agents as language primitives

No frameworks. No glue code. Define autonomous AI agents with tool-use, multi-provider LLM support, and lifecycle hooks — all with a single keyword.

research_agent.tl
// Define tool functions in pure TL fn search(query) { let resp = http_request("GET", "https://api.search.com/v1?q=" + query, none, none) json_parse(resp.body) } // Declare the agent agent research_bot { model: "gpt-4o", system: "You are a research assistant.", tools { search: { description: "Search the web", parameters: { type: "object", properties: { query: { type: "string" } } } } }, max_turns: 5, on_tool_call { println("[LOG] " + tool_name) } } // Run it let result = run_agent(research_bot, "What is quantum computing?") println(result.response)
  1. 01

    First-Class Keyword

    agent is a language keyword, not a library import. Tools are TL functions wired directly to the LLM.

  2. 02

    Any LLM Provider

    OpenAI, Anthropic, Ollama, or any OpenAI-compatible endpoint. Auto-detects protocol from model name. One base_url field to switch.

  3. 03

    Automatic Tool Loop

    The runtime handles multi-turn tool calling, JSON arg conversion, and result formatting. You just write the function.

  4. 04

    Lifecycle Hooks

    on_tool_call and on_complete blocks for logging, metrics, or custom logic at each step.

  5. 05

    Pipeline Integration

    Agents use the same table, stream, and connectors as your data pipelines — no serialization layer needed.

  6. 06

    Conversation Persistence

    run_agent(agent, message, history) maintains context across multi-turn sessions. No external memory store needed.

  7. 07

    SSE Streaming

    stream_agent() delivers real-time token-by-token output via Server-Sent Events.

  8. 08

    Retry & JSON Mode

    Automatic exponential backoff on 429/5xx errors. output_format: "json" for guaranteed structured output.

run_agent(agent, msg, history)
multi-turn agent with conversation persistence
stream_agent(agent, msg)
real-time SSE streaming responses
embed(text) → tensor
vector embeddings + similarity search
mcp ecosystem
Full MCP ecosystem

Model Context Protocol — the open standard that lets AI tools and data systems talk to each other. TL implements both sides: connect to any MCP server as a client, or expose TL functions to any AI tool as a server.

mcp_client.tl
// Connect to any MCP server let github = mcp_connect("github-server") // Discover available tools let tools = mcp_list_tools(github) // Call any tool directly let issues = mcp_call_tool(github, "list_issues", { repo: "myorg/myrepo" }) // Read resources from server let schema = mcp_read_resource(github, "repo://myorg/myrepo/schema") // Get prompt templates let prompt = mcp_get_prompt(github, "summarize_pr", { pr: "42" }) mcp_disconnect(github)
mcp_server.tl
// Expose TL functions as MCP tools fn query_sales(region, quarter) { postgres("warehouse", "sales") |> filter(region == region) |> filter(q == quarter) } fn run_report(name) { read_csv("reports/" + name + ".csv") |> aggregate(sum(revenue)) } // Start the MCP server mcp_serve() // Claude Desktop, Cursor, Windsurf etc. // can now discover & call these functions // Config: { "command": "tl run server.tl" }
  1. 01

    MCP Client

    mcp_connect() auto-detects stdio or HTTP transport. 10 builtins for tools, resources, prompts, and ping.

  2. 02

    MCP Server

    mcp_serve() turns TL functions into MCP tools. Claude Desktop, Cursor, Windsurf can discover and call them.

  3. 03

    Agent Integration

    mcp_servers: [...] in agent definitions. LLM sees one unified tool list — MCP and native tools dispatched transparently.

  4. 04

    Sampling

    MCP servers can request LLM completions back through TL. Bidirectional AI communication over the protocol.

agent_with_mcp.tl
// Connect to multiple MCP servers let github = mcp_connect("github-server") let db = mcp_connect("database-server") // Agent auto-discovers tools from all MCP servers agent ops_bot { model: "claude-sonnet-4-20250514", system: "You are a DevOps assistant.", mcp_servers: [github, db], tools { // Native TL tools alongside MCP tools deploy: { description: "Deploy to production", parameters: { type: "object", properties: { service: { type: "string" } } } } } } // LLM sees unified tool list: GitHub + DB + native let result = run_agent(ops_bot, "Check open PRs, verify staging DB, then deploy")
11 builtins
BuiltinId 216–226 — connect, discover, call, read, ping, disconnect
2 transports
stdio (subprocess) + Streamable HTTP (reqwest/axum)
rmcp 1.1 SDK
feature-gated: --features mcp
sandbox-aware
sandbox blocks subprocess spawning, HTTP always allowed
One protocol, both directions. TL agents gain access to the entire MCP ecosystem — filesystem, GitHub, databases, Slack, and hundreds more — without building each integration natively. Any AI tool gains access to TL's data engine via MCP server. Connect or serve. Your choice.
type system
Data types as first-class citizens
  1. 01

    table<T>

    Columnar, lazy-evaluated, and partitionable. The core data type for batch processing.

    let users: table<User> = postgres("db").table("users")
  2. 02

    stream<T>

    Infinite, windowed, real-time. For continuous data processing and event streams.

    stream process_events { from: kafka("events") window: tumbling(5m) }
  3. 03

    tensor<dtype, shape>

    N-dimensional arrays for AI and machine learning. Shape-checked at compile time.

    let embeddings: tensor<float32, [256, 768]>
  4. 04

    model

    A trained AI model as a first-class value. Serialize, version, deploy natively.

    model churn = train xgboost { ... }
  5. 05

    agent

    Autonomous AI agent with tool-use, MCP server integration, and lifecycle hooks.

    agent bot { model: "gpt-4o", mcp_servers: [...], tools { ... } }
memory safety
Four rules. Zero data races.

Rust-inspired ownership without lifetime annotations. The compiler guarantees memory safety and data-race freedom at compile time.

  1. 01

    Every value has one owner

    let users = load("users.parquet") // `users` is the sole owner
  2. 02

    Pipe |> moves ownership

    let active = users |> filter(age > 25) // `users` is now consumed
  3. 03

    Clone or borrow for reuse

    let copy = users.clone() let ref = &users // read-only
  4. 04

    Parallel partitions own data

    parallel for shard in users.partition(by: "region") // No locks needed — compiler guarantees it
under the hood
Six-stage compilation pipeline
Lexer Parser Semantic Analysis TL-IR Optimization Code Generation
backend targets
LLVM
Native
Cranelift
JIT
WASM
Web
CUDA
GPU

Built entirely in Rust. TL-IR doubles as a query plan — enabling data-aware optimizations like predicate pushdown, column pruning, and join reordering.

benchmarks
Performance that speaks
CSV Parse 1B rows
Python
45s
TL
<4s
Filter + Aggregate
Python
30s
TL
<2s
ETL Pipeline
Python
5min
TL
<30s
Stream Processing
Python
10K/s
TL
500K/s
Cold Start
Python
3–5s
TL
<100ms
end-to-end ml pipeline

TL's compiler sees the entire pipeline as one program — eliminating serialization boundaries between tools.

Python Stack (~275s)
pandas.read_csv()45s
DataFrame transforms30s
df.to_numpy()5s
xgboost.train()120s
model.predict()15s
pandas.to_sql()60s
TL Pipeline (~120s)
load + filter + with4s
train xgboost { ... }110s
predict + with + save6s
Speedup~2.3x
Zero-copy Arrow handoff. No serialization boundaries.

Targets based on architecture analysis. Benchmarks will be published with reproducible scripts.

error handling
Data is messy. TL handles it.

Rust-inspired result<T, E> with data-specific error types and declarative cleaning — not try/catch bolted on as an afterthought.

result types & pattern match
fn load_users(path: string) -> result<table<User>, DataError> { let raw = read_csv(path)? let valid = raw |> validate_schema(User)? Ok(valid) } match load_users("data.csv") { Ok(users) => process(users) Err(DataError::SchemaViolation(d)) => alert("Drift: {d}") Err(e) => log("{e}") }
declarative data cleaning
let users = load("raw_users.csv") |> clean { nulls: { name: fill("UNKNOWN") email: drop_row age: fill(median) } duplicates: dedupe(by: email) outliers: { age: clamp(0, 150) } } |> validate { assert null_rate(email) == 0.0 assert unique(id) }
connectors
Connect to everything

First-class connectors for databases, object storage, message queues, and APIs. All type-safe and schema-aware.

PostgreSQL
shipped
MySQL
shipped
SQLite
shipped
DuckDB
shipped
Redshift
shipped
MSSQL
shipped
Snowflake
shipped
BigQuery
shipped
Databricks
shipped
ClickHouse
shipped
MongoDB
shipped
Kafka
shipped
AWS S3
shipped
Redis
shipped
GraphQL
shipped
HTTP/REST
shipped
Parquet/CSV
shipped
SFTP/SCP
shipped

20 connectors shipped — more coming with every release.

python interop
Use any Python library

Bidirectional Python FFI via pyo3. Import Python modules, call functions, convert tensors to NumPy — all from TL code.

interop.tl
// Import any Python library let np = py_import("numpy") let pd = py_import("pandas") let sklearn = py_import("sklearn.metrics") // Call Python functions with TL values let score = py_call( sklearn.accuracy_score, y_true, y_pred ) // TL Tensor <-> NumPy ndarray let pi = np.pi let arr = np.sqrt(16)
  1. 01

    Bidirectional Conversion

    int, float, string, bool, list, map, set — all auto-converted between TL and Python.

  2. 02

    Tensor ↔ NumPy

    TL tensors convert seamlessly to/from NumPy ndarrays for ML workflows.

  3. 03

    Dot Notation Access

    Use natural math.sqrt(16) syntax on Python objects via method dispatch.

  4. 04

    Feature-Gated

    Python FFI is opt-in via feature flag. Zero overhead when not used.

developer experience
Everything you need, built in
terminal
$ tl init my-project Created my-project/ with tl.toml $ tl build Compiled 12 modules in 0.3s $ tl test 1,322 tests passed, 0 failed $ tl add kafka-connector Added kafka-connector v0.8 $ tl fmt && tl lint && tl check Formatted 12 files. No warnings. Types OK. $ tl debug pipeline.tl Breakpoint hit at pipeline.tl:42 dbg> inspect rows → 1,248 records $ tl doc src/ --public-only Generated docs for 8 public modules $ tl deploy pipeline.tl --target k8s Deployed to cluster: prod-east
  1. 01

    VS Code Extension & LSP

    Syntax highlighting, diagnostics, go-to-definition, hover docs, document symbols, rename refactoring, and find-references across files.

  2. 02

    Package Manager

    tl add, tl update, tl outdated — full dependency management with lockfile and transitive resolution.

  3. 03

    Formatter, Linter & Type Checker

    tl fmt, tl lint, tl check — AST-guided formatting, naming conventions, and compile-time type safety.

  4. 04

    Doc Generation

    tl doc generates HTML, Markdown, or JSON docs from /// doc comments with cross-references.

  5. 05

    Interactive Step Debugger

    tl debug — breakpoints, variable inspection, source listing, and stack traces. Debug pipelines interactively.

  6. 06

    Data Inspection & Lineage

    tl inspect, tl profile, tl lineage — preview data, statistical profiles, and lineage graphs.

standard library
Batteries included

Rich standard library methods, native DateTime, window functions, and data engineering primitives — all built in.

15+ List Methods

findsort_bygroup_byuniqueflattenchunkzipeach

Map & String Methods

mergeentriesmap_valuestrim_startis_numericstrip_prefixcount

Math & Randomization

expsignclampis_nanrandom()random_int()sample()

Native DateTime

First-class VmValue::DateTime type with full arithmetic.

today()date_add()date_diff()date_trunc()date_extract()

Window Functions

DataFusion UDWF-backed analytics on tables.

rankrow_numberdense_ranklagleadntile

Table Operations

Pipeline-native data manipulation.

table1 |> union(table2) table |> sample(100) table |> sample(fraction: 0.1) assert_table_eq(t1, t2)

11 MCP Builtins

Full Model Context Protocol client + server.

mcp_connectmcp_list_toolsmcp_call_toolmcp_read_resourcemcp_get_promptmcp_servemcp_pingmcp_disconnect

"""..."""  Triple-quoted strings with automatic dedentation.

positioning
How TL compares

The only language where data pipelines, SQL-like queries, ML training, AI agents, MCP ecosystem, and real-time streaming are all first-class features — not libraries.

tooltheir strengthTL's advantage
PythonLargest ML/data ecosystem10-50× faster, type-safe, compiled
MojoCompiled ML, Python supersetBetter data engineering: pipelines, streaming, connectors
RustMax performance, memory safetyDomain-specific abstractions as primitives
DuckDBEmbedded analytics, great SQLFull language, not just SQL — plus ML and streaming
PolarsFastest DataFrame libraryFirst-class syntax, integrated ML/streaming
Scala + SparkBattle-tested distributed computingSimpler syntax, faster single-node, no JVM
SQL / dbtDeclarative, universally understoodFull programming language + AI + streaming
LangChain / CrewAIRich agent ecosystem, Python flexibilityNative syntax, no Python dep, compiled speed, type-safe tools
Custom MCP SDKsProtocol-level flexibilityClient + server built-in, agent-integrated, zero config
open source

Open source.

ThinkingLanguage is licensed under Apache 2.0.

Apache 2.0 Built with Rust

— pull requests welcome ✎