Features Syntax Connectors Python FFI Toolchain GitHub View on GitHub
Open Source · MIT + Apache 2.0

The Language Built for Data & AI

Python made data accessible. TL makes it fast, safe, and intelligent — in one compiled language. 1,272 tests passing across 33 implementation phases — generics, pattern matching, Python FFI, LLVM & WASM backends, package manager, and a full LSP already shipping.

1,272 Tests Passing Rust-Powered Zero GIL
pipeline.tl
source users = postgres("db").table("users") -> User

transform active_users(src: table<User>) -> table<User> {
    src
    |> filter(is_active == true)
    |> clean(nulls: { name: "unknown" })
    |> with { tenure = today() - signup_date }
}

model churn = train xgboost {
    data: active_users(users)
    target: "is_active"
    features: [tenure, monthly_spend]
}
Philosophy

Seven Principles That Define TL

DATA IS A TYPE

Tables, Streams, Tensors are native types in the language.

PIPELINES ARE PROGRAMS

ETL/ELT flows are composable first-class constructs.

AI IS A VERB

train, predict, embed are keywords — not libraries.

PARALLEL BY DEFAULT

No GIL, automatic partitioning across cores.

FAIL LOUD, RECOVER SMART

Built-in error handling for unreliable data sources.

READABLE BEATS CLEVER

Python-like readability, Rust-like safety guarantees.

FAST WITHOUT TRYING

Compiled to native code with lazy evaluation. Performance is the default, not an afterthought.

The Problem

Replace Your Entire Stack

Stop duct-taping together a dozen tools. TL unifies the modern data stack into one language.

Today's Stack
TL Equivalent
Python + Pandas
Native table type
SQL in strings
Native query syntax
Spark / PySpark
Built-in distributed execution
Airflow / Dagster
Native pipeline construct
PyTorch / TF / sklearn
Native model / train / predict
Kafka consumers
Native stream type
dbt
Native transformations with typing
Docker + K8s
tl deploy CLI command
Syntax

Clean, Expressive, Powerful

Schema & Source
schema User {
    id:          int64
    name:        string
    email:       string
    signup_date: date
    is_active:   bool
}

source users =
    postgres("db")
    .table("users")
    -> User
Transform & Pipeline
transform clean_users(src: table<User>) {
    src
    |> filter(is_active == true)
    |> clean(nulls: {
        name: "unknown"
    })
    |> with {
        tenure = today() - signup_date
    }
}

pipeline daily_etl {
    users |> clean_users
}
AI Training
model churn_predictor =
    train xgboost {
        data: clean_users(users)
        target: "is_active"
        features: [
            tenure,
            monthly_spend
        ]
        split: 0.8
    }

// Use the model
let result =
    predict(churn_predictor, new_user)
Pattern Matching
match load_users("data.csv") {
    Ok(users) => process(users)
    Err(DataError::FileNotFound(p)) =>
        log("Missing: {p}")
    Err(e) => alert("{e}")
}

// Destructuring + guards
let [head, ...tail] = items
let Point { x, y } = origin
Generics & Traits
fn top_n<T: Comparable>(
    data: table<T>,
    col: fn(T) -> float64,
    n: int
) -> table<T> {
    data |> sort(col, desc) |> limit(n)
}

trait Connectable {
    fn connect() -> result<Conn, Error>
}
Type System

Data Types as First-Class Citizens

table<T>

Columnar, lazy-evaluated, and partitionable. The core data type for batch processing.

let users: table<User> = postgres("db").table("users")

stream<T>

Infinite, windowed, real-time. For continuous data processing and event streams.

stream process_events {
  from: kafka("events")
  window: tumbling(5m)
}

tensor<dtype, shape>

N-dimensional arrays for AI and machine learning. Shape-checked at compile time.

let embeddings: tensor<float32, [256, 768]>

model

A trained AI model as a first-class value. Serialize, version, deploy natively.

model churn = train xgboost { ... }
Memory Safety

Four Rules. Zero Data Races.

Rust-inspired ownership without lifetime annotations. The compiler guarantees memory safety and data-race freedom at compile time.

1

Every value has one owner

let users = load("users.parquet")
// `users` is the sole owner
2

Pipe |> moves ownership

let active = users |> filter(age > 25)
// `users` is now consumed
3

Clone or borrow for reuse

let copy = users.clone()
let ref = &users // read-only
4

Parallel partitions own data

parallel for shard in users.partition(by: "region")
// No locks needed — compiler guarantees it
Under the Hood

Six-Stage Compilation Pipeline

Lexer
Parser
Semantic Analysis
TL-IR
Optimization
Code Generation
LLVM
Native
Cranelift
JIT
WASM
Web
CUDA
GPU

Built entirely in Rust. TL-IR doubles as a query plan — enabling data-aware optimizations like predicate pushdown, column pruning, and join reordering.

Benchmarks

Performance That Speaks

CSV Parse 1B rows
Python
45s
TL
<4s
Filter + Aggregate
Python
30s
TL
<2s
ETL Pipeline
Python
5min
TL
<30s
Stream Processing
Python
10K/s
TL
500K/s
Cold Start
Python
3-5s
TL
<100ms

End-to-End ML Pipeline

TL's compiler sees the entire pipeline as one program — eliminating serialization boundaries between tools.

Python Stack (~275s)
pandas.read_csv()45s
DataFrame transforms30s
df.to_numpy()5s
xgboost.train()120s
model.predict()15s
pandas.to_sql()60s
TL Pipeline (~120s)
load + filter + with4s
train xgboost { ... }110s
predict + with + save6s
Speedup~2.3x
Zero-copy Arrow handoff. No serialization boundaries.

Targets based on architecture analysis. Benchmarks will be published with reproducible scripts.

Error Handling

Data Is Messy. TL Handles It.

Rust-inspired result<T, E> with data-specific error types and declarative cleaning — not try/catch bolted on as an afterthought.

Result Types & Pattern Match
fn load_users(path: string)
    -> result<table<User>, DataError> {
    let raw = read_csv(path)?
    let valid = raw |> validate_schema(User)?
    Ok(valid)
}

match load_users("data.csv") {
    Ok(users) => process(users)
    Err(DataError::SchemaViolation(d))
        => alert("Drift: {d}")
    Err(e) => log("{e}")
}
Declarative Data Cleaning
let users = load("raw_users.csv")
    |> clean {
        nulls: {
            name: fill("UNKNOWN")
            email: drop_row
            age: fill(median)
        }
        duplicates: dedupe(by: email)
        outliers: {
            age: clamp(0, 150)
        }
    }
    |> validate {
        assert null_rate(email) == 0.0
        assert unique(id)
    }
Connectors

Connect to Everything

First-class connectors for databases, object storage, message queues, and APIs. All type-safe and schema-aware.

PostgreSQL
Shipped
MySQL
Shipped
SQLite
Shipped
Kafka
Shipped
AWS S3
Shipped
Redis
Shipped
GraphQL
Shipped
HTTP/REST
Shipped
Parquet/CSV
Shipped
Custom
User-Defined

BigQuery, Snowflake, and more connectors are planned for the production release.

Python Interop

Use Any Python Library

Bidirectional Python FFI via pyo3. Import Python modules, call functions, convert tensors to NumPy — all from TL code.

interop.tl
// Import any Python library
let np = py_import("numpy")
let pd = py_import("pandas")
let sklearn = py_import("sklearn.metrics")

// Call Python functions with TL values
let score = py_call(
    sklearn.accuracy_score,
    y_true, y_pred
)

// TL Tensor <-> NumPy ndarray
let pi = np.pi
let arr = np.sqrt(16)

Bidirectional Conversion

int, float, string, bool, list, map, set — all auto-converted between TL and Python.

Tensor ↔ NumPy

TL tensors convert seamlessly to/from NumPy ndarrays for ML workflows.

Dot Notation Access

Use natural math.sqrt(16) syntax on Python objects via method dispatch.

Feature-Gated

Python FFI is opt-in via feature flag. Zero overhead when not used.

Developer Experience

Everything You Need, Built In

terminal
$ tl init my-project
Created my-project/ with tl.toml
$ tl build
Compiled 12 modules in 0.3s
$ tl test
1,272 tests passed, 0 failed
$ tl add kafka-connector
Added kafka-connector v0.8
$ tl fmt && tl lint && tl check
Formatted 12 files. No warnings. Types OK.
$ tl doc src/ --public-only
Generated docs for 8 public modules
$ tl deploy pipeline.tl --target k8s
Deployed to cluster: prod-east

VS Code Extension & LSP

Syntax highlighting, diagnostics, go-to-definition, hover docs, and document symbols.

Package Manager

tl add, tl update, tl outdated — full dependency management with lockfile and transitive resolution.

Formatter, Linter & Type Checker

tl fmt, tl lint, tl check — AST-guided formatting, naming conventions, and compile-time type safety.

Doc Generation

tl doc generates HTML, Markdown, or JSON docs from /// doc comments with cross-references.

Data Inspection & Lineage

tl inspect, tl profile, tl lineage — preview data, statistical profiles, and lineage graphs.

Positioning

How TL Compares

The only language where data pipelines, SQL-like queries, ML training, and real-time streaming are all first-class features — not libraries.

Tool
Their Strength
TL's Advantage
Python
Largest ML/data ecosystem
10-50x faster, type-safe, compiled
Mojo
Compiled ML, Python superset
Better data engineering: pipelines, streaming, connectors
Rust
Max performance, memory safety
Domain-specific abstractions as primitives
DuckDB
Embedded analytics, great SQL
Full language, not just SQL — plus ML and streaming
Polars
Fastest DataFrame library
First-class syntax, integrated ML/streaming
Scala + Spark
Battle-tested distributed computing
Simpler syntax, faster single-node, no JVM
SQL / dbt
Declarative, universally understood
Full programming language + AI + streaming

Open Source.

ThinkingLanguage is dual-licensed under MIT and Apache 2.0.

MIT + Apache 2.0 Built with Rust 33 Phases Complete