RACK-SCALE INFRASTRUCTURE · PACIFIC NORTHWEST

Rack-Scale Inference and Fine-Tuning, Operated for You

GB200 NVL72 in the Pacific Northwest — for teams that need a full rack, not just a GPU.


You build the model. Mantle runs the rack.

The workloads defining this moment — sophisticated inference at rack scale, enterprise fine-tuning, research-lab experimentation — need more than a GPU-hour. They need rack-scale compute: full GB200 NVL72 cabinets, not fractional GPUs, plus storage to feed them and a platform operator that keeps the system running.

That is what Mantle is building and scaling. Our initial deployment starts with two NVL72 racks, 144 Blackwell GPUs, and 1.2 PB of all-flash storage — a managed platform with Kubernetes, bare metal, networking, and a full software stack, available now.

We are entering the market with live capacity, a team that has built hyperscale AI infrastructure before, and a growth path that extends well beyond what is running today. This post is about who we built for, what a full rack unlocks, and how to get started.

§ 01  —  WORKLOADS THAT CONSUME A FULL RACK

Workloads that consume a full rack

Mantle is built for teams whose unit of compute is the NVL72 cabinet — 72 GPUs in one NVLink domain, 13.5 TB of unified HBM, operated as a single system. Our ideal customers need the performance of full racks: large-model inference, sustained fine-tuning, and training runs scoped to rack-scale capacity.

§ 01 — WORKLOADS THAT CONSUME A FULL RACK Built for teams whose unit of compute is the cabinet. Inference at rack scale SERVE THE LARGEST MODELS High-throughput serving of 400B–600B+ MoE models with expert parallelism Long-context inference (128K–1M+ tokens) with coherent KV cache at rack scale Production platforms consolidating large models onto isolated capacity Largest open- and closed-weight models on infrastructure built for them Fine-tuning at rack scale TRAIN ON DEDICATED CAPACITY Enterprise fine-tuning of 70B–400B+ models on proprietary data Full-parameter, LoRA, or continued pre-training Research-lab experimentation — ablations, architecture & dataset sweeps Training runs scoped to a single NVL72 rack FULL NVL72 DOMAIN · 72 GPUS · 13.5 TB COHERENT HBM · ONE OPERATOR

The platform supports multi-tenancy — workloads can start at the tray level and grow into larger allocations. We built Mantle for customers who will ultimately require full racks, because that is where NVL72 delivers what no smaller configuration can: one coherent domain, one integrated platform, one operator.

§ 02  —  WHAT IS LIVE NOW

Two racks. Live today.

§ 02 — WHAT IS LIVE NOW 144 BLACKWELL GPUS 2 NVL72 LIVE RACKS 1.8 TB/s NVLINK PER GPU 4 ×200G FABRIC PER TRAY GB200 NVL72 · 13.5 TB COHERENT HBM/RACK · 1.2 PB VAST ALL-FLASH · PACIFIC NORTHWEST
GPUs144 × Blackwell
Racks2 × GB200 NVL72
Coherent HBM / rack13.5 TB
Storage1.2 PB VAST all-flash
North–south fabric4×200G / tray · redundant
LocationPacific Northwest

Platform, day one

Dedicated GPU Kubernetes or bare-metal access

Per-tenant VAST storage — namespace, VMS admin, CSI mounts

Mantle-managed stack: DGX OS 7.5, CUDA 13.0, NCCL 2.30, GPU operator

Per-tenant private registry, network isolation, secure access

Container images target linux/arm64 (Grace CPU)

Direct engineering support — further customization on request

§ 03  —  WHY A FULL RACK, NOT A BOX OF GPUS

Why a full rack, not a box of GPUs

An 8×Blackwell node uses the same chips. The difference is what fits in one coherent domain. Inside an NVL72 rack, all 72 GPUs share NVLink at 1.8 TB/s per GPU and 130 TB/s aggregate13.5 TB of unified HBM as a single memory space.

§ 03 — RACK-SCALE DOMAIN Same silicon. A full rack is a different machine. 8 × B200 NODE Domain stops at 8 GPUs 8 GPUS PER DOMAIN Interconnect penalty across nodes GB200 NVL72 RACK 72 GPUs in one NVLink domain 72 GPUS 13.5 TB UNIFIED HBM 1.8 TB/S PER GPU ONE COHERENT DOMAIN · 130 TB/S AGGREGATE NVLINK · NO INTERCONNECT PENALTY PER STEP

For inference, that means serving a large MoE model in one rack — expert parallelism on one fabric, without an interconnect penalty on every step. For fine-tuning, it means tensor and pipeline parallelism that stays inside the NVLink domain for the life of the run.

A full rack is a different machine. That is what Mantle operates.

§ 04  —  A PLATFORM, NOT A SKU

A platform, not a SKU

Renting bare GPUs gives you silicon. Mantle gives you a rack-scale system — compute, storage, fabric, orchestration, and operations integrated and running.

§ 04 — A PLATFORM, NOT A SKU You keep the top of the stack. Mantle runs the rest. YOU Data Models Fine-tuning Inference — HANDOFF — MANTLE Land Building Power Cooling Racks Fabric Storage Platform ops COMPUTE · STORAGE · FABRIC · ORCHESTRATION · OPERATIONS — INTEGRATED AND RUNNING

Storage is included and integrated — not a separate procurement. Every tenant gets a dedicated namespace wired into the cluster from day one.

Networking is engineered for the rack — 4×200G per tray, redundant leaf–spine, on a converged compute/storage network.

Operations are Mantle's responsibility — telemetry across GPUs, fabric, cooling, and storage; automation that detects, isolates, and remediates before your job does.

You keep the top of the stack — data, models, training code, inference services, product logic.

BUILT WITH   NVIDIA  ·  SUPERMICRO  ·  VAST
§ 05  —  AVAILABLE NOW, AND BUILT TO GROW

Available now — and built to grow

Two liquid-cooled NVL72 racks are live at our first data center in the Pacific Northwest. The platform is operational and capacity is available now.

Standing up GB200 NVL72 at rack scale typically takes the industry 12–18 months from decision to production. Mantle did it twice as fast — from empty floor to a running, multi-tenant platform with storage, fabric, and managed software included.

That speed reflects the team behind the build. Two racks are running today. An additional 2 MW of capacity comes online through 2026, with more power and locations beyond that. Over the next 18–24 months, Mantle is expanding through a pipeline of land, expertise, and purpose-built data center development — vertically integrated capacity designed for AI workloads.

§ 06  —  WHO WE ARE

Who we are

Mantle is a vertically integrated business spanning data center development and GPU-as-a-service.

75+
HYPERSCALE DATA CENTERS
7+ GW
IN OPERATION
FASTER THAN INDUSTRY BUILD TIME

The team behind Mantle has delivered 75+ hyperscale data centers, 7+ GW in operation, and spent careers building the infrastructure that powers large-scale AI. That is how we stood up rack-scale Blackwell capacity in months while the market is still queuing. Mantle is pointed forward: inference and fine-tuning at rack scale, with more capacity, more locations, and more platform depth ahead.

§ 07  —  GET STARTED

Get started

If your team is running — or planning — rack-scale inference or fine-tuning on the largest models, Mantle is available now.

Our platform supports multi-tenancy and tray-level entry; our ideal engagements scale to full NVL72 cabinets and beyond as your workload grows.

We are one of the companies that can deliver full-rack Blackwell performance — operated, integrated, and live. We would like to be your infrastructure partner.

144 BLACKWELL GPUS · AVAILABLE NOW
PACIFIC NORTHWEST  ·  Request compute →