RACK-SCALE INFRASTRUCTURE · PACIFIC NORTHWEST

Rack-Scale Inference and Fine-Tuning, Operated for You

GB200 NVL72 in the Pacific Northwest — for teams that need a full rack, not just a GPU.

You build the model. Mantle runs the rack.

The workloads defining this moment — sophisticated inference at rack scale, enterprise fine-tuning, research-lab experimentation — need more than a GPU-hour. They need rack-scale compute: full GB200 NVL72 cabinets, not fractional GPUs, plus storage to feed them and a platform operator that keeps the system running.

That is what Mantle is building and scaling. Our initial deployment starts with two NVL72 racks, 144 Blackwell GPUs, and 1.2 PB of all-flash storage — a managed platform with Kubernetes, bare metal, networking, and a full software stack, available now.

We are entering the market with live capacity, a team that has built hyperscale AI infrastructure before, and a growth path that extends well beyond what is running today. This post is about who we built for, what a full rack unlocks, and how to get started.

§ 01 — WORKLOADS THAT CONSUME A FULL RACK

Workloads that consume a full rack

Mantle is built for teams whose unit of compute is the NVL72 cabinet — 72 GPUs in one NVLink domain, 13.5 TB of unified HBM, operated as a single system. Our ideal customers need the performance of full racks: large-model inference, sustained fine-tuning, and training runs scoped to rack-scale capacity.

The platform supports multi-tenancy — workloads can start at the tray level and grow into larger allocations. We built Mantle for customers who will ultimately require full racks, because that is where NVL72 delivers what no smaller configuration can: one coherent domain, one integrated platform, one operator.

§ 02 — WHAT IS LIVE NOW

Two racks. Live today.

GPUs144 × Blackwell

Racks2 × GB200 NVL72

Coherent HBM / rack13.5 TB

Storage1.2 PB VAST all-flash

North–south fabric4×200G / tray · redundant

LocationPacific Northwest

Platform, day one

Dedicated GPU Kubernetes or bare-metal access

Per-tenant VAST storage — namespace, VMS admin, CSI mounts

Mantle-managed stack: DGX OS 7.5, CUDA 13.0, NCCL 2.30, GPU operator

Per-tenant private registry, network isolation, secure access

Container images target linux/arm64 (Grace CPU)

Direct engineering support — further customization on request

§ 03 — WHY A FULL RACK, NOT A BOX OF GPUS

Why a full rack, not a box of GPUs

An 8×Blackwell node uses the same chips. The difference is what fits in one coherent domain. Inside an NVL72 rack, all 72 GPUs share NVLink at 1.8 TB/s per GPU and 130 TB/s aggregate — 13.5 TB of unified HBM as a single memory space.

For inference, that means serving a large MoE model in one rack — expert parallelism on one fabric, without an interconnect penalty on every step. For fine-tuning, it means tensor and pipeline parallelism that stays inside the NVLink domain for the life of the run.

A full rack is a different machine. That is what Mantle operates.

§ 04 — A PLATFORM, NOT A SKU

A platform, not a SKU

Renting bare GPUs gives you silicon. Mantle gives you a rack-scale system — compute, storage, fabric, orchestration, and operations integrated and running.

Storage is included and integrated — not a separate procurement. Every tenant gets a dedicated namespace wired into the cluster from day one.

Networking is engineered for the rack — 4×200G per tray, redundant leaf–spine, on a converged compute/storage network.

Operations are Mantle's responsibility — telemetry across GPUs, fabric, cooling, and storage; automation that detects, isolates, and remediates before your job does.

You keep the top of the stack — data, models, training code, inference services, product logic.

BUILT WITH NVIDIA · SUPERMICRO · VAST

§ 05 — AVAILABLE NOW, AND BUILT TO GROW

Available now — and built to grow

Two liquid-cooled NVL72 racks are live at our first data center in the Pacific Northwest. The platform is operational and capacity is available now.

Standing up GB200 NVL72 at rack scale typically takes the industry 12–18 months from decision to production. Mantle did it twice as fast — from empty floor to a running, multi-tenant platform with storage, fabric, and managed software included.

That speed reflects the team behind the build. Two racks are running today. An additional 2 MW of capacity comes online through 2026, with more power and locations beyond that. Over the next 18–24 months, Mantle is expanding through a pipeline of land, expertise, and purpose-built data center development — vertically integrated capacity designed for AI workloads.

§ 06 — WHO WE ARE

Who we are

Mantle is a vertically integrated business spanning data center development and GPU-as-a-service.

75+

HYPERSCALE DATA CENTERS

7+ GW

IN OPERATION

2×

FASTER THAN INDUSTRY BUILD TIME

The team behind Mantle has delivered 75+ hyperscale data centers, 7+ GW in operation, and spent careers building the infrastructure that powers large-scale AI. That is how we stood up rack-scale Blackwell capacity in months while the market is still queuing. Mantle is pointed forward: inference and fine-tuning at rack scale, with more capacity, more locations, and more platform depth ahead.

§ 07 — GET STARTED

Get started

If your team is running — or planning — rack-scale inference or fine-tuning on the largest models, Mantle is available now.

Our platform supports multi-tenancy and tray-level entry; our ideal engagements scale to full NVL72 cabinets and beyond as your workload grows.

We are one of the companies that can deliver full-rack Blackwell performance — operated, integrated, and live. We would like to be your infrastructure partner.

144 BLACKWELL GPUS · AVAILABLE NOW

PACIFIC NORTHWEST · Request compute →