Musk's Compute Centers Dissected: Colossus 1 and Colossus 2

Colossus refers to the two AI data centers that xAI[C1] (merged into SpaceX[C2] in February 2026, the combined entity valued at roughly $1.25 trillion) is building around Memphis, Tennessee, and Southaven, Mississippi.[24] Colossus 1 came online in July–September 2024 with roughly 230,000 GPUs and about 300 MW of installed power; Colossus 2 came online in January 2026, targeting 555,000 GPUs and 2 GW. This article dissects its suppliers, contracts, and controversies from the outside in across four spatial layers — the infrastructure layer (land, power supply, water supply, external networking), the internal environment layer (cooling, machine rooms, racks), the server layer (whole machines and OEM/ODM), and the component layer (GPU, CPU, interconnect) — and adds two dimensions that cut across the full stack: financing and downstream tenants. Five subsystem categories — power, cooling, compute, storage, networking — run vertically through every layer; networking in turn splits into the in-rack vertical scale-up (NVLink) and the cluster- and cross-campus scale-out (Ethernet fabric).

Liquid-cooled GPU server racks inside the xAI Colossus data center
Liquid-cooled GPU server racks inside the xAI Colossus data center (source: ServeTheHome[3])

Infrastructure layer: land, power, water, external networking

Land & buildings

Colossus 1 was converted from a 785,000-sqft abandoned Electrolux factory in South Memphis; the building was purchased by Phoenix Investors[C13] in December 2023 for $35 million. xAI was initially told a new-build data center would take 18–24 months, so it pivoted to an existing factory, settling on the site in about a week from seven or eight candidates — the fundamental precondition for its subsequent 122-day build.[2]

For Colossus 2, the xAI-affiliated entity CTC Property LLC purchased a 1-million-sqft warehouse on Tulane Road plus roughly 100 adjacent acres for nearly $80 million in March 2025; separately, the affiliated entity MZX Tech LLC acquired a former Duke Energy[C10] power plant asset in Southaven, Mississippi (114 acres, with power transmission lines, adjacent to TVA’s[C16] combined-cycle gas plant). A third building was purchased in late 2025 and named MACROHARDRR.[2][14]

Both the buildings and the parcels have public title records. The controversy is that residents, city council members, and environmental agencies had almost no advance knowledge — most learned of the project through local news.[2][31]

Power supply

On the grid side, the Colossus 1 site initially had only 8 MW of grid interconnection, upgraded by MLGW[C15] to 50 MW, and in November 2024 TVA approved an increase to roughly 150 MW; xAI committed to fund $24 million to build a 150 MW substation itself, to be handed over to MLGW once complete.[12]

On-site gas turbines are the core of power supply. Early temporary power came from VoltaGrid’s[C9] 14 mobile 2.5 MW units (35 MW total) and Solar Turbines’[C7] 16 MW SMT-130.[12] Scaled power is provided by Solaris Energy Infrastructure (SEI)[C6] on a power-as-a-service basis: its roughly 400 MW currently serves xAI, with xAI accounting for about 67% of its order book (roughly 1,140 MW), and the two formed a joint venture, Stateline Power (Solaris 50.1% / xAI 49.9%).[13][14] The turbines are manufactured by Caterpillar subsidiary Solar Turbines, in the SMT-130 and Titan-350 models (35–38 MW each).[2][7]

Battery storage is provided by Tesla[C8] Megapack (about 168–208 units at Colossus 1, about 200 units at Colossus 2), used to smooth the power fluctuations of training loads; Tesla’s filings disclose that xAI purchased about 191millionofMegapackin2024andanother191 million of Megapack in 2024 and another 36.8 million in the first two months of 2025.[13]

Power is where the project’s controversies are densest. April 2025 aerial photos showed Colossus 1 had already deployed 35 gas turbines on-site (about 422 MW combined), far exceeding the permitted 15, exploiting the loophole that “portable equipment at the same location for no more than 364 days is exempt from regulation.” The Southern Environmental Law Center (SELC) said the facility is likely Memphis’s largest industrial source of NOx emissions, at an estimated 1,200–2,000 tons per year; in January 2026 the EPA revised the New Source Performance Standards (NSPS), stipulating that large methane gas turbines require a permit even when operating temporarily; in April 2026 the NAACP sued xAI subsidiary MZX Tech, alleging it was illegally operating 27 methane gas turbines at Colossus 2.[2][16][14]

Water supply

The facility is expected to need more than 5 million gallons of water per day, drawn from the Mississippi River and the Memphis aquifer; xAI committed to building a roughly $80 million wastewater (graywater) recycling plant. Memphis’s peak electricity demand is about 3 GW, and SELC has written to TVA urging it to prioritize reliable power for residents.[2]

External and cross-campus networking

In-campus networking performance is extremely strong (see Section 4), but the link between campuses is another matter. According to Bloomberg, citing people familiar, xAI originally planned to link three campuses more than 10 miles apart into a single cluster for training, but was stymied by cross-campus latency and aging networking infrastructure — one of the technical motivations for later leasing out Colossus 1 wholesale.[21]

Internal environment layer: cooling, machine rooms, racks

Cooling system

Colossus 1 is primarily liquid-cooled: cold-plate direct cooling of the GPUs, with waste heat removed by rear-door heat exchangers, and a coolant distribution unit (CDU) at the bottom of each rack; this system is delivered integrated with the Supermicro[C4] whole machines.[3] Colossus 2 uses a hybrid approach: about half the cooling comes from xAI’s graywater facility, and the other half is air-cooled, with roughly 119 air-cooled chillers providing about 200 MW of cooling capacity by August 2025. Early in construction, xAI at one point rented out about a quarter of all mobile cooling capacity in the U.S. to race the schedule.[7][4]

Machine rooms and racks

In the H100/H200 generation, each rack holds 8 servers totaling 64 GPUs, and 8 racks form a group (512 GPUs), with the full system exceeding 3,000 GPU racks at about 80 kW/rack.[3] The Blackwell generation shifts to the GB200/GB300 NVL72 rack-scale architecture: 72 GPUs per rack, fully interconnected via NVLink, mandatorily liquid-cooled; a fully loaded Colossus 2 is estimated at about 7,700 compute racks, with per-rack power evolving from about 120 kW toward the megawatt level.[7]

Server layer: whole machines and OEM/ODM

Dual OEM: Supermicro and Dell

Colossus’s servers are supplied by two vendors, Supermicro and Dell[C5], roughly splitting the volume in half.[42] Supermicro supplies 4U Universal GPU liquid-cooled servers, 1U storage nodes, CDUs, and racks, pre-assembled in San Jose before shipping to the site; Dell co-builds servers and systems, with the Blackwell generation using its IR7000 GB200 NVL72 rack.[3][41]

On contracts, Dell’s GB200 server contract exceeds 5billionandwaspubliclyconfirmedinits8KMichaelDellsaidthesignedordersincludingxAIbroughtAIserverbacklogtoabout5 billion and was publicly confirmed in its 8-K — Michael Dell said the signed orders including xAI brought AI server backlog to about 9 billion.[37][38] Supermicro’s contract value and share are undisclosed.[3][6]

Whole-machine form factors: HGX, NVL72, and DGX

xAI uses NVIDIA’s[C3] rack-scale architecture but does not purchase NVIDIA-branded finished whole machines. In the H100/H200 phase it uses NVIDIA’s HGX 8-GPU baseboard (with NVLink/NVSwitch integrated on the board), assembled into servers by the OEMs; in the Blackwell phase it uses NVIDIA’s GB200/GB300 NVL72 rack-scale reference design, with NVIDIA supplying the core parts (Blackwell/Grace, NVLink switch trays, network cards) and the OEMs assembling them into full racks (Supermicro SuperCluster, Dell IR7000, at about $3.7 million per rack); NVIDIA-branded DGX/DGX SuperPOD whole machines are not adopted.[41] There are three reasons for choosing OEM integration over DGX: avoiding the DGX brand premium; ease of customization (xAI uses Spectrum-X Ethernet rather than DGX’s default InfiniBand, and needs a specific liquid-cooling solution); and parallel supply from multiple OEMs to speed deployment and spread capacity bottlenecks.[3]

Storage

The storage platform runs on Supermicro hardware, with VAST Data[C11] as the mainstay (the company self-identifies as Colossus’s data platform); DDN[C12] is also involved (it claims to be the primary storage, and NVIDIA executive Dion Harris publicly named it), but no DDN equipment was seen in ServeTheHome’s teardown video. Both are private companies, contract terms are undisclosed, and there is a dispute over each one’s “primary storage” claim.[10][11]

Component layer: GPU, CPU, interconnect

GPU

The GPUs are supplied by NVIDIA. Colossus 1 has 150,000 H100, 50,000 H200, and 30,000 GB200 (about 230,000 total); Colossus 2 is primarily GB200/GB300, targeting 555,000, with at least 110,000 GB200 NVL72 in the first batch.[2][7]

The hybrid architecture constitutes Colossus 1’s core engineering flaw. Mixing three chip generations (a passive result of the extreme schedule), combined with distributed training requiring each card to synchronize step by step, means the fast GB200 must wait for the slow H100 — the straggler effect — which is exponentially amplified at the 220,000-card scale, dragging Colossus 1’s model FLOPs utilization (MFU) down to about 11% (the industry production-grade level is 35–45%). This is the fundamental technical reason xAI moved training to the pure-Blackwell Colossus 2 and freed up Colossus 1 to lease out externally.[5][21]

CPU

The H100/H200 phase uses x86 CPUs (two per HGX server) rather than NVIDIA’s own ARM-architecture Grace, mainly because schedule took priority and the x86 software environment is mature; the GB200/GB300 phase introduces NVIDIA’s Grace CPU alongside NVL72 (every 2 Grace paired with 4 Blackwell).[3][7]

Interconnect and networking

In-rack vertical interconnect (scale-up) uses NVIDIA NVLink, connecting 72 GPUs within an NVL72 into a single coherent domain.[7] Inter-rack horizontal interconnect (scale-out) uses NVIDIA Spectrum-X Ethernet rather than InfiniBand: the switch is the SN5600 (51.2 Tbps, providing 64 800GbE ports in 2U), and the network card is the BlueField-3 SuperNIC; each GPU gets a dedicated 400GbE network card, with one more added per server, giving each HGX H100 server 3.6 Tbps of Ethernet bandwidth. According to NVIDIA, at the hundred-thousand-GPU scale Spectrum-X achieves about 95% data throughput with zero flow-collision packet loss, versus only about 60% for traditional Ethernet.[1]

Capital and financing

xAI completed a 20billionSeriesEroundinJanuary2026(originallytargeting20 billion Series E round in January 2026 (originally targeting 15 billion, oversubscribed), at a valuation of about 230billion,makingittheworldsthirdlargestAIstartupatthetime;thestructurewasaGPUcollateralizedSPV(about230 billion, making it the world's third-largest AI startup at the time; the structure was a GPU-collateralized SPV (about 7.5 billion equity + 12.5billiondebt),withstrategicinvestorsincludingNVIDIA(planninguptoabout12.5 billion debt), with strategic investors including NVIDIA (planning up to about 2 billion) and Cisco[C20].[17][18][19]

GPU procurement is done through a dedicated leasing structure. Valor Compute Infrastructure (VCI), under Valor Equity Partners[C18], purchased NVIDIA GB200 for about 5.4 billion and leases it to xAI subsidiaries on a triple-net lease basis; within this, Apollo[\[C17\]](#co-17) provides 3.5 billion of debt and NVIDIA acts as an anchor limited partner with about 1.9billion,sothatneitherbuyernorsellerrecordstheassetonitsownbalancesheet.Apollodeployedabout1.9 billion, so that neither buyer nor seller records the asset on its own balance sheet. Apollo deployed about 7 billion into xAI-related transactions within five weeks.[35][36] In addition, xAI has another roughly 18 billion, roughly 300,000-GPU direct purchase commitment, whose relationship to the above SPV is not fully disclosed; separately, in July 2025 Morgan Stanley[\[C19\]](#co-19) arranged 5 billion of debt (plus $5 billion of equity).[19]

This structure has circular-financing features: NVIDIA both sells GPUs to the SPV and holds equity in it, Apollo issues debt through its insurance subsidiary Athene, and Google is both a SpaceX shareholder and its compute tenant. Investor Michael Burry has publicly questioned the NVIDIA/xAI deal, calling it “fugazi.”[36][39]

In 2026, Colossus shifted from an in-house factory for xAI to an operating model of leasing out compute externally. SpaceX went public on the Nasdaq on June 12, 2026 (ticker SPCX): priced at 135foravaluationofabout135 for a valuation of about 1.77 trillion — the largest IPO in history — and closed its first day up 19% at $161 (after a confidential April filing and a public S-1 on May 20). This section lists, in one place, the public contracts on both sides of the two data centers: the customer side (leasing out compute) and the supplier side (procuring equipment, land, power, and capital for the data centers).

Customer side: downstream compute leases (all publicly searchable)

CustomerContract contentTerm and amountSource
Anthropic[C21]Compute spanning Colossus and Colossus II, about 220,000 GPUs and 300 MW, running Claude inference1.25billion/month,throughMay2029,about1.25 billion/month, through May 2029, about 45 billion total; 90-day cancellation either way[33][20][22]
Alphabet / Google[C22]About 110,000 GPUs plus accompanying CPU/memory920million/month,October2026June2029,about920 million/month, October 2026–June 2029, about 30 billion total; 90-day cancellation[24][25]
Anysphere (Cursor)[C23]Uses xAI compute; on June 16, 2026, SpaceX announced it would exercise its acquisition right to buy Anysphere all-stock (valued at $60 billion), expected to close in Q3 2026Includes an $8.5 billion deferred service fee; partnership disclosed in April 2026[34][29]

The Anthropic and Google leases together total about 2.17billion/month(about2.17 billion/month (about 26 billion/year), roughly 13 times Grok’s annual subscription and API revenue.[28] On verification, no compute contracts between SAP or other vendors and Colossus were found; the agreements with Amazon, Microsoft, and Nvidia mentioned in Anthropic’s announcement belong to its broader compute sourcing and are not Colossus contracts; the specific campus Google leases is not named in the filings.[25][20]

Supplier side: publicly searchable contracts

SupplierContract contentAmountLayerSource
NVIDIA (via Valor VCI)GB200 triple-net leaseSPV about 5.4B(Apollo5.4B (Apollo 3.5B debt + NVIDIA about $1.9B equity)Component layer[35][36]
DellGB200 serversOver $5B (confirmed by 8-K)Server layer[37][38]
Solaris (SEI)About 1,140 MW of turbines + Stateline JVJV has invested about $112MInfrastructure layer[14]
TeslaMegapack storageAbout 191Min2024+191M in 2024 + 36.8M in first two months of 2025Infrastructure layer[13]
Phoenix InvestorsColossus 1 building$35MInfrastructure layer[2]
CTC Property / MZX TechLand parcelsAbout $80M for the Colossus 2 areaInfrastructure layer[14]
Morgan StanleyArranged debt$5BCapital layer[19]

Supplier side: known to exist, terms undisclosed

Supermicro (whole machines and cooling), VAST Data and DDN (storage), Solar Turbines/Caterpillar (turbines), VoltaGrid (units), xAI’s direct GPU purchase commitment (about $18 billion), Introl[C14] (cabling subcontract), and MLGW and TVA (power, water, substation) are all confirmed as involved, but their contract amounts, shares, or terms are undisclosed.[3][6][10][11][12][19][30]

Business model assessment

Commentators have debated whether this is a “frontier lab or a compute REIT”: xAI’s pretraining team is reported to have shrunk to fewer than 5 people, Grok’s roughly 2billionin2026productrevenueisinvertedagainstthe2 billion in 2026 product revenue is inverted against the 230 billion valuation, and rent has become the core support for the valuation. One irony: Anthropic revoked xAI’s API access in January 2026 because xAI engineers used Claude output to train their own models, yet four months later the two signed a compute contract worth about $15 billion a year.[26][28]

Controversies and risks

Environmental and legal: Colossus 1’s 35 unpermitted gas turbines (only 15 permitted), Colossus 2’s 27 turbines (NAACP lawsuit), Clean Air Act litigation, the EPA’s January 2026 loophole-closing rule, an estimated 1,200–2,000 tons/year of NOx, water consumption, and community health.

Engineering: the engineering debt from the heterogeneous hybrid architecture and cross-campus latency has materially damaged Colossus 1’s training value (MFU about 11%).

Capital: burning over 1billionamonth,withabout1 billion a month, with about 12.5 billion of the Series E being GPU-collateralized debt, the 230billionvaluationinvertedagainstabout230 billion valuation inverted against about 2 billion in product revenue, and a fragile circular-financing structure.

Commercial: the “frontier lab” versus “compute REIT” characterization debate; the trust irony of Anthropic revoking API access and then renting its compute.

Data caveats: many figures are affected by NDAs, vendor self-reporting, and the gap between “nameplate installed” and “actually running” (for example, Colossus 1’s nameplate is about 300 MW while actual running in the S-1 is about 130 MW; Colossus 2’s actual running is about 210 MW). Any single number should be cross-verified.[23]

Referenced companies

Lists the companies mentioned in the text along with their official sites and listing status (U.S.-listed ones note the exchange and ticker; private companies, municipal, or federal entities are noted separately).

  • [C1] xAI — x.ai — Private (merged into SpaceX in February 2026); includes affiliated entities CTC Property LLC and MZX Tech LLC
  • [C2] SpaceX (SpaceXAI) — spacex.com — Listed June 12, 2026, NASDAQ: SPCX (IPO valuation about $1.77 trillion, the largest ever)
  • [C3] NVIDIA — nvidia.com — NASDAQ: NVDA
  • [C4] Super Micro Computer (Supermicro) — supermicro.com — NASDAQ: SMCI
  • [C5] Dell Technologies — dell.com — NYSE: DELL
  • [C6] Solaris Energy Infrastructure — solaris-energy.com — NYSE: SEI (formerly Solaris Oilfield, ticker SOI)
  • [C7] Caterpillar (subsidiary Solar Turbines, solarturbines.com)——) caterpillar.com — NYSE: CAT
  • [C8] Tesla — tesla.com — NASDAQ: TSLA
  • [C9] VoltaGrid — voltagrid.com — Private (Houston)
  • [C10] Duke Energy — duke-energy.com — NYSE: DUK (former owner/seller of the C2 parcel)
  • [C11] VAST Data — vastdata.com — Private
  • [C12] DDN (DataDirect Networks) — ddn.com — Private
  • [C13] Phoenix Investors — phoenixinvestors.com — Private (C1 building owner)
  • [C14] Introl (Introl Solutions) — introl.com — Private (cabling subcontract)
  • [C15] Memphis Light, Gas and Water (MLGW) — mlgw.com — Memphis municipal utility (not investable)
  • [C16] Tennessee Valley Authority (TVA) — tva.com — U.S. federal agency (issues bonds, not stock)
  • [C17] Apollo Global Management — apollo.com — NYSE: APO
  • [C18] Valor Equity Partners (with subsidiary Valor Compute Infrastructure) — valorep.com — Private
  • [C19] Morgan Stanley — morganstanley.com — NYSE: MS
  • [C20] Cisco Systems — cisco.com — NASDAQ: CSCO
  • [C21] Anthropic — anthropic.com — Private
  • [C22] Alphabet (Google) — abc.xyz — NASDAQ: GOOGL / GOOG
  • [C23] Anysphere (Cursor) — cursor.com — Private (being acquired by SpaceX, expected to close in Q3 2026)

References