xAI Used Claude for Distillation — The Bigger Problem

xAI reportedly tapped Claude models to train Grok after being cut off. The scandal isn't the practice — it's how common it probably is.

According to The Information, xAI used Claude models for distillation and training purposes — specifically through personal accounts and the intermediary service Blackbox AI after Anthropic cut them off. This is being framed as a scandal. But the real story is more mundane: this is almost certainly happening across the entire industry.

What Distillation Actually Is

Model distillation is the practice of training a smaller model to mimic a larger one's outputs. It's how the industry has always worked — take a frontier model, use its responses as training data, and you get a cheaper model that inherits much of its capability. Every major AI lab does this. The only difference is whether anyone catches them.

"Every major AI lab does this. The only difference is whether anyone catches them."

The interesting part of this story isn't the distillation itself — it's the method. xAI allegedly used personal accounts and a third-party service (Blackbox AI) to access Claude after being cut off. That's not sophisticated reverse engineering. That's improvising with whatever API was still open.

Why It Matters Anyway

Despite it being common practice, the xAI story matters for three reasons:

First, it shows the dependency. xAI — with billions in funding and access to some of the world's most powerful compute — still found Claude valuable enough to work around restrictions. That's a stronger endorsement than any benchmark.

Second, it's a diplomatic problem. Anthropic embedded engineers inside the NSA for offensive cyber operations. They have a government relationship that's evolving. And now xAI is revealed to have been stealing Claude for training. The timing is awkward.

Third, it exposes the gap between stated values and actual behavior. Every lab talks about safety and responsible development. Few volunteer that they built their models on outputs from competitors' APIs.

The Takeaway

Expect more of these stories. As AI development accelerates, so does the temptation to shortcut. Frontier models are expensive to train, and distilled copies aren't. The xAI story is a data point in a much larger pattern — not an anomaly.

The industry will keep this quiet where it can. But the Anthropic vs. xAI dynamic just got a lot more interesting.

Data via TEXXR