They Spent $100 Billion Training AI. They Forgot 8th Grade Math. 😁

Playback speed

Share post at current time

Share from 0:00

0:00

They Spent $100 Billion Training AI. They Forgot 8th Grade Math. 😁

The Hessian has been screaming for 75 years. Nobody listened.

Jose Crespo PhD

Dec 02, 2025

Your Tesla phantom brakes for no reason.

ChatGPT confidently tells you about your dead grandmother’s favorite recipe. She’s not dead. You don’t have a grandmother named Ethel.

Your model? Trained for three weeks, beautiful loss curve, ships to prod, immediately falls on its face.

What the hell is going on?

Same root cause. Every single time.

Here’s the thing nobody tells you: there are three numbers that predict whether training will work or blow up in your face. Whether inference will be stable or hallucinatory. Whether your minimum is real or a trap.

Condition number. Ratio of largest to smallest Hessian eigenvalue. High means your optimizer is zigzagging through a canyon instead of descending a bowl.

Eigenvalue magnitude. Big eigenvalues = sharp minimum = your model memorized noise and will choke on real data. Small = flat minimum = actually learned something generalizable.

Negative eigenvalue count. If any are negative, you’re not at a minimum. You’re at a saddle point. Gradient says “we’re done here” but you’re stuck on a ridge.

This math has existed since 1950.

It’s not in PyTorch. Not in TensorFlow. Not in JAX. Not anywhere in your stack.

You know what your framework shows you? Loss value. Gradient. Learning rate.

That’s it. That’s the whole dashboard.

You’re flying a 747 with a speedometer and good intentions.

OpenAI doesn’t compute this. Google doesn’t compute this. Nobody running a $400K/month GPU cluster is looking at eigenvalues. They’re looking at loss curves and praying. 💀

Why? Because the Hessian is huge — n² entries for n parameters. Except... you don’t need the full matrix. A Hungarian physicist named Cornelius Lanczos figured out how to extract the important eigenvalues in 1950. Linear time. Basically free.

Seventy-five years later, still not productized.

The industry scaled to $100 billion in compute. Nobody spent a weekend adding spectral diagnostics to the training loop.

I wrote the whole thing up: what the Hessian actually tells you, why frameworks ignore it, and the 75-year-old algorithms that could fix training failures before they happen.

→ The Missing Mathematics of AI

If you’ve ever watched a loss plateau for several hours wondering whether to kill the run or wait — this essay is the answer you didn’t have.

Send it to your ML team. Or don’t. Keep flying blind. Your call.

They Spent $100 Billion Training AI. They Forgot 8th Grade Math. 😁

Discussion about this video

Ready for more?