Build First Brain Journal

Why AI Video Hallucinates Physics (and How to Spot It)

The machine renders a convincing surface over a world it does not understand. The cracks are physical.

Why AI Video Hallucinates Physics (and How to Spot It)
TL;DR

The most reliable way to detect AI-generated video is not a detector tool, which lags every new model, but physics. AI video learns which pixels tend to follow which, not how the world works, so it routinely breaks gravity, object permanence, and cause and effect: objects morph or vanish, shadows disagree with the light, actions leave no consequence. Provenance signals like C2PA help when present. The durable detector is a mind with a strong physical model of reality.

How to detect AI-generated video: start with physics

The fastest, most durable way to spot an AI-generated video is not a detector app. It is physics. An AI video model does not understand the world; it has learned, from enormous amounts of footage, which pixels tend to follow which other pixels. That is a statistical model of appearance, not a model of reality, and the gap shows up wherever the real world obeys a rule the model never actually learned.

This is not a critic’s guess. When OpenAI introduced its video model, it acknowledged the model’s struggles with simulating complex physics, understanding cause and effect, and keeping objects consistent, the often-cited example being a character who bites a cookie that then shows no bite mark. The cognitive scientist Gary Marcus catalogued the deeper pattern in Sora’s surreal physics: objects that vanish, deform, or multiply, a gross and general failure of object permanence that even a young infant has mastered. The machine renders a plausible-looking surface over a world it does not comprehend.

The checklist of physical tells

Because the failure is physical, the detection is too. Run a clip against the rules the world never breaks.

What to testHow the real world behavesThe AI’s typical failure
Object permanenceObjects persist when briefly hiddenThings appear, vanish, deform, or multiply
Gravity and momentumMass falls and moves predictablyFloating, sliding, weightless or jerky motion
Shadows and reflectionsMatch the light source and the sceneDisagree with the light, or are missing entirely
Cause and effectA bite leaves a mark, a foot leaves a printActions that leave no consequence behind
Fine structureHands, teeth, and text stay stableExtra fingers, melting letters, shifting detail

You do not need all five. One clear violation of physics is usually enough, and it is far more reliable than squinting at skin texture, because newer models have cleaned up the old surface-level tells while forensic detection still leans on temporal coherence and physics precisely because those are the hardest things to fake.

Why detector tools and watermarks are not enough

Two tempting shortcuts both fall short. The first is an AI-detector tool, but detectors lag every new model and the targets keep moving. OpenAI’s own update notes that a newer model improved at physics and that object permanence began to emerge from sheer scale, which means today’s reliable tell can quietly disappear in the next release. The second is provenance: standards like C2PA attach a verifiable record of where a file came from and how it was edited, and they genuinely help. But missing or stripped credentials are not proof that a video is fake or real, so provenance is a strong signal, not a verdict.

Both shortcuts share the same flaw. They try to move the judgment outside your head, onto a tool or a label, exactly when the tools are in a losing race with the generators. You cannot fully outsource this.

The real detector is a physical model of reality

Spotting the hallucination is, at bottom, an act of comparison. You notice the floating cup or the impossible shadow because you hold a dense, accurate internal model of how the world behaves, and the clip violates it. The richer that model, the faster and more confidently you catch the error, even on footage no detector has been trained against yet. The skill is not media literacy as a checklist; it is a well-built First Brain with a strong grip on physical reality.

That is the same lesson we drew about text in bypassing the AI sludge web: when the flood is endless and the tools lag, the filter has to live in your head. It helps to understand why these systems hallucinate in the first place, which is a matter of how large language and generative models work and whether they understand what they produce. They do not; they model patterns, not meaning. Your defense is to model meaning better than they do, through the connecting work of cognitive mapping. That is the argument of Building Your First Brain, free for the first 1,000 readers.

Frequently asked questions

How do you detect AI-generated video?

Test its physics. AI video models learn pixel patterns, not how the world works, so they break object permanence, gravity, shadows, and cause and effect: objects vanish or multiply, things float, shadows disagree with the light, and actions leave no consequence. Detector tools and provenance signals like C2PA help but lag and are not definitive. As Building Your First Brain by Lawrence Arya argues, the durable detector is a mind with a strong physical model of reality, a First Brain.

Why does AI video look wrong or weird?

Because the model is rendering a plausible-looking surface over a world it does not understand. It predicts what pixels usually come next rather than simulating real objects, so it produces convincing texture but breaks the underlying rules, hence morphing objects, impossible motion, and effects without causes.

Can AI video detectors be trusted?

Only partially. Detectors lag behind each new generation of models, and the tells they rely on can vanish as the models improve. They are a useful input, not a verdict. Combining them with provenance checks, source history, and your own physical judgment is far more reliable than trusting any single tool.

What is the easiest sign of an AI video?

A clear physics violation, usually object permanence: an object that appears, disappears, deforms, or duplicates between frames, or an action that leaves no consequence, like eating without a bite mark. One unmistakable break in the world’s rules is the strongest single tell.

Does C2PA prove a video is real?

No. C2PA provenance records where a file came from and how it was edited, which is valuable when present, but missing or stripped credentials do not prove a video is fake, and their presence does not guarantee authenticity. Treat provenance as a strong signal to weigh alongside physics and source checks, not as proof on its own.

Tagged Ai VideoDeepfake DetectionPhysicsFirst BrainSynthetic Media
Copy as Markdown ↗ ← All posts