In July 2025 researchers unveiled an AI called Centaur, built on a standard large‑language model and fine‑tuned with data from psychological experiments. The goal was to make the machine mimic human‑like thinking—handling everything from simple choices to complex executive‑control tasks. In early trials Centaur appeared to breeze through 160 different cognitive tests, sparking headlines that it might be a step toward true artificial reasoning. But a fresh analysis now warns that the hype may be premature. The study found that while Centaur could produce the right answers, it often did so by spotting statistical patterns rather than grasping the underlying concepts—much like a student who memorizes test formats without understanding the material. The researchers liken this to a “black‑box” that can hallucinate or misinterpret when faced with unfamiliar phrasing. The takeaway? Impressive scores alone don’t prove genuine comprehension. To truly gauge AI intelligence, scientists need diverse, real‑world challenges that test meaning, not just pattern‑matching. Until then, models like Centaur remain powerful tools that still lack the deep, human‑like understanding we hope to achieve.
Read more