January 10, 2026 - AI pioneer Andrew Ng has introduced a rigorous new benchmark titled the 'Turing-AGI Test' to address the rampant hype and lack of clarity surrounding Artificial General Intelligence (AGI). As tech giants increasingly claim to be on the cusp of achieving human-level intelligence, Ng argues that current metrics, such as static datasets like GPQA or SWE-bench, are insufficient and often misleading because they can be indirectly tuned or 'gamed' by developers. This proposal aims to provide a practical, work-oriented standard that aligns more closely with public expectations of what a truly intelligent system should be able to achieve in a professional environment.
Unlike the original Turing Test, which focused on text-based deception, Ngβs version requires an AI to perform complex, multi-day work tasksβsuch as operating as a call centre agent or a research assistantβusing standard software and internet access. Writing on X, as noted by the Indian Express, Ng explained: "A computer passes the Turing-AGI Test if it can carry out the work task as well as a skilled human." He suggests that this approach prevents developers from relying on narrow slivers of intelligence and instead probes the genuine generality and adaptability of the model's capabilities across diverse, unannounced scenarios.
The debate over AGI definitions has intensified recently, with industry leaders like Demis Hassabis and Yann LeCun clashing over whether intelligence is broadly general or highly specialised. Ngβs intervention is timely, as the industry faces a potential AI bubble driven by over-promising and semantic ambiguity. By shifting the focus to functional utility and professional competence, the Turing-AGI Test could help recalibrate market expectations and guide more realistic investment in machine learning and neural networks. It also addresses the ethical concern of misleading students and CEOs about the imminent arrival of systems that can replace human labour entirely, which Ng argues can lead to harmful investment decisions.
Our view: We welcome this shift toward objective, performance-based evaluation in the AI sector. The term AGI has become so diluted by marketing departments that it has almost lost its scientific utility. By anchoring the definition of intelligence to the ability to execute meaningful, multi-step professional tasks, Ng provides a much-needed reality check for the industry. This focus on 'work-ready' AI encourages the development of systems that provide tangible economic value rather than just impressive, yet narrow, conversational tricks. It is a necessary step toward mature, accountable technology development that prioritises utility over speculative hype.
beFirstComment