- One Cerebral
- Posts
- Will LLMs actually lead to AGI?
Will LLMs actually lead to AGI?
Francois Chollet (Ex Google, AI Researcher) & Mike Snoop (Co-Founder, Head of AI at Zapier)
Credit and Thanks:
Based on insights from Dwarkesh Patel.
Today’s Podcast Host: Dwarkesh Patel
Title
LLMs won’t lead to AGI - $1,000,000 Prize to find true solution
Guests
Francois Chollet
Mike Knoop
Guest Credentials
François Chollet is a prominent figure in artificial intelligence, best known as the creator of the Keras deep learning library. He worked as a Senior Staff Engineer at Google for over 9 years, focusing on deep learning research and development of Keras and TensorFlow. Chollet holds a Master of Engineering from ENSTA Paris and has made significant contributions to the field, including publishing influential papers and books on deep learning. He was also mention in TIME's list of the 100 most influential people in AI.
Mike Knoop is a co-founder of Zapier and currently serves as the Head of Zapier AI. He has been with the company since its inception in 2011, holding various leadership roles including President and Head of Labs. Prior to Zapier, Knoop worked as a Graduate Researcher at the University of Missouri, where he also earned his Bachelor of Science in Mechanical Engineering. Zapier's success under his leadership, with the company reaching $150 million in revenue and over 10 million users.
Podcast Duration
1:34:39
This Newsletter Read Time
Approx. 6 mins
Brief Summary
François Chollet, an AI researcher, ex Google and creator of Keras, discusses the challenges and implications of achieving artificial general intelligence (AGI) with Dwarkesh Patel. They delve into the limitations of large language models (LLMs) in solving the abstraction and reasoning corpus (ARC) benchmark (also introduced by Chollet), which is designed to measure AI's ability to generalize and solve novel problems. The conversation highlights the need for innovative approaches to advance AI research and the importance of core knowledge in developing intelligent systems.
Deep Dive
François Chollet and Mike Knoop explored the intricacies of the ARC benchmark, a novel test designed to evaluate machine intelligence in a way that resists the pitfalls of memorization. Unlike traditional benchmarks that often allow models to rely on rote learning, ARC requires a deeper understanding and the ability to adapt to new, unseen challenges. Each puzzle within the ARC framework is crafted to be novel, meaning that even if a model has access to vast amounts of data, it cannot simply recall a solution; it must synthesize a response based on core knowledge, akin to what a young child might possess.
He expresses skepticism about the ability of large language models (LLMs) to lead to artificial general intelligence (AGI). He argues that while LLMs can excel at tasks involving memorization and pattern recognition, they fundamentally lack the ability to adapt to novel situations on the fly, which is a critical component of true intelligence. Chollet emphasizes that for a model to be considered on the path to AGI, it would need to demonstrate a capacity for genuine reasoning and the ability to synthesize new solutions based on core knowledge, rather than relying solely on memorized information. This is because the models are essentially large interpolative memories, and their performance is heavily reliant on the data they have been trained on. They lack the ability to generate new solutions on the fly.
Chollet points out that if a future LLM were to achieve a high score on the ARC benchmark, it would be essential to analyze how that score was obtained. If the model simply trained on a vast number of similar tasks, it would still be relying on memorization rather than demonstrating true intelligence. Chollet believes that the current focus on LLMs has diverted attention and resources away from exploring other innovative approaches that could contribute to the development of AGI, suggesting that a hybrid model combining deep learning with discrete program synthesis may be necessary to advance toward genuine machine intelligence
The discussion also touched on the implications of achieving artificial general intelligence (AGI) and whether it is necessary for automating most jobs. Chollet suggested that while AGI would undoubtedly enhance automation capabilities, many tasks could be automated without reaching that level of intelligence. The current trajectory of AI development, particularly with LLMs, indicates that while they can perform specific tasks efficiently, they do not possess the broader cognitive abilities that characterize human intelligence.
Knoop shared his personal journey of becoming fascinated by the ARC benchmark, describing how he initially encountered Chollet's work during the COVID-19 pandemic. This led him to delve deeper into the concept of AGI and the challenges posed by the ARC puzzles. His enthusiasm culminated in the launch of the million-dollar ARC Prize, aimed at incentivizing researchers to tackle the benchmark and explore new methodologies that could push the boundaries of AI capabilities.
The ARC Prize is not just a financial incentive; it represents a critical opportunity to resist benchmark saturation, a phenomenon where models become overly specialized in solving specific tasks. Chollet expressed concern that the current focus on LLMs has led to a stagnation in innovative research directions. By encouraging diverse approaches to solving ARC, the prize aims to foster a more open and collaborative research environment, reminiscent of the early days of AI development when sharing ideas and methodologies was commonplace.
As the conversation progressed, they examined the performance of frontier models compared to open-source alternatives on the ARC benchmark. Chollet noted that while some frontier models have achieved impressive results, they often rely on extensive pre-training on similar tasks, which can lead to overfitting. In contrast, open-source models, which may not have the same level of resources, can still demonstrate significant potential if they leverage innovative techniques such as test-time fine-tuning.
Possible solutions to the challenges posed by the ARC Prize include a hybrid approach that combines the strengths of deep learning with discrete program synthesis. This would allow models to not only memorize patterns but also to generate new solutions based on a limited set of examples. Chollet emphasized that the future of AI progress lies in merging these two paradigms, creating systems that can adapt and learn efficiently in the face of novel challenges.
In summary, the dialogue between Chollet and Knoop illuminated the complexities of AI development, particularly in the context of the ARC benchmark. Their insights underscore the importance of fostering innovation and collaboration in the field, as well as the need to redefine our understanding of intelligence in machines. The ARC Prize stands as a testament to this vision, inviting researchers to explore uncharted territories in the quest for true machine intelligence.
Key Takeaways
The ARC benchmark is designed to test machine intelligence by requiring adaptability and core knowledge rather than memorization.
Current LLMs struggle with tasks that demand novel problem-solving capabilities, highlighting their limitations in achieving AGI.
The shift towards LLMs in AI research may be hindering progress in other innovative areas, necessitating a more open research environment.
Actionable Insights
Researchers should explore hybrid models that integrate deep learning with discrete program synthesis to enhance problem-solving capabilities in AI.
AI practitioners can focus on developing benchmarks that prioritize adaptability and core knowledge to better assess machine intelligence.
Organizations should advocate for open research practices to encourage collaboration and innovation in the AI community.
Why it’s Important
The discussion emphasizes the critical need for a paradigm shift in AI research methodologies. By recognizing the limitations of current models and the importance of adaptability, the field can move closer to achieving AGI. This understanding is vital for developing systems that can genuinely learn and adapt, rather than merely regurgitating learned information.
What it Means for Thought Leaders
Thought leaders must understand the critical understanding of the limitations of current AI models, particularly large language models (LLMs), in achieving true artificial general intelligence (AGI). By emphasizing the importance of adaptability and core knowledge, leaders can better navigate the evolving landscape of AI research and development. Furthermore, the introduction of the ARC Prize highlights the necessity of collaboration and open research, urging leaders to foster environments that prioritize knowledge sharing and interdisciplinary approaches. This perspective not only informs strategic decision-making but also positions thought leaders at the forefront of advancing AI technology in meaningful ways.
Mind Map

Key Quote
"General intelligence is the ability to approach any problem, any skill, and very quickly master it using very little data."
Future Trends & Predictions
As the AI landscape evolves, there is likely to be a growing emphasis on developing models that can adapt to new situations without extensive retraining. This could lead to the emergence of more sophisticated AI systems capable of genuine reasoning and problem-solving. Additionally, the push for open research may foster a resurgence of innovative methodologies, potentially accelerating the path toward AGI. The integration of multimodal models that can process and understand various types of data may also become a focal point in future AI developments.
Check out the podcast here:
Latest in AI
1. Tech companies are rapidly developing AI shopping agents to automate online purchasing, enhancing convenience for consumers. These agents can browse multiple retail websites, compare prices, and complete transactions, significantly streamlining the shopping experience. For instance, Perplexity has launched an AI shopping assistant that allows users to issue simple commands to find and purchase products directly. As competition heats up, major players like OpenAI and Google are also entering the space, aiming to redefine how consumers engage with e-commerce through personalized and efficient AI-driven solutions.
2. Salesforce CEO Marc Benioff envisions a future where humans collaborate with AI agents and robots, emphasizing the concept of "digital labor" in the workplace. During a recent earnings call, he highlighted that companies are already integrating AI agents to enhance productivity and streamline operations, positioning Salesforce as a key provider of these technologies. Benioff believes that rather than replacing jobs, AI will augment human capabilities, allowing workers to focus on more complex tasks while automation handles routine responsibilities.
3. Google's attempts to keep AI out of the search trial remedies are facing significant challenges as the Department of Justice targets the company's position in generative AI. The DOJ is considering remedies that include requiring Google to allow websites to opt out of training or appearing in AI products, and potentially sharing data and models used for AI-assisted search features with competitors. These proposed remedies aim to prevent Google from leveraging its search monopoly into emerging AI markets, reflecting concerns about the tech giant's ability to control the AI ecosystem.
Useful AI Tools
1. Fine tune LoRA-SB, which is a new method that brings full fine-tuning performance to low-rank adapters for large language models.
2. Researchers introduced a new method called Diversity-driven EarlyLate Training (DELT) to improve dataset distillation for large-scale tasks.
3. Chat-logger is a nice simple utility that allows you to create a JSONL dataset based on the messages sent back and forth between user and API.
Startup World
1. Protect AI, a US-based startup focusing on AI Security Posture Management, has enhanced its capabilities by acquiring SydeLabs, enabling automation simulation of attacks against GenAI systems. The company closed a $60 million Series B funding round in August, bringing its total funding to $108.5 million. Protect AI aims to provide greater visibility, management, and security for AI/ML environments, along with remediation and governance capabilities.
2. WitnessAI raised $27.5 million in Series A funding in May and recently added retired U.S. Army General Paul M. Nakasone to its board of directors. WitnessAI, a US startup founded in 2023, launched its Secure AI Enablement Platform in October, offering security and governance guardrails for third-party GenAI applications. The platform provides unified policy control, shadow IT visibility, and controls for first-party apps powered by Large Language Models.
3. Magnetar, a US-based hedge fund, has launched its first venture capital fund, Magnetar AI Ventures, with a substantial commitment of $235 million aimed at investing in early to growth-stage AI companies. The fund will focus on various sectors within the AI stack, including models, infrastructure, and applications, and is designed to provide portfolio companies with access to high-performance computing resources in exchange for equity.
Analogy
Traditional AI benchmarks are like exams where students can ace the test by studying past papers. The ARC benchmark, however, is more like a riddle contest: each puzzle is fresh and unfamiliar, demanding genuine reasoning and adaptability. Just as you can't memorize your way to solving a novel riddle, AI models must synthesize solutions from fundamental principles, not pre-learned patterns, to truly demonstrate intelligence.
Thanks for reading, have a lovely day!
Jiten-One Cerebral
All summaries are based on publicly available content from podcasts. One Cerebral provides complementary insights and encourages readers to support the original creators by engaging directly with their work; by listening, liking, commenting or subscribing.
Reply