The Race to AI Supremacy: Analyzing OpenAI, Google, and Meta’s Strategic Approaches
I’ve long searched for how OpenAI could pave the breakthrough in the AI industry before Google, and it seems difficult to catch up, while Meta with its radical “Open Source” strategy can indeed disrupt the industry, but again, dictated by scaling law, it’s difficult to catch up, as most independent researchers are GPU poor. Although I am aware that modern LLMs can be developed cheaper and better with fewer resources, the scaling law remains a key issue, particularly the data and computational power. This challenge is significant for independent researchers and even university leading labs, as Fei-Fei Li has expressed her concern. This concern led to government subsidies over common GPU projects during the Biden administration, aiming to democratize access to advanced computing resources through initiatives like the National Artificial Intelligence Research Resource (NAIRR). In the past years, I tried to investigate this in the field, on most Hackathons I joined mostly related to Google PaLM 2 and Gemini, as information about the products were rare at the time, part to tackle this puzzle, here is what I found.
OpenAI has distinguished itself with a unique approach to AI development. They focus on reverse engineering human thinking rather than attempting to build a brain-like system from scratch. OpenAI’s philosophy is to investigate the functions and outcomes of human cognition and model their language models to replicate these mechanisms as closely as possible. This approach emphasizes squeezing the most out of the model’s capabilities to mimic human-like reasoning and adaptability. The result is a versatile and intuitive AI that mirrors human cognitive functions, providing a competitive edge in creating more human-like interactions and responses.
On the other hand, Google’s approach is heavily driven by engineering and frontier AI research. Google’s philosophy does not focus on human thinking or brain abstraction. Instead, it relies on strict engineering concepts to explore all possibilities, as well as in NLP (see this evolution,) much like their development of search algorithms. This engineering-driven approach leverages Google’s strength in scaling and optimizing cloud infrastructure, allowing them to develop practical and scalable AI solutions. Their focus on leveraging engineering expertise ensures that their AI solutions are robust, efficient, and highly scalable, making them well-suited for a wide range of applications.
Meta, led by Yann LeCun’s vision, takes a brain-inspired, research-intensive approach. Meta deeply analyzes the brain’s functions and anatomy, aiming to integrate these insights into their AI models, the Llama. They are skeptical of relying solely on LLM scaling and believe that human intervention is needed to build units similar to brain cortex functions. Meta’s focus is on creating a “world model” for AI, combining LLMs with brain-inspired components to enhance understanding and interaction with the real world. This long-term, research-oriented approach aims to develop AI with more profound and integrated cognitive functions, setting Meta apart in its ambition to create truly brain-like AI.
Interestingly, despite Google’s pioneering role in AI, they faced internal challenges that may have slowed their progress. Some suggestions indicate that Google has bureaucratic procedures and siloed approaches, leading to internal disruptions between intellectual factions. Some groups within Google are strongly pro-AI security, while others believe these concerns are overblown and rooted in fantasy. This internal discord might have contributed to Google’s slower response to the rapid advancements made by competitors. OpenAI’s successful launch of ChatGPT on November 30, 2022, caught Google off guard, even though Google had pioneered similar technologies years before.
While Google’s bureaucratic and siloed structure might have hindered its agility in responding to market changes in ways OpenAI has successfully navigated, it’s a challenge that larger organizations inevitably face. OpenAI is beginning to encounter similar issues as it grows, highlighted by their shift in prioritizing business demands over AI (or AGI) safety, leading to the resignation of key figures like Ilya Sutskever. This mirrors the departure of notable AI scholars such as Geoffrey Hinton from Google to advocate for AI safety. Google’s focus on engineering, while prioritized over mimicking human thinking as part of its organizational culture, does not mean there is no such focus within Google at all. Given other factors being equal, we tend to think the underlying strategy of each actor will ultimately dictate the success of AI. The Strawberry project would, once again, make history repeat itself (see inspiring idea, STaR, c.f. this link and this link, aka “chain-of-reason”.)
Ultimately, the foundational strategy of each entity shapes its AI success. OpenAI’s emphasis on replicating human cognitive mechanisms drives its passion for delivering state-of-the-art AI services, whether through its flagship chatbot or API offerings. This approach parallels how Midjourney thrives in image generation by empowering artists to create memorable works, emphasizing AI as a tool to augment human creativity rather than replace it. Meta’s innovative brain-inspired research represents a long-term, research-intensive approach that could potentially revolutionize the field but requires significant time and investment.
PS: We acknowledge Google’s latest AI-all-in strategy to shift the tide by integrating its flagship AI into ‘all products and services,’ including the future generation browser with Gemini Nano free of charge in Chrome Canary. Google dominates the browser market share at around 65%, search engine market share at 91%, and Android at 70%, still making them a viable ‘internet resource’ player. However, it’s also true that they can’t beat OpenAI in the chatbot battlefield. The ‘connectivity architecture’ war is more dynamic and can’t be assessed in the short term horizon.
I don’t understand the “hype” in the AI community speculating that open-source models like the latest Llama 3.1 405B can “close the gap” with leading closed-source models like GPT-4o or Claude 3.5 Sonnet. After spending time reading its paper, “The Llama 3 Herd of Models,” which narrates in great detail over 92 pages how it adopts a straightforward dense architecture over the sparse MoE architecture and aligns optimized scaling in training to fit its limited budget, it is crucial to understand its limitations.
Firstly, this model is not a “multimodal model” (there is a need to clarify the difference between the GPT-4o API and ChatGPT, which includes several agentic abilities in its platform). Furthermore, although it might pass the “strawberry test,” it struggles with our test in decoding Morse code and retrieving some “historical geopolitical inquiries” correctly. These two tests alone, which GPT-4o can pass, significantly reduce our trust in using Llama 3.1 405B. Regarding its smaller model, Llama 3.1 70B, although it can also pass the “strawberry test,” this mechanism is not stable and could provide equally right and wrong answers. However, it is practical to use this model in-house for any corporation that might be sensitive about internal data usage.
Judging from the Llama 3.1 paper, it seems the Meta team thoroughly understands how to develop a state-of-the-art model. However, similar to the Google DeepMind team, which leads in frontier research like word2vec and subsequent transformers, they struggle to produce a “neat AI model” like OpenAI can with its GPT-4o. Not to mention, instead of increasing price and lowering speed like general dense models, OpenAI still provides qualified answers. This means OpenAI has achieved several techniques to produce models that meet very high standards. These techniques might include its signature RHLF (reinforcement learning from human feedback, based heavily on RL research during DotA), scaling laws, early adoption of transformer-GPT models, and inducing knowledge from larger models to smaller models (which are faster and consume less GPU) with techniques like distillation and pruning. There might be much more, but to overreact and say that open-source models can beat closed-source models is overrated.
Attached file: the results of the Morse code decoding test between GPT-4o and Llama 3.1 405B via LMsys. GPT-4o could correctly decode the message, while Llama 3.1 could not.
(From my twitter)