Every Step You Take: The Planning Unit as the Heart of Agentic AI
As AI industry has rapidly progressed from “turn-based” AI Q&A chatbots into more agentic AI paradigms, the foundational components of agentic AI are becoming increasingly sophisticated. According to Lilian Weng’s blog, the core components of agentic AI include: 1) memory (both long and short term) to maintain state across operations, 2) a planning unit encompassing reflection, self-critique, chain-of-thought reasoning, and subgoal decomposition to coherently achieve the final objective, 3) an action unit for assessment and decision-making, and 4) tools selection, akin to Python function calls for activating relevant APIs. Let’s focus on the planning unit first.
The planning unit is a critical component of agentic AI, responsible for developing strategies and making decisions that guide the AI towards achieving its goals. Within this unit, different frameworks and concepts from AI search and reasoning are employed. Among these, Q*, PCHS (Phase-Change Heuristic Search), and MCTS (Monte Carlo Tree Search, see also Deepmind’s paper,) stand out as prominent methods within the macro paradigm. These frameworks offer high-level strategies for complex problem-solving, integrating multiple steps and adaptive processes. By contrast, micro-level techniques such as A* Search and Q-Learning focus on specific, well-defined tasks or steps, often employing simpler, more direct methods.
Q* represents a hybrid approach, balancing deterministic heuristic planning with probabilistic Q-value estimation to guide decision-making. It operates within the macro paradigm by leveraging both fine-grained and broader strategies, making it versatile and efficient for tasks like multi-step reasoning, math problems, and code generation. Its adaptability and moderate complexity allow it to bridge the gap between detailed, step-by-step optimization and high-level strategic planning.
PCHS, on the other hand, adopts a deterministic form but a probabilistic soul, dynamically adjusting heuristics across different phases of the search process. This makes PCHS particularly suitable for complex, dynamic problems requiring long-term planning and strategic adjustments. Its structured, phase-based approach allows for high adaptability, though it comes with increased conceptual complexity due to the need for dynamic heuristic changes and phase management.
MCTS is distinguished by its probabilistic approach with a deterministic intent, systematically exploring and exploiting the search space through random simulations. It excels in applications like game playing and strategic planning, where detailed exploration and evaluation within a broad search space are crucial. The extensive research and numerous modifications of MCTS highlight its versatility and empirical success in high-profile applications, though it requires significant computational resources for simulations.
Integrating micro-level techniques like A* Search and Q-Learning into these macro-level frameworks enhances their overall efficiency, adaptability, and scalability. A* Search contributes a structured, heuristic-guided approach that can be used within the broader phases of PCHS and the guided simulations of MCTS. Q-Learning introduces reinforcement learning principles that help dynamically adjust strategies and optimize search policies over time, making both PCHS and MCTS more responsive and effective in changing environments.
As Andrew Ng recently demonstrated, even small language models (LLMs), which are generally considered inferior when compared to their larger counterparts, can outperform larger models in certain benchmarks when equipped with an agentic framework. Ng illustrated this by showing how small LLMs, when integrated with advanced planning, reasoning, and action components, can excel in various tasks. This success is attributed to the enhanced capabilities provided by the agentic AI paradigm, which leverages memory, planning units, action units, and tool selection to optimize performance.
Ng shared several key “toolkits” applied in his presentation to enable small LLMs to achieve superior results. These tools include (by comparing with Zero-Shot Learning as a benchmark) Intervenor for dynamic process adjustments, ANPL for autonomous program generation, Language Agent Tree Search (LATS) for integrating reasoning and planning, CodeT for optimizing coding tasks, MetaGPT for general-purpose applications, LDB for behavior-driven tasks, Reflexion for iterative self-improvement, and Agentcoder for automating coding tasks. These toolkits enhance small LLMs’ functionality and performance by equipping them with necessary components for memory, planning, action, and tool selection.
As Yann Lecun has pointed out, next-generation AI research should shift from large-scale LLMs, as big leading tech corporations (e.g., OpenAI, Google) already possess huge GPU resources that small AI labs or even leading universities cannot afford. The focus should now be on developing small LLMs and enhancing them with agentic AI frameworks. Moreover, the invention of new architectures other than transformer-based models (the core component of both GPT and BERT) is also possible. Similar to how Sepp Hochreiter reinvented LSTM (the advanced version of RNN) with extended memory in his new project, Extended Long Short-Term Memory (xLSTM), to address the shortcomings of attention mechanisms in transformers, he established NXAI with xLSTM as the core building block of a European LLM, aiming for independence from American AI dominance. His work highlights the potential of new architectures, such as Modern Hopfield Networks, which strive to optimize memory fields efficiently.
In conclusion, the evolution of AI from simple Q&A systems to sophisticated agentic paradigms involves a deep integration of both micro and macro-level planning strategies. By understanding and leveraging the strengths of frameworks like Q*, PCHS, and MCTS, along with techniques such as A* Search and Q-Learning, we can develop more intelligent, adaptable, and efficient AI systems capable of tackling a wide range of tasks and environments. This comprehensive approach ensures that AI continues to advance in a manner that is both innovative and practical, meeting the complex demands of modern applications.