Adaptive Intelligence Framework (AIF)
Coined this term by Kan Yuenyong (http://www.twitter.com/sikkha), this is for coining and definition work for the term AIF, based on the forthcoming paper, “Adaptive Intelligence Framework (AIF): Enhancing Human-AI Interaction through the LLM Equation”
Definition: The Adaptive Intelligence Framework (AIF) is a model for understanding the relationship between artificial intelligence (AI) systems and human cognition. It comprises three key components: LLM (thoughts, mental models, or algorithms), RR (translation of these into actions), and R (external reality). AIF emphasizes the continuous feedback loop between these components, allowing for ongoing adaptation and learning. This cyclical nature of interactions fosters improved decision-making and better alignment with external reality over time. Despite differences in reasoning processes between AI and humans, AIF allows for meaningful comparisons and highlights the potential for mutual learning and collaboration. The framework can be applied to various domains, and encourages collaboration between AI systems and humans. AIF offers a comprehensive model for understanding and comparing AI algorithms and human cognition, promoting adaptability and continuous learning.
คำนิยาม: Adaptive Intelligence Framework (AIF) หรือ “แบบจำลองปัญญาที่สามารถปรับตัวได้” เป็นแบบจำลองที่ใช้เพื่อทำความเข้าใจความสัมพันธ์ระหว่างระบบปัญญาประดิษฐ์ (AI) และการรับรู้ของมนุษย์ โดยประกอบด้วยองค์ประกอบสำคัญ 3 อย่างคือ LLM (ความคิดเห็น แบบจำลองจิตวิทยา หรืออัลกอริทึม) RR (การแปรความคิดเหล่านี้เป็นการกระทำ) และ R (ความเป็นจริงภายนอก) ตัวแบบ AIF เน้นย้ำถึงวงจรป้อนกลับอย่างต่อเนื่องระหว่างองค์ประกอบเหล่านี้ เพื่อให้สามารถปรับตัวและเรียนรู้ได้อย่างต่อเนื่อง ลักษณะของการโต้ตอบที่เป็นวัฏจักรนี้ช่วยเพิ่มขีดความสามารถการตัดสินใจที่ดีขึ้น ทำให้สามารถปรับตัวให้เหมาะสมกับสิ่งแวดล้อมภายนอกได้ในระยะยาว แม้จะมีความแตกต่างในกระบวนการให้เหตุผลระหว่าง AI และมนุษย์ แต่ตัวแบบ AIF จะช่วยให้เราสามารถเปรียบเทียบกระบวนการให้เหตุผลที่ต่างกันนั้นได้อย่างมีนัยสำคัญและเน้นการเรียนรู้และการทำงานร่วมกันระหว่าง AI และมนุษย์ในหลายๆ ด้าน ตัวแบบ AIF นำเสนอโมเดลที่ครอบคลุมสำหรับการทำความเข้าใจและเปรียบเทียบอัลกอริธึม AI และความรู้ความเข้าใจของมนุษย์ เพิ่มพูนขีดความสามารถในการปรับตัวและการเรียนรู้อย่างต่อเนื่อง
See further on Cybernetics and Triarchic Theory of Intelligence.
The Triarchic Theory of Intelligence was developed by Robert J. Sternberg and is a theory that helps us understand what intelligence is and how it works. The theory suggests that intelligence is made up of three parts: analytical intelligence, creative intelligence, and practical intelligence.
Analytical intelligence is what we typically think of when we talk about intelligence — it’s the ability to solve problems, think critically, and make sense of complex information. This is the kind of intelligence that is measured by IQ tests.
Creative intelligence, on the other hand, is the ability to come up with new and innovative ideas, to think outside the box, and to see things in new and different ways. This kind of intelligence is important for things like artistic and scientific creativity.
Finally, practical intelligence is the ability to apply what we know to real-world situations, to adapt to new situations, and to solve problems in everyday life. This kind of intelligence is important for things like social skills, common sense, and street smarts.
According to the Triarchic Theory, people who are intelligent in one area may not necessarily be intelligent in the others. For example, someone who is highly analytical may not be very creative or practical. However, people who are strong in all three areas are considered to be highly intelligent.
It’s important to note that intelligence can be developed and improved over time with practice and learning. So, if you want to improve your intelligence, it’s important to work on all three areas: analytical, creative, and practical.
Updated version when blending with Deep TAMER
Definition: Deep TAMER (Training an Agent Manually via Evaluative Reinforcement) is a machine learning algorithm that enables agents to learn from human feedback to improve their policies for interacting with an environment. The algorithm consists of several components, including an agent that observes the state of the environment and executes actions based on a policy, a trainer that provides feedback in the form of reward signals, and a memory that stores transitions for experience replay. The algorithm uses stochastic gradient descent (SGD) to update its policy based on the feedback it receives. Compared to autonomous learning algorithms, Deep TAMER dramatically reduces the number of episodes required to learn a good policy, making it suitable for domains where high-cost learning trials are not feasible.
The Deep TAMER algorithm is an extension of the original TAMER (Training an Agent Manually via Evaluative Reinforcement) framework, and it includes several improvements over the original algorithm. One of the main improvements is that Deep TAMER uses deep neural networks to model the human’s reinforcement function, whereas the original TAMER used linear models. This allows Deep TAMER to learn more complex and nuanced reinforcement functions, which can lead to better performance in complex environments. Another improvement is that Deep TAMER uses experience replay to improve sample efficiency. Experience replay involves storing transitions in a memory buffer and randomly sampling from this buffer during training, which can reduce the variance of updates and improve convergence. Finally, Deep TAMER includes several other enhancements such as prioritized experience replay, double Q-learning, and dueling architectures, which have been shown to improve performance in deep reinforcement learning tasks. Overall, these improvements make Deep TAMER a more powerful and flexible framework for training agents with human feedback than the original TAMER algorithm.
This code creates a directed graph with three main components: LLM (Thoughts, Mental Models, or Algorithms), R (External Reality), and the Deep TAMER Algorithm. The LLM and R components are represented similarly to the original Graphviz diagram, with LLM translating mental models into actions and receiving feedback from R. The Deep TAMER Algorithm is embedded within the LLM component, with its various components (Agent, Environment, Memory, Trainer, and Hk_1) connected to each other as in the original algorithm. The edges between the components represent the flow of information and control in the blended concept. For example, feedback from R is used to update mental models in LLM, and the learned policy from the Deep TAMER Algorithm is used to interact with the environment in R. The dashed edges represent the interaction between the components and external reality, while the solid edges represent the internal workings of the algorithm. Note that this is just one possible way to blend the two concepts together, and there are many other ways to visualize their relationship.
In the context of the Deep TAMER algorithm, `Hk_1` refers to the updated policy that is generated by the trainer component during a single iteration of stochastic gradient descent (SGD). The trainer component samples a mini-batch of transitions from the memory component and uses them to compute an update to the current policy. This update is denoted as `ΔHk` in the algorithm, and it is added to the current policy `Hk` to obtain an updated policy `Hk+1`. The updated policy `Hk+1` is then sent back to the agent component for use in interacting with the environment. So, `Hk_1` specifically refers to the updated policy that results from applying one iteration of SGD. It represents a slightly improved version of the previous policy, based on feedback received from the environment through human reinforcement.
References (TAMER, founding idea in Reinforcement Learning from Human Feedback (RLHF):
- Knox and Stone (2009), Interactively shaping agents via human reinforcement: the TAMER framework
- Warnell, Waytowich, Lawhern, and Stone (2017), Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces
- Lamber, Castricato, and von Weera (2022), Illustrating Reinforcement Learning from Human Feedback (RLHF) at Hugging Face.
- Goldman, VentureBeet (2022), Deepmind isn’t deploying Chatbot, Sparrow, due to concerning on ethics.
- Chaumond (2020), How to train a new language model from scratch using Transformers and Tokenizers
- OpenAI’s Gymnasium
- OpenAI’s Universe Starter Agent
More information:
- Explainable AI: [source]
- Human-AI Interaction (HAII): [source] [google scholar]
- AI paper tracking: [source]
- How to open Sakamoto’s Bitcoin whitepaper on Mac: open /System/Library/Image\ Capture/Devices/VirtualScanner.app/Contents/Resources/simpledoc.pdf
./end./