ReAct Agent

本示例展示了如何使用 outlines 构建您自己的 agent，该 agent 使用开放权重本地模型并能生成结构化输出。它受到 Simon Willison 的博客文章 A simple Python implementation of the ReAct pattern for LLMs 的启发。

ReAct 模式（Reason+Act 的缩写，意为“推理+行动”）在论文 ReAct: Synergizing Reasoning and Acting in Language Models 中有详细描述。在这种模式下，您可以实现 LLM 可以执行的额外行动，例如搜索维基百科或进行计算，然后教导 LLM 如何请求执行这些行动，并将结果反馈给 LLM。

此外，我们还赋予 LLM 使用草稿本的能力，这在论文 Show Your Work: Scratchpads for Intermediate Computation with Language Models 中有所描述，它提高了 LLM 执行多步计算的能力。

我们使用 llama.cpp 及其 llama-cpp-python 库。Outlines 支持 llama-cpp-python，但我们需要自行安装它。

pip install llama-cpp-python

我们通过传递 HuggingFace Hub 上的仓库名称以及文件名（或通配符模式）来下载模型权重。

import llama_cpp
from outlines import generate, models

model = models.llamacpp("NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF",
            "Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf",
            tokenizer=llama_cpp.llama_tokenizer.LlamaHFTokenizer.from_pretrained(
            "NousResearch/Hermes-2-Pro-Llama-3-8B"
            ),
            n_gpu_layers=-1,
            flash_attn=True,
            n_ctx=8192,
            verbose=False)

(可选) 将模型权重存储在自定义文件夹中

默认情况下，模型权重会下载到 hub 缓存中，但如果想将权重存储在自定义文件夹中，我们可以从 HuggingFace 上拉取 NousResearch 的量化 GGUF 模型 Hermes-2-Pro-Llama-3-8B。

wget https://hugging-face.cn/NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF/resolve/main/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf

我们初始化模型

import llama_cpp
from llama_cpp import Llama
from outlines import generate, models

llm = Llama(
    "/path/to/model/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf",
    tokenizer=llama_cpp.llama_tokenizer.LlamaHFTokenizer.from_pretrained(
        "NousResearch/Hermes-2-Pro-Llama-3-8B"
    ),
    n_gpu_layers=-1,
    flash_attn=True,
    n_ctx=8192,
    verbose=False
)

构建 ReAct Agent

在本示例中，我们使用两个工具

wikipedia: \<search term> - 搜索维基百科并返回第一个结果的摘要
calculate: \<expression> - 使用 Python 的 eval() 函数评估表达式

import httpx

def wikipedia(q):
    return httpx.get("https://en.wikipedia.org/w/api.php", params={
        "action": "query",
        "list": "search",
        "srsearch": q,
        "format": "json"
    }).json()["query"]["search"][0]["snippet"]


def calculate(numexp):
    return eval(numexp)

我们通过 Pydantic 类定义 agent 的逻辑。首先，我们希望 LLM 只能在这两个预先定义的工具之间做出选择

from enum import Enum

class Action(str, Enum):
    wikipedia = "wikipedia"
    calculate = "calculate"

我们的 agent 将循环执行 Thought (思考) 和 Action (行动)。我们明确指定 Action Input 字段，以便它不会忘记添加 Action 的参数。我们还添加了一个草稿本（可选）。

from pydantic import BaseModel, Field

class Reason_and_Act(BaseModel):
    Scratchpad: str = Field(..., description="Information from the Observation useful to answer the question")
    Thought: str = Field(..., description="It describes your thoughts about the question you have been asked")
    Action: Action
    Action_Input: str = Field(..., description="The arguments of the Action.")

我们的 agent 将得出 Final Answer (最终答案)。我们还添加了一个草稿本（可选）。

class Final_Answer(BaseModel):
    Scratchpad: str = Field(..., description="Information from the Observation useful to answer the question")
    Final_Answer: str = Field(..., description="Answer to the question grounded on the Observation")

我们的 agent 将自行决定何时得出 Final Answer，从而停止 Thought 和 Action 的循环。

from typing import Union

class Decision(BaseModel):
    Decision: Union[Reason_and_Act, Final_Answer]

我们可以使用 json schema 生成响应，但我们将使用 regex 并检查一切是否按预期工作

from outlines.fsm.json_schema import convert_json_schema_to_str
from outlines_core.fsm.json_schema import build_regex_from_schema

json_schema = Decision.model_json_schema()
schema_str = convert_json_schema_to_str(json_schema=json_schema)
regex_str = build_regex_from_schema(schema_str)
print(regex_str)
# '\\{[ ]?"Decision"[ ]?:[ ]?(\\{[ ]?"Scratchpad"[ ]?:[ ]?"([^"\\\\\\x00-\\x1F\\x7F-\\x9F]|\\\\["\\\\])*"[ ]?,[ ]?"Thought"[ ]?:[ ]?"([^"\\\\\\x00-\\x1F\\x7F-\\x9F]|\\\\["\\\\])*"[ ]?,[ ]?"Action"[ ]?:[ ]?("wikipedia"|"calculate")[ ]?,[ ]?"Action_Input"[ ]?:[ ]?"([^"\\\\\\x00-\\x1F\\x7F-\\x9F]|\\\\["\\\\])*"[ ]?\\}|\\{[ ]?"Scratchpad"[ ]?:[ ]?"([^"\\\\\\x00-\\x1F\\x7F-\\x9F]|\\\\["\\\\])*"[ ]?,[ ]?"Final_Answer"[ ]?:[ ]?"([^"\\\\\\x00-\\x1F\\x7F-\\x9F]|\\\\["\\\\])*"[ ]?\\})[ ]?\\}'

然后我们需要将我们的提示词调整为 Hermes 的 JSON schema 提示词格式，并解释 agent 的逻辑

import datetime

def generate_hermes_prompt(question, schema=""):
    return (
        "<|im_start|>system\n"
        "You are a world class AI model who answers questions in JSON with correct Pydantic schema. "
        f"Here's the json schema you must adhere to:\n<schema>\n{schema}\n</schema>\n"
        "Today is " + datetime.datetime.today().strftime('%Y-%m-%d') + ".\n" +
        "You run in a loop of Scratchpad, Thought, Action, Action Input, PAUSE, Observation. "
        "At the end of the loop you output a Final Answer. "
        "Use Scratchpad to store the information from the Observation useful to answer the question "
        "Use Thought to describe your thoughts about the question you have been asked "
        "and reflect carefully about the Observation if it exists. "
        "Use Action to run one of the actions available to you. "
        "Use Action Input to input the arguments of the selected action - then return PAUSE. "
        "Observation will be the result of running those actions. "
        "Your available actions are:\n"
        "calculate:\n"
        "e.g. calulate: 4**2 / 3\n"
        "Runs a calculation and returns the number - uses Python so be sure to use floating point syntax if necessary\n"
        "wikipedia:\n"
        "e.g. wikipedia: Django\n"
        "Returns a summary from searching Wikipedia\n"
        "DO NOT TRY TO GUESS THE ANSWER. Begin! <|im_end|>"
        "\n<|im_start|>user\n" + question + "<|im_end|>"
        "\n<|im_start|>assistant\n"
    )

我们定义一个 ChatBot 类

class ChatBot:
    def __init__(self, prompt=""):
        self.prompt = prompt

    def __call__(self, user_prompt):
        self.prompt += user_prompt
        result = self.execute()
        return result

    def execute(self):
        generator = generate.regex(model, regex_str)
        result = generator(self.prompt, max_tokens=1024, temperature=0, seed=42)
        return result

我们定义一个查询函数

import json

def query(question, max_turns=5):
    i = 0
    next_prompt = (
        "\n<|im_start|>user\n" + question + "<|im_end|>"
        "\n<|im_start|>assistant\n"
    )
    previous_actions = []
    while i < max_turns:
        i += 1
        prompt = generate_hermes_prompt(question=question, schema=Decision.model_json_schema())
        bot = ChatBot(prompt=prompt)
        result = bot(next_prompt)
        json_result = json.loads(result)['Decision']
        if "Final_Answer" not in list(json_result.keys()):
            scratchpad = json_result['Scratchpad'] if i == 0 else ""
            thought = json_result['Thought']
            action = json_result['Action']
            action_input = json_result['Action_Input']
            print(f"\x1b[34m Scratchpad: {scratchpad} \x1b[0m")
            print(f"\x1b[34m Thought: {thought} \x1b[0m")
            print(f"\x1b[36m  -- running {action}: {str(action_input)}\x1b[0m")
            if action + ": " + str(action_input) in previous_actions:
                observation = "You already run that action. **TRY A DIFFERENT ACTION INPUT.**"
            else:
                if action=="calculate":
                    try:
                        observation = eval(str(action_input))
                    except Exception as e:
                        observation = f"{e}"
                elif action=="wikipedia":
                    try:
                        observation = wikipedia(str(action_input))
                    except Exception as e:
                        observation = f"{e}"
            print()
            print(f"\x1b[33m Observation: {observation} \x1b[0m")
            print()
            previous_actions.append(action + ": " + str(action_input))
            next_prompt += (
                "\nScratchpad: " + scratchpad +
                "\nThought: " + thought +
                "\nAction: " + action  +
                "\nAction Input: " + action_input +
                "\nObservation: " + str(observation)
            )
        else:
            scratchpad = json_result["Scratchpad"]
            final_answer = json_result["Final_Answer"]
            print(f"\x1b[34m Scratchpad: {scratchpad} \x1b[0m")
            print(f"\x1b[34m Final Answer: {final_answer} \x1b[0m")
            return final_answer
    print(f"\nFinal Answer: I am sorry, but I am unable to answer your question. Please provide more information or a different question.")
    return "No answer found"

现在我们可以测试我们的 ReAct agent 了

print(query("What's 2 to the power of 10?"))
# Scratchpad:
# Thought: I need to perform a mathematical calculation to find the result of 2 to the power of 10.
#  -- running calculate: 2**10
#
# Observation: 1024
#
# Scratchpad: 2 to the power of 10 is 1024.
# Final Answer: 2 to the power of 10 is 1024.
# 2 to the power of 10 is 1024.

print(query("What does England share borders with?"))
# Scratchpad:
# Thought: To answer this question, I will use the 'wikipedia' action to gather information about England's geographical location and its borders.
#  -- running wikipedia: England borders
#
# Observation: Anglo-Scottish <span class="searchmatch">border</span> (Scottish Gaelic: Crìochan Anglo-Albannach) is an internal <span class="searchmatch">border</span> of the United Kingdom separating Scotland and <span class="searchmatch">England</span> which runs for
#
# Scratchpad: Anglo-Scottish border (Scottish Gaelic: Crìochan Anglo-Albannach) is an internal border of the United Kingdom separating Scotland and England which runs for
# Final Answer: England shares a border with Scotland.
# England shares a border with Scotland.

正如 Simon 的博客文章中提到的，这远非一个非常鲁棒的实现，还有很大的改进空间。但用几行 Python 代码就能让 LLM 拥有这些额外的能力，这真是太棒了。现在，您可以使用开放权重 LLM 在本地运行它了。

本示例最初由 Alonso Silva 贡献。