思维链

思维链是一种在论文 “思维链提示引发大型语言模型中的推理能力” 中引入的提示技术，通过提示，作者生成一系列中间推理步骤，这提高了 LLM 执行复杂推理的能力。

在本指南中，我们使用 outlines 通过结构化输出应用思维链。

我们使用 llama.cpp，并使用 llama-cpp-python 库。Outlines 支持 llama-cpp-python，但我们需要自己安装它。

pip install llama-cpp-python

通过传递 HuggingFace Hub 上的仓库名称和文件名（或 glob 模式），我们可以下载模型权重。

import llama_cpp
from outlines import generate, models

model = models.llamacpp("NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF",
            "Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf",
            tokenizer=llama_cpp.llama_tokenizer.LlamaHFTokenizer.from_pretrained(
            "NousResearch/Hermes-2-Pro-Llama-3-8B"
            ),
            n_gpu_layers=-1,
            flash_attn=True,
            n_ctx=8192,
            verbose=False)

（可选）将模型权重存储在自定义文件夹中

默认情况下，模型权重会被下载到 hub 缓存，但如果我们想将权重存储在自定义文件夹中，我们会从 HuggingFace 拉取一个由 NousResearch 提供的量化 GGUF 模型 Hermes-2-Pro-Llama-3-8B。

wget https://hugging-face.cn/NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF/resolve/main/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf

我们初始化模型

import llama_cpp
from llama_cpp import Llama
from outlines import generate, models

llm = Llama(
    "/path/to/model/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf",
    tokenizer=llama_cpp.llama_tokenizer.LlamaHFTokenizer.from_pretrained(
        "NousResearch/Hermes-2-Pro-Llama-3-8B"
    ),
    n_gpu_layers=-1,
    flash_attn=True,
    n_ctx=8192,
    verbose=False
)

思维链

我们首先定义用于表示推理步骤的 Pydantic 类

from pydantic import BaseModel, Field

class Reasoning_Step(BaseModel):
    reasoning_step: str = Field(..., description="Reasoning step")

然后我们定义用于推理的 Pydantic 类，它将包含一个推理步骤列表和一个结论，并获取其 JSON schema

from typing import List

class Reasoning(BaseModel):
    reasoning: List[Reasoning_Step] = Field(..., description="List of reasoning steps")
    conclusion: str = Field(..., description="Conclusion")

json_schema = Reasoning.model_json_schema()

我们可以使用 JSON schema 生成响应，但为了变化一下，我们将使用 regex

from outlines.fsm.json_schema import convert_json_schema_to_str
from outlines_core.fsm.json_schema import build_regex_from_schema

schema_str = convert_json_schema_to_str(json_schema=json_schema)
regex_str = build_regex_from_schema(schema_str)

然后我们需要调整我们的提示，以适应 Hermes prompt format for JSON schema

def generate_hermes_prompt(user_prompt):
    return (
        "<|im_start|>system\n"
        "You are a world class AI model who answers questions in JSON "
        f"Here's the json schema you must adhere to:\n<schema>\n{json_schema}\n</schema><|im_end|>\n"
        "<|im_start|>user\n"
        + user_prompt
        + "<|im_end|>"
        + "\n<|im_start|>assistant\n"
        "<schema>"
    )

对于给定的用户提示：

user_prompt = "9.11 and 9.9 -- which is bigger?"

我们可以使用 generate.regex，通过传递我们之前定义的 Pydantic 类，并使用 Hermes 提示调用生成器。

generator = generate.regex(model, regex_str)
prompt = generate_hermes_prompt(user_prompt)
response = generator(prompt, max_tokens=1024, temperature=0, seed=42)

我们获得了一系列中间推理步骤以及结论。

import json

json_response = json.loads(response)

print(json_response["reasoning"])
print(json_response["conclusion"])
# [{'reasoning_step': 'Both 9.11 and 9.9 are decimal numbers.'},
#  {'reasoning_step': 'When comparing decimal numbers, we look at the numbers after the decimal point.'},
#  {'reasoning_step': 'In this case, 9.11 has the number 1 after the decimal point, while 9.9 has the number 9.'},
#  {'reasoning_step': 'Since 1 is greater than 9, 9.11 is greater than 9.9.'}]
# '9.11 is bigger.'

我们注意到第 4 个推理步骤是错误的：“Since 1 is greater than 9, 9.11 is greater than 9.9.”，所以我们或许应该针对这项特定任务给模型一些例子。

这个例子最初由 Alonso Silva 贡献。