跳到内容

生成

一旦构建好 Outlines 模型,就可以使用 outlines.generate 生成文本。标准 LLM 生成可以通过 outlines.generate.text 实现,此外还有各种结构化生成方法(如下所述)。(有关结构化生成工作原理的详细技术解释,请参阅结构化生成解释页面)

在生成文本之前,必须构建一个 outlines.model。示例

import outlines

model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct", device="cuda")

文本生成器

generator = outlines.generate.text(model)

result = generator("Question: What's 2+2? Answer:", max_tokens=100)
print(result)
# The answer is 4

# Outlines also supports streaming output
stream = generator.stream("What's 2+2?", max_tokens=4)
for i in range(5):
    token = next(stream)
    print(repr(token))
# '2'
# '+'
# '2'
# ' equals'
# '4'

多标签分类

Outlines 通过引导模型仅输出指定选项中的一个,来实现多标签分类。

import outlines

model = outlines.models.transformers("microsoft/Phi-3-mini-128k-instruct")
generator = outlines.generate.choice(model, ["Blue", "Red", "Yellow"])

color = generator("What is the closest color to Indigo? ")
print(color)
# Blue

JSON 结构化生成

Outlines 可以引导模型,使其 100% 输出有效的 JSON。您可以使用 Pydantic 或包含 JSON Schema 的字符串来指定结构。

from enum import Enum
from pydantic import BaseModel, constr, conint

import outlines

class Armor(str, Enum):
    leather = "leather"
    chainmail = "chainmail"
    plate = "plate"


class Character(BaseModel):
    name: constr(max_length=10)
    age: conint(gt=18, lt=99)
    armor: Armor
    strength: conint(gt=1, lt=100)

model = outlines.models.transformers("microsoft/Phi-3-mini-128k-instruct")
generator = outlines.generate.json(model, Character)

character = generator(
    "Generate a new character for my awesome game: "
    + "name, age (between 1 and 99), armor and strength. "
    )
print(character)
# name='Orla' age=21 armor=<Armor.plate: 'plate'> strength=8
import outlines

schema = """{
    "$defs": {
        "Armor": {
            "enum": ["leather", "chainmail", "plate"],
            "title": "Armor",
            "type": "string"
        }
    },
    "properties": {
        "name": {"maxLength": 10, "title": "Name", "type": "string"},
        "age": {"title": "Age", "type": "integer"},
        "armor": {"$ref": "#/$defs/Armor"},
        "strength": {"title": "Strength", "type": "integer"}\
    },
    "required": ["name", "age", "armor", "strength"],
    "title": "Character",
    "type": "object"
}"""

model = outlines.models.transformers("microsoft/Phi-3-mini-128k-instruct")
generator = outlines.generate.json(model, schema)
character = generator(
    "Generate a new character for my awesome game: "
    + "name, age (between 1 and 99), armor and strength. "
    )
print(character)
# {'name': 'Yuki', 'age': 24, 'armor': 'plate', 'strength': 3}

注意

我们建议您在首次测试 schema 时限制字符串字段的长度,特别是对于小型模型。

文法结构化生成

Outlines 还允许生成符合任何 上下文无关文法 (CFG) 的文本,采用 EBNF 格式。文法可能令人望而生畏,但它们是非常强大的工具!事实上,它们决定了每种编程语言的语法、有效的棋局走法、分子结构,还能辅助程序图形生成等。

这里展示一个定义算术运算的简单文法示例

from outlines import models, generate

arithmetic_grammar = """
    ?start: sum

    ?sum: product
        | sum "+" product   -> add
        | sum "-" product   -> sub

    ?product: atom
        | product "*" atom  -> mul
        | product "/" atom  -> div

    ?atom: NUMBER           -> number
         | "-" atom         -> neg
         | "(" sum ")"

    %import common.NUMBER
    %import common.WS_INLINE

    %ignore WS_INLINE
"""

model = models.transformers("microsoft/Phi-3-mini-128k-instruct")
generator = generate.cfg(model, arithmetic_grammar, max_tokens=100)

result = generator("Question: How can you write 5*5 using addition?\nAnswer:")
print(result)
# 5+5+5+5+5

EBNF 文法编写起来可能很麻烦。因此,Outlines 在 outlines.grammars. 模块中提供了文法定义

from outlines import models, generate, grammars

model = models.transformers("microsoft/Phi-3-mini-128k-instruct")
generator = generate.cfg(model, grammars.arithmetic, max_tokens=100)

result = generator("Question: How can you write 5*5 using addition?\nAnswer:")
print(result)
# 5+5+5+5+5

可用的文法列于此处

正则表达式结构化生成

Outlines 还可以生成属于 正则表达式 语言的文本,这种方式稍微简单一些,但同样有用。例如,强制模型生成 IP 地址

from outlines import models, generate

model = models.transformers("microsoft/Phi-3-mini-128k-instruct")

regex_str = r"((25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(25[0-5]|2[0-4]\d|[01]?\d\d?)"
generator = generate.regex(model, regex_str)

result = generator("What is the IP address of localhost?\nIP: ")
print(result)
# 127.0.0.100

生成给定的 Python 类型

对于简单的用例,我们提供了正则表达式结构化生成的快捷方式。将一个 Python 类型传递给 outlines.generate.format 函数,LLM 将输出与该类型匹配的文本

from outlines import models, generate

model = models.transformers("microsoft/Phi-3-mini-128k-instruct")
generator = generate.format(model, int)

result = generator("What is 2+2?")
print(result)
# 4