使用 Modal 运行 Outlines

Modal 是一个无服务器平台，允许您轻松在云端运行代码，包括使用 GPU。对于那些家里没有强大的 GPU，但又想快速轻松地配置和协调云基础设施的人来说，它非常方便。

本指南将向您展示如何使用 Modal 在云端使用 GPU 运行使用 Outlines 编写的程序。

要求

我们建议在虚拟环境中安装 modal 和 outlines。您可以使用以下命令创建一个虚拟环境：

python -m venv venv
source venv/bin/activate

然后安装所需的包

pip install modal outlines

构建镜像

首先，我们需要定义容器镜像。如果您需要访问受限模型，则需要提供一个访问令牌。请参阅下面的 .env 调用，了解如何提供 HuggingFace 令牌。

设置令牌的最佳方式是设置一个名为 HF_TOKEN 的环境变量，其值为您的令牌。如果您不想这样做，我们在代码中提供了一行注释掉的代码，可以直接在代码中设置令牌。

from modal import Image, App, gpu
import os

# This creates a modal App object. Here we set the name to "outlines-app".
# There are other optional parameters like modal secrets, schedules, etc.
# See the documentation here: https://modal.com/docs/reference/modal.App
app = App(name="outlines-app")

# Specify a language model to use.
# Another good model to use is "NousResearch/Hermes-2-Pro-Mistral-7B"
language_model = "mistral-community/Mistral-7B-v0.2"

# Please set an environment variable HF_TOKEN with your Hugging Face API token.
# The code below (the .env({...}) part) will copy the token from your local
# environment to the container.
# More info on Image here: https://modal.com/docs/reference/modal.Image
outlines_image = Image.debian_slim(python_version="3.11").pip_install(
    "outlines",
    "transformers",
    "datasets",
    "accelerate",
    "sentencepiece",
).env({
    # This will pull in your HF_TOKEN environment variable if you have one.
    'HF_TOKEN':os.environ['HF_TOKEN']

    # To set the token directly in the code, uncomment the line below and replace
    # 'YOUR_TOKEN' with the HuggingFace access token.
    # 'HF_TOKEN':'YOUR_TOKEN'
})

设置容器

运行较长的 Modal 应用程序时，建议在容器启动时下载语言模型，而不是在函数调用时下载。这将缓存模型以供将来运行使用。

# This function imports the model from Hugging Face. The modal container
# will call this function when it starts up. This is useful for
# downloading models, setting up environment variables, etc.
def import_model():
    import outlines
    outlines.models.transformers(language_model)

# This line tells the container to run the import_model function when it starts.
outlines_image = outlines_image.run_function(import_model)

定义 Schema

我们将运行 README 中的 JSON 结构化生成示例（链接），使用以下 Schema：

# Specify a schema for the character description. In this case,
# we want to generate a character with a name, age, armor, weapon, and strength.
schema = """{
    "title": "Character",
    "type": "object",
    "properties": {
        "name": {
            "title": "Name",
            "maxLength": 10,
            "type": "string"
        },
        "age": {
            "title": "Age",
            "type": "integer"
        },
        "armor": {"$ref": "#/definitions/Armor"},
        "weapon": {"$ref": "#/definitions/Weapon"},
        "strength": {
            "title": "Strength",
            "type": "integer"
        }
    },
    "required": ["name", "age", "armor", "weapon", "strength"],
    "definitions": {
        "Armor": {
            "title": "Armor",
            "description": "An enumeration.",
            "enum": ["leather", "chainmail", "plate"],
            "type": "string"
        },
        "Weapon": {
            "title": "Weapon",
            "description": "An enumeration.",
            "enum": ["sword", "axe", "mace", "spear", "bow", "crossbow"],
            "type": "string"
        }
    }
}"""

为了让推理在 Modal 上工作，我们需要将相应的函数包裹在 @app.function 装饰器中。我们将函数希望运行的镜像和 GPU 传递给这个装饰器。

我们选择一块 80GB 内存的 A100 GPU。有效的 GPU 列表可以在此处找到。

# Define a function that uses the image we chose, and specify the GPU
# and memory we want to use.
@app.function(image=outlines_image, gpu=gpu.A100(size='80GB'))
def generate(
    prompt: str = "Amiri, a 53 year old warrior woman with a sword and leather armor.",
):
    # Remember, this function is being executed in the container,
    # so we need to import the necessary libraries here. You should
    # do this with any other libraries you might need.
    import outlines

    # Load the model into memory. The import_model function above
    # should have already downloaded the model, so this call
    # only loads the model into GPU memory.
    model = outlines.models.transformers(
        language_model, device="cuda"
    )

    # Generate a character description based on the prompt.
    # We use the .json generation method -- we provide the
    # - model: the model we loaded above
    # - schema: the JSON schema we defined above
    generator = outlines.generate.json(model, schema)

    # Make sure you wrap your prompt in instruction tags ([INST] and [/INST])
    # to indicate that the prompt is an instruction. Instruction tags can vary
    # by models, so make sure to check the model's documentation.
    character = generator(
        f"<s>[INST]Give me a character description. Describe {prompt}.[/INST]"
    )

    # Print out the generated character.
    print(character)

然后我们需要定义一个 local_entrypoint 来远程调用我们的函数 generate。

@app.local_entrypoint()
def main(
    prompt: str = "Amiri, a 53 year old warrior woman with a sword and leather armor.",
):
    # We use the "generate" function defined above -- note too that we are calling
    # .remote() on the function. This tells modal to run the function in our cloud
    # machine. If you want to run the function locally, you can call .local() instead,
    # though this will require additional setup.
    generate.remote(prompt)

这里的 @app.local_entrypoint() 装饰器将 main 定义为使用 Modal CLI 在本地启动时执行的函数。您可以将以上代码保存到 example.py 文件中（或使用此实现）。现在让我们看看如何使用 Modal CLI 在云端运行代码。

在云端运行

首先从 PyPi 安装 Modal 客户端（如果您尚未安装）

pip install modal

然后您需要从 Modal 获取一个令牌。运行以下命令：

modal setup

设置完成后，您可以使用以下命令在云端运行推理：

modal run example.py

您应该会看到 Modal 应用程序初始化，很快就能在终端中看到 print 函数的输出结果。就是这样！