跳到内容

使用 Modal 运行 Outlines

Modal 是一个无服务器平台,允许您轻松在云端运行代码,包括使用 GPU。对于那些家里没有强大的 GPU,但又想快速轻松地配置和协调云基础设施的人来说,它非常方便。

本指南将向您展示如何使用 Modal 在云端使用 GPU 运行使用 Outlines 编写的程序。

要求

我们建议在虚拟环境中安装 modaloutlines。您可以使用以下命令创建一个虚拟环境:

python -m venv venv
source venv/bin/activate

然后安装所需的包

pip install modal outlines

构建镜像

首先,我们需要定义容器镜像。如果您需要访问受限模型,则需要提供一个访问令牌。请参阅下面的 .env 调用,了解如何提供 HuggingFace 令牌。

设置令牌的最佳方式是设置一个名为 HF_TOKEN 的环境变量,其值为您的令牌。如果您不想这样做,我们在代码中提供了一行注释掉的代码,可以直接在代码中设置令牌。

from modal import Image, App, gpu
import os

# This creates a modal App object. Here we set the name to "outlines-app".
# There are other optional parameters like modal secrets, schedules, etc.
# See the documentation here: https://modal.com/docs/reference/modal.App
app = App(name="outlines-app")

# Specify a language model to use.
# Another good model to use is "NousResearch/Hermes-2-Pro-Mistral-7B"
language_model = "mistral-community/Mistral-7B-v0.2"

# Please set an environment variable HF_TOKEN with your Hugging Face API token.
# The code below (the .env({...}) part) will copy the token from your local
# environment to the container.
# More info on Image here: https://modal.com/docs/reference/modal.Image
outlines_image = Image.debian_slim(python_version="3.11").pip_install(
    "outlines",
    "transformers",
    "datasets",
    "accelerate",
    "sentencepiece",
).env({
    # This will pull in your HF_TOKEN environment variable if you have one.
    'HF_TOKEN':os.environ['HF_TOKEN']

    # To set the token directly in the code, uncomment the line below and replace
    # 'YOUR_TOKEN' with the HuggingFace access token.
    # 'HF_TOKEN':'YOUR_TOKEN'
})

设置容器

运行较长的 Modal 应用程序时,建议在容器启动时下载语言模型,而不是在函数调用时下载。这将缓存模型以供将来运行使用。

# This function imports the model from Hugging Face. The modal container
# will call this function when it starts up. This is useful for
# downloading models, setting up environment variables, etc.
def import_model():
    import outlines
    outlines.models.transformers(language_model)

# This line tells the container to run the import_model function when it starts.
outlines_image = outlines_image.run_function(import_model)

定义 Schema

我们将运行 README 中的 JSON 结构化生成示例(链接),使用以下 Schema:

# Specify a schema for the character description. In this case,
# we want to generate a character with a name, age, armor, weapon, and strength.
schema = """{
    "title": "Character",
    "type": "object",
    "properties": {
        "name": {
            "title": "Name",
            "maxLength": 10,
            "type": "string"
        },
        "age": {
            "title": "Age",
            "type": "integer"
        },
        "armor": {"$ref": "#/definitions/Armor"},
        "weapon": {"$ref": "#/definitions/Weapon"},
        "strength": {
            "title": "Strength",
            "type": "integer"
        }
    },
    "required": ["name", "age", "armor", "weapon", "strength"],
    "definitions": {
        "Armor": {
            "title": "Armor",
            "description": "An enumeration.",
            "enum": ["leather", "chainmail", "plate"],
            "type": "string"
        },
        "Weapon": {
            "title": "Weapon",
            "description": "An enumeration.",
            "enum": ["sword", "axe", "mace", "spear", "bow", "crossbow"],
            "type": "string"
        }
    }
}"""

为了让推理在 Modal 上工作,我们需要将相应的函数包裹在 @app.function 装饰器中。我们将函数希望运行的镜像和 GPU 传递给这个装饰器。

我们选择一块 80GB 内存的 A100 GPU。有效的 GPU 列表可以在此处找到。

# Define a function that uses the image we chose, and specify the GPU
# and memory we want to use.
@app.function(image=outlines_image, gpu=gpu.A100(size='80GB'))
def generate(
    prompt: str = "Amiri, a 53 year old warrior woman with a sword and leather armor.",
):
    # Remember, this function is being executed in the container,
    # so we need to import the necessary libraries here. You should
    # do this with any other libraries you might need.
    import outlines

    # Load the model into memory. The import_model function above
    # should have already downloaded the model, so this call
    # only loads the model into GPU memory.
    model = outlines.models.transformers(
        language_model, device="cuda"
    )

    # Generate a character description based on the prompt.
    # We use the .json generation method -- we provide the
    # - model: the model we loaded above
    # - schema: the JSON schema we defined above
    generator = outlines.generate.json(model, schema)

    # Make sure you wrap your prompt in instruction tags ([INST] and [/INST])
    # to indicate that the prompt is an instruction. Instruction tags can vary
    # by models, so make sure to check the model's documentation.
    character = generator(
        f"<s>[INST]Give me a character description. Describe {prompt}.[/INST]"
    )

    # Print out the generated character.
    print(character)

然后我们需要定义一个 local_entrypoint 来远程调用我们的函数 generate

@app.local_entrypoint()
def main(
    prompt: str = "Amiri, a 53 year old warrior woman with a sword and leather armor.",
):
    # We use the "generate" function defined above -- note too that we are calling
    # .remote() on the function. This tells modal to run the function in our cloud
    # machine. If you want to run the function locally, you can call .local() instead,
    # though this will require additional setup.
    generate.remote(prompt)

这里的 @app.local_entrypoint() 装饰器将 main 定义为使用 Modal CLI 在本地启动时执行的函数。您可以将以上代码保存到 example.py 文件中(或使用此实现)。现在让我们看看如何使用 Modal CLI 在云端运行代码。

在云端运行

首先从 PyPi 安装 Modal 客户端(如果您尚未安装)

pip install modal

然后您需要从 Modal 获取一个令牌。运行以下命令:

modal setup

设置完成后,您可以使用以下命令在云端运行推理:

modal run example.py

您应该会看到 Modal 应用程序初始化,很快就能在终端中看到 print 函数的输出结果。就是这样!