使用 Modal 运行 Outlines
Modal 是一个无服务器平台,允许您轻松在云端运行代码,包括使用 GPU。对于那些家里没有强大的 GPU,但又想快速轻松地配置和协调云基础设施的人来说,它非常方便。
本指南将向您展示如何使用 Modal 在云端使用 GPU 运行使用 Outlines 编写的程序。
要求
我们建议在虚拟环境中安装 modal
和 outlines
。您可以使用以下命令创建一个虚拟环境:
然后安装所需的包
构建镜像
首先,我们需要定义容器镜像。如果您需要访问受限模型,则需要提供一个访问令牌。请参阅下面的 .env
调用,了解如何提供 HuggingFace 令牌。
设置令牌的最佳方式是设置一个名为 HF_TOKEN
的环境变量,其值为您的令牌。如果您不想这样做,我们在代码中提供了一行注释掉的代码,可以直接在代码中设置令牌。
from modal import Image, App, gpu
import os
# This creates a modal App object. Here we set the name to "outlines-app".
# There are other optional parameters like modal secrets, schedules, etc.
# See the documentation here: https://modal.com/docs/reference/modal.App
app = App(name="outlines-app")
# Specify a language model to use.
# Another good model to use is "NousResearch/Hermes-2-Pro-Mistral-7B"
language_model = "mistral-community/Mistral-7B-v0.2"
# Please set an environment variable HF_TOKEN with your Hugging Face API token.
# The code below (the .env({...}) part) will copy the token from your local
# environment to the container.
# More info on Image here: https://modal.com/docs/reference/modal.Image
outlines_image = Image.debian_slim(python_version="3.11").pip_install(
"outlines",
"transformers",
"datasets",
"accelerate",
"sentencepiece",
).env({
# This will pull in your HF_TOKEN environment variable if you have one.
'HF_TOKEN':os.environ['HF_TOKEN']
# To set the token directly in the code, uncomment the line below and replace
# 'YOUR_TOKEN' with the HuggingFace access token.
# 'HF_TOKEN':'YOUR_TOKEN'
})
设置容器
运行较长的 Modal 应用程序时,建议在容器启动时下载语言模型,而不是在函数调用时下载。这将缓存模型以供将来运行使用。
# This function imports the model from Hugging Face. The modal container
# will call this function when it starts up. This is useful for
# downloading models, setting up environment variables, etc.
def import_model():
import outlines
outlines.models.transformers(language_model)
# This line tells the container to run the import_model function when it starts.
outlines_image = outlines_image.run_function(import_model)
定义 Schema
我们将运行 README 中的 JSON 结构化生成示例(链接),使用以下 Schema:
# Specify a schema for the character description. In this case,
# we want to generate a character with a name, age, armor, weapon, and strength.
schema = """{
"title": "Character",
"type": "object",
"properties": {
"name": {
"title": "Name",
"maxLength": 10,
"type": "string"
},
"age": {
"title": "Age",
"type": "integer"
},
"armor": {"$ref": "#/definitions/Armor"},
"weapon": {"$ref": "#/definitions/Weapon"},
"strength": {
"title": "Strength",
"type": "integer"
}
},
"required": ["name", "age", "armor", "weapon", "strength"],
"definitions": {
"Armor": {
"title": "Armor",
"description": "An enumeration.",
"enum": ["leather", "chainmail", "plate"],
"type": "string"
},
"Weapon": {
"title": "Weapon",
"description": "An enumeration.",
"enum": ["sword", "axe", "mace", "spear", "bow", "crossbow"],
"type": "string"
}
}
}"""
为了让推理在 Modal 上工作,我们需要将相应的函数包裹在 @app.function
装饰器中。我们将函数希望运行的镜像和 GPU 传递给这个装饰器。
我们选择一块 80GB 内存的 A100 GPU。有效的 GPU 列表可以在此处找到。
# Define a function that uses the image we chose, and specify the GPU
# and memory we want to use.
@app.function(image=outlines_image, gpu=gpu.A100(size='80GB'))
def generate(
prompt: str = "Amiri, a 53 year old warrior woman with a sword and leather armor.",
):
# Remember, this function is being executed in the container,
# so we need to import the necessary libraries here. You should
# do this with any other libraries you might need.
import outlines
# Load the model into memory. The import_model function above
# should have already downloaded the model, so this call
# only loads the model into GPU memory.
model = outlines.models.transformers(
language_model, device="cuda"
)
# Generate a character description based on the prompt.
# We use the .json generation method -- we provide the
# - model: the model we loaded above
# - schema: the JSON schema we defined above
generator = outlines.generate.json(model, schema)
# Make sure you wrap your prompt in instruction tags ([INST] and [/INST])
# to indicate that the prompt is an instruction. Instruction tags can vary
# by models, so make sure to check the model's documentation.
character = generator(
f"<s>[INST]Give me a character description. Describe {prompt}.[/INST]"
)
# Print out the generated character.
print(character)
然后我们需要定义一个 local_entrypoint
来远程调用我们的函数 generate
。
@app.local_entrypoint()
def main(
prompt: str = "Amiri, a 53 year old warrior woman with a sword and leather armor.",
):
# We use the "generate" function defined above -- note too that we are calling
# .remote() on the function. This tells modal to run the function in our cloud
# machine. If you want to run the function locally, you can call .local() instead,
# though this will require additional setup.
generate.remote(prompt)
这里的 @app.local_entrypoint()
装饰器将 main
定义为使用 Modal CLI 在本地启动时执行的函数。您可以将以上代码保存到 example.py
文件中(或使用此实现)。现在让我们看看如何使用 Modal CLI 在云端运行代码。
在云端运行
首先从 PyPi 安装 Modal 客户端(如果您尚未安装)
然后您需要从 Modal 获取一个令牌。运行以下命令:
设置完成后,您可以使用以下命令在云端运行推理:
您应该会看到 Modal 应用程序初始化,很快就能在终端中看到 print
函数的输出结果。就是这样!