使用 Triton 部署 chatglm2-6b 模型( 二 ) _Triton

模型的输入，输出和参数可以在这里使用 python 脚本进行加工处理
import triton_python_backend_utils as pb_utils
class TritonPythonModel:
@staticmethod
def auto_complete_config(auto_complete_model_config):
"""`auto_complete_config` is called only once when loading the model
def initialize(self, args):
"""`initialize` is called only once when the model is being loaded.
Implementing `initialize` function is optional. This function allows
the model to initialize any state associated with this model.
Parameters
----------
args : dict
Both keys and values are strings. The dictionary keys and values are:
* model_config: A JSON string contAIning the model configuration
* model_instance_kind: A string containing model instance kind
* model_instance_device_id: A string containing model instance device
ID
* model_repository: Model repository path
* model_version: Model version
* model_name: Model name
"""
print('Initialized...')
def execute(self, requests):
"""`execute` must be implemented in every Python model. `execute`
function receives a list of pb_utils.InferenceRequest as the only
argument. This function is called when an inference is requested
for this model.
Parameters
----------
requests : list
A list of pb_utils.InferenceRequest
Returns
-------
list
A list of pb_utils.InferenceResponse. The length of this list must
be the same as `requests`
"""
responses = []
def finalize(self):
"""`finalize` is called only once when the model is being unloaded.
Implementing `finalize` function is optional. This function allows
the model to perform any necessary clean ups before exit.
"""
print('Cleaning up...')
Step 5: 安装推理环境和各种软件
cuda 版本和显卡驱动必须对应，cuda toolkit 与驱动版本
对应关系见官网: https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#cuda-major-component-versions
1) torch 介绍和安装:
torch 科学计算框架，旨在为机器学习和其他科学计算任务提供高效的矩阵运算和自动微分功能。
提供了丰富的预训练模型和算法库，使用户能够快速构建和训练各种机器学习任务。
pip install ./torch-1.12.1+cu116-cp38-cp38-linux_x86_64.whl
2) 显卡驱动:
sh ./NVIDIA-Linux-x86_64-460.106.00.run
3) cudnn 介绍和安装:
CUDA Deep Neural.NETwork library 是由 NVIDIA 提供的 GPU 加速的深度神经网络（DNN）库。它旨在优化和加速深度学习任务中的神经网络模型训练和推理。
cuDNN 提供了一组用于卷积神经网络（Convolutional Neural Networks, CNN）和循环神经网络（Recurrent Neural Networks, RNN）等常见深度学习任务的核心算法和函数。这些算法和函数针对 GPU 架构进行了高度优化，以提供最佳的性能和效率。
wget https://developer.download.nvidia.cn/compute/cuda/repos/ubuntu1804/x86_64/libcudnn8_8.1.1.33-1+cuda11.2_amd64.deb
dpkg -i libcudnn8_8.1.1.33-1+cuda11.2_amd64.deb
4) cuda:
Compute Unified Device Architecture 库是由 NVIDIA 开发的用于 GPU 编程的并行计算平台和 API 。
通过 CUDA 库，可以在 GPU 上同步或异步地进行模型推理，同时支持批处理和多张卡并行计算，以提升模型推理的速度和效率
wget https://developer.download.nvidia.com/compute/cuda/11.2.0/local_installers/cuda_11.2.0_460.27.04_linux.run
sudo sh cuda_11.2.0_460.27.04_linux.run
5) 各种软件
nohup apt-get update
nohup apt-get install -y autoconf autogen clangd gdb git-lfs libb64-dev libz-dev locales-all mosh openssh-server python3-dev rapidjson-dev sudo tmux unzip zstd zip zsh
Step 6: 启动 triton-server
CUDA_VISIBLE_DEVICES=0 setsid tritonserver --model-repository=/opt/tritonserver/python_backend/models --backend-config=python,shm-region-prefix-name=prefix1_ --http-port 8000 --grpc-port 8001 --metrics-port 8002 --log-verbose 1 --log-file /opt/tritonserver/logs/triton_server_gpu0.log

文章插图

文章插图
启动成功 http 端口 8000 grpc 端口 8001 测量端口 8002
三、测试简单的调用 python 代码调用 http 接口
import requests
# 定义模型的输入数据
data = https://www.isolves.com/it/rj/jy/2023-09-27/{
"inputs": [
{
"name": "QUERY",
"shape": [1,1],
"datatype": "BYTES",
"data": ["川普是不是四川人"]

使用 Triton 部署 chatglm2-6b 模型( 二 )

推荐阅读

百世低调上线优鲜达，全国19省份次日达

搜狐城市-商丘|永城女子花万元隆鼻致两次修复，鼻子透光还晃动悦己整形：没问题

中国新闻网|如何保障国庆中秋假期出行安全？民航局回应

汽车说刊全新奥迪Q7上市售68.98万元起，男人梦想的大型SUV

运输合同怎么写运输合同范本格式

五本养成系文，温柔大叔X傲娇萝莉娇气软妹女主X温柔霸道男主

【老吴聊动漫】莫纱温婉美丽，灵公主神似洋娃娃1.王默2.莫纱3.白光莹4.灵公主，叶罗丽仙子化身美丽的精灵

青年|我的世界：生存土豪的炫富方式！钻石信标路灯，钻石养宠新技巧

『女神秘语专属』生肖狗：好消息！4月8号【下周三】就是“猪猪人”的发财日！

东坡|品文库爱贵州·荐书台｜《莫批施注苏诗》：跟着莫友芝，读懂东坡诗

男子酒驾被查百般耍赖，直到儿子打来一个电话……

好吃不贵的酱香鸡爪做法酱鸡爪怎么做

这里是儋州|市体育中心“一场两馆”项目：完成投资3.5亿元

退休人员养老金15连涨，2019年何时到手呢？最新通知到了！

微信潮流网名英语潮流2021年微信昵称网名英文名

郭晶晶|郭晶晶妈妈是隐形美女！和亲家母同框，一头银发真抢镜

止咳化痰最快的办法一招见效,化痰止咳最好的方法-

人民日报客户端|日增3.3万例，巴西新冠肺炎确诊病例超452万例

装修时，先定下墙面的颜色可以吗

红茶加茯苓,山楂麦芽茯苓茶的功效和作用