大语言LLM环境准备与基本测试
安装ubuntu22.04 并更换为清华源
sudo nano /etc/apt/sources.list
替换文件内容为如下:
# 默认注释了源码镜像以提高 apt update 速度,如有需要可自行取消注释
deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ jammy main restricted universe multiverse
# deb-src https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ jammy main restricted universe multiverse
deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ jammy-updates main restricted universe multiverse
# deb-src https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ jammy-updates main restricted universe multiverse
deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ jammy-backports main restricted universe multiverse
# deb-src https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ jammy-backports main restricted universe multiverse
# deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ jammy-security main restricted universe multiverse
# # deb-src https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ jammy-security main restricted universe multiverse
deb http://security.ubuntu.com/ubuntu/ jammy-security main restricted universe multiverse
# deb-src http://security.ubuntu.com/ubuntu/ jammy-security main restricted universe multiverse
# 预发布软件源,不建议启用
# deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ jammy-proposed main restricted universe multiverse
# # deb-src https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ jammy-proposed main restricted universe multiverse
更换输入法为搜狗输入法
安装fcitx输入法框架
sudo apt install fcitx
下载搜狗输入法安装包
cd ~/Downloads
sudo dpkg -i sogoupinyin_4.2.1.145_amd64.deb
sudo apt install libqt5qml5 libqt5quick5 libqt5quickwidgets5 qml-module-qtquick2
sudo apt install libgsettings-qt1
卸载ibus输入法框架
sudo apt purge ibus
Miniconda
参考链接 下载这个安装脚本,并运行
wget https://repo.anaconda.com/miniconda/Miniconda3-py310_23.1.0-1-Linux-x86_64.sh
bash Miniconda3-py310_23.1.0-1-Linux-x86_64.sh
source ~/.bashrc
在需要创建环境的时候可以按需创建
conda create -n 31009text python=3.10.9
conda create -n 31006sd python=3.10.6
Git
sudo apt install git
如果没有科学上网,安装油猴脚本加速浏览器的github 或者其他方案
驱动
参考文章
#卸载系统自带的驱动
sudo apt purge nvidia*
sudo vi /etc/modprobe.d/blacklist.conf
在最后增加两行
blacklist nouveau
options nouveau modeset=0
sudo update-initramfs -u
sudo reboot
重启以后,检查nouveau是否被屏蔽,如果以下命令无输出则成功
lsmod | grep nouveau
根据显卡型号下载驱动,并安装
sudo chmod a+x NVIDIA-Linux-x86_64-525.89.02.run
sudo ./NVIDIA-Linux-x86_64-525.89.02.run -no-x-check -no-nouveau-check -no-opengl-files
- -no-x-check: 安装时关闭X服务;
- -no-nouveau-check: 安装时禁用nouveau;
- -no-opengl-files: 只安装驱动文件,不安装OpenGL文件, 这个无所谓
安装完成后,通过下述命令查看兼容的cuda版本,2080ti最高可以兼容12.0
nvidia-smi
安装cuda
访问https://developer.nvidia.com/cuda-toolkit-archive
进行选择,然后执行相应的命令行下载并安装
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.0.0/local_installers/cuda-repo-ubuntu2204-12-0-local_12.0.0-525.60.13-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2204-12-0-local_12.0.0-525.60.13-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu2204-12-0-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda
验证安装完成,注意V是大写
nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Mon_Oct_24_19:12:58_PDT_2022
Cuda compilation tools, release 12.0, V12.0.76
Build cuda_12.0.r12.0/compiler.31968024_0
!!!由于要使用量化QPTY版本,实际安装11.7.1版本,如下
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/11.7.1/local_installers/cuda-repo-ubuntu2204-11-7-local_11.7.1-515.65.01-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2204-11-7-local_11.7.1-515.65.01-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu2204-11-7-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda
安装cudnn
注册nv-developer账号
https://developer.nvidia.com/rdp/cudnn-download
下载对应版本,选择TAR压缩包,随后按照说明进行安装,实际是解压缩以及将文件移动到指定位置
Text-generation-webui
https://github.com/oobabooga/text-generation-webui
sudo apt install build-essential
# conda create -n 31009text python=3.10.9
conda env list
conda activate 31009text
# install pytorch 根据显卡不同命令不一样
pip3 install torch torchvision torchaudio
git clone https://github.com/oobabooga/text-generation-webui
cd text-generation-webui
pip install -r requirements.txt
python server.py
GPTQ 安装
激活conda 的 31009text环境,
按照此分支 进行安装,安装目录为text-generation-webui 的repositories目录下,如果仍然报错,参考hf 配置
git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa
cd GPTQ-for-LLaMa
pip install -r requirements.txt
hf的配置方案,主要是切换到了gptq的一个指定版本
git clone -n https://github.com/qwopqwop200/GPTQ-for-LLaMa gptq-safe
cd gptq-safe && git checkout 58c8ab4c7aaccc50f507fd08cce941976affe5e0
由于这个方案重新命名了文件夹gpta-safe, 所以在GPTQ-Loader.py的文件中需要相应修改,增加path的位置,以便后续的运行过程中能找到modelutils 和find_layers
sys.path.insert(0, str(Path("repositories/GPTQ-for-LLaMa")))
sys.path.insert(0, str(Path("repositories/GPTQ-for-LLaMa/utils")))
sys.path.insert(0, str(Path("repositories/gptq-safe")))
import llama_inference_offload
import modelutils
from modelutils import find_layers
测试加载模型
llama-13b-hf 不成功,其模型大小达40g, 报错out of memory, 因此必须用4bit量化模型
GPT4-x-alpaca-30b-4bit 不成功,在text-generation-webui中有bug
vicuna-13B-1.1-GPTQ-4bit-128g 成功
单独测试chatglm-6b ,成功,在text-generation-webui中报错,trust_remote_code问题
加载模型vicuna 4bit如下,也可以在webui中选取
python server.py --model vicuna-13B-1.1-GPTQ-4bit-128g --wbits 4 --groupsize 128 --model_type Llama
加载vicuna模型
#先用delta模型去处理原始llama,生成可以直接使用的模型
python -m fastchat.model.apply_delta --base-model-path ./llama-13b-hf --target-model-path ~/TDowload/vicuna-13b --delta-path ~/TDowload/vicuna-13b-delta-v1.1/
# 此处由于需要内存超过64GB,暂时无法完成,有待后续
转换原始的llama模型到hf 量化模型
#convert LLaMA to hf
python convert_llama_weights_to_hf.py --input_dir /path/to/downloaded/llama/weights --model_size 7B --output_dir ./llama-hf
# Benchmark language generation with 4-bit LLaMA-7B:
# Save compressed model
CUDA_VISIBLE_DEVICES=0 python llama.py {MODEL_DIR} c4 --wbits 4 --true-sequential --act-order --groupsize 128 --save_safetensors llama7b-4bit-128g.safetensors
# Benchmark generating a 2048 token sequence with the saved model
CUDA_VISIBLE_DEVICES=0 python llama.py {MODEL_DIR} c4 --benchmark 2048 --check
# model inference with the saved model
CUDA_VISIBLE_DEVICES=7 python llama_inference.py {MODEL_DIR} --wbits 4 --groupsize 128 --load llama7b-4bit-128g.safetensors --text "this is llama" --device=0
# model inference with the saved model with offload(This is very slow. This is a simple implementation and could be improved with technologies like flexgen(https://github.com/FMInference/FlexGen).
CUDA_VISIBLE_DEVICES=0 python llama_inference_offload.py ${MODEL_DIR} --wbits 4 --groupsize 128 --load llama7b-4bit-128g.pt --text "this is llama" --pre_layer 16
It takes about 180 seconds to generate 45 tokens(5->50 tokens) on single RTX3090 based on LLaMa-65B. pre_layer is set to 50.
总结:建议直接下载已经量化好的模型
测试 MOSS 模型
注意有plugin 版本的区别
安装迅雷或者其他下载软件
用来下载huggingface.co 上的模型
安装v2raya 或者其他科学上网方法
sudo snap install v2raya
# 进行配置
Langchain
在conda31009text环境下安装langchain
conda env list
conda activate 31009text
pip install langchain
data independent
referece
https://www.youtube.com/watch?v=v6sF8Ed3nTE
https://colab.research.google.com/drive/115ba3EFCT0PvyXzFNv9E18QnKiyyjsm5?usp=sharing
pip -q install git+https://github.com/huggingface/transformers
pip install -q datasets loralib sentencepiece
pip -q install bitsandbytes accelerate
pip -q install langchain
Troubleshooting
常见问题一
Hi @candowu, thanks for raising this issue. This is arising, because the
tokenizer
in the config on the hub points toLLaMATokenizer
. However, the tokenizer in the library isLlamaTokenizer
.This is likely due to the configuration files being created before the final PR was merged in.
Change the LLaMATokenizer in tokenizer_config.json into lowercase LlamaTokenizer and it works like a charm.
常见问题二
ValueError: The current `device_map` had weights offloaded to the disk. Please provide an
`offload_folder` for them.
Adding this argument should resolve the error:
text-generation-webui/modules/models.py line 56
import torch
from transformers import AutoModelForCausalLM
# Will go out of RAM on Colab
checkpoint = "facebook/opt-13b"
model = AutoModelForCausalLM.from_pretrained(
checkpoint, device_map="auto", offload_folder="offload", torch_dtype=torch.float16
)
问题三,moss运行中
TypeError: '<' not supported between instances of 'tuple' and 'float'
这个是由于 OutOfResources
导致返回 float值,随后不能完成比较
参考以下解决方案,采用源码重新编译和安装triton
其实就是修改一行代码,将资源不足时返回的float类型改成了tuple类型
删除 | return float('inf') | |
---|---|---|
增加 | return (float('inf'), float('inf'), float('inf')) |
问题四: (待验证)
In the current version, we have model_class
not being an Auto
class for (TF) ImageClassificationPipelineTests
, and we get test failure TypeError: ('Keyword argument not understood:', 'trust_remote_code')
https://github.com/huggingface/transformers/runs/7421505300?check_suite_focus=true
Adding TFAutoModelForImageClassification
in src/transformers/pipelines/__init__.py
will fix the issue.