-
Notifications
You must be signed in to change notification settings - Fork 594
Description
System Info / 系統信息
GLM-Z1-32B-0414和GLM-Z1-Rumination-32B-0414模型输出全部是:
“!!!!!!!”
模型已经采用官方的下载方式重新下载3次,无法解决乱码问题。而且cpu上运行正常,输出正常。
下载的代码为(GLM-Z1-Rumination-32B-0414官方文件中的代码加cachedir):
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH,cache_dir="./cache", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(MODEL_PATH, cache_dir="./cache", device_map="auto")
推理的代码为:
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0,1,2,3,4,5,6"
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
MODEL_PATH = "/mnt/data/zhaoshukuo/try/GLM-Z1-32B-0414/cache/models--zai-org--GLM-Z1-32B-0414/snapshots/8eb2858992c1f749e2a6d4075455decc2484722d"
quantization_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_compute_dtype="float16", bnb_4bit_use_double_quant=True, bnb_4bit_quant_type="nf4" # 不要加 llm_int8_enable_fp32_cpu_offload )
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained( MODEL_PATH, trust_remote_code=True, quantization_config=quantization_config, device_map="auto" )
message = [{"role": "user", "content": "你好"}]
inputs = tokenizer.apply_chat_template( message, return_tensors="pt", add_generation_prompt=True, return_dict=True, ).to(model.device)
generate_kwargs = { "input_ids": inputs["input_ids"], "attention_mask": inputs["attention_mask"], "max_new_tokens": 200, "do_sample": False, }
out = model.generate(**generate_kwargs)
print(tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
输出均都是叹号,而且gpu上运行logits均值为0.0184,低于cpu, 在gpu上8bit 量化和原版无量化模型输出也存在乱码情况。附件是我下载和运行模型的代码
tryload1.py
tryload.py
tryrag1-gpu-luanma1.py
Who can help? / 谁可以帮助到您?
No response
Information / 问题信息
- The official example scripts / 官方的示例脚本
- My own modified scripts / 我自己修改的脚本和任务
Reproduction / 复现过程
我已提交全部代码,下载模型后运行会产生错误
Expected behavior / 期待表现
输出正常的文字,不是“!!!!!!”或乱码。附件是我下载和运行模型的文件。