GLM-Z1-32B-0414和GLM-Z1-Rumination-32B-0414模型乱码

### System Info / 系統信息

GLM-Z1-32B-0414和GLM-Z1-Rumination-32B-0414模型输出全部是：

“！！！！！！！”

模型已经采用官方的下载方式重新下载3次，无法解决乱码问题。而且cpu上运行正常，输出正常。

下载的代码为（GLM-Z1-Rumination-32B-0414官方文件中的代码加cachedir）：

tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH,cache_dir="./cache", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(MODEL_PATH, cache_dir="./cache", device_map="auto")

推理的代码为：

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0,1,2,3,4,5,6"

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

MODEL_PATH = "/mnt/data/zhaoshukuo/try/GLM-Z1-32B-0414/cache/models--zai-org--GLM-Z1-32B-0414/snapshots/8eb2858992c1f749e2a6d4075455decc2484722d"

quantization_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_compute_dtype="float16", bnb_4bit_use_double_quant=True, bnb_4bit_quant_type="nf4" # 不要加 llm_int8_enable_fp32_cpu_offload )

tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained( MODEL_PATH, trust_remote_code=True, quantization_config=quantization_config, device_map="auto" )

message = [{"role": "user", "content": "你好"}]

inputs = tokenizer.apply_chat_template( message, return_tensors="pt", add_generation_prompt=True, return_dict=True, ).to(model.device)

generate_kwargs = { "input_ids": inputs["input_ids"], "attention_mask": inputs["attention_mask"], "max_new_tokens": 200, "do_sample": False, }

out = model.generate(**generate_kwargs)
print(tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

输出均都是叹号，而且gpu上运行logits均值为0.0184，低于cpu， 在gpu上8bit 量化和原版无量化模型输出也存在乱码情况。附件是我下载和运行模型的代码


[tryload1.py](https://github.com/user-attachments/files/22613944/tryload1.py)
[tryload.py](https://github.com/user-attachments/files/22613943/tryload.py)
[tryrag1-gpu-luanma1.py](https://github.com/user-attachments/files/22613945/tryrag1-gpu-luanma1.py)

### Who can help? / 谁可以帮助到您？

_No response_

### Information / 问题信息

- [x] The official example scripts / 官方的示例脚本
- [x] My own modified scripts / 我自己修改的脚本和任务

### Reproduction / 复现过程

我已提交全部代码，下载模型后运行会产生错误

### Expected behavior / 期待表现

输出正常的文字，不是“！！！！！！”或乱码。附件是我下载和运行模型的文件。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GLM-Z1-32B-0414和GLM-Z1-Rumination-32B-0414模型乱码 #795

System Info / 系統信息

Who can help? / 谁可以帮助到您？

Information / 问题信息

Reproduction / 复现过程

Expected behavior / 期待表现

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

GLM-Z1-32B-0414和GLM-Z1-Rumination-32B-0414模型乱码 #795

Description

System Info / 系統信息

Who can help? / 谁可以帮助到您？

Information / 问题信息

Reproduction / 复现过程

Expected behavior / 期待表现

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions