-
Notifications
You must be signed in to change notification settings - Fork 21
Open
Description
This code fails with an error:
import tkseem as tk
tokenizer_path = 'model.pl'
tokenizer = tk.SentencePieceTokenizer()
tokenizer.train(dataset_file)
# save the tokenizer to a file
tokenizer.save_model(tokenizer_path)
# load the tokenizer from a file
tokenizer = tk.SentencePieceTokenizer()
tokenizer.load_model(tokenizer_path)
# test the tokenizer
a = tokenizer.tokenize("السلام عليكم")Error message is:
Traceback (most recent call last):
File "/Users/user/Desktop/Projects/train-tokenizer.py", line 15, in <module>
a = tokenizer.tokenize("السلام عليكم")
File "/Users/user/.pyenv/versions/3.10.0/lib/python3.10/site-packages/tkseem/sentencepiece_tokenizer.py", line 50, in tokenize
return self.sp.encode(text, out_type=str)
AttributeError: 'bool' object has no attribute 'encode'
The solution to this issue is updating the "load_model" to:
def load_model(self, file_path):
"""Load a saved sp model
Args:
file_path (str): file path of the trained model
"""
self.sp = spm.SentencePieceProcessor(model_proto=open(file_path, "rb").read())Metadata
Metadata
Assignees
Labels
No labels