Model loading doesn't work for SentencePieceTokenizer

This code fails with an error:

```python
import tkseem as tk

tokenizer_path = 'model.pl'
tokenizer = tk.SentencePieceTokenizer()
tokenizer.train(dataset_file)

# save the tokenizer to a file
tokenizer.save_model(tokenizer_path)

# load the tokenizer from a file
tokenizer = tk.SentencePieceTokenizer()
tokenizer.load_model(tokenizer_path)

# test the tokenizer
a = tokenizer.tokenize("السلام عليكم")
```

Error message is:
```
Traceback (most recent call last):
  File "/Users/user/Desktop/Projects/train-tokenizer.py", line 15, in <module>
    a = tokenizer.tokenize("السلام عليكم")
  File "/Users/user/.pyenv/versions/3.10.0/lib/python3.10/site-packages/tkseem/sentencepiece_tokenizer.py", line 50, in tokenize
    return self.sp.encode(text, out_type=str)
AttributeError: 'bool' object has no attribute 'encode'
```

The solution to this issue is updating the "load_model" to:
```python
    def load_model(self, file_path):
        """Load a saved sp model

        Args:
            file_path (str): file path of the trained model
        """
        self.sp = spm.SentencePieceProcessor(model_proto=open(file_path, "rb").read())
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Model loading doesn't work for SentencePieceTokenizer #12

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Model loading doesn't work for SentencePieceTokenizer #12

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions