Releases · segment-any-text/wtpsplit · GitHub

19 Nov 08:41

markus583

Release 2.1.7 Latest

Latest

Suppress annoying warnings of upstream dependencies in some Python version
Add possibility to not merge LoRA weights (still defaults to merging for efficiency reasons)

Full Changelog: 2.1.6...2.1.7

Assets 2

23 Jun 03:49

markus583

Release 2.1.6

What's Changed

Improve postprocessing efficiency by @kevinhu in #157

New Contributors

@kevinhu made their first contribution in #157

Full Changelog: 2.1.5...2.1.6

Contributors

kevinhu

Assets 2

01 Apr 13:35

markus583

Release 2.1.5

Changelog

Avoid unnecessary len check by using is None for tokenizer, leading to major speedups (#150)
Change default install onnxruntime from cpu to flexible install gpu and cpu (#152)
Allow using pre-downloaded tokenizer so SaT can be used offline (#151)
Add checks when setting a ONNX model object (#149)

Assets 2

25 Jan 16:43

markus583

Release 2.1.4

Introduce optional hat weighting by @lsorber
Clarify LoRA adaptation
Clarify treat_newline_as_space: renamed to split_on_input_newlines. treat_newline_as_space will be deprecated in a future release.

Contributors

lsorber

Assets 2

14 Dec 11:06

markus583

Release 2.1.2

Fixes #142: AssertionError when string is only comprised of newlines, whitespace, or if its an empty strong.

Assets 2

27 Oct 14:19

markus583

Release 2.1.1

Change default behaviour for newlines in SaT.split.
- Now, while the model ignores them, they will used to split as simple post-processing.
Small bugfixes for LoRA training
Update Readme for advanced usage

Assets 2

24 Sep 21:37

markus583

Release 2.1.0

Adds ONNX support for SaT models.
- Including export scripts and an updated README.
- This results in 50% improved inference time on GPU.

Assets 2

09 Sep 10:49

markus583

Release 2.0.8

Fix splitting of short sequences into individual characters (#127)

Assets 2

02 Sep 13:26

markus583

Release 2.0.7

Allow numpy>=2.0
Fix adaptation code
Add some comments

Assets 2

08 Jul 07:41

bminixhofer

Release 2.0.5

Fixes potential CUDA device error when the input has exactly 511 tokens (#121).

Assets 2