fix: improve the reproducibility system#303
Conversation
There was a problem hiding this comment.
Code Review
This pull request refactors the reproducibility system to support multiple levels of detail and includes new settings for model commits and shard sizes. Key changes include the addition of a privacy-focused settings filter and updated logic for generating reproduction assets. Feedback identifies style guide violations regarding missing default configurations, unrelated whitespace changes, and missing type annotations, alongside a suggestion to use more idiomatic Pydantic practices for attribute exclusion.
| {accelerator_report} | ||
| - **Heretic:** v{version_info.version}{f" (Origin: {version_info.origin})" if version_info.origin else ""} | ||
| - **PyTorch:** {pytorch_version} | ||
| - **Other dependencies:** See [`requirements.txt`](requirements.txt). |
There was a problem hiding this comment.
isn't this line redundant as the 1st line just after it in contents section has requirements.txt info
There was a problem hiding this comment.
I noticed that too, but the fact is that the information legitimately belongs in both places. How would you structure it? The dependencies are part of the environment.
There was a problem hiding this comment.
maybe it's very obvious for anyone, so keeping it in any one of them is enough "See [requirements.txt]" is there not not does not matter.
There was a problem hiding this comment.
I agree that this looks a bit strange, but I tried removing both references and something is clearly missing then. I prefer a bit of redundancy over confusion.
|
I noticed in your conducted tests:
I suppose |
Previously was it not working? |
Correct. This is how the system expresses the "global" direction (
Yes, I noticed that. I guess not every CPU provides that information.
No? Did you intend to add that functionality? Because there was no relevant code at all yet. The |
|
Note to self: The dependencies are not sorted in |
maybe I haven't checked that time was it being used or not. Correct |
Not sure what you are talking about, but I do see |
Yes, but it's not sorted. |
|
I'm also planning for the final puzzle piece: This seems quite complicated to implement, because it will require Heretic to create a temporary venv, install the correct Heretic version and dependencies there, and then quit to hand processing over to that version of Heretic. |
if 1.3 will release after the reproducibility suite is automatically useable that'll take long,
We can have a |
Correct, that part won't be in the 1.3 release most likely. |
| exclude=True, | ||
| ) | ||
|
|
||
| max_shard_size: int | str = Field( |
There was a problem hiding this comment.
I don't know if This sharding logic is related to reproducibility
There was a problem hiding this comment.
The default size may change between Transformers versions (apparently, this has already happened at least once). Storing the size in settings increases the chance of reproducibility if the Transformers version doesn't match.
| > [!TIP] | ||
| > **This model is reproducible!** | ||
| > | ||
| > See the [README](reproduce/README.md) in the `reproduce` directory for more information. |
There was a problem hiding this comment.
| > See the [README](reproduce/README.md) in the `reproduce` directory for more information. | |
| > See the [README](reproduce/README.md) in the [reproduce](https://huggingface.co/{settings.model}/tree/main/reproduce) directory for more information. |
There was a problem hiding this comment.
I've thought about that but decided against it because if the model is forked, quantized etc., absolute paths will link to the wrong (parent) model. It's unfortunate that HF doesn't support linking directories properly, but hacks only create more problems.
There was a problem hiding this comment.
Also, your suggestion is linking to the wrong model anyway, namely to the base model, which doesn't have a reproduce directory.
|
Merged! Several remaining issues have been fixed and the exclusion logic for settings in particular is now much cleaner. I have re-tested everything before merging and found no more problems. |
yes i saw all 3 test repos |
|
I'm deleting all my test models now because many people follow my account and I don't want them to be quantized/benchmarked etc. |
Yeah I was thinking why you didn't deleted old tests, as they don't look good 👍 |
|
how was |
|
It wasn't "fixed", I changed nothing there. I believe some CPUs provide that value, while others don't. I was on a different server so I probably just got one with a CPU that has the |
* fix: various cleanups and improvements for the reproducibility system * fix: save only essential settings * fix: improve model commit handling * feat: make including system information optional * fix: improve formatting of reproducibility README * fix: fix remaining issues
This is a pretty major overhaul of #191, aiming to get it ready for the Heretic 1.3 release.
Changes include:
reproduce.jsonwith a view to the future.Examples:
@Vinay-Umrethe Requesting review!