Skip to content

Conversation

@luiz0992
Copy link

@luiz0992 luiz0992 commented Nov 9, 2025

What does this PR do?

This PR adds artifact logging support to the MLFlowLogger by implementing the upload_file() method from the LoggerDestination interface.

Key Changes:

  • Implements upload_file(): Adds the ability to log files and directories as MLflow artifacts
  • Smart file/directory detection: Automatically uses MlflowClient.log_artifact() for single files and MlflowClient.log_artifacts() for directories
  • Distributed training support: Properly respects the _enabled flag to ensure artifacts are only logged from rank-zero in distributed settings
  • Implements can_upload_files(): Returns True to advertise the artifact upload capability

Motivation:

Previously, MLFlowLogger could not upload arbitrary files as artifacts (e.g., checkpoints, model weights, configuration files). Using mlflow.log_artifact() directly in training code caused conflicts in distributed training scenarios with world_size > 1. This implementation provides a proper, safe way to log artifacts that works correctly in both single-node and distributed training.

Implementation Details:

The upload_file() method:

  • Accepts both file_path pointing to files or directories
  • Preserves directory structure when specified in remote_file_name
  • Validates that the MLflow run is initialized before logging
  • Returns early when _enabled=False (distributed training, non-rank-zero)

What issue(s) does this change relate to?

  • Related to artifact logging in distributed training environments
  • Addresses the gap where MLFlowLogger couldn't upload files through the standard LoggerDestination interface

Before submitting

  • Have you read the contributor guidelines?
  • Is this change a documentation change or typo fix? If so, skip the rest of this checklist.
  • Was this change discussed/approved in a GitHub issue first? It is much more likely to be merged if so.
  • Did you update any related docs and document your change?
  • Did you update any related tests and add any new tests related to your change? (see testing)
    • Added test_mlflow_upload_single_file() - tests single file upload
    • Added test_mlflow_upload_directory() - tests directory upload with multiple files
    • Added test_mlflow_upload_file_not_enabled() - tests distributed training behavior
    • Added test_mlflow_can_upload_files() - tests capability advertisement
  • Did you run the tests locally to make sure they pass?
  • Did you run pre-commit on your change? (see the pre-commit section of prerequisites)

Testing Notes:

All tests follow existing patterns in test_mlflow_logger.py:

  • Use pytest.importorskip('mlflow') for conditional test execution
  • Use tmp_path fixtures for isolated testing
  • Verify artifacts are correctly stored in MLflow directory structure
  • Test both success paths and edge cases (disabled logger)

@luiz0992 luiz0992 requested a review from a team as a code owner November 9, 2025 15:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant