Releases: bsc-pm/dlb
Releases · bsc-pm/dlb
Version 3.6.0
Added
- Initial support for GPU TALP metrics (includes NVIDIA and AMD support)
- Added a new and more robust support for MPI Fortran 2008 bindings
- New binary
dlb_mpito check affinity in MPI environments - Flags
--gpu-affinityand--uuidfordlbanddlb_mpito show
GPU visibility - New instrumentation events for OMPT callbacks
- New API
DLB_DROM_SetProcessMaskStrto set masks using a human-readable input - New DLB Python bindings
Changed
--talp-output-filenow creates missing directories if ableDLB_TALP_CollectPOPMetricscan be called now from non-MPI apps
Fixed
- Fixed TALP Global region not being started if no MPI or OpenMP
- Replace PAPI reset calls with regular reads to improve performance
- Add several PAPI init checks
- Fixed several LeWI features in async mode
Version 3.5.3
Added
- Added documentation for
DLB_DROM_FLAGS_NONEargument - Other minor documentation corrections
Changed
- Remove MPI Fortran 2008 bindings check at configure time; bindings remain
disabled pending new interception method
Fixed
- Fixed compilation error with GCC 15
- Fixed errors in the OpenMP thread manager during initialization
- Fixed some OpenMP thread manager logic for SMT systems
- Fixed bug in
DLB_MonitoringRegionReset; region was not being removed from
an internal list of open regions - Stop instrumentation after
MPI_Finalizeto avoid unwanted interactions with
external libraries
Version 3.5.2
Added
- Add
dlb_mpibinary to display CPU affinity and MPI rank - Add more verbose messages for OMPT events
- Add implicit-task-end OMPT event to add consistency to Cray OpenMP
Fixed
- Fixed several errors with CPU topology parsing
- Removed deprecated options and struct members in examples
Version 3.5.1
Added
- Add
--disable-sphinx-doc,--disable-doxygen, and--disable-pandoc
to configure script to disable the automatic detection of each tool
Fixed
- Fixed several bugs with CPU affinity masks detection and parsing
- POP metrics conditional printing improved for MPI/OpenMP detection
- Fix some compilation errors with clang-19, nvc, and nvfortran
- Improved documentation in user guide
- Several other minor fixes
TALP- Pages 3.5.1
- Bugfix: Enable the generation of OpenMP only scaling tables
- Bugfix: Disable warning for chained assignments in newer pandas versions
Version 3.5.0
Added
- Asynchronous support for classic LeWI
- Several SMT enhancements for LeWI policies
- Allowed to override lewi classic/mask with
--lewi-affinity - TALP POP metrics now includes experimental OpenMP hybrid metrics
- TALP global region is now exposed in the API
- TALP-Pages, a new tool for Continuous Performance Monitoring in static HTML pages
- Add flag
--talp-region-selectto filter active regions - SLURM integration via
dlb_taskset - CMake config for other projects to link with DLB
- Several examples and documentation reworked
- DLB version information can be accessed though the API
Changed
--talp-summaryhas been simplified and nowpop-metricsalso includes raw
metrics if using an output file, andprocessmetrics now includes node
identifiers- TALP now only stores monitoring regions in shared memory if
--talp-external-profileris set - TALP output structure has been reworked
- TALP main region is now called "Global"
Fixed
- LeWI mask now correctly supports threads blocked in MPI calls while pinned to
multiple CPUs - Add sanity checks for hardware counters in TALP
- Print JSON and CSV files in the proper locale
Deprecated
--talp-summaryvalues forpop-rawandnodeare deprecated- TALP output format XML is now deprecated
--talp-regions-per-procflag is deprecated for a new experimental
--shm-size-multiplierflag- Several fields in
dlb_monitor_tare now deprecated - Several fields in
dlb_pop_metrics_tare now deprecated DLB_MonitoringRegionGetMPIRegiondeprecated in favor of
DLB_MonitoringRegionGetGlobalDLB_Stats_GetCpuStateIdlefunctionality no longer providedDLB_Stats_GetCpuStateOwnedfunctionality no longer providedDLB_Stats_GetCpuStateGuestedfunctionality no longer provided
Version 3.4.1
Fixed
- Fix an error in the shared memory alignment that was causing
segmentation faults when compiling with-march=native - Avoid registering role shifting callbacks for other non-related
OpenMP thread managers - Update examples with supported options
- Fix some parameters in the Fortran'08 interface
- Be more resilient if PAPI fails to initialize
- Enhance compatibility in other systems
- Quote string names in csv files
- Several other minor fixes
Version 3.4
Added
- PAPI support for TALP metrics
libdlb_mpic.soandlibdlb_mpic_*.soare C MPI only libraries
that may be built using--enable-c-mpi-libraryat configure time- Functions to reset, stop, start and report monitoring regions now
accept the special argument DLB_MPI_REGION for the implicit region - Function
DLB_TALP_QueryPOPNodeMetricsfor third-party applications
to query pop metrics. Requires--talp-external-profiler. - Named barriers and several API functions to manage them
- Added
--lewi-barrierand--lewi-barrier-selectto fine-tune
which barriers activate LeWI. - Added
--lewi-colorto select specific key only for LeWI
Changed
libdlb_mpif.soandlibdlb_mpif_*.soare no longer built by default,
only if--enable-fortran-mpi-libraryis set at configure time- Flag
--quietnow only suppresses INFO and VERBOSE, added new flag
--silentto keep the old functionality to suppress all messages - Refactor
DLB_TALP_CollectNodeMetricsto
DLB_TALP_CollectPOPNodeMetricsand add communication efficiency - TALP now appends to CSV files if they already exist
Fixed
- Fixed wrong generated code for
MPI_InitializedandMPI_Finalized
Deprecated
--lewi-omptno longer accepts "mpi" nor "aggressive" as values.
Automatic LeWI via synchronization calls is now done with
--lewi-mpi-callsfor MPI and--lewi-barrieror
--lewi-barrier-selectfor DLB Barriers.
Version 3.3.1
Fixed
- Fixed wrong generated code for MPI_Initialized and MPI_Finalized
Version 3.3
Added
- Free agent and Role-shift OMPT thread managers to support LeWI with both
implementations - Flag
--ompt-thread-managerto select which OpenMP implementation to use - MPI Fortran 2008 bindings
- TALP flag to generate file in different output formats
--talp-output-file - New TALP collective functions to gather and compute metrics:
DLB_TALP_CollectPOPMetricsandDLB_TALP_CollectNodeMetrics
Changed
libdlb_mpi.soandlibdlb_mpi_*.sohave now both C and Fortran MPI symbols
Fixed
- Fixed DROM pre-initialization if child had empty cpuset affinity
- Fixed
--lewi-max-parallelism - Fixed several TALP bugs
- Fixed some finalization errors during MPI finalize
- Fixed cpuset parsing when provided a non-contiguous mask
Version 3.2
Added
- Flag
--verboseto enable all verbose modes - Flag
--talp-summary=pop-rawto print raw POP metrics - Flag
--lewi-respect-cpusetto allow LeWI to use CPUs not yet registered
Changed
- DROM can now steal all CPUs from one process
- DROM can now inherit a subset of CPUs from other process
DLB_DROM_SetProcessMaskto oneself does not longer require aDLB_pollDROMDLB_Lendin OpenMP applications now invokes the OpenMP runtime to change
the number of threads
Fixed
- Fixed TALP regions enabled or registered only on some processes
- Fixed minor option parsing