Skip to content

Releases: mercury-hpc/mercury

mercury 2.4.1rc4

01 Oct 22:29
v2.4.1rc4
Compare
Choose a tag to compare
mercury 2.4.1rc4 Pre-release
Pre-release

Summary

This new version brings both bug fixes and feature updates to mercury.

New features

  • [NA]
    • Remove NA_DEFAULT_PLUGIN_PATH and use NA_PLUGIN_RELATIVE_PATH instead
      • Use relative path for NA plugin search
      • Calculate relative path at build time and use it at runtime to find the plugin directory
  • [NA OFI]
    • Fix compatibility with libfabric 2.0
    • Pass down NA flags for firewall support in prov/tcp
      • Indicate if client bulk address is behind firewall by using address deserialization callback functions
  • [HG/NA perf]
    • Add -N option to keep perf server up after client exits
    • Remove barrier by default in perf loop and add --barrier as optional option to use barrier again
      • Add min/max measurements when barrier is not used
    • Print only first and last targets when reading config
    • Re-organize and clean up printed fields
    • Add -K option to increment key based on rank (used for testing)
  • [HG Util]
    • Add fatal and info log levels
    • This replaces the previous fatal log subsys, default log level is now fatal

Bug fixes

  • [HG]
    • Ensure that one-way RPCs can overflow
      • Use existing ack notifications to ensure send buffer remains available
    • Fix handling of multi-recv operations returning NULL buffers and repost multi-recv buffer if released
    • Fix possible erroneous refcount when bulk create/transfer fails
    • Enable diagnostic counters outside of debug builds
    • Enable HG proc overflow when using XDR
      • Fix hg_proc_save_ptr() error handling and allocation with XDR
      • Multiple proc fixes for XDR encoding
  • [HG Core]
    • Check for mismatching builds when using checksums
  • [HG Core/Bulk]
    • Print destination address string in error messages
  • [NA]
    • Fix plugin scan to continue if one plugin cannot load
    • Add na_context parameter to context_create plugin callback
  • [NA OFI]
    • Check against FI_REMOTE_CQ_DATA before accessing cq_event->data
    • Fix case of FI_MULTI_RECV event returned without buffer
    • Fix completion of multi-recv cancelation with prov/cxi
      • Only complete in error path when FI_MULTI_RECV is set
      • Multi-recv operations may still be used even after an error has occurred
    • Improve logging of canceled events
    • Add missing op type from op completed error log
    • Fix compile error on older prov/cxi platforms
    • Attempt to use ip_subnet with FI_SOCKADDR_IN format
    • Refactor msg_send/msg_recv calls and add debug info
    • Fix compilation under FreeBSD
    • Disable RNR protocol by default when using prov/cxi
    • Prevent the use of FI_AV_AUTH_KEY with prov/cxi when number of auth keys is 1
  • [NA UCX]
    • Use ucp_worker_query() instead of deprecated ucp_worker_get_address()
    • Switch to using ucp_ep_close_nbx()
    • Rework address EP close to be async and check on address close list during progress
    • Ensure address is resolved on RMA
    • Queue up pending connection if address exists and reject connection after timeout if no progress is made
  • [NA BMI]
    • Do not BMI_initialize() servers with address 0.0.0.0 and detect address to use
  • [HG/NA Perf]
    • Fix potential race when re-using exp op ID
    • Add spin_flag to prevent from excessively sleeping
      • Reduce overhead of hg_poll_wait()
  • [HG util]
    • Fix global buffer overflow in hg_log_outlet_active and hg_log_get_subsys(void)
    • Fix error return of hg_mem_pool_extend()
    • Fix kqueue implementation
    • Ensure parent log is registered first
      • Fix rare case where log was not being printed even if environment variables were set
  • [CMake]
    • Fix tirpc to be an external dependency

⚠️ Known Issues

  • [NA OFI]
    • [tcp/verbs;ofi_rxm] Using more than 256 peers requires FI_UNIVERSE_SIZE to be set.

mercury 2.4.1rc3

26 Sep 20:25
v2.4.1rc3
Compare
Choose a tag to compare
mercury 2.4.1rc3 Pre-release
Pre-release

Summary

This new version brings both bug fixes and feature updates to mercury.

New features

  • [NA]
    • Remove NA_DEFAULT_PLUGIN_PATH and use NA_PLUGIN_RELATIVE_PATH instead
      • Use relative path for NA plugin search
      • Calculate relative path at build time and use it at runtime to find the plugin directory
  • [NA OFI]
    • Fix compatibility with libfabric 2.0
    • Pass down NA flags for firewall support in prov/tcp
      • Indicate if client bulk address is behind firewall by using address deserialization callback functions
  • [HG/NA perf]
    • Add -N option to keep perf server up after client exits
    • Remove barrier by default in perf loop and add --barrier as optional option to use barrier again
      • Add min/max measurements when barrier is not used
    • Print only first and last targets when reading config
    • Re-organize and clean up printed fields
    • Add -K option to increment key based on rank (used for testing)
  • [HG Util]
    • Add fatal and info log levels
    • This replaces the previous fatal log subsys, default log level is now fatal

Bug fixes

  • [HG]
    • Ensure that one-way RPCs can overflow
      • Use existing ack notifications to ensure send buffer remains available
    • Fix handling of multi-recv operations returning NULL buffers and repost multi-recv buffer if released
    • Fix possible erroneous refcount when bulk create/transfer fails
    • Enable diagnostic counters outside of debug builds
    • Enable HG proc overflow when using XDR
      • Fix hg_proc_save_ptr() error handling and allocation with XDR
      • Multiple proc fixes for XDR encoding
  • [HG Core]
    • Check for mismatching builds when using checksums
  • [HG Core/Bulk]
    • Print destination address string in error messages
  • [NA]
    • Fix plugin scan to continue if one plugin cannot load
    • Add na_context parameter to context_create plugin callback
  • [NA OFI]
    • Check against FI_REMOTE_CQ_DATA before accessing cq_event->data
    • Fix case of FI_MULTI_RECV event returned without buffer
    • Fix completion of multi-recv cancelation with prov/cxi
      • Only complete in error path when FI_MULTI_RECV is set
      • Multi-recv operations may still be used even after an error has occurred
    • Improve logging of canceled events
    • Add missing op type from op completed error log
    • Fix compile error on older prov/cxi platforms
    • Attempt to use ip_subnet with FI_SOCKADDR_IN format
    • Refactor msg_send/msg_recv calls and add debug info
    • Fix compilation under FreeBSD
  • [NA UCX]
    • Use ucp_worker_query() instead of deprecated ucp_worker_get_address()
    • Switch to using ucp_ep_close_nbx()
    • Rework address EP close to be async and check on address close list during progress
    • Ensure address is resolved on RMA
    • Queue up pending connection if address exists and reject connection after timeout if no progress is made
  • [NA BMI]
    • Do not BMI_initialize() servers with address 0.0.0.0 and detect address to use
  • [HG/NA Perf]
    • Fix potential race when re-using exp op ID
    • Add spin_flag to prevent from excessively sleeping
      • Reduce overhead of hg_poll_wait()
  • [HG util]
    • Fix global buffer overflow in hg_log_outlet_active and hg_log_get_subsys(void)
    • Fix error return of hg_mem_pool_extend()
    • Fix kqueue implementation
    • Ensure parent log is registered first
      • Fix rare case where log was not being printed even if environment variables were set
  • [CMake]
    • Fix tirpc to be an external dependency

⚠️ Known Issues

  • [NA OFI]
    • [tcp/verbs;ofi_rxm] Using more than 256 peers requires FI_UNIVERSE_SIZE to be set.

mercury 2.4.1rc2

25 Sep 19:10
v2.4.1rc2
Compare
Choose a tag to compare
mercury 2.4.1rc2 Pre-release
Pre-release

Summary

This new version brings both bug fixes and feature updates to mercury.

New features

  • [NA]
    • Remove NA_DEFAULT_PLUGIN_PATH and use NA_PLUGIN_RELATIVE_PATH instead
      • Use relative path for NA plugin search
      • Calculate relative path at build time and use it at runtime to find the plugin directory
  • [NA OFI]
    • Fix compatibility with libfabric 2.0
    • Pass down NA flags for firewall support in prov/tcp
      • Indicate if client bulk address is behind firewall by using address deserialization callback functions
  • [HG/NA perf]
    • Add -N option to keep perf server up after client exits
    • Remove barrier by default in perf loop and add --barrier as optional option to use barrier again
      • Add min/max measurements when barrier is not used
    • Print only first and last targets when reading config
    • Re-organize and clean up printed fields
    • Add -K option to increment key based on rank (used for testing)
  • [HG Util]
    • Add fatal and info log levels
    • This replaces the previous fatal log subsys, default log level is now fatal

Bug fixes

  • [HG]
    • Ensure that one-way RPCs can overflow
      • Use existing ack notifications to ensure send buffer remains available
    • Fix handling of multi-recv operations returning NULL buffers and repost multi-recv buffer if released
    • Fix possible erroneous refcount when bulk create/transfer fails
    • Enable diagnostic counters outside of debug builds
    • Enable HG proc overflow when using XDR
      • Fix hg_proc_save_ptr() error handling and allocation with XDR
      • Multiple proc fixes for XDR encoding
  • [HG Core]
    • Check for mismatching builds when using checksums
  • [HG Core/Bulk]
    • Print destination address string in error messages
  • [NA]
    • Fix plugin scan to continue if one plugin cannot load
    • Add na_context parameter to context_create plugin callback
  • [NA OFI]
    • Check against FI_REMOTE_CQ_DATA before accessing cq_event->data
    • Fix case of FI_MULTI_RECV event returned without buffer
    • Fix completion of multi-recv cancelation with prov/cxi
      • Only complete in error path when FI_MULTI_RECV is set
      • Multi-recv operations may still be used even after an error has occurred
    • Improve logging of canceled events
    • Add missing op type from op completed error log
    • Fix compile error on older prov/cxi platforms
    • Attempt to use ip_subnet with FI_SOCKADDR_IN format
    • Refactor msg_send/msg_recv calls and add debug info
    • Fix compilation under FreeBSD
  • [NA UCX]
    • Use ucp_worker_query() instead of deprecated ucp_worker_get_address()
    • Switch to using ucp_ep_close_nbx()
    • Rework address EP close to be async and check on address close list during progress
    • Ensure address is resolved on RMA
    • Queue up pending connection if address exists and reject connection after timeout if no progress is made
  • [NA BMI]
    • Do not BMI_initialize() servers with address 0.0.0.0 and detect address to use
  • [HG/NA Perf]
    • Fix potential race when re-using exp op ID
    • Add spin_flag to prevent from excessively sleeping
      • Reduce overhead of hg_poll_wait()
  • [HG util]
    • Fix global buffer overflow in hg_log_outlet_active and hg_log_get_subsys(void)
    • Fix error return of hg_mem_pool_extend()
    • Fix kqueue implementation
  • [CMake]
    • Fix tirpc to be an external dependency

⚠️ Known Issues

  • [NA OFI]
    • [tcp/verbs;ofi_rxm] Using more than 256 peers requires FI_UNIVERSE_SIZE to be set.

mercury 2.4.1rc1

10 Jul 22:16
v2.4.1rc1
Compare
Choose a tag to compare
mercury 2.4.1rc1 Pre-release
Pre-release

Summary

This new version brings both bug fixes and feature updates to mercury.

New features

  • [NA]
    • Remove NA_DEFAULT_PLUGIN_PATH and use NA_PLUGIN_RELATIVE_PATH instead
      • Use relative path for NA plugin search
      • Calculate relative path at build time and use it at runtime to find the plugin directory
  • [NA OFI]
    • Fix compatibility with libfabric 2.0
    • Pass down NA flags for firewall support in prov/tcp
      • Indicate if client bulk address is behind firewall by using address deserialization callback functions
  • [HG/NA perf]
    • Add -N option to keep perf server up after client exits
    • Remove barrier by default in perf loop and add --barrier as optional option to use barrier again
      • Add min/max measurements when barrier is not used
    • Print only first and last targets when reading config
    • Re-organize and clean up printed fields
  • [HG Util]
    • Add fatal and info log levels
    • This replaces the previous fatal log subsys, default log level is now fatal

Bug fixes

  • [HG]
    • Ensure that one-way RPCs can overflow
      • Use existing ack notifications to ensure send buffer remains available
    • Fix handling of multi-recv operations returning NULL buffers and repost multi-recv buffer if released
    • Fix possible erroneous refcount when bulk create/transfer fails
    • Enable diagnostic counters outside of debug builds
    • Enable HG proc overflow when using XDR
      • Fix hg_proc_save_ptr() error handling and allocation with XDR
      • Multiple proc fixes for XDR encoding
  • [NA]
    • Fix plugin scan to continue if one plugin cannot load
  • [NA OFI]
    • Check against FI_REMOTE_CQ_DATA before accessing cq_event->data
    • Fix case of FI_MULTI_RECV event returned without buffer
    • Fix completion of multi-recv cancelation with prov/cxi
      • Only complete in error path when FI_MULTI_RECV is set
      • Multi-recv operations may still be used even after an error has occurred
    • Improve logging of canceled events
    • Add missing op type from op completed error log
    • Fix compile error on older prov/cxi platforms
    • Attempt to use ip_subnet with FI_SOCKADDR_IN format
  • [NA BMI]
    • Do not BMI_initialize() servers with address 0.0.0.0 and detect address to use
  • [HG/NA Perf]
    • Fix potential race when re-using exp op ID
    • Add spin_flag to prevent from excessively sleeping
      • Reduce overhead of hg_poll_wait()
  • [HG util]
    • Fix global buffer overflow in hg_log_outlet_active
    • Fix error return of hg_mem_pool_extend()
  • [CMake]
    • Fix tirpc to be an external dependency

⚠️ Known Issues

  • [NA OFI]
    • [tcp/verbs;ofi_rxm] Using more than 256 peers requires FI_UNIVERSE_SIZE to be set.

mercury 2.4.0

28 Oct 16:52
v2.4.0
Compare
Choose a tag to compare

Summary

This new version brings both bug fixes and feature updates to mercury. Notable are the addition of a new progress mechanism, new initialization parameters for the handling of multi-recv buffers and the support of cxi with HPE SHS 11.0.

New features

  • [HG]
    • Add HG_Get_input_payload_size()/HG_Get_output_payload_size()
      • Add the ability to query input / output payload sizes
    • Add HG_Diag_dump_counters() to dump diagnostic counters
      • Add rpc_req_recv_active_count and rpc_multi_recv_copy_count counters
    • Add HG_Class_get_counters() to retrieve internal counters
    • Add multi_recv_copy_threshold init parameter
      • Use this new parameter to fallback to memcpy to prevent starvation of multi-recv buffers
    • Add multi_recv_op_max init parameter
      • This allows users to control number of multi-recv buffers posted (libfabric plugin only)
    • Add no_overflow init option to prevent use of overflow buffers
    • Improve multi-recv buffer warning messages
    • Associate handle to HG proc
      • hg_proc_get_handle() can be used to retrieve handle within proc functions
    • Add HG_Event_get_wait_fd() to retrieve internal wait object
    • Add HG_Event_ready() / HG_Event_progress() / HG_Event_trigger() to support wait fd progress model
      • Simplify progress mechanism and remove use of internal timers
      • Always make NA progress when HG_Event_progress() is called
      • Update HG progress to use new NA progress routines
    • Add missing HG_WARN_UNUSED_RESULT to HG calls
    • Switch to using standard types and align with NA
      • Keep some uint8_t instances instead of hg_bool_t for ABI compatibility
    • Add HG_IO_ERROR return code
  • [NA]
    • Bump NA version to v5.0.0
    • Add NA_Poll() and NA_Poll_wait() routines
    • Deprecate NA_Progress() in favor of poll routines
    • Add NA_Context_get_completion_count() to retrieve size of completion queue
    • Update plugins to use new poll and poll_wait callbacks
      • poll_wait plugin callback remains for compatibility
    • Fix documentation of NA_Poll_get_fd()
    • Add missing NA_WARN_UNUSED_RESULT qualifiers
    • Remove deprecated CCI plugin
    • Return last known error when plugin loading fails
    • Add init info version compatibility wrappers
    • Add support for traffic_class init info (only supported by ofi plugin)
    • Add NA_IO_ERROR return code for generic I/O errors
      • Update OFI and UCX plugins to use new code
  • [NA OFI]
    • Support use of cxi provider with SHS 11.0
    • Add support for FI_AV_AUTH_KEY (requires libfabric >= 1.20)
      • Add runtime check for cxi provider version
      • Setting multiple auth keys disables FI_DIRECTED_RECV
      • Separate opening of AV and auth key insertion
      • Parse auth key range when FI_AV_AUTH_KEY is available
      • Encode/decode auth key when serializing addrs
    • Add support for FI_AV_USER_ID
    • Always use FI_SOURCE and FI_SOURCE_ERR when both are supported
      • Clean up handling of FI_SOURCE_ERR
      • Remove support of FI_SOURCE w/o FI_SOURCE_ERR
    • Add support for new CXI address format
    • Attempt to distribute multi-NIC domains based on selected CPU ID
    • Support selection of traffic classes (single class per NA class)
    • Add support for FI_PROTO_CXI_RNR
    • Add NA_OFI_SKIP_DOMAIN_OPS env variable to skip cxi domain ops
    • Remove unused NA_OFI_DOM_SHARED flag
  • [NA UCX]
    • Add ucx log outlet and redirect UCX log
      • Use default HG log level if UCX_LOG_LEVEL is not set
  • [HG/NA perf]
    • Add hg_first perf test to measure cost of initial RPC
    • Add -u option to control number of multi-recv ops (server only)
    • Add -i option to control number of handles posted (server only)
    • Add -f/--hostfile option to select hostfile to write to / read from
    • Add -T/--tclass option to select trafic class
    • Autodetect MPI implementation in perf utilities
      • MPI can now be autodetected and dynamically loaded in utilities, even if MERCURY_TESTING_ENABLE_PARALLEL was turned off. If MERCURY_TESTING_ENABLE_PARALLEL is turned on, tests remain manually linked against MPI as they used to be.
    • Print registration and deregistration times when -R option is used
    • Update to use new HG/NA progress routines and remove use of hg_request
    • Support forced registration in hg_bw_read/hg_bw_write
  • [HG Util]
    • Add hg_log_vwrite() to write log from va_list
    • Add hg_log_level_to_string()
    • Clean up mercury_event code and add const qualifier to hg_poll_get_fd()
    • Add const on atomic gets
    • Switch to using sys/queue.h directly
    • Remove HG_QUEUE and HG_LIST definitions
    • Add hg_dl_error() to return last error

Bug fixes

  • [HG]
    • Fix shared-memory path that was previously disabled in conjunction with libfabric transports that use the multi-recv capability
    • Fix behavior of request_post_incr init parameter
      • request_post_incr cannot be disabled (set to -1) with multi-recv
  • [HG/NA]
    • HG NA init info is fixed to v4.0 for now and duplicates tclass info
  • [NA]
    • Fix missing free of dynamic plugin entries
  • [NA BMI/MPI]
    • Return actual msg size through cb info
  • [NA OFI]
    • Fix cxi domain ops settings and disable PROV_KEY_CACHE
    • Fix shm provider flags
    • Remove excessive MR count warning message
  • [NA UCX]
    • Fix hg_info not filtering protocol
      • Allow na_ucx_get_protocol_info() to resolve ucx tl name aliases
    • Fix context thread mode to default to UCS_THREAD_MODE_MULTI
  • [HG/NA Perf]
    • Ensure NA perf tests wait on send completion
    • Fix bulk permission flag in hg_bw_read
    • Add some missing error checks in mercury_perf
  • [HG util]
    • Multiple logging fixes:
      • Fix dlog_free not called when parent/child have separate dlogs
      • Fix mercury log to correctly generate outlet names
      • Fix log outlets to use prefixed subsys name
      • Fix use of macros in debug log
      • Use destructor to free log outlets
    • Add missing prototype to hg_atomic_fence() definition
  • [CMake]
    • Fix cmake_minimum_required() warning
    • Update kwsys and mchecksum dependencies

⚠️ Known Issues

  • [NA OFI]
    • [tcp/verbs;ofi_rxm] Using more than 256 peers requires FI_UNIVERSE_SIZE to be set.

mercury 2.4.0rc5

26 Aug 22:01
v2.4.0rc5
Compare
Choose a tag to compare
mercury 2.4.0rc5 Pre-release
Pre-release

Summary

This is a preview release of the v2.4.0 release.

New features

Added in rc5

  • [HG]
    • Add HG_Get_input_payload_size()/HG_Get_output_payload_size()
      • Add the ability to query input / output payload sizes
    • Add HG_Diag_dump_counters() to dump diagnostic counters
      • Add rpc_req_recv_active_count and rpc_multi_recv_copy_count counters
    • Add HG_Class_get_counters() to retrieve internal counters

Added in rc4

  • [HG]
    • Add multi_recv_copy_threshold init parameter
      • Use this new parameter to fallback to memcpy to prevent starvation of multi-recv buffers
    • Associate handle to HG proc
      • hg_proc_get_handle() can be used to retrieve handle within proc functions

Added in rc3

  • [HG]
    • Add multi_recv_op_max init parameter
      • This allows users to control number of multi-recv buffers posted (libfabric plugin only)
    • Add no_overflow init option to prevent use of overflow buffers
    • Improve multi-recv buffer warning messages
    • Add HG_Event_get_wait_fd() to retrieve internal wait object
    • Add HG_Event_ready() / HG_Event_progress() / HG_Event_trigger() to support wait fd progress model
      • Simplify progress mechanism and remove use of internal timers
      • Always make NA progress when HG_Event_progress() is called
      • Update HG progress to use new NA progress routines
    • Add missing HG_WARN_UNUSED_RESULT to HG calls
    • Switch to using standard types and align with NA
      • Keep some uint8_t instances instead of hg_bool_t for ABI compatibility
  • [NA]
    • Add NA_Poll() and NA_Poll_wait() routines
    • Deprecate NA_Progress() in favor of poll routines
    • Add NA_Context_get_completion_count() to retrieve size of completion queue
    • Update plugins to use new poll and poll_wait callbacks
      • poll_wait plugin callback remains for compatibility
    • Fix documentation of NA_Poll_get_fd()
    • Add missing NA_WARN_UNUSED_RESULT qualifiers
    • Bump NA version to 5.0.0
    • Remove deprecated CCI plugin
    • Return last known error when plugin loading fails
  • [NA OFI]
    • Remove unused NA_OFI_DOM_SHARED flag
    • Always use FI_SOURCE and FI_SOURCE_ERR when both are supported
  • [NA UCX]
    • Add ucx log outlet and redirect UCX log
      • Use default HG log level if UCX_LOG_LEVEL is not set
  • [HG Util]
    • Add hg_log_vwrite() to write log from va_list
    • Add hg_log_level_to_string()
    • Clean up mercury_event code and add const qualifier to hg_poll_get_fd()
    • Add const on atomic gets
    • Switch to using sys/queue.h directly
    • Remove HG_QUEUE and HG_LIST definitions
    • Add hg_dl_error() to return last error
  • [HG/NA Perf Test]
    • Add -u option to control number of multi-recv ops (server only)
    • Add -i option to control number of handles posted (server only)
    • Update to use new HG/NA progress routines and remove use of hg_request

Added in rc2

  • [NA OFI]
    • Add support for FI_AV_AUTH_KEY (requires libfabric >= 1.20)
      • Add runtime check for cxi provider version
      • Setting multiple auth keys disables FI_DIRECTED_RECV
      • Separate opening of AV and auth key insertion
      • Parse auth key range when FI_AV_AUTH_KEY is available
      • Encode/decode auth key when serializing addrs
    • Add support for FI_AV_USER_ID
    • Clean up handling of FI_SOURCE_ERR
    • Remove support of FI_SOURCE w/o FI_SOURCE_ERR
    • Add support for new CXI address format

Added in rc1

  • [NA]
    • Add init info version compatibility wrappers
    • Bump NA version to v4.1.0
    • Add support for traffic_class init info (only supported by ofi plugin)
  • [NA OFI]
    • Attempt to distribute multi-NIC domains based on selected CPU ID
    • Support selection of traffic classes (single class per NA class)
  • [HG/NA Perf Test]
    • Add -f/--hostfile option to select hostfile to write to / read from
    • Add -T/--tclass option to select trafic class
    • Autodetect MPI implementation in perf utilities
      • MPI can now be autodetected and dynamically loaded in utilities, even if MERCURY_TESTING_ENABLE_PARALLEL was turned off. If MERCURY_TESTING_ENABLE_PARALLEL is turned on, tests remain manually linked against MPI as they used to be.

Bug fixes

Added in rc5

  • [HG]
    • Make HG_Core_event_ready() non-inline to fix NA dependency and remove HG_Core_event_ready_loopback() from public API
    • Fix NA init info not correctly set from HG
  • [NA BMI/MPI]
    • Return actual msg size through cb info

Added in rc4

  • [HG]
    • Fix couple of type changes introduced in rc1 that could have broken ABI
    • Fix shared-memory path that was previously disabled in conjunction with libfabric transports that use the multi-recv capability
  • [HG util]
    • Fix dlog_free not called when parent/child have separate dlogs
  • [HG/NA]
    • Fix init info changes made in previous rcs to prevent ABI breakage
    • HG NA init info is fixed to v4.0 for now and duplicates tclass info

Added in rc3

  • [HG]
    • Fix behavior of request_post_incr init parameter
      • request_post_incr cannot be disabled (set to -1) with multi-recv
  • [HG Util]
    • Fix mercury log to correctly generate outlet names
    • Fix log outlets to use prefixed subsys name
    • Fix use of macros in debug log
  • [CMake]
    • Fix cmake_minimum_required() warning
    • Update kwsys and mchecksum dependencies

Added in rc2

  • [HG Util]
    • Use destructor to free log outlets
  • [NA]
    • Fix missing free of dynamic plugin entries
  • [NA UCX]
    • Fix hg_info not filtering protocol
      • Allow na_ucx_get_protocol_info() to resolve ucx tl name aliases
  • [NA OFI]
    • Fix shm provider flags
  • [NA Test]
    • Remove could not find MPI message

Added in rc1

  • [HG Util]
    • Add missing prototype to hg_atomic_fence() definition
  • [NA OFI]
    • Remove excessive MR count warning message
  • [NA Perf]
    • Ensure perf tests wait on send completion

⚠️ Known Issues

  • [NA OFI]
    • [tcp/verbs;ofi_rxm] Using more than 256 peers requires FI_UNIVERSE_SIZE to be set.

mercury 2.4.0rc4

02 Aug 22:27
v2.4.0rc4
Compare
Choose a tag to compare
mercury 2.4.0rc4 Pre-release
Pre-release

Summary

This is a preview release of the v2.4.0 release.

New features

Added in rc4

  • [HG]
    • Add multi_recv_copy_threshold init parameter
      • Use this new parameter to fallback to memcpy to prevent starvation of multi-recv buffers
    • Associate handle to HG proc
      • hg_proc_get_handle() can be used to retrieve handle within proc functions

Added in rc3

  • [HG]
    • Add multi_recv_op_max init parameter
      • This allows users to control number of multi-recv buffers posted (libfabric plugin only)
    • Add no_overflow init option to prevent use of overflow buffers
    • Improve multi-recv buffer warning messages
    • Add HG_Event_get_wait_fd() to retrieve internal wait object
    • Add HG_Event_ready() / HG_Event_progress() / HG_Event_trigger() to support wait fd progress model
      • Simplify progress mechanism and remove use of internal timers
      • Always make NA progress when HG_Event_progress() is called
      • Update HG progress to use new NA progress routines
    • Add missing HG_WARN_UNUSED_RESULT to HG calls
    • Switch to using standard types and align with NA
      • Keep some uint8_t instances instead of hg_bool_t for ABI compatibility
  • [NA]
    • Add NA_Poll() and NA_Poll_wait() routines
    • Deprecate NA_Progress() in favor of poll routines
    • Add NA_Context_get_completion_count() to retrieve size of completion queue
    • Update plugins to use new poll and poll_wait callbacks
      • poll_wait plugin callback remains for compatibility
    • Fix documentation of NA_Poll_get_fd()
    • Add missing NA_WARN_UNUSED_RESULT qualifiers
    • Bump NA version to 5.0.0
    • Remove deprecated CCI plugin
    • Return last known error when plugin loading fails
  • [NA OFI]
    • Remove unused NA_OFI_DOM_SHARED flag
    • Always use FI_SOURCE and FI_SOURCE_ERR when both are supported
  • [NA UCX]
    • Add ucx log outlet and redirect UCX log
      • Use default HG log level if UCX_LOG_LEVEL is not set
  • [HG Util]
    • Add hg_log_vwrite() to write log from va_list
    • Add hg_log_level_to_string()
    • Clean up mercury_event code and add const qualifier to hg_poll_get_fd()
    • Add const on atomic gets
    • Switch to using sys/queue.h directly
    • Remove HG_QUEUE and HG_LIST definitions
    • Add hg_dl_error() to return last error
  • [HG/NA Perf Test]
    • Add -u option to control number of multi-recv ops (server only)
    • Add -i option to control number of handles posted (server only)
    • Update to use new HG/NA progress routines and remove use of hg_request

Added in rc2

  • [NA OFI]
    • Add support for FI_AV_AUTH_KEY (requires libfabric >= 1.20)
      • Add runtime check for cxi provider version
      • Setting multiple auth keys disables FI_DIRECTED_RECV
      • Separate opening of AV and auth key insertion
      • Parse auth key range when FI_AV_AUTH_KEY is available
      • Encode/decode auth key when serializing addrs
    • Add support for FI_AV_USER_ID
    • Clean up handling of FI_SOURCE_ERR
    • Remove support of FI_SOURCE w/o FI_SOURCE_ERR
    • Add support for new CXI address format

Added in rc1

  • [NA]
    • Add init info version compatibility wrappers
    • Bump NA version to v4.1.0
    • Add support for traffic_class init info (only supported by ofi plugin)
  • [NA OFI]
    • Attempt to distribute multi-NIC domains based on selected CPU ID
    • Support selection of traffic classes (single class per NA class)
  • [HG/NA Perf Test]
    • Add -f/--hostfile option to select hostfile to write to / read from
    • Add -T/--tclass option to select trafic class
    • Autodetect MPI implementation in perf utilities
      • MPI can now be autodetected and dynamically loaded in utilities, even if MERCURY_TESTING_ENABLE_PARALLEL was turned off. If MERCURY_TESTING_ENABLE_PARALLEL is turned on, tests remain manually linked against MPI as they used to be.

Bug fixes

Added in rc4

  • [HG]
    • Fix couple of type changes introduced in rc1 that could have broken ABI
    • Fix shared-memory path that was previously disabled in conjunction with libfabric transports that use the multi-recv capability
  • [HG util]
    • Fix dlog_free not called when parent/child have separate dlogs
  • [HG/NA]
    • Fix init info changes made in previous rcs to prevent ABI breakage
    • HG NA init info is fixed to v4.0 for now and duplicates tclass info

Added in rc3

  • [HG]
    • Fix behavior of request_post_incr init parameter
      • request_post_incr cannot be disabled (set to -1) with multi-recv
  • [HG Util]
    • Fix mercury log to correctly generate outlet names
    • Fix log outlets to use prefixed subsys name
    • Fix use of macros in debug log
  • [CMake]
    • Fix cmake_minimum_required() warning
    • Update kwsys and mchecksum dependencies

Added in rc2

  • [HG Util]
    • Use destructor to free log outlets
  • [NA]
    • Fix missing free of dynamic plugin entries
  • [NA UCX]
    • Fix hg_info not filtering protocol
      • Allow na_ucx_get_protocol_info() to resolve ucx tl name aliases
  • [NA OFI]
    • Fix shm provider flags
  • [NA Test]
    • Remove could not find MPI message

Added in rc1

  • [HG Util]
    • Add missing prototype to hg_atomic_fence() definition
  • [NA OFI]
    • Remove excessive MR count warning message
  • [NA Perf]
    • Ensure perf tests wait on send completion

⚠️ Known Issues

  • [NA OFI]
    • [tcp/verbs;ofi_rxm] Using more than 256 peers requires FI_UNIVERSE_SIZE to be set.

mercury 2.4.0rc3

25 Jun 23:28
v2.4.0rc3
Compare
Choose a tag to compare
mercury 2.4.0rc3 Pre-release
Pre-release

Summary

This is a preview release of the v2.4.0 release.

New features

Added in rc3

  • [HG]
    • Add multi_recv_op_max init parameter
      • This allows users to control number of multi-recv buffers posted (libfabric plugin only)
    • Add no_overflow init option to prevent use of overflow buffers
    • Improve multi-recv buffer warning messages
    • Add HG_Event_get_wait_fd() to retrieve internal wait object
    • Add HG_Event_ready() / HG_Event_progress() / HG_Event_trigger() to support wait fd progress model
      • Simplify progress mechanism and remove use of internal timers
      • Always make NA progress when HG_Event_progress() is called
      • Update HG progress to use new NA progress routines
    • Add missing HG_WARN_UNUSED_RESULT to HG calls
    • Switch to using standard types and align with NA
      • Keep some uint8_t instances instead of hg_bool_t for ABI compatibility
  • [NA]
    • Add NA_Poll() and NA_Poll_wait() routines
    • Deprecate NA_Progress() in favor of poll routines
    • Add NA_Context_get_completion_count() to retrieve size of completion queue
    • Update plugins to use new poll and poll_wait callbacks
      • poll_wait plugin callback remains for compatibility
    • Fix documentation of NA_Poll_get_fd()
    • Add missing NA_WARN_UNUSED_RESULT qualifiers
    • Bump NA version to 5.0.0
    • Remove deprecated CCI plugin
    • Return last known error when plugin loading fails
  • [NA OFI]
    • Remove unused NA_OFI_DOM_SHARED flag
    • Always use FI_SOURCE and FI_SOURCE_ERR when both are supported
  • [NA UCX]
    • Add ucx log outlet and redirect UCX log
      • Use default HG log level if UCX_LOG_LEVEL is not set
  • [HG Util]
    • Add hg_log_vwrite() to write log from va_list
    • Add hg_log_level_to_string()
    • Clean up mercury_event code and add const qualifier to hg_poll_get_fd()
    • Add const on atomic gets
    • Switch to using sys/queue.h directly
    • Remove HG_QUEUE and HG_LIST definitions
    • Add hg_dl_error() to return last error
  • [HG/NA Perf Test]
    • Add -u option to control number of multi-recv ops (server only)
    • Add -i option to control number of handles posted (server only)
    • Update to use new HG/NA progress routines and remove use of hg_request

Added in rc2

  • [NA OFI]
    • Add support for FI_AV_AUTH_KEY (requires libfabric >= 1.20)
      • Add runtime check for cxi provider version
      • Setting multiple auth keys disables FI_DIRECTED_RECV
      • Separate opening of AV and auth key insertion
      • Parse auth key range when FI_AV_AUTH_KEY is available
      • Encode/decode auth key when serializing addrs
    • Add support for FI_AV_USER_ID
    • Clean up handling of FI_SOURCE_ERR
    • Remove support of FI_SOURCE w/o FI_SOURCE_ERR
    • Add support for new CXI address format

Added in rc1

  • [NA]
    • Add init info version compatibility wrappers
    • Bump NA version to v4.1.0
    • Add support for traffic_class init info (only supported by ofi plugin)
  • [NA OFI]
    • Attempt to distribute multi-NIC domains based on selected CPU ID
    • Support selection of traffic classes (single class per NA class)
  • [HG/NA Perf Test]
    • Add -f/--hostfile option to select hostfile to write to / read from
    • Add -T/--tclass option to select trafic class
    • Autodetect MPI implementation in perf utilities
      • MPI can now be autodetected and dynamically loaded in utilities, even if MERCURY_TESTING_ENABLE_PARALLEL was turned off. If MERCURY_TESTING_ENABLE_PARALLEL is turned on, tests remain manually linked against MPI as they used to be.

Bug fixes

Added in rc3

  • [HG]
    • Fix behavior of request_post_incr init parameter
      • request_post_incr cannot be disabled (set to -1) with multi-recv
  • [HG Util]
    • Fix mercury log to correctly generate outlet names
    • Fix log outlets to use prefixed subsys name
    • Fix use of macros in debug log
  • [CMake]
    • Fix cmake_minimum_required() warning
    • Update kwsys and mchecksum dependencies

Added in rc2

  • [HG Util]
    • Use destructor to free log outlets
  • [NA]
    • Fix missing free of dynamic plugin entries
  • [NA UCX]
    • Fix hg_info not filtering protocol
      • Allow na_ucx_get_protocol_info() to resolve ucx tl name aliases
  • [NA OFI]
    • Fix shm provider flags
  • [NA Test]
    • Remove could not find MPI message

Added in rc1

  • [HG Util]
    • Add missing prototype to hg_atomic_fence() definition
  • [NA OFI]
    • Remove excessive MR count warning message
  • [NA Perf]
    • Ensure perf tests wait on send completion

⚠️ Known Issues

  • [NA OFI]
    • [tcp/verbs;ofi_rxm] Using more than 256 peers requires FI_UNIVERSE_SIZE to be set.

mercury 2.4.0rc2

07 May 20:10
v2.4.0rc2
Compare
Choose a tag to compare
mercury 2.4.0rc2 Pre-release
Pre-release

Summary

This is a preview release of the v2.4.0 release.

New features

Added in rc2

  • [NA OFI]
    • Add support for FI_AV_AUTH_KEY (requires libfabric >= 1.20)
      • Add runtime check for cxi provider version
      • Setting multiple auth keys disables FI_DIRECTED_RECV
      • Separate opening of AV and auth key insertion
      • Parse auth key range when FI_AV_AUTH_KEY is available
      • Encode/decode auth key when serializing addrs
    • Add support for FI_AV_USER_ID
    • Clean up handling of FI_SOURCE_ERR
    • Remove support of FI_SOURCE w/o FI_SOURCE_ERR
    • Add support for new CXI address format

Added in rc1

  • [NA]
    • Add init info version compatibility wrappers
    • Bump NA version to v4.1.0
    • Add support for traffic_class init info (only supported by ofi plugin)
  • [HG/NA Perf Test]
    • Add -f/--hostfile option to select hostfile to write to / read from
    • Add -T/--tclass option to select trafic class
    • Autodetect MPI implementation in perf utilities
      • MPI can now be autodetected and dynamically loaded in utilities, even if MERCURY_TESTING_ENABLE_PARALLEL was turned off. If MERCURY_TESTING_ENABLE_PARALLEL is turned on, tests remain manually linked against MPI as they used to be.
  • [NA OFI]
    • Attempt to distribute multi-NIC domains based on selected CPU ID
    • Support selection of traffic classes (single class per NA class)

Bug fixes

Added in rc2

  • [HG Util]
    • Use destructor to free log outlets
  • [NA]
    • Fix missing free of dynamic plugin entries
  • [NA UCX]
    • Fix hg_info not filtering protocol
      • Allow na_ucx_get_protocol_info() to resolve ucx tl name aliases
  • [NA OFI]
    • Fix shm provider flags
  • [NA Test]
    • Remove could not find MPI message

Added in rc1

  • [HG Util]
    • Add missing prototype to hg_atomic_fence() definition
  • [NA OFI]
    • Remove excessive MR count warning message
  • [NA Perf]
    • Ensure perf tests wait on send completion

⚠️ Known Issues

  • [NA OFI]
    • [tcp/verbs;ofi_rxm] Using more than 256 peers requires FI_UNIVERSE_SIZE to be set.

mercury 2.4.0rc1

20 Dec 21:22
v2.4.0rc1
Compare
Choose a tag to compare
mercury 2.4.0rc1 Pre-release
Pre-release

Summary

This is a preview release of the v2.4.0 release.

New features

  • [NA]
    • Add init info version compatibility wrappers
    • Bump NA version to v4.1.0
    • Add support for traffic_class init info (only supported by ofi plugin)
  • [HG/NA Perf Test]
    • Add -f/--hostfile option to select hostfile to write to / read from
    • Add -T/--tclass option to select trafic class
    • Autodetect MPI implementation in perf utilities
      • MPI can now be autodetected and dynamically loaded in utilities, even if MERCURY_TESTING_ENABLE_PARALLEL was turned off. If MERCURY_TESTING_ENABLE_PARALLEL is turned on, tests remain manually linked against MPI as they used to be.
  • [NA OFI]
    • Attempt to distribute multi-NIC domains based on selected CPU ID
    • Support selection of traffic classes (single class per NA class)

Bug fixes

  • [HG Util]
    • Add missing prototype to hg_atomic_fence() definition
  • [NA OFI]
    • Remove excessive MR count warning message
  • [NA Perf]
    • Ensure perf tests wait on send completion

⚠️ Known Issues

  • [NA OFI]
    • [tcp/verbs;ofi_rxm] Using more than 256 peers requires FI_UNIVERSE_SIZE to be set.