CUDA: Add support for compilation to LTO-IR #9274

gmarkall · 2023-11-08T12:51:19Z

This adds support for compiling to LTO-IR, providing an alternative route to using PTX for linking code with non-Python source code with greater potential for optimization from being able to optimize at link-time across the whole body of source for different languages.

A summary of the changes:

The first three commits are small refactors / tidy-ups of some paths that could be simplified now we don't have to deal with NVVM 3.4 anymore (556bb82, b6c3b85, 6d8a22c)
The next commit clarifies the compilation behaviour with / without return types being specified, which was a bit of a hole in the documentation: 357d67d
Then we add support for LTO-IR to to CUDA codegen, without exposing it externally: 8d6fd90
Then we add a public interface for LTO-IR code generation: ed99b9e
Documentation is also added / updated: 0938da4

gmarkall · 2023-11-08T12:51:39Z

gpuci run tests

gmarkall · 2023-11-08T12:53:16Z

gpuci run tests

gmarkall · 2023-11-08T16:49:25Z

gpuci run tests

gmarkall · 2023-11-09T10:08:08Z

gpuci run tests

This is a bit clearer about what it does, and will be a more representative name when compilation to LTO-IR is also supported. We also rename the buffer held by `CompilationUnit.compile()`, because the buffer will no longer be limited to holding PTX only.

We never have multiple PTX outputs anymore (this was only necessary with NVVM 3.4), there's no need to make lists of them or join them

This addition explicitly states the behaviour when a return type is or is not supplied as part of the signature; previously the user would have had to guess this, or discover it through accident / experiment.

This follows a very similar process to PTX compilation - LTO generation is enabled with NVVM's `-gen-lto` flag.

We create a more generic function, `numba.cuda.compile()`, that provides similar functionality to `compile_ptx()`, but allowing the choice of PTX or LTO-IR output. This function defaults to the C ABI rather than the Numba one, as this is expected to be more convenient for most use cases. We also add a variant to target the current device. The original `compile_ptx()` and variant for the current device are left in to support existing use cases that use them and expect generated code to use the Numba ABI.

gmarkall · 2023-12-05T16:13:51Z

gpuci run tests

gmarkall · 2024-04-03T11:54:09Z

gpuci run tests

gmarkall · 2024-04-08T16:18:44Z

gpuci run tests

Previous commits added support for compiling Python functions to CUDA LTO-IR via the compilation interfaces. This commit adds stub code for supporting compilation of `@cuda.jit`-decorated functions to LTO-IR. The only functional change, unused in Numba at present, is that if the linker has LTO enabled, the CUDA codegen uses NVVM to generate LTO-IR instead of PTX, and passes that to the linker. The `lto` attribute is added linker classes in `numba.cuda.cudadrv.driver` - this is always `False` for the built-in linkers, but a linker from pynvjitlink (or any other external linker, in theory) could set it to `True` to signal that LTO is enabled. Some tests must be skipped if LTO is enabled, because it becomes difficult to use the functionality they test when LTO is enabled: - Some inspect the PTX, which is difficult to do when LTO-IR is generated instead. - Others check for exceptions, but the exception flags get optimized away by LTO because Numba fails to add them to the used list (See numba#9526).

gmarkall · 2024-04-08T17:11:50Z

gpuci run tests

This is needed to allow the "skip under LTO" test functionality to run successfully (and not skip on the simulator, since it does not simulate LTO).

gmarkall · 2024-04-08T18:53:35Z

gpuci run tests

- `compile_for_current_device()` needs an `output` kwarg so it can generate LTO-IR or PTX. - `compile_ptx()` now calls `compile()` with an explicit `output` kwarg so that it compiles to PTX even if the default for `compile()` changes in future. - `compile_ptx_for_current_device()` now calls `compile_ptx()` with the CC for the current device.

This is just implementing `get_asm_str()` now, which is part of the codegen object's interface. To align better with the rest of Numba, the `_get_ptx()` body is moved into `get_asm_str()` and `get_asm_str()` is used in its place.

gmarkall · 2024-04-17T11:46:06Z

gpuci run tests

stuartarchibald

Thanks for the patch @gmarkall, it's great to see this feature implemented. This has been through an OOB pair review between us already in which the feature, expectations and implementation were discussed. The review provided below is just catching a few small things in the resultant change set, the contents otherwise is good. Thanks again for working on this!

numba/cuda/codegen.py

numba/cuda/compiler.py

docs/source/cuda/cuda_compilation.rst

numba/cuda/codegen.py

numba/cuda/tests/cudapy/test_userexc.py

numba/cuda/tests/cudapy/test_compiler.py

- Wording edits to docs on CUDA compilation. - Check for `if cc is not None` rather than just `if cc`, etc., in the codegen, for greater robustness. - Add a test that checks the error reported when specifying an illegal output kind. - Cross-reference numba#9526 in the comment in `TestUserUxc`.

gmarkall · 2024-04-24T13:08:47Z

gpuci run tests

gmarkall · 2024-04-24T13:09:11Z

@stuartarchibald Many thanks for the review - I believe all comments are now addressed, and I'm just waiting on CI.

stuartarchibald

Thanks for the patch and fixes!

Version 0.60.0 (13 June 2024) ============================= This is a major Numba release. Numba now has binary support for NumPy 2.0. Users should note that this does NOT yet include NEP 50 related type-level changes which are still in progress. This release does not guarantee execution level compatibility with NumPy 2.0 and hence users should expect some type and numerical deviations with respect to normal Python behavior while using Numba with NumPy 2.0. Please find a summary of all noteworthy items below. Highlights ~~~~~~~~~~ NumPy 2.0 Binary Support ------------------------ Added Binary Support for NumPy 2.0. However, this does not yet include NEP 50 related type-level changes which are still in progress. Following is a summary of the user facing changes: * The ``ptp()`` method previously available for arrays has been deprecated. Instead, it is recommended to use the ``np.ptp(arr)`` function. * The data type ``np.bool8`` has been deprecated and replaced with ``np.bool``. * The ``np.product`` function is deprecated; users are advised to use ``np.prod`` instead. * Starting from NumPy version 2.0, the ``itemset()`` method has been removed from the ``ndarray`` class. To achieve the same functionality, utilize the assignment operation ``arr[index] = value``. * Deprecated constants ``np.PINF`` and ``np.NINF`` should be replaced with ``np.inf`` for positive infinity and ``-np.inf`` for negative infinity, respectively. (`PR-#9466 <https://github.com/numba/numba/pull/9466>`__) New Features ~~~~~~~~~~~~ Enhance guvectorize support in JIT code --------------------------------------- Generalized universal function support is extended, it is now possible to call a ``@guvectorize`` decorated function from within a JIT-compiled function. However, please note that broadcasting is not supported yet. Calling a guvectorize function in a scenario where broadcast is needed may result in incorrect behavior. (`PR-#8984 <https://github.com/numba/numba/pull/8984>`__) Add experimental support for ufunc.at ------------------------------------- Experimental support for ``ufunc.at`` is added. (`PR-#9239 <https://github.com/numba/numba/pull/9239>`__) Add ``float(<string literal>)`` ctor ------------------------------------ Support for ``float(<string literal>)`` is added. (`PR-#9378 <https://github.com/numba/numba/pull/9378>`__) Add support for ``math.log2``. ------------------------------ Support for ``math.log2`` is added. (`PR-#9416 <https://github.com/numba/numba/pull/9416>`__) Add math.nextafter support for nopython mode. --------------------------------------------- Support ``math.nextafter`` in nopython mode. (`PR-#9438 <https://github.com/numba/numba/pull/9438>`__) Add support for parfor binop reductions. ---------------------------------------- Previously, only operations with inplace operations like `+=` could be used as reductions in `prange`s. Now, with this PR, binop reductions of the form `a = a binop b` can be used. (`PR-#9521 <https://github.com/numba/numba/pull/9521>`__) Improvements ~~~~~~~~~~~~ Expand ``isinstance()`` support for NumPy datetime types -------------------------------------------------------- Adds support of ``numpy.datetime64`` and ``numpy.timedelta64`` types in ``isinstance()``. (`PR-#9455 <https://github.com/numba/numba/pull/9455>`__) Python 3.12 ``sys.monitoring`` support is added to Numba's dispatcher. ---------------------------------------------------------------------- Python 3.12 introduced a new module ``sys.monitoring`` that makes available an event driven monitoring API for use in tools that need to monitor execution e.g. debuggers or profilers. Numba's dispatcher class (the code that handles transfer of control between the Python interpreter and compiled code) has been updated to emit ``sys.monitoring.events.PY_START`` and ``sys.monitoring.events.PY_RETURN`` as appropriate. This allows tools that are watching for these events to identify when control has entered and returned from compiled code. As a result of this change, Numba compiled code is now identified by ``cProfile`` in the same way that it has been historically i.e. it will be present in performance profiles. (`PR-#9482 <https://github.com/numba/numba/pull/9482>`__) NumPy Support ~~~~~~~~~~~~~ Added support for ``np.size()`` ------------------------------- Added ``np.size()`` support for NumPy, which was previously unsupported. (`PR-#9504 <https://github.com/numba/numba/pull/9504>`__) CUDA API Changes ~~~~~~~~~~~~~~~~ Support for compilation to LTO-IR --------------------------------- Support for compiling device functions to LTO-IR in the compilation API is added. (`PR-#9274 <https://github.com/numba/numba/pull/9274>`__) Support math.log, math.log2 and math.log10 in CUDA -------------------------------------------------- CUDA target now supports ``np.log``, ``np.log2`` and ``np.log10``. (`PR-#9417 <https://github.com/numba/numba/pull/9417>`__) Bug Fixes ~~~~~~~~~ Fix parfor variable hoisting analysis. -------------------------------------- If a variable is used to build a container (e.g., tuple, list, map, set) or is passed as an argument to a call then conservatively assume it could escape the current iteration of the parfor and so should not be hoisted. (`PR-#9532 <https://github.com/numba/numba/pull/9532>`__) Deprecations ~~~~~~~~~~~~ Deprecate `old_style` error-capturing ------------------------------------- Per deprecation schedule, `old_style` error-capturing is deprecated and the `default` is now `new_style`. (`PR-#9549 <https://github.com/numba/numba/pull/9549>`__) Expired Deprecations ~~~~~~~~~~~~~~~~~~~~ Removal of ``numba.core.retarget`` ---------------------------------- The experimental features implemented in ``numba.core.retarget`` have been removed. These features were primarily used in numba-dpex, but that project has replaced its use of ``numba.core.retarget`` with a preference for *target extension API*. (`PR-#9539 <https://github.com/numba/numba/pull/9539>`__) Documentation Changes ~~~~~~~~~~~~~~~~~~~~~ ``numba.cuda.gpus.current`` documentation correction ---------------------------------------------------- ``numba.cuda.gpus.current`` was erroneously described as a function, is now described as an attribute. (`PR-#9394 <https://github.com/numba/numba/pull/9394>`__) CUDA 12 conda installation documentation ---------------------------------------- Installation instructions have been added for CUDA 12 conda users. (`PR-#9487 <https://github.com/numba/numba/pull/9487>`__) Version 0.59.1 (18 March 2024) ------------------------------ This is a bug-fix release to fix regressions in 0.59.0. CUDA API Changes ~~~~~~~~~~~~~~~~ Fixed caching of kernels that use target-specific overloads =========================================================== Caching of kernels using target-specific overloads now works. This includes use of cooperative group sync, which is now implemented with a target-specific overload. (`PR-#9447 <https://github.com/numba/numba/pull/9447>`__) Performance Improvements and Changes ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Improvement to ``np.searchsorted`` ================================== Fixed a performance regression introduced in Numba 0.59 which made ``np.searchsorted`` considerably slower. (`PR-#9448 <https://github.com/numba/numba/pull/9448>`__) Bug Fixes ~~~~~~~~~ Fix issues with ``np.searchsorted`` not handling ``np.datetime64`` ================================================================== This patch fixes two issues with ``np.searchsorted``. First, a regression is fixed in the support of ``np.datetime64``. Second, adopt ``NAT``-aware comparisons to fix mishandling of ``NAT`` value. (`PR-#9445 <https://github.com/numba/numba/pull/9445>`__) Allow use of Python 3.12 PEP-695 type parameter syntax ====================================================== A patch is added to properly parse the PEP 695 syntax. While Numba does not yet take advantage of type parameters, it will no longer erroneously reject functions defined with the new Python 3.12 syntax. (`PR-#9459 <https://github.com/numba/numba/pull/9459>`__) Version 0.59.0 (31 January 2024) -------------------------------- This is a major Numba release. Numba now supports Python 3.12, please find a summary of all noteworthy items below. Highlights ~~~~~~~~~~ Python 3.12 Support =================== The standout feature of this release is the official support for Python 3.12 in Numba. Please note that profiling support is temporarily disabled in this release (for Python 3.12) and several known issues have been identified during development. The Numba team is actively working on resolving them. Please refer to the respective issue pages (`Numba #9289 <https://github.com/numba/numba/pull/9289>`_ and `Numba #9291 <https://github.com/numba/numba/pull/9291>`_) for a list of ongoing issues and updates on progress. (`PR-#9246 <https://github.com/numba/numba/pull/9246>`__) Move minimum supported Python version to 3.9. ============================================= Support for Python 3.8 has been removed, Numba's minimum supported Python version is now Python 3.9. (`PR-#9310 <https://github.com/numba/numba/pull/9310>`__) New Features ~~~~~~~~~~~~ Add support for ufunc attributes and reduce =========================================== Support for ``ufunc.reduce`` and most ufunc attributes is added. (`PR-#9123 <https://github.com/numba/numba/pull/9123>`__) Add a config variable to enable / disable the llvmlite memory manager ===================================================================== A config variable to force enable or disable the llvmlite memory manager is added. (`PR-#9341 <https://github.com/numba/numba/pull/9341>`__) Improvements ~~~~~~~~~~~~ Add ``TargetLibraryInfo`` pass to CPU LLVM pipeline. ==================================================== The ``TargetLibraryInfo`` pass makes sure that the optimisations that take place during call simplification are appropriate for the target, without this the target is assumed to be Linux and code will be optimised to produce e.g. math symbols that do not exit on Windows. Historically this issue has been avoided through the use of Numba internal libraries carrying wrapped symbols, but doing so potentially detriments performance. As a result of this change Numba internal libraries are smaller and there is an increase in optimisation opportunity in code using ``exp2`` and ``log2`` functions. (`PR-#9336 <https://github.com/numba/numba/pull/9336>`__) Numba deprecation warning classes are now subclasses of builtin ones ==================================================================== To help users manage and suppress deprecation warnings from Numba, the ``NumbaDeprecationWarning`` and ``NumbaPendingDeprecationWarning`` classes are now subclasses of the builtin ``DeprecationWarning`` and ``PendingDeprecationWarning`` respectively. Therefore, warning filters on ``DeprecationWarning`` and ``PendingDeprecationWarning`` will apply to Numba deprecation warnings. (`PR-#9347 <https://github.com/numba/numba/pull/9347>`__) NumPy Support ~~~~~~~~~~~~~ Added support for np.indices() function. ======================================== Support is added for ``numpy.indices()``. (`PR-#9126 <https://github.com/numba/numba/pull/9126>`__) Added support for ``np.polynomial.polynomial.Polynomial`` class. ================================================================ Support is added for the `Polynomial` class from the package ``np.polynomial.polynomial``. (`PR-#9140 <https://github.com/numba/numba/pull/9140>`__) Added support for functions ``np.polynomial.polyutils.as_series()``, as well as functions ``polydiv()``, ``polyint()``, ``polyval()`` from ``np.polynomial.polynomial``. ======================================================================================================================================================================== Support is added for ``np.polynomial.polyutils.as_series()``, ``np.polynomial.polynomial.polydiv()``, ``np.polynomial.polynomial.polyint()`` (only the first 2 arguments), ``np.polynomial.polynomial.polyval()`` (only the first 2 arguments). (`PR-#9141 <https://github.com/numba/numba/pull/9141>`__) Added support for np.unwrap() function. ======================================= Support is added for ``numpy.unwrap()``. The ``axis`` argument is only supported when its value equals -1. (`PR-#9154 <https://github.com/numba/numba/pull/9154>`__) Adds support for checking if dtypes are equal. ============================================== Support is added for checking if two dtype objects are equal, for example ``assert X.dtype == np.dtype(np.float64)``. (`PR-#9249 <https://github.com/numba/numba/pull/9249>`__) CUDA API Changes ~~~~~~~~~~~~~~~~ Added support for compiling device functions with a C ABI ========================================================= Support for compiling device functions with a C ABI through the :func:`compile_ptx() <numba.cuda.compile_ptx>` API, for easier interoperability with CUDA C/C++ and other languages. (`PR-#9223 <https://github.com/numba/numba/pull/9223>`__) Make grid() and gridsize() use 64-bit integers ============================================== ``cuda.grid()`` and ``cuda.gridsize()`` now use 64-bit integers, so they no longer overflow when the grid contains more than ``2 ** 31`` threads. (`PR-#9235 <https://github.com/numba/numba/pull/9235>`__) Prevent kernels being dropped by implementing the used list =========================================================== Kernels are no longer dropped when being compiled and linked using nvJitLink, because they are added to the ``@"llvm.used"`` list. (`PR-#9267 <https://github.com/numba/numba/pull/9267>`__) Support for Windows CUDA 12.0 toolkit conda packages ==================================================== The library paths used in CUDA toolkit 12.0 conda packages on Windows are added to the search paths used when detecting CUDA libraries. (`PR-#9279 <https://github.com/numba/numba/pull/9279>`__) Performance Improvements and Changes ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Improvement to IR copying speed =============================== Improvements were made to the deepcopying of ``FunctionIR``. In one case, the ``InlineInlineables`` pass is 3x faster. (`PR-#9245 <https://github.com/numba/numba/pull/9245>`__) Bug Fixes ~~~~~~~~~ Dynamically Allocate Parfor Schedules ===================================== This PR fixes an issue where a parallel region is executed in a loop many times. The previous code used an alloca to allocate the parfor schedule on the stack but if there are many such parfors in a loop then the stack will overflow. The new code does a pair of allocation/deallocation calls into the Numba parallel runtime before and after the parallel region respectively. At the moment, these calls redirect to malloc/free although other mechanisms such as pooling are possible and may be implemented later. This PR also adds a warning in cases where a prange loop is not converted to a parfor. This can happen if there is exceptional control flow in the loop. These are related in that the original issue had a prange loop that wasn't converted to a parfor and therefore all the parfors inside the body of the prange were running in parallel and adding to the stack each time. (`PR-#9048 <https://github.com/numba/numba/pull/9048>`__) Support multiple outputs in a ``@guvectorize`` function ======================================================= This PR fixes `Numba #9058 <https://github.com/numba/numba/pull/9058>`_ where it is now possible to call a guvectorize with multiple outputs. (`PR-#9049 <https://github.com/numba/numba/pull/9049>`__) Handling of ``None`` args fixed in ``PythonAPI.call``. ====================================================== Fixing segfault when ``args=None`` was passed to ``PythonAPI.call``. (`PR-#9089 <https://github.com/numba/numba/pull/9089>`__) Fix propagation of literal values in PHI nodes. =============================================== Fixed a bug in the literal propagation pass where a PHI node could be wrongly replaced by a constant. (`PR-#9144 <https://github.com/numba/numba/pull/9144>`__) ``numpy.digitize`` implementation behaviour aligned with numpy ============================================================== The implementation of ``numpy.digitize`` is updated to behave per numpy in a wider set of cases, including where the supplied bins are not in fact monotonic. (`PR-#9169 <https://github.com/numba/numba/pull/9169>`__) ``numpy.searchsorted`` and ``numpy.sort`` behaviour updates =========================================================== * ``numpy.searchsorted`` implementation updated to produce identical outputs to numpy for a wider set of use cases, including where the provided array `a` is in fact not properly sorted. * ``numpy.searchsorted`` implementation bugfix for the case where side='right' and the provided array `a` contains NaN(s). * ``numpy.searchsorted`` implementation extended to support complex inputs. * ``numpy.sort`` (and ``array.sort``) implementation extended to support sorting of complex data. (`PR-#9189 <https://github.com/numba/numba/pull/9189>`__) Fix SSA to consider variables where use is not dominated by the definition ========================================================================== A SSA problem is fixed such that a conditionally defined variable will receive a phi node showing that there is a path where the variable is undefined. This affects extension code that relies on SSA behavior. (`PR-#9242 <https://github.com/numba/numba/pull/9242>`__) Fixed ``RecursionError`` in ``prange`` ====================================== A problem with certain loop patterns using ``prange`` leading to ``RecursionError`` in the compiler is fixed. An example of such loop is shown below. The problem would cause the compiler to fall into an infinite recursive cycle trying to determine the definition of ``var1`` and ``var2``. The pattern involves definitions of variables within an if-else tree and not all branches are defining the variables. .. code-block:: for i in prange(N): for j in inner: if cond1: var1 = ... elif cond2: var1, var2 = ... elif cond3: pass if cond4: use(var1) use(var2) (`PR-#9244 <https://github.com/numba/numba/pull/9244>`__) Support negative axis in ufunc.reduce ===================================== Fixed a bug in ufunc.reduce to correctly handle negative axis values. (`PR-#9296 <https://github.com/numba/numba/pull/9296>`__) Fix issue with parfor reductions and Python 3.12. ================================================= The parfor reduction code has certain expectations on the order of statements that it discovers, these are based on the code that previous versions of Numba generated. With Python 3.12, one assignment that used to follow the reduction operator statement, such as a binop, is now moved to its own basic block. This change reorders the set of discovered reduction nodes so that this assignment is right after the reduction operator as it was in previous Numba versions. This only affects internal parfor reduction code and doesn't actually change the Numba IR. (`PR-#9334 <https://github.com/numba/numba/pull/9334>`__) Changes ~~~~~~~ Make test listing not invoke CPU compilation. ============================================= Numba's test listing command ``python -m numba.runtests -l`` has historically triggered CPU target compilation due to the way in which certain test functions were declared within the test suite. It has now been made such that the CPU target compiler is not invoked on test listing and a test is added to ensure that it remains the case. (`PR-#9309 <https://github.com/numba/numba/pull/9309>`__) Semantic differences due to Python 3.12 variable shadowing in comprehensions ============================================================================ Python 3.12 introduced a new bytecode ``LOAD_FAST_AND_CLEAR`` that is only used in comprehensions. It has dynamic semantics that Numba cannot model. For example, .. code-block:: python def foo(): if False: x = 1 [x for x in (1,)] return x # This return uses undefined variable The variable `x` is undefined at the return statement. Instead of raising an ``UnboundLocalError``, Numba will raise a ``TypingError`` at compile time if an undefined variable is used. However, Numba cannot always detect undefined variables. For example, .. code-block:: python def foo(a): [x for x in (0,)] if a: x = 3 + a x += 10 return x Calling ``foo(0)`` returns ``10`` instead of raising ``UnboundLocalError``. This is because Numba does not track variable liveness at runtime. The return value is ``0 + 10`` since Numba zero-initializes undefined variables. (`PR-#9315 <https://github.com/numba/numba/pull/9315>`__) Refactor and remove legacy APIs/testing internals. ================================================== A number of internally used functions have been removed to aid with general maintenance by reducing the number of ways in which it is possible to invoke compilation, specifically: * ``numba.core.compiler.compile_isolated`` is removed. * ``numba.tests.support.TestCase::run_nullary_func`` is removed. * ``numba.tests.support.CompilationCache`` is removed. Additionally, the concept of "nested context" is removed from ``numba.core.registry.CPUTarget`` along with the implementation details. Maintainers of target extensions (those using the API in ``numba.core.target_extension`` to extend Numba support to custom/synthetic hardware) should note that the same can be deleted from target extension implementations of ``numba.core.descriptor.TargetDescriptor`` if it is present. i.e. the ``nested_context`` method and associated implementation details can just be removed from the custom target's ``TargetDescriptor``. Further, a bug was discovered, during the refactoring, in the typing of record arrays. It materialised that two record types that only differed in their mutability could alias, this has now been fixed. (`PR-#9330 <https://github.com/numba/numba/pull/9330>`__) Deprecations ~~~~~~~~~~~~ Explicitly setting ``NUMBA_CAPTURED_ERRORS=old_style`` will raise deprecation warnings ====================================================================================== As per deprecation schedule of old-style error-capturing, explicitly setting ``NUMBA_CAPTURED_ERRORS=old_style`` will raise deprecation warnings. This release is the last to use "old_style" as the default. Details are documented at https://numba.readthedocs.io/en/0.58.1/reference/deprecation.html#deprecation-of-old-style-numba-captured-errors (`PR-#9346 <https://github.com/numba/numba/pull/9346>`__) Expired Deprecations ~~~~~~~~~~~~~~~~~~~~ Object mode *fall-back* support has been removed. ================================================= As per the deprecation schedule for Numba 0.59.0, support for "object mode fall-back" is removed from all Numba ``jit``-family decorators. Further, the default for the ``nopython`` key-word argument has been changed to ``True``, this means that all Numba ``jit``-family decorated functions will now compile in ``nopython`` mode by default. (`PR-#9352 <https://github.com/numba/numba/pull/9352>`__) Removal of deprecated API ``@numba.generated_jit``. =================================================== As per the deprecation schedule for 0.59.0, support for ``@numba.generated_jit`` has been removed. Use of ``@numba.extending.overload`` and the high-level extension API is recommended as a replacement. (`PR-#9353 <https://github.com/numba/numba/pull/9353>`__) Infrastructure Related Changes ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Add validation capability for user generated towncrier ``.rst`` files. ====================================================================== Added a validation script for user generated towncrier ``.rst`` files. The script will run as a part of towncrier Github workflow automatically on every PR. (`PR-#9335 <https://github.com/numba/numba/pull/9335>`__)

gmarkall added 2 - In Progress CUDA CUDA related issue/PR Effort - medium Medium size effort needed labels Nov 8, 2023

gmarkall added a commit to gmarkall/numba that referenced this pull request Nov 8, 2023

Add release note for numba#9274

c4011b5

gmarkall force-pushed the cuda-ltoir branch from 9bbe644 to f932dae Compare November 9, 2023 08:58

gmarkall added 3 - Ready for Review and removed 2 - In Progress labels Nov 13, 2023

gmarkall added 10 commits December 5, 2023 16:13

CUDA codegen: refactor common CC ensuring functionality

57410d1

CUDA codegen refactor: We only generate one PTX

ec74a39

We never have multiple PTX outputs anymore (this was only necessary with NVVM 3.4), there's no need to make lists of them or join them

compile_ptx() docs: Explain return type behaviour

e146972

This addition explicitly states the behaviour when a return type is or is not supplied as part of the signature; previously the user would have had to guess this, or discover it through accident / experiment.

CUDA codegen: add support for compilation to LTO-IR

5adfbc0

This follows a very similar process to PTX compilation - LTO generation is enabled with NVVM's `-gen-lto` flag.

CUDA: Document compilation to LTO-IR

40d8c98

Add release note for numba#9274

6734671

CUDA: Add dummy compile APIs to simulator

30109ea

Skip LTO-IR test when toolkit < 11.5

0d9a729

gmarkall force-pushed the cuda-ltoir branch from f932dae to 0d9a729 Compare December 5, 2023 16:13

leofang mentioned this pull request Mar 15, 2024

Support for CUFFT callbacks JuliaGPU/CUDA.jl#75

Open

gmarkall added 2 commits April 3, 2024 12:16

Merge remote-tracking branch 'numba/main' into cuda-ltoir

86000ce

Fix formatting in PR numba#9274 notes

1086df4

gmarkall force-pushed the cuda-ltoir branch from 75f0d81 to 6286b22 Compare April 8, 2024 17:11

Add simulator stubs for Linker LTO stub functionality

58d87fc

This is needed to allow the "skip under LTO" test functionality to run successfully (and not skip on the simulator, since it does not simulate LTO).

gmarkall added this to the 0.60.0-rc1 milestone Apr 9, 2024

isVoid mentioned this pull request Apr 16, 2024

LTO Support NVIDIA/numbast#33

Closed

gmarkall added 2 commits April 17, 2024 12:43

CUDA: refactor codegen to remove _get_ptx()

eb04196

This is just implementing `get_asm_str()` now, which is part of the codegen object's interface. To align better with the rest of Numba, the `_get_ptx()` body is moved into `get_asm_str()` and `get_asm_str()` is used in its place.

stuartarchibald reviewed Apr 24, 2024

View reviewed changes

stuartarchibald added 4 - Waiting on author Waiting for author to respond to review and removed 3 - Ready for Review labels Apr 24, 2024

gmarkall added 2 commits April 24, 2024 13:51

Assert correct return type in

1dd95c4

gmarkall added 4 - Waiting on CI Review etc done, waiting for CI to finish and removed 4 - Waiting on author Waiting for author to respond to review labels Apr 24, 2024

stuartarchibald approved these changes Apr 24, 2024

View reviewed changes

stuartarchibald added 5 - Ready to merge Review and testing done, is ready to merge and removed 4 - Waiting on CI Review etc done, waiting for CI to finish labels Apr 24, 2024

stuartarchibald mentioned this pull request Apr 24, 2024

Numba 0.60.0rc1 Checklist #9544

Closed

41 tasks

sklam merged commit 6bf8b6a into numba:main Apr 24, 2024

gmarkall deleted the cuda-ltoir branch May 2, 2024 11:17

CUDA: Add support for compilation to LTO-IR #9274

CUDA: Add support for compilation to LTO-IR #9274

Uh oh!

Conversation

gmarkall commented Nov 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gmarkall commented Nov 8, 2023

Uh oh!

gmarkall commented Nov 8, 2023

Uh oh!

gmarkall commented Nov 8, 2023

Uh oh!

gmarkall commented Nov 9, 2023

Uh oh!

gmarkall commented Dec 5, 2023

Uh oh!

gmarkall commented Apr 3, 2024

Uh oh!

gmarkall commented Apr 8, 2024

Uh oh!

gmarkall commented Apr 8, 2024

Uh oh!

gmarkall commented Apr 8, 2024

Uh oh!

gmarkall commented Apr 17, 2024

Uh oh!

stuartarchibald left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gmarkall commented Apr 24, 2024

Uh oh!

gmarkall commented Apr 24, 2024

Uh oh!

stuartarchibald left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gmarkall commented Nov 8, 2023 •

edited

Loading