Autogeneration of Python bindings from manually annotated C++ headers
genpybind is a tool based on clang, which automatically generates the code which is
necessary to expose a C++ API as a Python extension via pybind11.
To reduce the complexity and required heuristics, it relies on additional manual hints that are
added to the header file in the form of unobtrusive annotation macros1.
While this mandates that you are able to change the actual interface declaration, it results in
a succinct file that describes both the C++ and Python interface of your library. However, as a
consequence, manually writing pybind11 code is still necessary for code which is not under your
control.
That said, a simple class that should be exposed via a Python extension could look as follows:
#pragma once
#include "genpybind.h"
class GENPYBIND(visible) Example {
public:
static constexpr int GENPYBIND(hidden) not_exposed = 10;
/// \brief Do a complicated calculation.
int calculate(int some_argument = 5) const;
GENPYBIND(getter_for(something))
int getSomething() const;
GENPYBIND(setter_for(something))
void setSomething(int value);
private:
int _value = 0;
};The resulting extension can then be used like this:
>>> import pyexample as m
>>> obj = m.Example()
>>> obj.something
0
>>> obj.something = 42
>>> obj.something
42
>>> obj.calculate() # default argument
47
>>> obj.calculate(2)
44
>>> help(obj.calculate)
Help on method calculate in module pyexample:
calculate(...) method of pyexample.Example instance
calculate(self: pyexample.Example, some_argument: int=5) -> int
Do a complicated calculation.As you can see, annotations are included inline to control what is exposed to the Python extension,
whether getters or setters are exposed as a class property, ….
The resulting Python extension will among other things include docstrings, argument names and
default arguments for functions. Imagine how much time you will save by not manually keeping the
python bindings and header files in sync! For the example presented above genpybind will generate
the following:
auto genpybind_class_decl__Example_Example =
py::class_<::Example>(m, "Example");
{
typedef int (::Example::*genpybind_calculate_type)(int) const;
genpybind_class_decl__Example_Example.def(
"calculate", (genpybind_calculate_type) & ::Example::calculate,
"Do a complicated calculation.", py::arg("some_argument") = 5);
}
genpybind_class_decl__Example_Example.def(py::init<>(), "");
genpybind_class_decl__Example_Example.def(py::init<const ::Example &>(), "");
genpybind_class_decl__Example_Example.def_property(
"something", py::cpp_function(&::Example::getSomething),
py::cpp_function(&::Example::setSomething));The current implementation was started as a proof-of-concept to see whether the described approach
was viable for an existing code base. Due to its prototypical and initially fast-changing nature
it was based off the libclang bindings. However, as of clang 5.0.0 not all necessary information
was available via this API. (For example, implicitly instantiated constructors are not exposed.)
To work around this issue, several patches on top of libclang are included in this repository (some
of them have already been merged upstream; some are rather hacky or not yet finished/fully tested).
In addition, a genpybind-parse tool based on the internal libtooling clang API is used to
extend/amend the abstract syntax tree (e.g. instantiate implicit member functions) and store it in
a temporary file. This file is then read by the Python-based tool via the patched libclang API.
Evidently, now that the approach has been shown to work, the implementation could transition to be
a single C++ tool based on the internal libtooling API. I eventually plan to go down that road.
- Documentation is minimal at the moment. If you want to look at example use-cases the integration tests might provide a starting point.
- Expressions and types in default arguments, return values or
GENPYBIND_MANUALinstructions are not consistently expanded to their fully qualified form. As a workaround it is suggested to use the fully-qualified name where necessary.
-
Build and install llvm/clang 9.0.0 with the patches provided in
llvm-patches. You can use a different prefix when installing, to prevent the patched clang from interfering with the version provided by your distribution. Let's assume you unpacked the source code to$HOME/llvm-srcand used-DCMAKE_INSTALL_PREFIX=$HOME/llvm. -
Make sure genpybind can find the the libclang Python bindings:
export PYTHONPATH=$HOME/llvm-src/tools/clang/bindings/python \ LD_LIBRARY_PATH=$HOME/llvm/lib
-
Build the
genpybind-parseexecutable:PYTHON=/usr/bin/python2 CXX=/bin/clang++ CC=/bin/clang \ LLVM_CONFIG=$HOME/llvm/bin/llvm-config \ ./waf configure --disable-tests ./waf buildNote that custom Python/compiler/llvm-config executables can be provided via environment variables. If you happened to use the
-DBUILD_SHARED_LIBS=ONoption when building clang you need to pass--clang-use-sharedtowaf configure.Optional: If you want to build and run the integration tests you need to install pytest and pybind11 and should remove the
--disable-testsargument towaf configure. You can use the--pybind11-includesoption to point to the include path required for pybind11. -
Install the genpybind tool:
./waf install
By default genpybind will be installed to
/usr/local/. Use the--prefixargument ofwaf configureif you prefer a different location. -
Create Python bindings from your C++ header files:
# Remember to set up your environment as done in step 2 each time you run genpybind: export PYTHONPATH=$HOME/llvm-src/tools/clang/bindings/python \ LD_LIBRARY_PATH=$HOME/llvm/lib # The following assumes that both `genpybind` and `genpybind-parse` are on your path. genpybind --genpybind-module pyexample --genpybind-include example.h -- \ /path/to/example.h -- \ -D__GENPYBIND__ -xc++ -std=c++14 \ -I/path/to/some/includes \ -resource-dir=$HOME/llvm/lib/clang/5.0.0
The flags after the second
--are essentially what you would pass to the compiler when processing the translation unit corresponding to the header file.
enum GENPYBIND(arithmetic) Access { Read = 4, Write = 2, Execute = 1 };To allow arithmetic on enum elements use the arithmetic keyword.
See enums.h and enums_test.py.
The dynamic_attr keyword controls if dynamic attributes (adding additional members at run-time) is allowed:
struct GENPYBIND(visible) Default {
void some_function() const {}
bool existing_field = true;
};
struct GENPYBIND(dynamic_attr) WithDynamic {
void some_function() const {}
bool existing_field = true;
};See dynamic_attr.h and dynamic_attr_test.py.
expose_as allows to give the Python binding a name different from the one in the C++ source:
GENPYBIND(expose_as(some_other_name));
bool name;This also allows to populate the private/name-mangled Python variables and functions:
GENPYBIND(expose_as(__hash__))
int hash() const;See expose_as.h and expose_as_test.py.
Python propiertes are supported by the getter_for and setter_for keywords:
GENPYBIND(getter_for(value))
int get_value() const;
GENPYBIND(setter_for(value))
void set_value(int value);
GENPYBIND(getter_for(readonly))
bool computed() const;getter and setter_for are short-hands for accessor_for(..., get/set).
See properties.h and properties_test.py.
hidden
See visible.
See hide_base.h and hide_base_test.py.
Cf. pybind11's PYBIND11_DECLARE_HOLDER_TYPE.
See holder_type.h and holder_type_test.py.
See inline_base.h and inline_base_test.py.
To control the life time of objects passed to or returned from (member) functions, keep_alive can be used.
keep_alive(bound, who) indicates that who should be kept alive at least until bound is garbage collected.
An argument to keep_alive can be either the name of a function parameters or one of return or this, where
return refers to the return value of the function and this refers to the instance a member function is called on.
GENPYBIND(keep_alive(this, child))
Parent(Child *child);When the instance of Parent is deleted from Python, child will not be deleted as well.
See keep_alive.h and keep_alive_test.py.
Using the module keyword C++ namespaces can be turned into submodules
of the generated Python module. In the following example, X would be exposed
as name_of_module.submodule.X, where name_of_module is the name of the
outer Python module.
namespace submodule GENPYBIND(module) {
class GENPYBIND(visible) X {};
} // namespace submoduleSee submodule.h and submodule_test.py.
Implicit conversion of function arguments can be controlled with the noconvert keyword:
GENPYBIND(noconvert(value))
double noconvert(double value);
GENPYBIND(noconvert(first))
double noconvert_first(double first, double second);If noconvert(...) is called with anything but type double, a TypeError is raised.
For multi-argument functions, the behaviour can be controlled on a per-variable basis.
See noconvert.h and noconvert_test.py.
Allows to "inline" the underlying type at the location of a typedef, as if it was defined
there. As the name of this feature may lead to confusion with pybind11's
PYBIND11_MAKE_OPAQUE, it will likely be renamed or redesigned in an upcoming release.
More details can be found in issue #24.
See expose_as.h and expose_as_test.py.
Unscoped GENPYBIND_MANUAL macros can be used to add preamble and postamble code to the
generated bindings, e.g. for importing required libraries or executing python code that
dynamically patches the generated bindings:
GENPYBIND(postamble)
GENPYBIND_MANUAL({
auto env = parent->py::module::import("os").attr("environ");
// should not have any effect as this will be run after preamble code
env.attr("setdefault")("genpybind", "postamble");
env.attr("setdefault")("genpybind_post", "postamble");
})
GENPYBIND_MANUAL({
auto env = parent->py::module::import("os").attr("environ");
env.attr("setdefault")("genpybind", "preamble");
})See manual.h and manual_test.py.
readonly is an alias for writable(false);
GENPYBIND(required(child))
void required(Child *child)Calls to functions where pointer arguments are annotated with required
and called from Python with None will raise a TypeError.
See required.h and required_test.py.
The return value policy controls how returned references are exposed to Python:
Nested &ref();
const Nested &cref() const;
GENPYBIND(return_value_policy(copy))
Nested &ref_as_copy();
struct GENPYBIND(visible) Parent {
GENPYBIND(return_value_policy(reference_internal))
Nested &ref_as_ref_int();
};By default, the automatic return value policy
of pybind11 is used. In the case of ref and cref in the example this amounts to
"return by value" for the wrapped Python functions. This behavior is unchanged
when the function is explicitly annotated to return by value (see ref_as_copy).
As ref_as_ref_int demonstrates, any other return value policy supported by
pybind11 can be set. In this case reference_internal is used to return a reference
to an existing object, whose life time is tied to the parent object.
See return_by_value_policy.h and return_by_value_policy_test.py.
The stringstream keyword populates the str and repr functionality:
GENPYBIND(stringstream)
friend std::ostream &operator<<(std::ostream &os, const Something &) {
return os << "uiae";
}See stringstream.h and stringstream_test.py.
If a binding is supposed to be generated is controlled by the visibility keywords visible and hidden:
class Unannotated {};
class GENPYBIND(hidden) Hidden {};
class GENPYBIND(visible) Visible {};Any GENPYBIND annotation will make the annotated entity visible.
As a consequence visible can be removed from the argument list,
as soon as there are any other arguments to GENPYBIND.
Anything without an annotation is excluded by default, but the intent of hiding it
from bindings can be explicitly stated by the keyword hidden.
If a namespace is annotated with visible, any contained entity will be made visible
by default, even if it has no GENPYBIND annotations. The hidden keyword can then
be used to hide it.
See visibility.h and visibility_test.py.
Constness is transported from C++ to Python automatically. In addition, variables can be set to be read-only by the writable keyword:
const int const_field = 2;
GENPYBIND(writeable(false))
int readonly_field = 4;For both const_field and readonly_field, an AttributeError will be raised if set from Python.
See variables.h and variables_test.py.
See License.
Footnotes
-
During normal compilation these macros have no effect on the generated code, as they are defined to be empty. The annotation system is implemented using the
annotateattribute specifier, which is available as a GNU language extension via__attribute__((...)). As the annotation macros only have to be parsed by clang and are empty during normal compilation the annotated code can still be compiled by any C++ compiler. See genpybind.h for the definition of the macros. ↩