C ABI¶
This page documents the public C ABI shipped with bocpy. Use it when
writing a downstream C extension that needs to participate in
behavior-oriented concurrency at the C level — typically by registering
a custom Python type as cross-interpreter shareable so Cown
can hand instances of it across worker interpreters.
When do I need this?¶
You do not need this header to use a custom Python type with
Cown. Any type that has been registered as cross-interpreter
shareable through CPython’s _PyXIData_* / _PyCrossInterpreterData_*
machinery — by whatever means — is automatically handled by bocpy’s
scheduler and message queue. Registration is what makes a type usable
across worker sub-interpreters; bocpy does not impose any additional
requirement on top of that.
The public C ABI is provided as a convenience for extension authors who want to do that registration from C. It exists for two reasons:
Cross-platform atomics.
<bocpy/bocpy.h>exposes a small sequentially-consistent atomics surface (atomic_load/atomic_store/atomic_fetch_add/atomic_compare_exchange_strongoveratomic_int_least64_t, plus athread_localmacro) that compiles on MSVC as well as every toolchain that ships<stdatomic.h>. On non-MSVC builds the umbrella simply pulls in<stdatomic.h>; on MSVC it provides the missing prototypes so the same source compiles unchanged.Portability across Python versions. The
_PyXIData_*API has changed shape several times between CPython 3.12, 3.13, 3.14, and 3.15 (free-threaded builds included), and differs again on the legacyBOC_NO_MULTIGILpath.<bocpy/bocpy.h>and<bocpy/xidata.h>paper over those differences with a single set of macros, so one source file builds unchanged across every supported Python version that bocpy itself supports.
If neither of those concerns applies to your extension, you can ignore the C ABI entirely and rely on whatever cross-interpreter registration your type already has.
Note
The bocpy public C ABI is C only. Including bocpy.h from a
C++ translation unit is not supported in this release. C++ consumers
must wrap the bocpy ABI in a thin C translation unit and call into
that from C++.
Quickstart¶
In your downstream setup.py:
import bocpy
from setuptools import setup, Extension
setup(
ext_modules=[
Extension(
"myext",
sources=["myext.c"] + bocpy.get_sources(),
include_dirs=[bocpy.get_include()],
)
],
)
In your C source:
#include <bocpy/bocpy.h>
/* <bocpy/bocpy.h> includes <Python.h> internally and is
* order-insensitive with respect to <Python.h> itself (which is
* idempotent). It must still appear *before* any system header
* (<stdio.h>, <string.h>, ...) in the same translation unit, the
* same way <Python.h> must — CPython forbids system headers
* before Python.h. */
ABI versioning¶
bocpy.h defines a single integer macro BOCPY_ABI. Compare it
with >= if you want to gate code on a minimum bocpy ABI revision.
The value is bumped on any incompatible change to bocpy.h or
xidata.h. Wheels are CPython-version-tagged (currently cp310,
cp311, cp312, cp313, cp314), so a runtime ABI mismatch
between bocpy and its host CPython cannot occur: each wheel embeds
the xidata.h ladder arm appropriate for its target CPython.
Atomic surface¶
The atomic surface is a minimal, sequentially-consistent shim over
int_least64_t. On non-MSVC compilers it is just <stdatomic.h>;
on MSVC the four functions below have out-of-line bodies in
bocpy_msvc.c, which downstream extensions pick up automatically
via bocpy.get_sources().
Symbol |
Description |
|---|---|
|
Type alias for a 64-bit atomic integer. |
|
Sequentially-consistent load of |
|
Sequentially-consistent store of |
|
Sequentially-consistent |
|
Sequentially-consistent CAS. Returns |
All operations are sequentially consistent on every supported MSVC
target (x86, x64, ARM64). The MSVC shim implements atomic_load
via InterlockedOr64(ptr, 0) and atomic_store via
InterlockedExchange64 on x64/ARM64 (full barriers); on x86 both
go through InterlockedCompareExchange64. The RMW ops are
InterlockedExchangeAdd64 / InterlockedCompareExchange64 on
x64/ARM64 and CAS-loops on x86 — all already full barriers. The shim
deliberately does not expose _explicit variants or weaker memory
orders.
Ownership helpers¶
The XIData callbacks shown in the worked example below flip a single
atomic_int_least64_t owner field as the resource crosses
interpreter boundaries. bocpy.h exposes two helpers for that
pattern so downstream code does not have to redefine them:
Symbol |
Description |
|---|---|
|
Sentinel value ( |
|
|
The two are designed to be used together: producer-side, CAS the
owner from bocpy_interpid() to BOCPY_NO_OWNER before calling
XIDATA_INIT; consumer-side, CAS it back from BOCPY_NO_OWNER to
bocpy_interpid() inside the new_object callback. See the
worked example below.
Proto-Region semantics¶
The ownership pattern shown in the worked example is a deliberately
narrow approximation of the Region discipline from Pyrona’s
Lungfish model (Stoldt et al., Dynamic Region Ownership for
Concurrency Safety,
PLDI 2025). Lungfish is a dynamic ownership model for Python in which mutable state is grouped into
regions: at any point in time at most one thread has access to a
region, transferring a region into a cown makes it sharable
between threads, and acquiring the cown moves the region into the
acquiring thread for the duration of the behavior. The bocpy public
ABI does not implement Lungfish — there is no region graph, no
freeze, no merge, no borrow tracking — but the BOCPY_NO_OWNER /
bocpy_interpid() pair gives downstream C extensions enough
machinery to model the single most important invariant a region
provides: a mutable C resource is owned by exactly one interpreter
at a time, and any other interpreter that still holds a wrapper
around it cannot read or write its contents.
What proto-Region buys you¶
Wrapping a C struct in a refcounted Python type and registering it
through XIDATA_REGISTERCLASS already moves the pointer between
sub-interpreters efficiently — no copy, no pickle. But pointers alone
are unsafe: nothing stops a worker from racing the previous owner on
the same impl after the handoff, exactly the unrestricted-shared-
mutable-state hazard regions are designed to eliminate (see Pyrona §1
and Fig. 1: share(x, T2) between threads is unsafe). The owner
field, flipped atomically as the impl crosses the XIData boundary,
turns that hazard into a deterministic RuntimeError rather than a
data race.
The contract¶
A custom C resource opting into proto-Region semantics commits to all of the following:
Single owner field. The resource carries one
atomic_int_least64_t ownerfield, initialised tobocpy_interpid()of the constructing interpreter.Matrix’smatrix_impland the consumer template’scounter_implare the canonical examples.CAS in the producer callback.
XIDATA_GETDATA_FUNCCASes the owner frombocpy_interpid()toBOCPY_NO_OWNERbefore callingXIDATA_INIT. A failed CAS surfaces as aRuntimeErrorand aborts the handoff: the resource is not transferred.CAS in the consumer callback. The
new_objectcallback CASes the owner fromBOCPY_NO_OWNERtobocpy_interpid()before constructing the new wrapper. If wrapper allocation fails after the CAS succeeds, the callback must store the owner back toBOCPY_NO_OWNERso a future retry of the handoff can succeed.Ownership check on data accessors. Any method or getter that reads or writes the resource’s payload must verify
bocpy_interpid() == atomic_load(&impl->owner)and raiseRuntimeErrorotherwise.Matrix’simpl_check_acquiredand the consumer template’scounter_impl_check_acquiredare the canonical helpers. Identity-only accessors (e.g.Counter.address,Counter.refcount) are allowed to skip the check — the same way you may print the address of a Lungfish bridge object without acquiring its region.No raw send of the cown’s value. Inside a
@when,send("tag", c.value)is not the right primitive for shipping a proto-Region resource to another behavior: it would atomically move the impl out of the cown mid-behavior and leave the worker unable to release the cown afterwards. Send a copy (c.value.copy(), likeexamples/boids.py) or send primitive summary data (c.value.address,c.value.count). The cown itself is the right primitive for handing the resource to a different behavior — schedule a downstream@whenon the same cown.
What proto-Region does not give you¶
The bocpy ABI does not implement these parts of Lungfish, and a downstream extension that wants them must build them on top:
Transitive closure. Lungfish regions enforce isolation across a whole object graph; proto-Region tracks ownership of a single
implpointer. If your resource holds further heap-allocated state, you are responsible for keeping that state private (no outgoing references) or applying the same owner-CAS pattern to each piece.Matrixdoes this implicitly:matrix_implencapsulatesdataandrow_ptrs, both private to the impl.Freezing and immutability. There is no equivalent of
freeze(b). Resources are either owned-and-mutable or in flight.Merge / region trees. There is no nesting; one resource = one owner field.
Borrowed references from a “local region”. The bocpy worker loop is the closest equivalent — a worker behavior is the period during which the impl is owned by that interpreter — but there is no first-class borrow type, and a borrowed-style reference held past the end of a behavior is undefined behavior.
In short: proto-Region is the smallest thing that turns a
shareable C struct into a Region-like resource, and it slots into
the existing XIDATA_REGISTERCLASS lifecycle without any
additional machinery beyond the two ABI symbols
BOCPY_NO_OWNER and bocpy_interpid().
XIData ladder¶
The cross-interpreter data (“XIData”) API is a thin macro ladder over
CPython’s internal cross-interpreter data primitives, smoothing over
the rename from _PyCrossInterpreterData (3.12, 3.13) to
_PyXIData (3.14+). All macros and the typedef below are exposed
by bocpy.h via its #include "xidata.h".
Symbol |
Description |
|---|---|
|
Opaque struct holding a serialised cross-interpreter handoff. |
|
Allocate and zero a fresh |
|
Initialise an allocated |
|
Ask CPython to populate |
|
Field type used for the consumer-side reconstruction callback. |
|
Release any resources owned by |
|
Install a custom free callback into |
|
Register a Python |
The lifecycle is register once, then per-handoff init/get/free:
register each shareable type at module init; on the producer side,
allocate an XIDATA_T, populate it with XIDATA_GETXIDATA (which
internally calls the registered callback that will end up calling
XIDATA_INIT), hand the buffer to the consumer interpreter; on the
consumer side, call the new_object callback recorded during init
to reconstruct the Python object, then XIDATA_FREE to release.
Safety contract¶
These contracts are mirrored from the canonical doc-comments in
xidata.h; the C header is the single source of truth.
XIDATA_INIT(xidata, interp, …)—interpmust be the interpreter that currently ownsdata. Passing the wronginterpproduces a use-after-free across the worker handoff. Thexidatabuffer must be freshly allocated (or zeroed) and must not have been initialised before; double-init is undefined.XIDATA_REGISTERCLASS(type, cb)— must be called once per(type, cb)pair per interpreter, from inside that interpreter’s module-exec slot (or equivalent module-init code that runs on every import in every interpreter). The standard idiom is to call it from aPy_mod_execslot — see_core.c/_math.cfor the canonical pattern, mirrored by the consumer template undertemplates/c_abi_consumer/. Registering the same type twice with different callbacks in the same interpreter is undefined.BOC_NO_MULTIGIL— internal marker. Defined only on CPython <3.12, where the host interpreter has no per-interpreter GIL and the bocpy runtime falls back to a single-interpreter mode. TheXIDATA_*ladder still exposes the same macros on these versions; downstream consumers do not need to special-case this macro themselves. (TheXIDATA_GETDATA_FUNCmacro — see below — hides the only consumer-visible signature change for you.)
Worked example: bocpy.Matrix¶
The bocpy-shipped Matrix type uses every entry in the table
above. The annotated extracts below come from src/bocpy/_math.c.
1. Module init: register the type from the exec slot. This runs
once per interpreter, on every import — main interpreter and every
worker sub-interpreter that imports the module. Use multi-phase
initialisation (Py_mod_exec) and declare
Py_MOD_PER_INTERPRETER_GIL_SUPPORTED so the bocpy runtime can
load the module inside worker sub-interpreters:
/* _math_module_exec */
if (XIDATA_REGISTERCLASS(state->matrix_type, _matrix_shared)) {
Py_FatalError(
"could not register MatrixObject for cross-interpreter sharing");
return -1;
}
A single-phase PyModule_Create module that registers from
PyInit will load fine in the main interpreter but cannot satisfy
Py_MOD_PER_INTERPRETER_GIL_SUPPORTED; if a transpiled @when
body imports it, the worker sub-interpreter import will not run the
registration and the consumer callback will see no type registered
in its registry. See templates/c_abi_consumer/src/_bocpy_probe.c for
the full multi-phase template.
2. Producer side: prepare the underlying C matrix for handoff.
_matrix_shared is the cb registered above; CPython invokes it
when XIDATA_GETXIDATA is called against a MatrixObject from
the producer interpreter. Declare the callback with
XIDATA_GETDATA_FUNC so the body has the same
(tstate, obj, xidata) parameter list on every supported CPython —
the macro emits a small trampoline on Python <3.12 (where the runtime
calls the callback with (obj, xidata) only) so the body never has
to special-case BOC_NO_MULTIGIL:
XIDATA_GETDATA_FUNC(_matrix_shared) {
MatrixObject *matrix = (MatrixObject *)obj;
matrix_impl *impl = matrix->impl;
/* Atomically transfer ownership: this interpreter -> BOCPY_NO_OWNER. */
int_least64_t expected = bocpy_interpid();
int_least64_t desired = BOCPY_NO_OWNER;
if (!atomic_compare_exchange_strong(&impl->owner,
&expected, desired)) {
PyErr_Format(PyExc_RuntimeError, /* … */);
return -1;
}
XIDATA_INIT(xidata, tstate->interp, impl, obj, _new_matrix_object);
return 0;
}
Note the use of atomic_compare_exchange_strong from the atomic
surface to flip the matrix’s owner field, and XIDATA_INIT to wire
xidata to the C-level impl, the original Python obj, and
the consumer-side reconstruction callback _new_matrix_object.
3. Consumer side: reconstruct a Python wrapper for the C matrix.
_new_matrix_object is the new_object callback recorded by
XIDATA_INIT; CPython invokes it on the consumer interpreter:
static PyObject *_new_matrix_object(XIDATA_T *xidata) {
matrix_impl *impl = (matrix_impl *)xidata->data;
/* Atomically take ownership: BOCPY_NO_OWNER -> this interpreter. */
int_least64_t expected = BOCPY_NO_OWNER;
int_least64_t desired = bocpy_interpid();
if (!atomic_compare_exchange_strong(&impl->owner,
&expected, desired)) {
PyErr_Format(PyExc_RuntimeError, /* … */);
return NULL;
}
PyTypeObject *type = LOCAL_STATE->matrix_type;
MatrixObject *matrix = (MatrixObject *)type->tp_alloc(type, 0);
/* … wrap and return … */
}
Consumer modules and worker sub-interpreters¶
Workers always run in sub-interpreters, on every supported CPython.
What varies across versions is whether each sub-interpreter owns its
own GIL (3.12+) or shares the legacy global GIL (3.10/3.11, marked
internally by BOC_NO_MULTIGIL). The execution model —
per-interpreter module state, multi-phase init, the per-interpreter
XIDATA_REGISTERCLASS registry — is identical across versions.
Because every XIDATA_REGISTERCLASS ladder lives in a per-interpreter
exec slot (see step 1 above), a consumer extension’s Matrix-like
type is only registered in interpreters that actually imported the
extension. Any consumer module whose XIData wrappers will travel into
a @when body must be imported at module scope in the file the
worker exec’d from.
The transpiler propagates module-scope import statements into the
exported per-worker module, but it does not see runtime imports
(importlib.import_module(...), pytest.importorskip(...),
__import__(...) from inside a function, …). A worker that imports
the transpiled body without the consumer extension will load the
shared object via the OS loader but skip the per-interpreter exec
slot, leaving its LOCAL_STATE (or equivalent module-state cache)
NULL. The consumer callback will then segfault on the first reconstruction.
Practical rule for downstream authors:
Use a top-level
import _your_extensionin any test or example file that schedules@whenbodies which observe your extension’s types.pytest.importorskipis not transpiler-visible.Mirror the per-interpreter state pattern (heap-allocated type from
PyType_FromModuleAndSpec, per-module state,thread_localcache primed in the exec slot) shown intemplates/c_abi_consumer/src/_bocpy_probe.c.
What is NOT public¶
The wheel ships only bocpy.h, xidata.h, and bocpy_msvc.c
under the package directory. The following internal headers and
surfaces are not part of the public C ABI; do not depend on them:
Internal headers:
boc_compat.h,boc_cown.h,boc_sched.h,boc_tags.h,boc_terminator.h,boc_noticeboard.h. As a general rule, everyboc_*file in the package directory is private — onlybocpy.h,xidata.h, andbocpy_msvc.care public.Ordered atomics (
boc_atomic_*_explicitand the typedboc_atomic_*_u64/_intptrAPI).BOC mutex / condition-variable types (
BOCMutex,BOCCond) and theboc_mtx_*/boc_cnd_*helpers.boc_yield,boc_now_s,boc_now_ns,boc_sleep_ns, the physical-CPU helpers, and any otherboc_-prefixed function not exposed viabocpy.h.
C++ consumer support is also a non-goal for this release.
CPython version skew¶
bocpy wheels are tagged cp310 / cp311 / cp312 / cp313 / cp314.
Downstream extensions ship the same per-CPython matrix and thereby
pick up the matching bocpy.h view of the cross-interpreter data
ladder. BOCPY_ABI is bumped on any incompatible change to
bocpy.h or xidata.h.