Python, compilation and accelerating

Which Foreign Function Interface am I supposed to be using now?

Want to call a a function in C+, C++, FORTRAN etc from python?

If you are just talking to C, ctypes is a python library to translate python objects to c with minimal fuss, and no compiler requirement. See the ctypes tutorial.

And of course, if you have your compiler lying about, Python was made to talk to other languages and has a normal C API.

If you want something closer to python for your development process, Cython allows python compilation using a special syntax, and easy calling of foreign functions in one easy package. SWIG wraps function interfaces between various languages, but looks like a PITA; (See a comparison on stackoverflow).

There is also Boost.python if you want to talk to C++. Boost comes with lots of other fancy bits, like numerical libraries.

There are many other options, but in practice I’ve never needed to go further than cython, so I can’t even talk about all the options listed here knowledgeably.

Related, overlapping:


There are too many options for interfacing with external libraries and/or compiling python code.

FFI, ctypes, Cython, Boost-Python, numba, SWIG…


Lowish-friction, well tested, well-document works everywhere that Cpython extensions can be compiled. Compiles most python code (apart from generators and inner functions). Optimises python code using type defs and extended syntax. Here, read Max Burstein’s intro.

Highlights: It works seamlessly with numpy. It makes calling C-code easy

Problems: No generic dispatch. Debugging is nasty, like debugging C with extra crap in your way.


More specialised than cython, uses LLVM instead of the generic C compiler. Numba make optimising inner numeric loops easy.

Highlights: jit-compiles plain python, so it’s easy to use normal debuggers then switch on the compiler for performance improvements using the @jit Generic dispatch using the @generated_jit decorator. Compiles to multi-core vectorisations as well as CUDA. In principle this means you can do your calculations on the GPU.

Problems: LLVM is a shifty beast and sensitive version dependencies are annoying. Documentation is a bit crap, or at least unfriendly to outsiders. Practically, getting performance out of a GPU is trickier than working out you can optimise away one tedious matrix op, and doing it at this level is hard. There is too much messing with details of how many processors to allocate what to.

You might find it easier to use julia if a well-maintained and documented LLVM infrastructure is a real selling point for you.

Jax, tensorflow etc