Options for doing SIMD computation with fewer tears, for people, like me, who give not a damn about implementation details, but just want it to work fast enough.
Aside: you’re going to have to mess around with downloading proprietary GPU toolkits from the manufacturer. Tedious. Consider instead paying some cloud provider to rent their pre-configured machines.
Is all this too 3-years-ago for you tastes? Why not try FPGA computation.
Just writing GSL shaders using your compiler and the relevant manufacturer toolboxes. Laborious and tangential, unless you are a GPU-algorithm researcher. But could be fun I s’pose. See the book of shaders.
for data-oriented computational data flow graphs, use one of those toolkits from the deep_learning community. These are easy and performant, although not quite as general as just writing a damn shader.
OK, try these.
numba compiles a subset of python to run on CPUs or GPUs; this sound uninspiring, but it turns out to be amazing because the debugging affordances are really good when you can switch between a python interpreter and a C compiler for the same code. It generates C loops from plain python, which is incredible. OTOH the GPU stuff is not seamless and requires a little too much parallelism hinting to be plausibly useful to amateurs like me.
Taichi is a physics-simulation-and-graphics oriented library with clever compilation to various backends including CUDA and cpus.
cupy is an NVIDIA-backed numpy clone which includes bonus CUDA libraries and DNN operations.
Gnumpy isn’t fashionable but has been around, and has a fancy pedigree:
Do you want to have both the compute power of GPUs and the programming convenience of Python numpy? Gnumpy + Cudamat will bring you that.
Gnumpy is a simple Python module that interfaces in a way almost identical to numpy, but does its computations on your computer’s GPU. […]
Gnumpy runs on top of, and therefore requires, the excellent cudamat library, written by Vlad Mnih.
Gnumpy can run in simulation mode: everything happens on the CPU, but the interface is the same. This can be helpful if you like to write your programs on your GPU-less laptop before running them on a GPU-equipped machine. It also allows you to easily test what performance gain you get from using a GPU. The simulation mode requires npmat, written by Ilya Sutskever.
The aim of the cudamat project is to make it easy to perform basic matrix calculations on CUDA-enabled GPUs from Python. cudamat provides a Python matrix class that performs calculations on a GPU. At present, some of the operations our GPU matrix class supports include…
This book will focus on the use of GLSL pixel shaders. First we’ll define what shaders are; then we’ll learn how to make procedural shapes, patterns, textures and animations with them. You’ll learn the foundations of shading language and apply it to more useful scenarios such as: image processing (image operations, matrix convolutions, blurs, color filters, lookup tables and other effects) and simulations (Conway’s game of life, Gray-Scott’s reaction-diffusion, water ripples, watercolor effects, Voronoi cells, etc.). Towards the end of the book we’ll see a set of advanced techniques based on Ray Marching.