GPU computation

October 7, 2014 — August 4, 2021

computers are awful
concurrency hell
number crunching
premature optimization

Options for doing SIMD computation with fewer tears, for people, like me, who give few hoots implementation details, but just want it to work fast enough.

Aside: setting up local machines is boring and expensive unless I believe I can somehow keep my GPU continually occupied, which empirically I cannot. I would generally pay a cloud provider to rent their pre-configured machines.

Is all this too 3-years-ago for you tastes? Why not try FPGA computation?

Figure 1: NVIDIA headquarters, 1902

1 Theory

2 Methods

  • Just writing GSL shaders using your compiler and the relevant manufacturer toolboxes. Laborious and tangential, unless you are a GPU-algorithm researcher. GPU coding is a whole highly specialised career path. But could be fun I s’pose, for someone who is not me. See the book of shaders.

  • for data-oriented computational data flow graphs, use one of those toolkits from the deep_learning community. These are easy and performant, although not quite as general as writing shader code.

Neither fits? OK, try these libraries at intermediate levels of complexity:

  • GPU usage in julia

  • RAPIDS is a python toolkit which attempts to run whole ML pipelines on GPUs. I wonder how effective it is.

  • numba compiles a subset of python to run on CPUs or GPUs; this sound uninspiring, but it turns out to be amazing because the debugging affordances are excellent when you can switch between a python interpreter and a C compiler for the same code. It generates C loops from plain python, which is incredible. OTOH the GPU stuff is not seamless and requires a little too much parallelism hinting to be plausibly useful to amateurs like me.

  • Taichi is a physics-simulation-and-graphics oriented library with clever compilation to various backends including CUDA and cpus.

  • cupy is an CUDA-backed numpy clone which includes bonus deep learning operations.

  • Gnumpy isn’t fashionable but has been around, and has a fancy pedigree:

    Do you want to have both the compute power of GPUs and the programming convenience of Python numpy? Gnumpy + Cudamat will bring you that.

    Gnumpy is a simple Python module that interfaces in a way almost identical to numpy, but does its computations on your computer’s GPU. […]

    Gnumpy runs on top of, and therefore requires, the excellent cudamat library, written by Vlad Mnih.

    Gnumpy can run in simulation mode: everything happens on the CPU, but the interface is the same. This can be helpful if you like to write your programs on your GPU-less laptop before running them on a GPU-equipped machine. It also allows you to easily test what performance gain you get from using a GPU. The simulation mode requires npmat, written by Ilya Sutskever.

  • cudamat:

    The aim of the cudamat project is to make it easy to perform basic matrix calculations on CUDA-enabled GPUs from Python. cudamat provides a Python matrix class that performs calculations on a GPU. At present, some of the operations our GPU matrix class supports include…

The book of shaders

This book will focus on the use of GLSL pixel shaders. First we’ll define what shaders are; then we’ll learn how to make procedural shapes, patterns, textures and animations with them. You’ll learn the foundations of shading language and apply it to more useful scenarios such as: image processing (image operations, matrix convolutions, blurs, color filters, lookup tables and other effects) and simulations (Conway’s game of life, Gray-Scott’s reaction-diffusion, water ripples, watercolor effects, Voronoi cells, etc). Towards the end of the book we’ll see a set of advanced techniques based on Ray Marching.