2026/04/17

A Minimal GPU Jacobi Solver for Rigid Body Point Constraints

This post contains my derivation notes for the Jacobi iteration of a system of spherical rigid bodies constrained together in a chain by “point” constraints (ie. ball joints) - ie. a pearl necklace :). I’ve been working on a little framework project for comparing different GPU based physics solvers, and this is meant to be a solver/scene which can function as a baseline by which to compare the stability and performance of various solvers.

I’m sticking with a specific, very simple constraint type for the scope of this derivation for concreteness, and so that it can immediately map to an actual simulation.

The result: 2.5k chains of 40 beads (100k beads and constraints total) at 60fps on a RTX4060.

2025/04/19

Particle Life in WebGL

I spent this weekend fascinated by the interesting patterns that can emerge from the Particle Life algorithm by Jeffrey Ventrella. While this algorithm needs no new implementations (there are many!) I decided to use it as an exercise in WebGL programming.

Full screen

2025/02/24

NBody Simulation - 262k Stars

In the past few weeks, I learned about the Barnes-Hut algorithm for approximating long-range forces between many 2d particles from this video, and decided to try my own hand at a 3d implementation. My goal: to see how many stars I can fit into a single galaxy in real time, on my M1 Macbook Air. The video below was captured on that computer, with 262k stars. The code I wrote is here.

2018/02/02

The BIN-CSR Sparse Matrix Format

BIN-CSR is a sparse, symmetric matrix format, originally described by Daniel Weber et. al in their 2012 paper on GPU data structures. BIN-CSR is suitable for applications which use massively parallelized matrix operations, like some GPU-based finite element method solvers.

The diagonal elements of the matrix are stored separately, to enable fast preconditioning of the matrix. Preconditioning is a step which allows for faster convergence when performing a system solving method like the conjugate gradient method.

The WebGL demos on this page aim to demonstrate the construction of a BIN-CSR matrix in an interactive way.

The following is an interactive representation of a full matrix. Black squares represent non-zero values, and they’re populated randomly as rows are added. Drag the slider to change the dimensions of the matrix.

Size
Density

The elements of this matrix are packed into padded bins. Each bin contains a set number of rows worth of data. The number of rows contained in a bin is called the bin width. The following is an illustration of bin data packing. The slider can be used to adjust the bin width.

The above visual shows the theoretical grouping of the data. The actual layout of the data in memory is optimized for coallesced memory access on the GPU. In actuality, the data is stored in four separate arrays, visualized here:

The values in the pointer array (gray) are indices into the value and column arrays (black), indicating the starting index for each row. The pointer values are represented by gray connecting lines.

2017/10/15

FEM Jello Cube

The second installment in my progress on a ductile fracture physics engine. So far, it supports neither ductile nor fracturable materials, but elastodynamic deformation is a step in the right direction.

So far, it’s pretty much a jello cube, created mostly following the FEM derivations by Erleben in Physics Based Animation. More technical updates to follow.

2017/10/10

Detecting Islands in Tetrahedral Meshes

In the following video a teapot made of 149210 tetrahedra is generated from a regular surface-bounded mesh. Fifty tetrahedra are removed each frame. After each removal the mesh is traversed in search of “islands” - groups of adjacent tetrahedra which share no faces with other groups - which are indicated by color.

I am beginning work on an FEM based model for deforming and fracturing physics objects. Such a model requires that objects to be composed of assemblies of tetrahedrons. This post contains a high level overview of the datastructures and algorithms I’m experimenting with.

2017/06/12
2017/05/30

Inline Tensor Convolution Using Template Expansion

Last year I faced a problem which required me to compute products (convolutions) between vectors, matrixes, vectors of matrixes (rank-3 tensors).

I wanted to do this in C++ and the best option I could find seemed to be FTensor. Although it probably would have worked for my particular case, FTensor seems to have been written in an older style which failed to make use of variadics, and I wanted something a bit more generic.

In a quest for further flexibility I began to work on my own tensor library, Weyl.

Weyl is header only and contains just one core class, tensor, whose entire source code is very short (at the moment, 367 lines including comments). Thanks to fancy-pants template expansion, large tensor convolutions are (usually!) compiled into a list of simple add and multiply operations.

Here’s the sort of thing I aim to do.

weyl::tensor<float, 3, 3, 5> a({ ... });
weyl::tensor<float, 2, 5, 4> b({ ... });
weyl::tensor<float, 3, 3, 2, 4> c = weyl::sum<2, 1>(a, b);

This ought to be equivalent to the tensor convolution through a Riemann sum over one index.

\[\begin{align*} C_{ijlm} = \sum_{k=1}^5 A_{ijk}B_{lkm} \end{align*}\]

In order to produce this result, we need the template function sum to expand to something equivalent to the following set of embedded loops (assuming that the values in the resulting tensor were initialized to zero).

for (i = 0; i < 3; ++i) // i and j are a's contribution to c's rank indexes
for (j = 0; j < 3; ++j)
for (l = 0; l < 2; ++l) // l and m are b's contribution to c's rank indexes
for (m = 0; m < 4; ++m)
for (k = 0; k < 5; ++k) // k is the index of the dimension to sum over
    c[i][j][l][m] += a[i][j][k] * b[l][k][m];

This is essentially what weyl::tensor produces, and since the bounds of each loop are known, they are unrolled at compile time. You can see the sort of assembly it produces here on Godbolt. For brevity, the full source is not included in this example.

Older Stuff:

2017/04/16 Rigidbody Physics Engine - Log 7 - Cached Constraint Results
2017/04/09 Rigidbody Physics Engine - Log 6 - Jenga Tower
2017/04/05 Rigidbody Physics Engine - Log 5 - Box Stacking
2017/03/30 Rigidbody Physics Engine - Log 4 - 3D Picking With Raycasts
2017/03/24 Rigidbody Physics Engine - Log 3 - MLCP Solver
2017/03/17 Rigidbody Physics Engine - Log 2 - Dynamic AABB Tree
2017/03/06 Rigidbody Physics Engine - Log 1
2017/03/02 Generating Quick & Dirty Debug Meshes From Convex Mathematical Geometry
2016/11/25 Constraining IK Joints With Arbitrary Geometry
2016/04/21 Interactive Water Simulation With Finite Differencing
2016/04/13 Stuff I learned from working on my first DigiPen Game Team
2016/04/01 Simple 2D Waves With FFTs
2016/02/29 Cumulative Git User Statistics
2016/02/18 Projecting Screen Coordinates Onto A 3D Plane
2015/12/30 Q-Sim
2015/12/28 A Subtle Sin Of Macro Expansion
2015/12/05 Toy Planes
2015/10/07 Super Simple Fluid Dynamics in a 2D Tile Based Platformer
2015/09/01 Constraining Physics Along A Spline In Unity
2015/08/28 Gen 0 - My Imitative Little Foray Into Generative Art
2015/05/02 Last Man Standin' - Prototype
2014/12/20 Check My Logic
2014/11/20 Frame
2014/09/01 Making The MetaBox Shader
2014/08/14 Tall Rhinoceros
2014/07/18 Metabox - Proof Of Concept
2014/02/02 Clover
2013/07/25 MetaVoxel
2012/08/22 Rigor
2012/08/20 Optimism
2011/05/19 Time Development Of 1D Quantum Systems