It has been a few years since I last was experimenting with OpenCL for gravity simulations.

At that time I made some mistakes with my code that resulted in not all of the objects being compared to all of the objects in the gravity calculations. For the purists this is not acceptable “How can you call it a gravity simulation if you are not using all objects in the calculations?!” The other opinion raised at the time was if it looks good enough, use the speed ups. None of the comments to my YouTube gravity videos were (in typical YouTube speak) “haha lol! looks fake! u no do gud gravity!”. So, if you are struggling with getting gravity working fast, try reducing the number of actual particle interactions.

Here is the latest updated CL kernel code….

```
__kernel void Gravity3DKernel(__global float4* pos, __global float4* vel, __global float4* acc, __global float4* mass, float mingravdist, float forcescalefactor, int startindex, int stopindex, int gridsize)
{
int index=get_global_id(0)+startindex;
float dx,dy,dz,distance,force;
float positionx=pos[index].x;
float positiony=pos[index].y;
float positionz=pos[index].z;
acc[index].x=0;
acc[index].y=0;
acc[index].z=0;
for(int a=0; a<get_local_size(0); a++) {
if (a!=index) {
dx=pos[a].x-positionx;
dy=pos[a].y-positiony;
dz=pos[a].z-positionz;
distance=sqrt(dx*dx+dy*dy+dz*dz);
dx=dx/distance;
dy=dy/distance;
dz=dz/distance;
force=1/(distance*distance+mingravdist*mingravdist)*forcescalefactor;
acc[index].x+=dx*force;
acc[index].y+=dy*force;
acc[index].z+=dz*force;
}
}
vel[index].x+=acc[index].x;
vel[index].y+=acc[index].y;
vel[index].z+=acc[index].z;
}
```

The kernel processes all of the 3d objects once using 1 of the other objects (passed as startindex).

You can call the kernel multiple times each frame/step of the simulation. If you want strictly accurate (and slow) gravity simulation results you call it inside a loop that sets startindex from 1 to the number of objects. In my experiments this is not necessary. If you want “good enough” nice looking gravity simulations then calling the kernel with as little as 10 different startindexes (use a set of objects spread out among all of them or just use a random set of 10 objects each frame).

Here is a recent 4K resolution example movie. This one used 1,000 objects out of 5,000,000 objects for the gravity calculations. The objects start by being randomly placed within an oblate spheroid. Their velocities are initialized so they are rotating around the center axis.

Jason.