More adventures with 3D Gravity simulations and OpenCL


3D Gravity simulations are something I have been interested in for many years now. Some worked, some didn’t, some were more realistic than others.

The first 3D Gravity simulation movie I still have on my YouTube channel is this one from way back in May of 2007. Low res with only a bunch of blurry objects.

Since then I have increased the details and object counts. I also started experimenting with OpenCL for big speedups that allowed many more objects to be simulated in a reasonable time frame.

Moving forward to now

For this latest post I went back and rewrote my code and the OpenCL kernel code to correctly compare every object to all other objects in the gravity calculations. The simulation is using Newton’s law of universal gravitation.

Newtonian Gravity

Every point mass attracts every single other point mass by a force acting along the line intersecting both points. The force is proportional to the product of the two masses and inversely proportional to the square of the distance between them.

How this simulation works

I am using software OpenGL on the CPU for all the rendering of the visuals and CPU and OpenCL for the gravity calculations. OpenCL code runs on your graphics card GPU and GPUs are great at running lots of small bits of code fast at the same time. The gravity formula maths is perfect for multi threading. Every objects velocity and acceleration can be calculated at the same time as the other objects.

The basics of using OpenCL is you fill arrays with the information you want the OpenCL code to use (for 3D gravity I am passing position, velocity, acceleration and mass of the objects), pass it to OpenCL, run the code on the GPU, and then read back the results from the GPU when it is done.

This is the current OpenCL code I am using for these latest simulations. Each of the arrays passed (posx, posy, etc) contain all the current objects, ie for a 1 million object simulation the posx array has 1,000,000 floating point values to cover every object’s X position in 3D space.

__kernel void Gravity3DKernel( __global float * posx,
		               __global float * posy
		               __global float * posz, 
		               __global float * velx, 
		               __global float * vely, 
		               __global float * velz, 
		               __global float * accx, 
		               __global float * accy, 
		               __global float * accz, 
		               __global float * mass, 
		               __global float * mingravdist)
	int index=get_global_id(0);
	float dx,dy,dz,distance,force;
	float positionx=posx[index];
	float positiony=posy[index];
	float positionz=posz[index];
	float mingravdistsqr=mingravdist[index]*mingravdist[index];
	float accelerationx=0;
	float accelerationy=0;
	float accelerationz=0;
	float thismass=mass[index];
	for(int a=0; a<get_local_size(0); a++) {
		if (a!=index) {
			//old method - all objects are assumed to have the same mass
			//new method - allows objects to have different masses

The kernel code loops through every object and calculates the forces against every other object. This is the naive unoptimized O(n2) version of the algorithm. Once all the loops are finished the new object velocity and acceleration values are read back from the GPU memory into local memory and then the CPU can access the results. All of the object positions are then updated using the new velocity and acceleration values and then displayed. For displaying the objects I am using the old software only OpenGL billboard quads. A billboard quad is a texture on a quad (rectangle) that always faces the “camera” in OpenGL. If you put a nicely shaded and transparent “blob” as the texture it looks like a simple star and blends in with other stars.

Calculation Times

Using the above code allows me to process millions of objects in a reasonable time frame. For these simulations the slowest part is always the display and CPU calculations. The OpenCL is always the fastest part of the simulations.

Here are a few stats for how quick (slow) these simulations are;

GeForce GTX 750 Ti – 1 million particles – 600 ms per frame – 58 ms OpenCL time.
GeForce GTX 750 Ti – 5 million particles – 1554 ms per frame – 144 ms OpenCL time.
GeForce 1080 – 5 million particles – 2200 ms per frame – 140 ms OpenCL time.

The reason the per frame time is so much longer than the OpenCL is that I am still using software OpenGL to render all the particles. Software OpenGL falls back to the older v1.0 OpenGL DLLs provided by Microsoft in Windows and has no benefits of hardware acceleration. On my to do list is getting the OpenGL code up to date to use hardware acceleration, then the above times should come way down.

The reason the GeForce 1080 time is slower than the GTX 750 is the 1080 was rendering full 4K resolution frames at the time.


Here is a new sample 4K resolution 3D Gravity movie.

After the last movie I went back and improved the color shading code and added the option for a “black hole”, which in this case is only a single object with a larger mass than the others. The black hole has a mass of around 100 to 500 times the other stars. Any higher and all the stars are flung out of the simulation area too quickly. Here are are some of the latest results.

The spiral galaxy like results are mostly a fluke. I started the simulation with a disk or oblate spheroid (squished sphere) of particles rotating around the origin (Y axis) with a central black hole and let it run.

Try It Yourself

The latest 3D Gravity code is now updated and included in Visions of Chaos. For now I am finally happy that at least the basic Newtonian gravity is working and comparing all objects for calculations.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s