COMP 426 Assignment 3
Sam Anthony 40271987


= Manycore implementation of 2D bouncing balls simulation on the GPU =


# Simulation

This version uses OpenCL to run the simulation on the GPU.  The old C
functions were transcribed into three OpenCL kernels: move(), which
updates the positions of the balls; collideWalls(), which reacts to
collisions between balls and the bounds of the screen; and collideBalls(),
which reacts to collisions between pairs of balls.

The positions, velocities, and radii of the balls are stored in OpenCL
buffers in GPU memory.  Vectors are now real float2 vectors as opposed
to structs in prior implementations.

The move() and collideWalls() kernels run with the global work size set
to the number of balls.  Each thread works on one of the balls.

The collideBalls() kernel relies on the partitioning scheme from the
TBB implementation.  The host partitions the set of collisions between
pairs of balls so that dependencies are separated in different cells of
the partition.  All collisions within a cell may run in parallel without
synchronization.  The partition is represented as a 2D array of pairs of
indices of balls.  Each row of the array represents a cell, and each pair
of vertices within a row represents a collision between those two balls.

The host creates an OpenCL buffer for each cell of the partition and
copies the arrays into GPU memory.  To run the collideBalls() kernel,
the host iterates over the partition and sets the kernel argument to the
appropriate buffer containing the cell.  The global work size is set to
the size of the cell.  Each thread works on one collision between a pair
of balls.

This version of the program uses a different collision formula based on
the impulse (J) between the two colliding balls.  The formula is based
on this page:

https://introcs.cs.princeton.edu/java/assignments/collisions.html


# Graphics

OpenGL is used to draw the balls on screen.  The program leverages
interoperability between OpenCL and OpenGL to minimize data transfer
between host and GPU.

There are two OpenGL vertex buffer objects that reside on the GPU:
one containing vertices, and one containing colors.  There is also an
additional OpenCL buffer instantiated with clCreateFromGLBuffer() which
points to the same data as the GL vertex VBO.

The genVertices() OpenCL kernel sets the vertex buffer according to the
positions of the balls.  The local work size is the number of vertices
per ball (24 currently.)  Each thread sets one vertex.  There is one
work group per ball.  The group id is used to index the position array,
and the global id is used to index the vertex array.

After the vertices are set, the program uses glDrawArrays() with
GL_TRIANGLE_FAN to draw the balls on-screen.