Mark A Becker
Home  Me  Portfolio  Words
Send message  Send Message GitHub Google + LinkedIn FaceBook
 Parallel Processing With a GPU
Project Overview

Project team:
Jason Allen, Mark Becker, Cameron Brown, Michael Sawaya

The project description:
General-purpose computing on graphics processing units (GPGPU) is a relatively new field that uses the graphical processing unit’s (GPU) processing architecture. GPUs are normally used for computer graphic computation. We are using this many-core architecture to perform computations on very large data sets using the GPU’s parallelization capabilities.

We designed and developed a ray tracer that demonstrates the OpenCL framework. A suite of benchmark tests were developed to compare relative run times. Benchmarks were run on these systems using native C code on the CPU, and OpenCL using both the CPU and GPU. The project had three main parts: building the ray tracer, utilizing the OpenCL framework, creating the benchmarks.

This page is in six sections:
  1. Parallel Processing
  2. Ray tracer
  3. GPGPU and OpenCL
  4. The Program
  5. The Benchmarks
  6. Extra Material
What is a Senior Project @ SJSU

cs174 class projectSenior projects are to be done during your last two semesters.

Semester 1 course description: Individual or group design projects. Proposal preparation with plans and specifications; oral and written reports; professional seminars.

Semester 2 course description: Construction, testing, and evaluation of the design from 195A culminating in demonstrations and written and oral presentations to faculty and peers.
 Section 1: Parallel Processing
What is Parallel Processing?

Wikipedia: Parallel processing
"The simultaneous use of more than one CPU or processor core to execute a program or multiple computational threads."

Wikipedia: Parallel computing
Parallel computing is a form of computation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently ("in parallel").
Why is Parallel Processing Important

The problem as of right now is that when the frequency increases, the power consumption increases too.

Primary driver is Moore’s Law:
  • Number of Transistors Doubles every 18-24 Months
  • Stated by Gordon Moore, Intel Co-Founder in 1965
  • Prediction has been proven valid over a long term
  • "Prediction" has been the "Law" for over 40 year
  • Clock frequency scaling limits have been reached
  • Instruction Level Parallelism limits have been reached
  • Era of Single Core Performance increases has ended
  • No More "Free Lunch" for Software Programmers
    • Multiple cores will directly expose parallelism to software
  • All future micro-processor designs will be Multi-Core
    • Evident in Chip Manufacturer’s Road Maps
 Section 2: Ray tracer
What is a Ray tracer?

Ray tracing is a technique that generates an image by tracing the path of light through pixels in an view plane and simulating the effects of its intersections with virtual objects. Ray tracing is capable of producing an image of photo-realistic quality. usually higher than that of scan-line rendering methods. Unfortunately this level of realism comes at a greater computational cost.

cs174 class project
Parts of a Ray tracer
  • Coordinate System – A Ray tracer uses a basic Cartesian coordinate system. All of the objects, including the rays themselves, have an origin in the X Y Z system.
  • Camera – This is the single point of origin of all primary rays dealt with in the area of the View Plane.
  • View Plane – The View Plane, in a sense, is the image that is created. It has a width and height of pixels that the rays, originating from the Camera, pass through.
  • Primitive – A primitive in ray tracing is an object that has several attributes / qualities. They have positions (center), size, material, reflective and refractive values, and color.
  • Light Source – The light source is a primitive where a Boolean variable "is_light" is set to true.
  • Plane – A plane has a center, a material, and a vector to determine the size and angle of itself in the three dimensional scene.
  • Sphere – A sphere is a primitive that has a center, a material, and a vector to determine the size and position of itself in the three dimensional scene.
  • Rays – A ray, in the program, are just vectors that have an origin and a direction.
Our Ray tracer Architecture

cs174 class project

The illustration above shows the general flow of the Ray tracer portion of the program. The Ray tracer was partially a recursive algorithm. It is recursive only when a new ray needed to be spawned. A ray needed to be spawned when there was a reflection or refraction that was influencing the color returned.
Our Ray tracer Architecture
  • Camera – The Camera is a position in the three dimensional scene that is the origin for all the primary rays. The Camera is positional relative to the View Plane and the two works together to create the view.
  • View Plane – The View Plane is a position as well, but has a height and width which is derived from the actual height and width of the output image. The View Plane is divided by the granularity the image is to have. In other words, the View Plane is the set of pixels that the ray vector will go through. There is one ray sent for every pixel. The information collected by the ray will be added to an array of integers called pixels. This array has a size of height x width.
  • Primitive – A primitive in ray tracing is an object that has several attributes / qualities. They have positions (center), size, material, reflective and refractive values, and color. Primitives also have types assigned to them, for example "sphere", "plane", "light source".
  • Scene – The scene is made up of a list of primitives. This list exists both on the Host side and the Kernel side.
  • Intersect – The Intersect is the calculation of whether a ray hits a primitive. For example, the computation to find if the ray hit the sphere is by taking the center of the sphere, adding the size (radius) of the sphere and checking to see if the ray crosses the surface. If it does, then return the color of the material.
  • Queue – The Queue is used to collect the virtual rays, origin and direction, so that each trace depth can be collected and then reversed when the ray has reached the end.
  • Output – The Output for the program is the run time for the kernel and the output image file in a bitmap format.
 Section 3: GPGPU and OpenCL
What is GPGPU and OpenCL?

GPGPU (general-purpose computing on graphics processing units) is a relatively new field of computing that attempts to use standard GPUs, traditionally used for graphics rendering, to perform more general computational tasks.

OpenCL (Open Computing Language) is an industry standard framework for programming computers composed of a combination of CPUs, GPUs, and other processors. OpenCL includes a language (based on C99) for writing kernels (functions that execute on OpenCL devices) and APIs that are used to define and control the platforms. OpenCL provides parallel computing using task-based and data-based parallelism.

Modern GPUs are very efficient at manipulating computer graphics, and their highly parallel structure makes them more effective than general-purpose CPUs for algorithms where processing of large blocks of data is done in parallel.

The GPU is divided into multiple "Compute Units", which are divided into multiple "Stream Cores", which are divided into multiple "Processing Elements".
GPU Architecture

cs174 class project
 Section 4: The Program
The Program Architecture

The program contains two separate but similar ray tracers. One is a sequential, CPU based tracer that is used as a control when benchmarking performance. The other is an OpenCL-based, parallel tracer. Since both the program and the OpenCL kernel are written in C, these two tracers share the majority of their code; this allows for valid comparisons to be made between their execution times.
The Program Components
  • User Interface design - The User Interface is written in Java. Java was used for portability and ease of creating the interface. The User Interface directly calls the executable file created by the program. The UI also allows for the displaying of the runtime log and image created by the program.
  • Host's component design - The Host component has the responsibility of discovering the hardware, creating the OpenCL program and creating and executing the OpenCL Kernel. The Host is written in C and includes the OpenCL API so to use the OpenCL specific data types. The Host is coupled tightly with the rest of the program, but will not execute if the arguments passed to the Main function explicitly say to run Non-OpenCL.
  • Kernel's component design - The Kernel design is purposefully simple. It is not expected to, nor can it, interact with any other component other than the Host. The design is to be independent from the rest of the program even the other Kernels.
  • Non-OpenCL's component design - The Non-OpenCL's component is the most traditional part, in a programming sense, of the program. The Non-OpenCL component consists of the Ray tracing algorithm and interfaces with the data structure created by the Ray tracer Engine. But because of the possible use of OpenCL, the interaction with the other components is kept to a minimum.
The Program UI

The Program UI

Ray tracer frontend is the program entry point, and simply serves to create the main JFrame, which consists of a single OptionsPanel. The OptionsPanel provides easy access to every option available in the command-line program. When the user clicks the Execute Run button, an instance of RaytracerRun is created and started in a new thread. The user interface will also lock itself, preventing the execution of multiple runs simultaneously.

The RaytracerRun object is responsible for launching and monitoring the Parallel Processing With a GPU command-line process. When it detects that the ray tracer has completed, it creates an instance of OutputFrame, which is a window displaying the results of the ray trace.

The main window is implemented using a JPanel with a GroupLayout consisting of numerous text fields, labels, and checkboxes.
The Program Output

The Program Output

The run display window is a JFrame with two components; a scrolling text field that contains the output text of the ray trace and a panel with the output image. Having access to the text allows for viewing the device and settings used in that run, as well as the program runtime.

The image is displayed using the BackgroundPanel class, which takes an Image and sets it as the background of a JPanel. After finishing execution, the main window unlocks so another run can be executed, potentially with different options.
 Section 5: The Benchmarks
The Benchmark Data

The Benchmark Data

Using different combinations of workgroups, trace iterations, image sizing etc., we compared the time of run between different graphic cards and computer processing units.

The Decreasing Trace-depth is a collection of the run times where each run varies by the number of recursive depths created.

What shows in Cameron’s tests, the AMD OpenCL CPU runs were uncharacteristically longer than they should be. We found out that even though the CPU has four physical cores, the AMD OpenCL driver only recognized one of the cores. This is why the times are close to the Non-OpenCL run times. We believe that this is due to the fact that the CPU is reasonably new, a second generation Intel i7, and AMD’s driver has not been updated. The tests do show that Intel’s OpenCL driver sees all the cores as expected.
The Benchmark Speed-ups

The Benchmark Speed-ups

As all the charts show, the AMD OpenCL GPU times were the fastest. In general AMD OpenCL GPU ran 7 to 13 times faster than the native C run times.

What was a bit unexpected were the AMD and Intel OpenCL CPU times. They were fast for 4 core processors. More exploration would need to be done to see if there can be more improvement gained through optimization.

Even though we developed a ray tracer, the purpose of this project was to demonstrate the performance improvement of OpenCL over a generic variant of a ray tracer using native C. As shown in our benchmark tests, we found that we can develop a ray tracer that can run over 10 times faster than its native C counterpart. The future of this project is further testing on additional hardware, code optimization and comparison to nVidia’s CUDA framework.
 Section 6: Extra Material
The Presentation Board

A presentation board was design for the project. The design and photo printing was done by me. The left and right panels were 12" x 36" and the center panel was 24" x 36". Click the images or the links below to view a PDF version of the art.

posterboard_left_panel posterboard_center_panel posterboard_right_panel_small

The Left Panel... [A new window will open]
The Center Panel... [A new window will open]
The Right Panel... [A new window will open]
The Report

The above information, in greater detail, can be found in the report created for this project. The program code is in the appendix of the report. The report is a 4.1MB PDF file.

The Report... [A new window will open]
Contact   ©2012 Mark Becker