GPU-accelerated Neural Networks with Julia and oneAPI

November 28, '21

Hello again, dear readers! This time it’s going to be a quick post about how to get hardware acceleration in Julia for scientific computing, using neural network execution as an example. This is going to be specifically for Intel graphics cards (there’s a lot of Intel GPU stuff in this blog, huh? 😁).

About Julia

Julia is a good language for scientific computing for several reasons, in my opinion:

On Julia’s Documentation

One point however that’s still quite lacking is the documentation, you can easily find corners of the language that are not well documented or simple idioms that you seem to not be able to find. In those cases, reading some existing projects is always the best source of information to get you back on track.


As explained in the oneAPI.jl intro blogpost (be sure to also check the oneAPI.jl homepage and the actual Intel documentation about oneAPI), you can quickly download everything with a simple command (see what I was talking about when I said “pretty good packaging”?):

pkg> add oneAPI

📒 To check how to enter the pkg> prompt read the Pkg reference @ Julia docs

After everything is downloaded, we can simply start to type a new Julia program with:

using oneAPI

and then use the oneArray API to get things ACCELERATED!

Feed Forward Neural Network

Lets start out by defining what the key data structure for neural networks is: a matrix of real numbers.

const RealMat = AbstractArray{<:Real}

In this case we set the type to AbstractArray so later on we can construct a neural network for different types of arrays.

Now, lets make define data structures useful in writing clearer code to handle the computation of neuron layers:

const LayerInput = RealMat

struct Layer

struct LayerOutput
    # The value of the layer output before running the activation function over
    # each element
    # And after running ϕ (activation function) after each element

Once that’s done we have the entire structure of a neural network quite well defined. We can do the activation function next:

ϕ(x) = max(x, 0)

Here we are using the ReLU function.

Okay, great work so far, but what about actually computing the output value of the nn? 😬 That comes now:

# Compute the outputs for each layer, given the input for the initial layer and
# the actual data (weights and biases) for each layer
function feed_forward(input::LayerInput, network::Vector{Layer})::Vector{LayerOutput}
    # If there's nothing in the nework, then there should be nothing in the
    # output too... 🤔
    isempty(network) && return []

    # We are calling `feed_forward` recursively, so first(network) is always the
    # "current" element we want to calculate with
    layer = first(network)

    partial_output = input * layer.weights + layer.biases
    output = LayerOutput(partial_output, ϕ.(partial_output))

    if length(network) == 1
        # Concatenate the current input with the output for the next layers
        [output; feed_forward(output.final, network[2:end])]

And we are done! The only step that’s now missing is actually computing the value for some network. We are going to do this now ✨

gpu_network = oneArray(
    Layer(oneArray(rand(4096, 4096)), oneArray(rand(1, 4096))),
    Layer(oneArray(rand(4096, 4096)), oneArray(rand(1, 4096))),
    Layer(oneArray(rand(4096, 1)), oneArray(rand(1, 1))),

gpu_input = oneArray(rand(1, 4096))

println(last(feed_forward(gpu_input, gpu_network)).final)

This is where the actual magic happens. By using oneArray here to construct the layers, we are making sure that when we call the multiply, map and sum functions in feed_forward, they are actually going to call routines which transfer the memory to the GPU and compute those operations on the GPU.

Pretty simple when you think about it right? It’s one of the best APIs for GPU programming I have ever seen, actually!

Future Work

In this example there are still some things we probably can optimize, specially around how we are getting the output: in a growable vector on the main memory. That means we are probably pausing the GPU a lot to copy data back.

If instead we did all operations writing back to GPU memory and copying only at the end, we would probably see better performance. Those are things to play around, however. This post was just an introduction for you to see how easy it to turn your GPU into a heater with Julia!

See you all! 💕