Skip to content

Instantly share code, notes, and snippets.

@omaralvarez
Last active July 12, 2016 07:12
Show Gist options
  • Select an option

  • Save omaralvarez/32441d4791cf3198f837de147dcec432 to your computer and use it in GitHub Desktop.

Select an option

Save omaralvarez/32441d4791cf3198f837de147dcec432 to your computer and use it in GitHub Desktop.
Shared Memory Gather
__global__ void s2g_sm_gpu_gather_kernel(int *in, int *out, int len) {
//@@ INSERT CODE HERE
__shared__ int values[TILE_SIZE];
// Get our global thread ID
int id = blockIdx.x*blockDim.x + threadIdx.x;
int sum = 0;
for (int k = 0; k < (len - 1)/TILE_SIZE + 1; ++k) {
int start = k*TILE_SIZE;
int lid = start + threadIdx.x;
if(lid < len)
values[threadIdx.x] = outInvariant(in[lid]);
else
values[threadIdx.x] = 0;
__syncthreads();
for (int l = 0; l < TILE_SIZE; ++l) {
if(start + l < len)
sum += outDependent(values[l], start + l, id);
}
__syncthreads();
}
if(id < len)
out[id] = sum;
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment