c++ - make for ( rowIdx = 1...) work using cuda threads -
i have in c++
for ( rowidx = 1; rowidx < (nbrows - 1); rowidx++ ) in order using cuda ,how should handle it?
because in cuda do:
if (rowidx < arraysize) ... if set rowidx=1 before calling if (rowidx < arraysize) , doesn't work.
----update ----------------------------
a simple example illustration.
__global__ void test_func(int *a_in,int *b_in,int *c_out) { size_t rowidx = blockidx.x * blockdim.x + threadidx.x; rowidx=1; if (rowidx <array_size) c_out[rowidx]=a_in[rowidx]*b_in[rowidx]; } //fill matrices (int i=0;i<array_size;i++){ a_in[i]=i; b_in[i]=i+1; c_out[i]=0; } if use rowidx=1 ,then taking first result correctly.the rest zeros.
for simple replace of loop given functionality provided in example, kernel can looks way.
__global__ void test_func(int *a_in,int *b_in,int *c_out) { size_t rowidx = blockidx.x * blockdim.x + threadidx.x; if (rowidx > 0 && // ensure rowidx @ least 1 rowidx <array_size) // ensure rowidx not out of bounds { c_out[rowidx]=a_in[rowidx]*b_in[rowidx]; } } all threads compute different array elements starting index 1 array_size-1. aware "real" first element c_out[0] won't computed in case.
Comments
Post a Comment