c++ - make for ( rowIdx = 1...) work using cuda threads -
i have in c++
for ( rowidx = 1; rowidx < (nbrows - 1); rowidx++ )
in order using cuda ,how should handle it?
because in cuda do:
if (rowidx < arraysize) ...
if set rowidx=1
before calling if (rowidx < arraysize)
, doesn't work.
----update ----------------------------
a simple example illustration.
__global__ void test_func(int *a_in,int *b_in,int *c_out) { size_t rowidx = blockidx.x * blockdim.x + threadidx.x; rowidx=1; if (rowidx <array_size) c_out[rowidx]=a_in[rowidx]*b_in[rowidx]; } //fill matrices (int i=0;i<array_size;i++){ a_in[i]=i; b_in[i]=i+1; c_out[i]=0; }
if use rowidx=1
,then taking first result correctly.the rest zeros.
for simple replace of loop given functionality provided in example, kernel can looks way.
__global__ void test_func(int *a_in,int *b_in,int *c_out) { size_t rowidx = blockidx.x * blockdim.x + threadidx.x; if (rowidx > 0 && // ensure rowidx @ least 1 rowidx <array_size) // ensure rowidx not out of bounds { c_out[rowidx]=a_in[rowidx]*b_in[rowidx]; } }
all threads compute different array elements starting index 1
array_size-1
. aware "real" first element c_out[0]
won't computed in case.
Comments
Post a Comment