i need the kernel code and host stub function that can be called with four parameters:
pointer-to-the-output matrix, pointer-to-the-input matrix, pointer-to-the-input
vector, and the number of elements in each dimension. Use one thread to
calculate an output vector element. an input matrix B and a vector C and
produces one output vector A. Each element of the output vector A is the dot
product of one row of the input matrix B and C, i.e., A[i] = Σj B[i][j] + C[j].
For simplicity, we will only handle square matrices whose elements are
single-precision floating-point numbers.