The core idea in vector processing is to use a single instruction multiple is. Instead of programming at the assembly level we are going to be interacting with vector programming at the C level using intrinsics
Intrinsic (definition) These are special functions and types that we have implemented that you can use within a C program to activate vector operations and vector registers without requiring to deal with assembly-level syntax and finite 32 number of register.
__cs295_vec_int
: The implementation of the vector registers of either float or int type. Each vector register holds a VLEN number of is. These registers can be passed as parameters to the vector functions.{: .table-striped table-bordered}
**cs295_vec_int | Vector of type int |
**cs295_vec_float | Vector of type floa t |
__cs295_mask | Declares mask vector of type bool |
e.g.,
__cs295_vec_int x_v, y_v;
__cs295_mask
: Masks are bool vectors of width VLEN which specify whether a particular lane or index is active in an operation. We illustrate with vmult belowIf the mask is a 1 the lane performs the operation, if its a 0 the lane is in active and result is not modified.
for i = 0 to VLEN
if mask[i] == 1 then
result[i] = x_v[i] * y_v[i]
else
// DO NOTHING
{: .table-striped table-bordered}
Return value | Function | Description |
---|---|---|
__cs295_mask | _cs295_init_ones(width) | Return a mask initialized to 1 in the first width lanes and 0 in the others |
int | _cs295_cntbits(mask); | Count the number of 1s in maska |
_cs295_init_ones
for i = 0 to width
mask[i] = 1
return mask
_cs295_init_ones
for i = 0 to VLEN
if(mask[i] == 1)
ones = ones + 1
return ones
{: .table-striped table-bordered}
**cs295_mask | _cs295_mask_not | Return the inverse of maska |
**cs295_mask | _cs295_mask_and(mask_a,mask_b) | Return (maska & maskb) |
__cs295_mask | _cs295_mask_or(maskA, maskB) | Return (maska or maskb) |
_cs295_mask_and(maskA, maskB)
for i = 0 to VLEN
mask = maskA[i] & maskB[i]
return mask
# 1 s where maskA and maskB is 1
vec below refers to __cs295_vec_int
mask below refers to __cs295_mask
If you want to use the float operations
use float e.g., _cs295_vadd_float
You also need to use the float vector registers
__cs295_vec_float
We use the shorthand notation to ensure readability.
{: .table-striped .table-bordered}
Function | Description |
---|---|
_cs295_vset_int(int value, mask m) | For user's convenience, returns a vector register with all lanes initialized to value otherwise keep the old value |
for i = 0 to VLEN
if (m[i] == 1)
v[i] = 5;
{: .table-striped .table-bordered}
Function | Description |
---|---|
_cs295_vmove_int(vec dest, vec src, mask m) | For user's convenience, returns a vector register with all lanes initialized to value otherwise keep the old value |
for i = 0 to VLEN
if (m[i] == 1)
dst[i] = src[i];
{: .table-striped .table-bordered}
Function | Description |
---|---|
_cs295_vadd_int(vec &res, vec &x_v, vec &y_v, mask m) | Return calculation of (x_v + y_v) if vector lane active |
for i = 0 to VLEN
if (m[i] == 1)
res[i] = x_v[i] + y_v[i];
{: .table-striped .table-bordered}
Function | Description |
---|---|
_cs295_vsub_int(vec &res, vec &x_v, vec &y_v, mask m) | Return calculation of (x_v - y_v) if vector lane active |
for i = 0 to VLEN
if (m[i] == 1)
res[i] = x_v[i] - y_v[i];
{: .table-striped .table-bordered}
Function | Description |
---|---|
_cs295_vmult_int(vec &res, vec &x_v, vec &y_v, mask m) | Return calculation of (x_v * y_v) if vector lane active |
for i = 0 to VLEN
if (m[i] == 1)
res[i] = x_v[i] * y_v[i];
{: .table-striped .table-bordered}
Function | Description |
---|---|
_cs295_vdiv_int(vec &res, vec &x_v, vec &y_v, mask m) | Return calculation of (x_v / y_v) if vector lane active |
for i = 0 to VLEN
if (m[i] == 1)
res[i] = x_v[i] / y_v[i];
{: .table-striped .table-bordered}
Function | Description |
---|---|
_cs295_vdiv_int(vec &res, vec &x_v, vec &y_v, mask m) | Return calculation of (x_v >> y_v) if vector lane active |
for i = 0 to VLEN
if (m[i] == 1)
res[i] = x_v[i] >> y_v[i];
{: .table-striped .table-bordered}
Function | Description |
---|---|
_cs295_vdiv_int(vec &res, vec &x_v, vec &y_v, mask m) | Return calculation of (x_v & y_v) if vector lane active |
for i = 0 to VLEN
if (m[i] == 1)
res[i] = x_v[i] & y_v[i];
{: .table-striped .table-bordered}
Function | Description |
---|---|
_cs295_vabs_int(vec &res, vec &x_v, mask m) | Return calculation of abs(x_v) if vector lane active |
for i = 0 to VLEN
if (m[i] == 1)
res[i] = abs(x_v[i]);
These set of operations return a mask with true/false based on equality check between two vectors. Note that there are two masks in these operations.
{: .table-striped .table-bordered}
Function | Description |
---|---|
_cs295_vgt_int(mask &resmask, vec &x_v, vec &y_v, mask &mask); | Return a mask of (x_v > y_v) if vector lane active; otherwise keeps old value. |
for i = 0 to VLEN
# This mask controls whether lane is active
if (m[i] == 1)
# The result of the comparison controls
# result mask
if (x_v[i] > y_v[i])
resmask[i] = 1
else
resmask[i] = 0
{: .table-striped .table-bordered}
Function | Description |
---|---|
_cs295_vgt_int(mask &resmask, vec &x_v, vec &y_v, mask &mask); | Return a mask of (x_v < y_v) if vector lane active; otherwise keeps old value. |
for i = 0 to VLEN
# This mask controls whether lane is active
if (m[i] == 1)
# The result of the comparison controls
# result mask
if (x_v[i] < y_v[i])
resmask[i] = 1
else
resmask[i] = 0
{: .table-striped .table-bordered}
Function | Description |
---|---|
_cs295_vgt_int(mask &resmask, vec &x_v, vec &y_v, mask &mask); | Return a mask of (x_v == y_v) if vector lane active; otherwise keeps old value. |
for i = 0 to VLEN
# This mask controls whether lane is active
if (m[i] == 1)
# The result of the comparison controls
# result mask
if (x_v[i] == y_v[i])
resmask[i] = 1
else
resmask[i] = 0
{: .table-striped .table-bordered}
Function | Description |
---|---|
_cs295_vmove_int(vec &dest, vec &src, mask &mask); | Copies one vector to another. |
for i = 0 to VLEN
# This mask controls whether lane is active
if (m[i] == 1)
x_v[i] = y_v[i]
{: .table-striped .table-bordered}
Function | Description |
---|---|
_cs295_vload_int(vec &x_v, int* src, mask &mask); | Load values from array src to vector register dest if vector lane active |
for i = 0 to VLEN
# This mask controls whether lane is active
if (m[i] == 1)
x_v[i] = src[i]
{: .table-striped .table-bordered}
Function | Description |
---|---|
_cs295_vstore_int(int* dst, vec &x_v, mask &mask); | Store values from vector register to array dest if vector lane active |
WARNING: x_v is a register
dst is an array in memory
for i = 0 to VLEN
# This mask controls whether lane is active
if (m[i] == 1)
dest[i] = x_v[i]
Converts array-of-structs to multiple arrays.
{: .table-striped .table-bordered}
Function | Description |
---|---|
_cs295_vload_seg_int(vec dst[], int*src, int fields); | Loads a vector of tuples of is from memory such that component of the tuple is loaded into a different vector. This operation is useful to convert a memory representation of Array-of-Structures into a register representation of Structure-of-Arrays. |
Note that we are returning an array of vector registers.
for i = 0 to VLEN
for f = 0 to fields
if (m[i] == 1)
dest_v[f][i] = src
# Pointer scaled by int
src = src + 1
{: .table-striped .table-bordered}
Function | Description |
---|---|
int _cs295_firstbit(__cs295_mask &maska); | Finds first index that is non-zero from left |
for i = 0 to VLEN:
if (m[i] != 0):
return i
return 0
{: .table-striped .table-bordered}
Function | Description |
---|---|
_cs295_hadd_int(vec result, vec x_v); | Adds up adjacent pairs of is, so |
[0 1 2 3] -> [0+1 0+1 2+3 2+3] |
for i = 0 to VLEN
# This mask controls whether lane is active
if (i % 2 == 0)
res[i] = x_v[i] + x_v[i +1]
else
res[i] = res[i-1]
{: .table-striped .table-bordered}
Function | Description |
---|---|
_cs295_interleave_int(vec result, vec x_v); | Performs an even-odd interleaving where all even-indexed is move to front half of the array and odd-indexed to the back half, so [0 1 2 3 4 5 6 7] -> [0 2 4 6 1 3 5 7] |