Grace Hopper developed the first compiler for a computer programming language.

Lab 9

Lab 9

The core idea in vector processing is to use a single instruction multiple is. Instead of programming at the assembly level we are going to be interacting with vector programming at the C level using intrinsics

Intrinsic (definition) These are special functions and types that we have implemented that you can use within a C program to activate vector operations and vector registers without requiring to deal with assembly-level syntax and finite 32 number of register.

With intrinsics you can create an arbtirary number of vector registers in your program and invoke as many operations as defined in our vector library.

Vector types

__cs295_vec_int : The implementation of the vector registers of either float or int type. Each vector register holds a VLEN number of is. These registers can be passed as parameters to the vector functions.

{: .table-striped table-bordered}


**cs295_vec_int	Vector of type int
**cs295_vec_float	Vector of type floa t
__cs295_mask	Declares mask vector of type bool

e.g.,

__cs295_vec_int x_v, y_v;

__cs295_mask: Masks are bool vectors of width VLEN which specify whether a particular lane or index is active in an operation. We illustrate with vmult below

If the mask is a 1 the lane performs the operation, if its a 0 the lane is in active and result is not modified.

for i = 0 to VLEN
   if mask[i] == 1 then
     result[i] = x_v[i] * y_v[i]
  else
    // DO NOTHING

Mask instructions

{: .table-striped table-bordered}

Return value	Function	Description
__cs295_mask	_cs295_init_ones(width)	Return a mask initialized to 1 in the first width lanes and 0 in the others
int	_cs295_cntbits(mask);	Count the number of 1s in maska

_cs295_init_ones
for i = 0 to width
     mask[i] = 1
return mask

_cs295_init_ones
for i = 0 to VLEN
    if(mask[i] == 1)
       ones = ones + 1
return ones

Mask logical operations

{: .table-striped table-bordered}


**cs295_mask	_cs295_mask_not	Return the inverse of maska
**cs295_mask	_cs295_mask_and(mask_a,mask_b)	Return (maska & maskb)
__cs295_mask	_cs295_mask_or(maskA, maskB)	Return (maska or maskb)

_cs295_mask_and(maskA, maskB)
for i = 0 to VLEN
     mask = maskA[i] & maskB[i]
return mask
# 1 s where maskA and maskB is 1

Vector compute instructions

vec below refers to __cs295_vec_int mask below refers to __cs295_mask

If you want to use the float operations use float e.g., _cs295_vadd_float You also need to use the float vector registers __cs295_vec_float

We use the shorthand notation to ensure readability.

_cs295_vset_int

{: .table-striped .table-bordered}

Function	Description
_cs295_vset_int(int value, mask m)	For user's convenience, returns a vector register with all lanes initialized to value otherwise keep the old value

for i = 0 to VLEN
   if (m[i] == 1)
     v[i] = 5;

_cs295_vmove_int

{: .table-striped .table-bordered}

Function	Description
_cs295_vmove_int(vec dest, vec src, mask m)	For user's convenience, returns a vector register with all lanes initialized to value otherwise keep the old value

for i = 0 to VLEN
   if (m[i] == 1)
     dst[i] = src[i];

_cs295_vadd_int

{: .table-striped .table-bordered}

Function	Description
_cs295_vadd_int(vec &res, vec &x_v, vec &y_v, mask m)	Return calculation of (x_v + y_v) if vector lane active

for i = 0 to VLEN
   if (m[i] == 1)
     res[i] = x_v[i] + y_v[i];

_cs295_vsub_int

{: .table-striped .table-bordered}

Function	Description
_cs295_vsub_int(vec &res, vec &x_v, vec &y_v, mask m)	Return calculation of (x_v - y_v) if vector lane active

for i = 0 to VLEN
   if (m[i] == 1)
     res[i] = x_v[i] - y_v[i];

_cs295_vmult_int

{: .table-striped .table-bordered}

Function	Description
_cs295_vmult_int(vec &res, vec &x_v, vec &y_v, mask m)	Return calculation of (x_v * y_v) if vector lane active

for i = 0 to VLEN
   if (m[i] == 1)
     res[i] = x_v[i] * y_v[i];

_cs295_vdiv_int

{: .table-striped .table-bordered}

Function	Description
_cs295_vdiv_int(vec &res, vec &x_v, vec &y_v, mask m)	Return calculation of (x_v / y_v) if vector lane active

for i = 0 to VLEN
   if (m[i] == 1)
     res[i] = x_v[i] / y_v[i];

_cs295_vshiftright_int

{: .table-striped .table-bordered}

Function	Description
_cs295_vdiv_int(vec &res, vec &x_v, vec &y_v, mask m)	Return calculation of (x_v >> y_v) if vector lane active

for i = 0 to VLEN
   if (m[i] == 1)
     res[i] = x_v[i] >> y_v[i];

_cs295_vbitand_int

{: .table-striped .table-bordered}

Function	Description
_cs295_vdiv_int(vec &res, vec &x_v, vec &y_v, mask m)	Return calculation of (x_v & y_v) if vector lane active

for i = 0 to VLEN
   if (m[i] == 1)
     res[i] = x_v[i] & y_v[i];

_cs295_vabs_int

{: .table-striped .table-bordered}

Function	Description
_cs295_vabs_int(vec &res, vec &x_v, mask m)	Return calculation of abs(x_v) if vector lane active

for i = 0 to VLEN
   if (m[i] == 1)
     res[i] = abs(x_v[i]);

Comparison instructions

These set of operations return a mask with true/false based on equality check between two vectors. Note that there are two masks in these operations.

The mask operator which controls which lanes are active
The result mask which is a result of the comparisons.

_cs295_vgt_int

{: .table-striped .table-bordered}

Function	Description
_cs295_vgt_int(mask &resmask, vec &x_v, vec &y_v, mask &mask);	Return a mask of (x_v > y_v) if vector lane active; otherwise keeps old value.

for i = 0 to VLEN
# This mask controls whether lane is active
   if (m[i] == 1)
    # The result of the comparison controls
    # result mask
     if (x_v[i] > y_v[i])
        resmask[i] = 1
     else
        resmask[i] = 0

_cs295_vlt_int

{: .table-striped .table-bordered}

Function	Description
_cs295_vgt_int(mask &resmask, vec &x_v, vec &y_v, mask &mask);	Return a mask of (x_v < y_v) if vector lane active; otherwise keeps old value.

for i = 0 to VLEN
# This mask controls whether lane is active
   if (m[i] == 1)
    # The result of the comparison controls
    # result mask
     if (x_v[i] < y_v[i])
        resmask[i] = 1
     else
        resmask[i] = 0

_cs295_veq_int

{: .table-striped .table-bordered}

Function	Description
_cs295_vgt_int(mask &resmask, vec &x_v, vec &y_v, mask &mask);	Return a mask of (x_v == y_v) if vector lane active; otherwise keeps old value.

for i = 0 to VLEN
# This mask controls whether lane is active
   if (m[i] == 1)
    # The result of the comparison controls
    # result mask
     if (x_v[i] == y_v[i])
        resmask[i] = 1
     else
        resmask[i] = 0

Vector memory instructions

_cs295_vmove_int

{: .table-striped .table-bordered}

Function	Description
_cs295_vmove_int(vec &dest, vec &src, mask &mask);	Copies one vector to another.

for i = 0 to VLEN
# This mask controls whether lane is active
   if (m[i] == 1)
     x_v[i] = y_v[i]

_cs295_vload_int

{: .table-striped .table-bordered}

Function	Description
_cs295_vload_int(vec &x_v, int* src, mask &mask);	Load values from array src to vector register dest if vector lane active

for i = 0 to VLEN
# This mask controls whether lane is active
   if (m[i] == 1)
     x_v[i] = src[i]

_cs295_vstore_int

{: .table-striped .table-bordered}

Function	Description
_cs295_vstore_int(int* dst, vec &x_v, mask &mask);	Store values from vector register to array dest if vector lane active

WARNING: x_v is a register dst is an array in memory

for i = 0 to VLEN
# This mask controls whether lane is active
   if (m[i] == 1)
     dest[i] = x_v[i]

_cs295_vload_seg_int

Converts array-of-structs to multiple arrays.

{: .table-striped .table-bordered}

Function	Description
_cs295_vload_seg_int(vec dst[], int*src, int fields);	Loads a vector of tuples of is from memory such that component of the tuple is loaded into a different vector. This operation is useful to convert a memory representation of Array-of-Structures into a register representation of Structure-of-Arrays.

Note that we are returning an array of vector registers.

  for i = 0 to VLEN
    for f = 0 to fields
      if (m[i] == 1)
        dest_v[f][i] = src
        # Pointer scaled by int
        src = src + 1

Vector index instructions

_cs295_firstbit

{: .table-striped .table-bordered}

Function	Description
int _cs295_firstbit(__cs295_mask &maska);	Finds first index that is non-zero from left

for i = 0 to VLEN:
  if (m[i] != 0):
     return i
return 0

Vector permutation instructions

_cs295_hadd_int

{: .table-striped .table-bordered}

Function	Description
_cs295_hadd_int(vec result, vec x_v);	Adds up adjacent pairs of is, so
[0 1 2 3] -> [0+1 0+1 2+3 2+3]

for i = 0 to VLEN
# This mask controls whether lane is active
   if (i % 2 == 0)
    res[i] = x_v[i] + x_v[i +1]
   else
     res[i] = res[i-1]

_cs295_interleave_int

{: .table-striped .table-bordered}

Function	Description
_cs295_interleave_int(vec result, vec x_v);	Performs an even-odd interleaving where all even-indexed is move to front half of the array and odd-indexed to the back half, so [0 1 2 3 4 5 6 7] -> [0 2 4 6 1 3 5 7]