Grace Hopper developed the first compiler for a computer programming language.

Lab 7

This lab will help you get acquainted with the routines needed for assignment 3. However, note that file formats and read/writes are different then assignment 3.

Lab 7

Acknowledgements

This lab has been modified by your CMPT 295 instructor.

Goals

The purpose of this project is to have you implement a simple, configurable dot product in RISC-V assembly language.
You will learn to use registers efficiently, write functions, use calling conventions for calling your functions, as well as external ones, allocate memory on the stack and heap, work with pointers and more!

Check Yourself

Do you know how to use venus ?
Have you looked at the RISCV CALL card and pseudo instructions.

wget "https://www2.cs.sfu.ca/~ashriram/Courses/CS295/assets/distrib/Venus/jvm/venus.jar"

Background

What is the RISC-V Calling Convention?

We will be testing all of your code on RISC-V calling conventions, as described in lecture/lab/discussion. All functions that overwrite registers that are preserved by convention must have a prologue and epilogue where they save those register values to the stack at the start of the function and restore them at the end.

Follow the calling conventions. It is extremely important for this project, as you’ll be writing functions that call your other functions, and maintaining the abstraction barrier provided by the conventions will make your life a lot easier.

We’ve provided # Prologue and # Epilogue comments in each function as a reminder. Note that depending on your implementation, some functions won’t end up needed a prologue and epilogue. In these cases, feel free to delete/ignore the comments we’ve provided.

For an closer look at RISC-V calling conventions, refer here.

What is the prologue and epilogue in the factorial?
What is the purpose of addi sp, sp, -8 and addi sp, sp, 8?
Why do we need this statement ? sw s0, 4 (sp)

Source

test_files/ : contains the driver programs. These programs set up the arguments before invoking the functions and then display the results.
main.s : Main driver program for running the neural network end-to-end

Source	Description
test_dot.s, dot.s	Driver for dot product, implement dot product
test_read_vector.s, read_vector.s	Driver for reading matrix, read the matrix
main.s	Main starting file for end-to-end run

In the test_files subdirectory, you’ll find several RISC-V files to test your code with. There is a test file corresponding to every function you’ll have to write, except for the part 2.

DO NOT MODIFY THE INPUTS WHEN COMMITTING THE FILES TO GIT. IT WILL FAIL THE REFERENCE CHECKS

Inputs and read_vector

input/ : the various input files. We have included two input vectors m0.bin and m1.bin

In this project, vectors are stores in row-major order. We can think of vectors as one-dimensional matrices with all values flattered out.

In this part, you will implement functions to read matrices from the binary files. Then, you’ll write a main function putting together all of the functions you’ve written so far into an MNIST classifier, and run it using pre-trained weight matrices that we’ve provided.

Our vector files are provided in binary format. We recommend the xxd command to open the binary file (DO NOT USE YOUR EDITOR). You can find it’s man page here, but its default functionality is to output the raw bits of the file in a hex representation.

The first 4 bytes of the binary file represent one 4 byte integer. These integers are the number of elements in the vector. Every 4 following bytes represents an integer that is an element of the matrix, in row-major order. In this case each of the 4 bytes represents a value of the pixel There are no gaps between elements.. It is important to note that the bytes are in little-endian order. This means the least significant byte is placed at the lowest memory address. For files, the start of the file is considered the lower address. This relates to how we read files into memory, and the fact that the start/first element of an array is usually at the lowest memory address.

$ xxd ./inputs/m0.bin | more
# hit q to exit the viewer

The stride of a vector is the number of memory locations between consecutive accesses to the vector. If our stride is n, then consecutive accesses access vector[i] and vector[i+n]. If the address of vector[i] is address, then the memory address of vector[i+n] = address + n * sizeof(element).

So far, all the arrays/vectors we’ve worked with have had stride 1, meaning there is no gap betwen consecutive elements. For a closer look at strides in vectors/arrays, see this Wikipedia page.

Task 1: Read Matrix

In read_vector.s, implement the read_vector function which uses the file operations we described above to read a binary matrix file into memory. If any file operation fails or doesn’t return the expected number of bytes, exit the program with exit code 1. The code to do this has been provided for you, simply jump to the eof_or_error label at the end of the file.

Recall that the first 4 bytes indicate the size of the vecotr, which will tell you how many bytes to read from the rest of the file.

You’ll need to allocate memory for the matrix in this function as well. This will require calls to malloc , which is in util.s and also described in the background section above.

Finally, note that RISC-V only allows for a0 and a1 to be return registers, and our function needs to return three values: The pointer to the matrix in memory, the number of rows, and the number of columns. We get around this by having two int pointers passed in as arguments. We set these integers to the number of rows and columns, and return just the pointer to the matrix.

Testing: Read Matrix

$ java -jar venus.jar ./test_files/test_read_vector.s
## See the screen output
## To validate
$ java -jar venus.jar ./test_files/test_read_vector.s > ./out/test_read_vector.out
$ diff ./out/test_read_vector.out ./ref/test_read_vector.out
# If diff report any lines, then check your output

Task 2 Dot Product

In dot.s, implement the dot function to compute the dot product of two integer vectors. The dot product of two vectors a and b is defined as dot(a, b) = \sum_{i=0}^{n-1} a_ib_i = a_0 * b_0 + a_1 * b_1 + \cdots + a_{n-1} * b_{n-1}, where a_i is the ith element of a.

Notice that this function takes in the a stride as a variable for each of the two vectors, make sure you’re considering this when calculating your memory addresses. We’ve described strides in more detail in the background section above, which also contains a detailed example on how stride affects memory addresses for vector elements.

Also note that we do not expect you to handle overflow when multiplying. This means you won’t need to use the mulh instruction.

For a closer look at dot products, see this Wikipedia page.

Testing: Dot Product

This time, you’ll need to fill out test_dot.s, using the starter code and comments we’ve provided. Overall, this test should call your dot product on two vectors in static memory, and print the result (285 for the sample input).

$ java -jar venus.jar ./test_files/test_dot.s
## See the screen output
## To validate
$ java -jar venus.jar ./test_files/test_dot.s > ./out/test_dot.out
$ diff ./out/test_dot.out ./ref/test_dot.out
# If diff report any lines, then check your output

By default, in the starter code we’ve provided, v0 and v1 point to the start of an array of the integers 1 to 9, continuous in memory. Let’s assume we set the length and stride of both vectors to 9 and 1 respectively. We should get the following:

v0 = [1, 2, 3, 4, 5, 6, 7, 8, 9]
v1 = [1, 2, 3, 4, 5, 6, 7, 8, 9]
dot(v0, v1) = 1 * 1 + 2 * 2 + ... + 9 * 9 = 285

What if we changed the length to 3 and the stride of the second vector v1 to 2, without changing the values in static memory? Now, the vectors contain the following:

v0 = [1, 2, 3]
v1 = [1, 3, 5]
dot(v0, v1) = 1 * 1 + 2 * 3 + 3 * 5 = 22

Note that v1 now has stride 2, so we skip over elements in memory when calculating the dot product. However, the pointer v1 still points to the same place as before: the start of the sequence of integers 1 to 9 in memory.

Task 3: Putting it all Together

In main.s, implement the main function. This will bring together everything you’ve written so far, and create a basic sequence of functions that will allow you to dot product two vectors and display the results on the screen. You may need to malloc space when reading in vector and computing the output of the dot product.

Note that for THIS PROJECT/FUNCTION ONLY, we will NOT require you to follow RISC-V calling convention by preserving saved registers in the main function. This is to make testing the main function easier, and to reduce its length. Normally, main functions do follow convention with a prologue and epilogue.

Command Line Arguments and File Paths

The filepaths for the m0, m1 to write to will all be passed in on the command line. RISC-V handles command line arguments in the same way as C, at the start of the main function a0 and a1 will be set to argc and argv respectively.

We will call main.s in the following way:

java -jar venus.jar ./test_files/main.s ./inputs/m0.bin ./inputs/m1.bin

Note that this means the pointer to the string M0_PATH will be located at index 1 of argv, M1_PATH at index 2, and so on.

If the number of command line arguments is different from what is expected, you code should exit with exit code 3. This will require a call to a helper function in utils.s. Take a look at the starter code for matmul, read_vector, and write_matrix for hints on how to do this.

Questions

Questions regarding dot.s

In line 18:dot.s sw ra, 0(sp). What does this instruction do ? and Why do we need this instruction ?
In line 19-26:dot.s - Why do we save registers s0---s7 ?
Describe what is in each of the registers s0---s7 ?
What does this instruction do ? line 36: slli s3, s3, 2
What do these instructions do ? line 17:addi sp sp -36 and line 71: addi sp sp 36