Skip to main content

Github Clone Link

Goals

  • The purpose of this project is to have you implement a simple, yet extremely useful system in RISC-V assembly language.
  • You will learn to use registers efficiently, write functions, use calling conventions for calling your functions, as well as external ones, allocate memory on the stack and heap, work with pointers and more!

Overview

You will implement functions which operate on matrices and vectors – Sparse matrix dense vector matrix multiplication (SPMV).

Note: Although the spec is quite long, please make sure to read through the whole thing as it contains a lot of important information about the functions provided, running Venus, and testing your code.

wget https://www.cs.sfu.ca/~ashriram/Courses/CS295/assets/distrib/Venus/jvm/venus.jar
# We can also encourage you to use the online version.
# Do not check it in under any circumstance or you may fail the travis test.
# https://www.cs.sfu.ca/~ashriram/Courses/CS295/assets/distrib/Venus/

Background

In C dense matrices are stored as a one-dimensional vector in row-major order. One way to think about it is that we can create a 1D vector from a 2D matrix by concatenating together all the rows in the matrix. Alternatively, we could concatenate all the columns together instead, which is known as column-major order.

<iframe allowfullscreen frameborder="0" style="width:640px; height:480px" src="https://www.lucidchart.com/documents/embeddedchart/31ecc8e4-0d9f-47fd-9503-a7292362546d" id="~RpV4.vERaT9">
For a more in-depth look at row-major vs. column-major order, see this [Wikipedia page](https://en.wikipedia.org/wiki/Row-_and_column-major_order).

The stride of a vector is the number of memory locations between consecutive elements of our vector, measured in the size of our vector’s elements. If our stride is n, then the memory addresses of our vector elements are n * sizeof(element) bytes apart.

So far, all the arrays/vectors we’ve worked with have had stride 1, meaning there is no gap betwen consecutive elements. Now, to do the row * column dot products with our row-major matrices that we’ll need for matrix multiplication, we will need to consider vectors with varying strides. Specifically, we’ll need to do this when considering a column vector in a flattened, row-major representation of a 2D matrix

Let’s take a look at a practical example. We have the vector int *a with 3 elements.

  • If the stride is 1, then our vector elements are *(a), *(a + 1), and *(a + 2), in other words a[0], a[1], and a[2].
  • However, if our stride is 4, then our elements are at *(a), *(a + 4), and *(a + 8) or in other words a[0], a[4], and a[8].

To summarize in C code, to access the ith element of a vector int *a with stride s, we use *(a + i * s), or a[i * s]. We leave it up to you to translate this memory access into RISC-V.

For a closer look at strides in vectors/arrays, see this Wikipedia page.

Check Yourself

Do you know how to run venus at the command line and in browswer ?

All the code you write will be in RISC-V, and will be run in Venus. There are two ways to run your code with Venus, either through the web interface or from the command line using a Java .jar file. We recommend that you do most of your development locally with the .jar file, and only use the web interface for debugging or making changes to one file at a time.

  • Load factorial, assemble and run link
  • Download factorial.s to your computer
  • Change the number for which factorial is calculated.
  • Upload back to venus
  • Assemble and simulate
  • Get venus on your computer
  • Run program at the command line.

What is the RISC-V Calling Convention?

We will be testing all of your code on RISC-V calling conventions, as described in lecture/lab/discussion. All functions that overwrite registers that are preserved by convention must have a prologue and epilogue where they save those register values to the stack at the start of the function and restore them at the end.

Follow the calling conventions. It is extremely important for this project, as you’ll be writing functions that call your other functions, and maintaining the abstraction barrier provided by the conventions will make your life a lot easier.

We’ve provided # Prologue and # Epilogue comments in each function as a reminder. Note that depending on your implementation, some functions won’t end up needed a prologue and epilogue. In these cases, feel free to delete/ignore the comments we’ve provided.

For an closer look at RISC-V calling conventions, refer here.

  • Watch lab 6.
  • What is the prologue and epilogue in the factorial?
  • What is the purpose of addi sp, sp, -8 and addi sp, sp, 8?
  • Why do we need this statement ? sw s0, 4 (sp)

How to trace and debug values in Venus ?

  • what is the value in a0 register when the program returns from factorial
  • what is the value in ra ?
  • VSCODE Venus

Source

  • test_files/ : contains the driver programs. These programs set up the arguments before invoking the functions and then display the results.
  • main_spmv.s : Main driver program for running the neural network end-to-end
  • other .s files : Implementations of various functionality. See table below
  • You will have to modify the files listed below.
Source Description
test_dot.s, dot.s Driver for dot product, TODO: implement dot product
test_read_bin.s, read_bin.s Driver for reading dense matrix/vector stored in bin format
test_read_coo.s, read_coo.s Driver for reading sparse matrix stored in coo format
test_spmv.s, spmv.s tester, driver, implementation of sparse-matrix and dense vector
main_spmv.s Main starting file for spmv run

In the test_files subdirectory, you’ll find several RISC-V files to test your code with. There is a test file corresponding to every function you’ll have to write.

DO NOT MODIFY THE INPUTS WHEN COMMITTING THE FILES TO GIT. IT WILL FAIL THE REFERENCE CHECKS

Inputs, Out and Ref Outputs

  • input/ : the various input files. There are totally three four sets of inputs, mnist, simple0, simple1, and simple2.
# e.g.,  listing of inputs for part 1. mnist
$ ls inputs/
bin txt

Each network folder contains a bin and txt subfolder. The bin subfolder contains the binary files that you’ll run main.s on, while the txt subfolder contains the plaintext versions for debugging and calculating the expected output.

Within the bin and txt subfolders, you’ll find files for m and v which define the matrix and vector respectively.

$ ls inputs/mnist/bin/
m.bin  m.coo  v0.bin v1.bin v2.bin v3.bin v4.bin v5.bin v6.bin v7.bin v8.bin
  • out/

To aid in the process of testing and debugging, create an output folder after you clone.

# DO NOT CHECK IN out/. IT MIGHT BREAK THE AUTOGRADERS
mkdir -p out/gemv/mnist
mkdir -p out/gemv/simple
mkdir -p out/spmv/mnist
mkdir -p out/spmv/simple
  • ref/ : The reference outputs.

The test_[*].out represent the output of each sub-function.

$ ls ./ref/
gemv/                 test_gemv.out        test_sdot.out
spmv/                 test_read_coo.out
test_dot.out         test_read_bin.out

$ ls ref/gemv/*            
ref/spmv/mnist:
v0.trace v2.trace v4.trace v6.trace v8.trace
v1.trace v3.trace v5.trace v7.trace

ref/spmv/simple:
simple0.trace simple1.trace simple2.trace

We include the reference outputs for each of the inputs. The traces display the out array of each of the stages in main. (look for the jal print_array calls in main.s). They include the outputs from each stage of main for verification.

Pre Assignment Dense Dot

Task 1 Dot Product

In dot.s, implement the dot function to compute the dot product of two integer vectors. The dot product of two vectors a and b is defined as dot(a, b) = $\sum_{i=0}^{n-1} a_i \times b_i = a_0 * b_0 + a_1 * b_1 + \cdots + a_{n-1} * b_{n-1}$, where $a_i$ is the ith element of a.

Notice that this function for gemv the stride is always 1 for each of the two vectors, make sure you’re considering this when calculating your memory addresses. We’ve described strides in more detail in the background section above, which also contains a detailed example on how stride affects memory addresses for vector elements.

Also note that we do not expect you to handle overflow when multiplying. This means you won’t need to use the mulh instruction.

For a closer look at dot products, see this Wikipedia page.

Testing: Dot Product

This time, you’ll need to fill out test_dot.s, using the starter code and comments we’ve provided. Overall, this test should call your dot product on two vectors in static memory, and print the result (285 for the sample input).

$ java -jar venus.jar ./test_files/test_dot.s
## See the screen output
## To validate
$ java -jar venus.jar ./test_files/test_dot.s > ./out/test_dot.out
$ diff ./out/test_dot.out ./ref/test_dot.out
# If diff report any lines, then check your output

By default, in the starter code we’ve provided, v0 and v1 point to the start of an array of the integers 1 to 9, continuous in memory. Let’s assume we set the length and stride of both vectors to 9 and 1 respectively. We should get the following:

v0 = [1, 2, 3, 4, 5, 6, 7, 8, 9]
v1 = [1, 2, 3, 4, 5, 6, 7, 8, 9]
dot(v0, v1) = 1 * 1 + 2 * 2 + ... + 9 * 9 = 285

SPMV

Sparse Matrix - Dense vector multiplication (SPMV)

SPMV considers sparse matrices which only stores the non-zeros in the matrix. Sparse matrices can be represented in memory using a variety of formats. Unlike dense matrices, which are commonly represented as contiguous column-major or row-major arrays, sparse matrices present an additional degree of information, which allows for memory-efficient representations. In the coordinate format ,sometimes known as ``edge list'' or COO, the non-zero elements are represented as a list of triplets [row, col, non-zero value]. The matrix/vector is described here as a single array of triplets (Array of Structs). In COO format, we first store all the non-zeros of row 0, followed by row-1 etc. This will come in handy when we perform multiplication since we can stream over the COO array and process a non-zero at a time.

<iframe allowfullscreen frameborder="0" style="width:640px; height:480px" src="https://lucid.app/documents/embeddedchart/2be1f91f-f7d7-48c6-a444-3d1a4e5a7b23" id="hlSK84oT03uo">

Task 2 Sparse Dot

<iframe allowfullscreen frameborder="0" style="width:960px; height:480px" src="https://lucid.app/documents/embeddedchart/dcfc40a0-ea43-4f92-849b-dae14593f933" id="HSX146oRV06q">

In dense dot product we iterate over every element. In sparse we only iterate over the non zeros. The key difference is that we need to use match up the coordinates of the sparse vector with the dense vector (and output) and reference the non-zero positions. For instance, in the above figure notice how C[0] (the red element) is generated. We identify the coordinates of the non-zero elements in matrix A ([0,0], [0,3]). We need to reference elements 0 and 3 from the dense vector B ([0] and [3]) corresponding to the columns of the non-zeros in row 0. We then perform the multiplication A[0,0]*B[0] + A[0,3]*B[3].

Testing: Sparse dot product

This time, you’ll need to fill out test_sdot.s, using the starter code and comments we’ve provided. Overall, this test should print out a single value matrix. Note that vector v0 is stored in COO format and v1 in dense row-major format.

Check yourself

  • Do you know what structs are? (Revise: see Week 2 slide deck)
  • Do you know how fields are organized within a struct ?
  • Do you know how to iterate over an array of structs Array of structs
  • In the code below if the matrix starts at address 0x1000. What is the value contained in 0x1004, 0x100C, 0x1010? Do not proceed without answering above question If you could not answer then you will not able to complete next steps If you cannot answer, visualize previous bullet before trying again
  • In the code below if the matrix starts at address 0x1000; what address contains the value 6, what address contains 9, what address contains 5?

By default, in the starter code we’ve provided, m point to the start of a sparse vector shown in the figure above with values 1,3,5,6,7,9. We have stored in the COO format as an array of structs

coo matrix[6] = {0,0,1},{0,2,3},{0,4,5},{0,5,6},{0,6,7},{0,8,9};
# In assembly and memory this would be laid out as
# vector v0 COO format: 0 0 1 0 2 3 0 4 5 0 5 6 0 6 7 0 8 9
# Actual 1x9 vector laid out in dense format: [1 0 3 0 5 6 7 0 9]
# Dense vector v1 is [1 2 3 4 5 6 7 8 9]
# Sparse vector * dense vector = [1*1 + 3*3 + 5*5 + 6*6 + 7*7 + 9*9] = [210]
# The output of test_sdot should be **210**
$ java -jar venus.jar ./test_files/test_sdot.s
## See the screen output
## To validate
$ java -jar venus.jar ./test_files/test_sdot.s > ./out/test_sdot.out
$ diff ./out/test_sdot.out ./ref/test_sdot.out
# If diff report any lines, then check your output

Task 3: File Operations

BIN File Format

BIN is a dense format; both 0s and non-zeros are stored

Our vector files come in two forms: binary and plaintext. The binary format is the one we will be using for all our programs as it is easier to read. The plain text version has been provided to help with debugging The usage is as follows:

We recommend the xxd command to open the binary file (DO NOT USE YOUR EDITOR). You can find it’s man page here, but its default functionality is to output the raw bits of the file in a hex representation.

The first 8 bytes of the binary file represent two 4 byte integers. These integers are the number of rows and columns of the matrix. Every 4 following bytes represents an integer that is an element of the matrix, in row-major order. In this case each of the 4 bytes represents a vallue of the pixel There are no gaps between elements.. It is important to note that the bytes are in little-endian order. This means the least significant byte is placed at the lowest memory address. For files, the start of the file is considered the lower address. This relates to how we read files into memory, and the fact that the start/first element of an array is usually at the lowest memory address.

  • Here we will only deal with vectors
  • Second parameter in format is always 1
$ xxd ./inputs/mnist/bin/v0.bin 
# hit q to exit the viewer
<iframe allowfullscreen frameborder="0" style="width:960px; height:720px" src="https://www.lucidchart.com/documents/embeddedchart/8c6d9b8a-d7aa-47f6-9baf-5d4c3fa1770f" id="mCmV10zCyCV7">

We’ve included a python script called txt2bin.py to convert between the text and binary formats

  • python3 txt2bin.py file.bin file.txt --to-ascii to go from binary to plaintext
  • python3 txt2bin.py file.txt file.bin --to-binary to go from plaintext to binary

For example, let’s say the plaintext example in the previous section is stored in file.txt. We can run python txt2bin.py file.txt file.bin --to-binary to convert it to a binary format.

Read BIN FORMAT

In read_bin.s, implement the read_bin function which uses the file operations we described above to read a binary matrix file into memory. If any file operation fails or doesn’t return the expected number of bytes, exit the program with exit code 1. The code to do this has been provided for you, simply jump to the eof_or_error label at the end of the file.

Recall that the first 8 bytes contains the two 4 byte dimensions of the matrix, which will tell you how many bytes to read from the rest of the file. Additionally, recall that the binary matrix file is already in row-major order.

You’ll need to allocate memory for the matrix in this function as well. This will require calls to malloc , which is in util.s and also described in the background section above.

Finally, note that RISC-V only allows for a0 and a1 to be return registers. Here we will violate that rule and our function needs to return three values: We return the 3 pointers in a0,a1,a2. a1 is a pointer to an integer, we will set it to the number of row. a2 is a pointer to an integer, we will set it to the number of columns

Testing: Read Bin

Testing this function is a bit different from testing the others, as the input will need to be a properly formatted binary file that we can read in.

We’ve provided a skeleton for test_read_bin.s, which will read the file test_input.bin, and then print the output. The file test_input.bin is the binary format of the plaintext matrix file test_input.txt. To change the input file read by the test you’ll need to edit test_input.txt first, then run the convert.py script with the --to-binary flag to update the binary.

From the root directory, it should look something like this: python convert.py --to-binary test_files/test_input.txt test_files/test_input.bin After this, you can run the test again, and it’ll read your updated test_input.bin.

Another thing to note is that you’ll need to allocate space for two integers, and pass in those memory addresses as arguments to read_bin. You can do this either with malloc.

$ java -jar venus.jar ./test_files/test_read_bin.s
## See the screen output
## To validate
$ java -jar venus.jar ./test_files/test_read_bin.s > ./out/test_read_bin.out
$ diff ./out/test_read_bin.out ./ref/test_read_bin.out
# If diff report any lines, then check your output

# If you are using venus extension within vsocde then you may have to change file paths to absolute path # e.g., file_path: .asciiz "/test_files/test_input.coo # becomes file_path: .asciiz $REPO/test_files/test_input.coo Expand repo to full path of wherever you cloned the repo You also will not be able to use command line arguments But; it does provide convenient debugging and tracing

Coo File Format

COO is a sparse format; only non zeros are stored

Sparse matrix files are stored in the coo format. The coo format is the one we will be using for all our programs as it is easier to read. The plain text version has been provided to help with debugging The usage is as follows:

We recommend the xxd command to open the binary file (DO NOT USE YOUR EDITOR). You can find it’s man page here, but its default functionality is to output the raw bits of the file in a hex representation.

The first 12 bytes of the binary file represent two 4 byte integers. These integers are the number of rows and columns of the matrix, and number of non-zeros. Every 4 following bytes represents an integer that is an element of the matrix, in row-major order. It is important to note that the bytes are in little-endian order.

$ cd $REPO
$ xxd ./test_files/test_input.coo | more
# hit q to exit the viewer
# If you want to creat your own test case coo files from bin
python3 dense2sparse.py -input_file file.bin -output_file file.coo
<iframe allowfullscreen frameborder="0" style="width:640px; height:480px" src="https://lucid.app/documents/embedded/af64c4cd-f448-4aa2-8646-8a73bcf6af58" id="5iNee7Jg1BzS">

In read_coo.s, implement the read_coo function which uses the file operations we described above to read a binary matrix file into memory. If any file operation fails or doesn’t return the expected number of bytes, exit the program with exit code 1. The code to do this has been provided for you, simply jump to the eof_or_error label at the end of the file.

The first 8 bytes contains the two dimensions of the matrix (similar to the dense format), the next 8 bytes specify the number of non-zeros in the matrix. which will tell you how many bytes to read from the rest of the file. Additionally, recall that the binary matrix file is already in row-major order.

You’ll need to allocate memory for the matrix in this function as well. This will require calls to malloc , which is in util.s and also described in the background section above. To display the coo matrices we have also provided a print_coo_array.

Finally, note that RISC-V only allows for a0 and a1 to be return registers, and our function needs to return four values: The pointer to the matrix in memory, the number of rows, and the number of columns, number of non-zeros.

Testing: Read COO

We’ve provided a skeleton for test_read_coo.s, which will read the file test_input.coo, and then print the output. The file test_input.coo is the coo format of the dense matrix in test_input.coo and test_input.txt. The coo format stores only the non-zeros in the matrix.

Another thing to note is that you’ll need to allocate space for two integers, and pass in those memory addresses as arguments to read_coo. You can do this either with malloc.

$ java -jar venus.jar ./test_files/test_read_coo.s
## See the screen output
## To validate
$ java -jar venus.jar ./test_files/test_read_coo.s > ./out/test_read_coo.out
$ diff ./out/test_read_coo.out ./ref/test_read_coo.out
# If diff report any lines, then check your output

Task 5: SPMV

Now we are ready for spmv multiplication. In spmv.s, implement the spmv function to compute the product of a sparse matrix and dense vector. SPMV builds on sparse dot. While sparse dot generates a single output. SPMV generates a dense vector of [row x 1] where the sparse matrix is of dimensions [row x col]. Each element in coo matrix can belong to a different row (while in sparse dot every element belongs to row 0. See 1st field in coo matrix above). Thus we explicitly obtain the row and increment the result[row] element. Note that the output of SPMV is a dense vector (not COO format).

<iframe allowfullscreen frameborder="0" style="width:640px; height:480px" src="https://lucid.app/documents/embedded/a3b33aee-3c33-4246-a3bb-5cd5c5155aed" id="ka-FvKceZBpT">

Testing

We have provided starter code in test_spmv.s to test your sparse matrix - vector multiplication. The completed test file should let you set the values and dimensions for two matrices in .data as 1D vectors in row-major order. When ran, it should print the result of your matrix multiplication.

$ java -jar venus.jar ./test_files/test_spmv.s
## See the screen output
## To validate
$ java -jar venus.jar ./test_files/test_spmv.s > ./out/test_spmv.out
$ diff ./out/test_spmv.out ./ref/test_spmv.out
# If diff report any lines, then check your output

# The inputs are matrix and vector
# |1, 0, 3|    |1|    |10|
# |0, 5, 6| *  |2|  = |28| 
# |7, 0, 9|    |3|    |34|

For test_matrix, we have provided the sample output (./ref/test_spmv.out).

Completing main

Fill out the starter code in test_files/main_spmv.s, spmv.s to test your SPMV function. The completed test file should let you set the values and dimensions for two matrices in .data as 1D vectors in row-major order. When ran, it should print the result of your SPMV. Note that you’ll need to allocate space for an output matrix as well.

In main_spmv.s, implement the main function. You may need to malloc space when reading in matrices and computing the layers of the network, but remember to always free all data allocated at the end of this process. More information about the free function is available in utils.s and the background section above.c

Command Line Arguments and File Paths

The filepaths for the v, m will all be passed in on the command line. RISC-V handles command line arguments in the same way as C, at the start of the main function registers a0 and a1 will be set to argc and argv respectively.

We will call main.s in the following way: java -jar venus.jar main.s <VECTOR_PATH> <MATRIX_COO_PATH>

Note that this means the pointer to the string VECTOR_PATH will be located at index 1 of argv, MATRIX_COO_PATH at index 2, and so on.

If the number of command line arguments is different from what is expected, you code should exit with exit code 3. This will require a call to a helper function in utils.s. Take a look at the starter code for gemv, read_coo_matrix for hints on how to do this.

Algorithm steps

  1. Read Matrix: Read the inputs coo format matrix by making calls to read_coo_matrix. The paths to the matrices are passes in the command-line. Remember to store them and pass two integer pointers as arguments.
  2. Read vector: Read the inputs bin format vector by making calls to read_bin_matrix. The paths to the vector are passes in the command-line. Remember to store them and pass two integer pointers as arguments.
  3. Next, you’ll invoke the spmv operation and display the result.
  4. You can assume the output is in dense format. Note that when calling spmv, you should the hardcode the number of columns in the vector to 1.

Validation

Validating Simple

Apart from MNIST, we’ve provided several smaller input networks for you to run your main function on. simple0, simple1, and simple2 are all smaller networks that will be easier to debug.

# Testing simple1. input0
java -jar venus.jar ./test_files/main_spmv.s ./inputs/simple1/bin/v.bin ./inputs/simple1/bin/m.coo -ms -1
# To validate
java -jar venus.jar ./test_files/main_spmv.s ./inputs/simple1/bin/v.bin ./inputs/simple1/bin/m.coo > ./out/spmv/simple/simple1.trace
python3 part2_tester.py spmv/simple/simple1

All the files for testing the mnist network are contained in inputs/mnist. There are both binary and plaintext versions of m0. and 9 input files (inputs/mnist/bin/v[0-9]*.bin).

To test on the first input file for example, run the following:

# Testing mnist.
java -jar venus.jar ./test_files/main_spmv.s ./inputs/mnist/bin/v0.bin ./inputs/mnist/bin/m.coo -ms -1
# To validate
java -jar venus.jar ./test_files/main_spmv.s ./inputs/mnist/bin/v0.bin ./inputs/mnist/bin/m.coo > ./out/spmv/mnist/v0.trace -ms -1
python3 part2_tester.py spmv/mnist/v0
# you can try inputs from v[0-9].
# (Note that we run with the `-ms -1` flag, as real-dataset inputs are large and we need to increase the max instructions Venus will run)

To check the final output. We have also included a python script that implements MNIST.

python3 spmv.py ./inputs/mnist/bin/v0.bin ./inputs/mnist/bin/m.coo
# if scipy is not installed, you can install it with `pip install --user scipy`

You can check the printed output against the references for each of the input files in the ref/mnist.

End-to-End tests

$ bash ./scripts/localci.sh
# if you see SUCCESS and *.log.sucess then you passed. You can also check your *_Grade.json to see your tentative grade.
# If you see FAILED, then inspect *.log.failed. Check the failed section to see what tests failed.

Remember from the testing framework section that these sanity tests are not comprehensive, and you should rely on your own tests to decide whether your code is correct. Your score will be determined mostly by hidden tests that will be ran after the submission deadline has passed.

We will also be limiting the number of submissions you can make to the travis-ci. Each test can take up to 5-6 minutes. To give the 150-200 students a fair chance. For any given 2 hour period, you’re limited to 6 submissions.

Overall, there aren’t that many edge cases for this project. We’re mainly testing for correctness, RISC-V calling convention, and exiting with the correct code in the functions where you’ve been instructed to.

Grading

Test Points
test_dot 10
test_read_bin 10
test_sdot 10
test_spmv 10
test_read_coo 10
simple[0,1,2] (coo) 15 (5pts each)
mnist/[v0-8] (coo) 40 (5pts each)

Debugging hints

  • unable to venusbackend.assembler.AssemblerError: Could not find the library: Check the path of the imported files.

  • unable to create ./out/....trace : Check if the out/mnist, out/simple0, out/simple1 and out/simple2 folders exist.

  • SPMV result is dense (not coo): Note that the output of a sparse matrix-dense vector is a dense vector (not coo format).

  • File read errors: Note that the file paths we specify ./ are relative and assume you are in the top folder. You should include complete path when in web-based venus In web-based venus make sure you are at the top of the folder to invoke in the same manner as described here. File paths if in root repo. If you are in a different folder (e.g., test_files then relative path will be ../)

  • Ran for more than max allowed steps!: Venus will automatically terminate if your program runs for too many steps. This is expected for large MNIST sized inputs, and you can workaround it with the -ms flag. If you’re getting this for small inputs, you might have an infinite loop.

  • Attempting to access uninitialized memory between the stack and heap.: Your code is trying to read or write to a memory address between the stack and heap pointers, which is causing a segmentation fault. Check that you’re allocating enough memory, and that you’re accessing the correct addresses.

  • Assembler errors : You cannot have any .word and .data sections in your included files.

  • Check your calling convention. We show an example below on how to use venus to check the calling convention

# Sample execution only
cd $REPO
java -jar venus.jar -cc --retToAllA ./test_files/main_gemv.s ./inputs/simple1/bin/v.bin ./inputs/simple1/bin/m.bin -ms -1 > ./out/gemv/simple1.trace

# You forgot to save and restore a register that is expected. Check your prologue and epilogue in gemv.s
Save register s8 not correctly restored before return! Expected 0x10008030, Actual 0x00000000. ../gemv.s:91 ret

# You have not written to t3 but are trying to use it. Initialize or see what t3 should be.
# t3 is being used in line 150
150 Usage of unset register t3! ./test_files/main_gemv.s

TA Help

TAs are not magicians to say what's wrong with your code by just staring at assembly for 5-6 minutes. Since TA hours are limited per student we will be imposing the following rules for TA help.

  • TA help is available only for the following: simple inputs and test_ files
  • You have to run venus in command line and show there are no calling convention errors and no return errors
# important venus options
  -cc, --callingConvention  Runs the calling convention checker.
  --retToAllA               If this flag is enabled, the calling convention checker will assume all `a` register can be used to return values. If this is not there, then it will assume only a0 will be returned.
  • You should print the registers of interest (e.g., loop counters, base addresses) Print int Print arrray
  • You should preferrably have your program loaded to browser based venus for a TA OH.

If you do not follow the above instructions, TAs will simply direct you to follow these steps and move to the next student.

Acknowledgments

This assignment has been modified the CMPT 295 instructor.