Learning Objectives:
In the first part of this assignment, we build a simulator for modeling a single level cache. We run traces extracted from valgrind through this cache model and estimate the number of misses, hits, and evictions.
In the second part, we try to reverse engineer and discover cache geometries. Instead of running programs on these processors and inferring the cache layout from timing results, you will approximate this work by using a simulator.
WARNING: Note that the logical variable used to represent block offset vary from the lecture slides. In representation below, case matters:
{: .table-striped .table-bordered }
Lecture | Assignment | Description |
---|---|---|
s | s | Number of set bits |
S | S | $2^s$ Number of sets |
E | E | Number of ways |
k | b | Number of block bits or byte offset. |
K | B | $2^b$ Size of block in bytes |
This image shows the last 2 bytes of the address of each cache block as they align to the 32x32 matrices. Red squares notate the start of a section that would fit in the cache. Green lines divide individual integers, and black lines divide cache blocks.
This assignment will help you understand the impact that cache memories can have on the performance of your C programs. You will write a small C (not C++!) program of about 200-300 lines that simulates the behavior of a cache memory.
int *
and int
?{: .table-striped .table-bordered }
Tag Comparison | Selects the set | Block offset |
---|---|---|
Tag (t) | Set index (s) | byte offset (b) |
| Tag(t) | Set index (s) | 0s for byte offset |
> cd $REPO/model
> ls traces/
dave.trace fifo_m2.trace fifo_s3.trace yi2.trace
fifo_l.trace fifo_s1.trace long.trace trans.trace
fifo_m1.trace fifo_s2.trace small.trace yi.trace
The traces/
directory contains a collection of reference trace files that we will use as input to evaluate the correctness of your cache simulator. The trace files are generated by a Linux program called valgrind
. For example, the following command runs the executable program ls -l
, captures a trace of each of its memory accesses in the order they occur, and prints them to stdout.
valgrind --log-fd=1 --tool=lackey -v --trace-mem=yes ls -l
Memory traces have the following form:
// Format: operation address,size
M 0421c7f0,4
L 04f6b868,8
S 7ff0005c8,8
Each line denotes one or two memory accesses. The format of each line is operation address,size
:
operation
field denotes the type of memory access: L
a data load, S
a data store, and M
a data modify (i.e., a data load followed by a data store).address
field specifies a 64-bit hexadecimal memory address.size
field specifies the number of bytes accessed by the operation.We have provided you with the binary executable of a reference cache simulator, called cache-ref
, that simulates the behavior of a cache with arbitrary size and associativity on a valgrind
trace file. Your assignment is to create a cache model (called cache
) that mimics the behavior of cache-ref
.
h
: Optional help flag that prints usage info-v
: Optional verbose flag that displays trace info-s
: S = number of sets = $2^s$ (s is the number of bits used for the set index)-E
: Number of lines per set (i.e., associativity or number of ways)-b
: Number of bits used for the byte offset within the line. Number of bytes per block (i.e., block size) = $2^b$-t
: Trace file to be processed. cache-ref
reads file line-by-line and feeds it to cache simulator-v
: Verbose mode. Displays what happens to each line.
Displays hit
, miss eviction
, and miss
for each access based on what happened when operating on the cache.-L
: LRU policy. Your own model cache
only needs to implement LRU and does not need to implement this option.WARNING: When running using cache-ref
, you have to include -L
option. But when running with using own model, you should remove this option
# Reference simulator
> ./cache-ref -v -s 2 -E 1 -b 5 -t traces/yi.trace -L
# sets = 2^2 = 4
# E = 1 Number of lines per set (Number of ways)
# b = 5
# Block size = 2^5 = 32 bytes
# -L : LRU policy
> ./cache -v -s 2 -E 1 -b 5 -t traces/yi.trace
L 10, L1 insert [status: 0 insert_block: 0x0]
M 20, L1 insert [status: 0 insert_block: 0x20]
L 22, L1 hit [status: 1]
S 18, L1 hit [status: 1]
L 110, L1 insert + eviction [status: 2 victim_block: 0x0 insert_block: 0x100]
L 210, L1 insert + eviction [status: 2 victim_block: 0x100 insert_block: 0x200]
M 12, L1 insert + eviction [status: 2 victim_block: 0x200 insert_block: 0x0]
L1 hits:4 misses:5 evictions:3
# If you have implemented cache.c correctly run as such
./cache -v -s 2 -E 1 -b 5 -t traces/yi.trace
Here is a commentary on the output format required:
You will write a cache simulator in cache.c
that takes a valgrind
memory trace as input (we have provided the traces), simulates the hit/miss
behavior of a cache memory on this trace, and outputs the total number of hits, misses, and evictions.
WARNING: The output has to match the reference simulator, both verbose and concise
Functions you will have to fill in:
{: .table-striped .table-bordered }
Function to be implemented | Description |
---|---|
address_to_block | Given an address, return the block aligned address i.e., blockbits are zero'd out |
cache_tag | Return the tag of an address |
cache_set | Return the set index of the address |
cacheSetUp | Allocate memory and initialize the cache sets and each line in every set |
probe_cache | Check if the block of the address is in the cache. If yes, return true. else return false |
allocate_cache | Insert block in cache |
avail_cache | Check if empty way available. If yes, return true, else return false |
victim_cache | Find the victim line to be removed from the set. Return the way of the block (i.e., the corresponding line index within the set). The victim will be the least-recently-used block |
evict_cache | Remove/Invalidate the cache block in corresponding set and way |
evict_address | Remove/Invalidate the cache given address |
flush_cache | Given a block address, Find it in corresponding set and way and invalidate it |
operateCache | Top-level method that receives an address, and invokes other methods to implement the cache model. Returns a struct result. Fill in the required fields for each result (hit, miss, evict and miss) |
cacheSetup | Allocate and initialize cache |
Already Implemented Helpers | |
print_result(result r) | Print the result of the cache operation. This function is provided to you fo r printing in the required format. |
print_candidate_set_way | This function is provided to you to help debug |
main.c | You do not need to modify this file. However, you may benefit from reading how runTrace invokes operateCache . It will be useful in Part 2 |
// Per cache-line datastructure
typedef struct Line {
unsigned long long block_addr;
short valid; // True: Cacheblock valid False: Cacheblock invalid
unsigned long long tag;
// holds the place in used lines
// the greater the rate, that much recent it is
int timestamp;
} Line;
typedef struct Set {
Line *lines; // Lines in each set (# of ways)
int clock;
int placementRate;
} Set;
typedef struct Cache {
// Confiuration
int setBits; // s
int linesPerSet; // E
int blockBits; // k
Set *sets; // Array of sets
// Stats
int eviction_count;
int hit_count;
int miss_count;
short displayTrace;
int lfu; // 0: Least Recently Used 1: Least Frequently Used.
char* name;
} Cache;
When selecting a victim to evict from the cache set, you need to implement the LRU policy. To do so, we give you a timestamp
field per-line and clock
per-set. The clock
is incremented on each access to the set. Thus, the timestamp
value when a cache line in accessed indicates recency. Try to figure out:
clock
incremented? (Hint: think of it as a counter that ticks over on accesses to the set and acts as a virtual clock)timestamp
set?In a set-associative cache, each set has multiple ways (or blocks), and the LRU policy decides which block to evict when a new block is loaded into the set. The per-set counter keeps track of access history, while each block has a flag to indicate its position in the recency order.
Per-Set Counter: Each set has a counter that tracks the total number of accesses. This counter is incremented with each access in the set, representing a timestamp or clock that is ticking at the rate of number of accesses.
Per-Block Flags: Each block in the set has a flag (or a small counter) that indicates its recent usage relative to other blocks. This value is updated based on the per-set counter to track which blocks are more recently used.Here’s a concise pseudocode version for implementing LRU using a per-set counter and per-block flags:
Initialize all counters to 0.
On Cache Access(Set S, Block B):
Increment SetCounter[S]
If Block B is in cache:
Flag[B] = SetCounter[S] // Update accessed block's flag to current counter
When evicting a block:
Find LRU block L with smallest Flag[L] in set S
Here are some hints and suggestions for working on this assignment:
2 small.trace
5 dave.trace
7 yi.trace
10 fifo_s1.trace
14 fifo_s2.trace
16 fifo_s3.trace
16 yi2.trace
400 fifo_m1.trace
269998 fifo_l.trace
267988 long.trace
The reference simulator takes an optional -v
argument that enables verbose output, displaying the hits, misses, and evictions that occur as a result of each memory access. It helps you debug by allowing you to directly compare the behavior of your simulator with the reference simulator on the reference trace files.
The data modify operation (M
) is treated as a load followed by a store to the same address. Thus, an M
operation can result in two cache hits or a miss and a hit plus a possible eviction (even when only 1 byte is modified).
This part will help you understand how parallel caches interact and the impact cache memories can have on you C programs' performance. You will write a top-level controller function that maintains cache blocks across two caches at the same level (L1). We will be using a new set of traces to evaluate the correctness of your cache simulator (these traces are provided in the traces/2x/
directory). Each trace now includes an additional field a program making the access (Here the traces have two programs 1 or 2). So entries will be like the following:
L 000013F8,4,1 # Program 1's access
L 000013FC,4,2 # Program 2's access
Each program uses its own cache (1
accesses go to P1, 2
accesses to P2). However they might make access to the same block e.g., 000013F8
and 000013FC
above and miss. When this happens we will supply data from the adjacent cache instead of the memory. This is how the system works for dual cache. When checking and supplying remotely e.g., P2 is supplying an access that originated in P1 cache, we will call this a remote hit. On the remote hit, remove the block from remote cache and insert into local.
// Implement Dual Cache Policy With 2 Caches, P1 and P2
Access trace says P1 access L 00001000,4,1
{: .table-striped .table-bordered }
P1 cache | P2 remote cache | Action |
---|---|---|
Miss | Miss | Insert block into P1 cache. P1.miss++ |
Miss | Hit | P2.Hit++ in P2 cache. Remove from P2 Cache. Insert into P1 Cache. P1.miss++ |
Hit | Don't Care | Operate Local. P1.hit++. |
Access trace says P2 access L 00001000,4,2
{: .table-striped .table-bordered }
P2 cache | P1 remote cache | Action |
---|---|---|
Miss | Miss | Insert block into P2 cache. P2.miss++ |
Miss | Hit | P2.Hit++ in P2 cache. Remove from P1 Cache. Insert into P2 Cache. P2.miss++ |
Hit | Don't Care | Operate Local. P2.hit++. |
Exclusive means a block can be in either in the P1 or P2, but never both. i.e., P1{} $\notin$ P2{} and P2{} $\notin$ P1.. We have included a validate function that checks this condition after each access
We have provided you with the binary executable of a reference cache simulator, called 1level-shared-ref
, that simulates the behavior of a dual cache. Your assignment is to create a cache model 1level-shared.c
that mimics the behavior of shared-ref
.
-i1
: Interactive mode; stops the program after each access and shows the current state.-h
: Optional help flag that prints usage info-c
: Configuration for two-level hierarchy-v
: Optional verbose flag that displays trace info-t
: Trace file to be processed. cache-ref
reads file line-by-line and feeds it to cache simulator-v
: Verbose mode. Displays what happens to each line.
Displays hit
, insert + eviction
, and insert
for each access based on what happened when operating on the cache# You do not need to modify this file. This is how we load configurations for `2-level.c`
$ cat 2x_1.config
{
"P1_SetBits (s)": 2,
"P1_Ways (E)": 1,
"BlockBits (b)": 4,
"P2_SetBits (s)": 2,
"P2_Ways (E)": 1
}
$ ./1level-shared-ref -c ./2x_1.config -t ./traces/2x/sample.trace
###### Configuration #########
P1_setBits: 2
P1_ways: 1
P1 and P2 blockBits: 4
P2_setBits: 2
P2_ways: 1
############################
L 1000, P1 insert [status: 0 insert_block: 0x1000]
L 1200, P2 insert [status: 0 insert_block: 0x1200]
L 1202, P2 hit [status: 1] P1 insert + eviction [status: 2 victim_block: 0x1000 insert_block: 0x1200]
L 1002, P2 insert [status: 0 insert_block: 0x1000]
L 2000, P1 insert + eviction [status: 2 victim_block: 0x1200 insert_block: 0x2000]
P1 hits:0 misses:3 evictions:2
P2 hits:1 misses:2 evictions:0
# Implement the 1level simulator
$ make 1level-shared
$ ./1level-shared -c ./2x_1.config -t ./traces/2x/sample.trace
Here is a commentary on the output format required:
You will write a cache simulator in 1level-shared-main.c
that takes a valgrind
memory trace as input (we have provided the traces), simulates the hit/miss
behavior of a two-level hierarchy, and outputs the total number of hits, misses, and evictions.
WARNING: The output has to match the reference simulator, both verbose and concise.
Functions you will have to fill in:
{: .table-striped .table-bordered }
Function to be implemented | Description |
---|---|
runTrace | Inside this function invoke the appropriate methods within your P1 and P2 caches to implement an exclusive hierarchy. See the attached diagram on how they interact. Accesses first start with the P1 cache with 3 possible outcomes (hit ,miss , evict and miss ). On miss and evict-miss , you need to operate the L2 cache |
main | Set cache parameters and instantiate cache |
The key condition that needs to be obeyed in the shared model is that At any given time a block is either in the P1's cache or P2's cache. To validate this we have provided a validate
function. After each access we check the condition.
for each block in P1:
assert not probe_cache(block,P2)
for each block in P2:
assert not probe_cache(block,P1)
If the probe returns true, we assert and stop.
Hint: You need to always ensure that every block in L1 is not in the L2. Think what you need to do on L1 insertions, and L2 evictions to enforce this condition.
Remember from the testing framework section that these sanity tests are not comprehensive, and you should rely on your own tests to decide whether your code is correct. Your score will be determined mostly by hidden tests that are ran after the submission deadline has passed.
validate
function to double check P1 and P2 blocks. Reference the simulatorevict_address(address,..)
: Remove/Invalidate the cache block of the address.> bash ./scripts/localci.sh
# if you see SUCCESS and *.log.sucess then you passed. You can also check your *_Grade.json to see your tentative grade.
# If you see FAILED, then inspect *.log.failed. Check the failed section to see what tests failed.
This assignment has been created by the CMPT 295 instructors based on the CS:APP textbook.