Skip to content

bsc-loca/FetchFlare

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

65 Commits
 
 
 
 
 
 
 
 

Repository files navigation

FetchFlare: An Open-Source Strided Data Prefetcher for High-Performance Cache Hierarchies

This repository contains the RTL design for the FetchFlare module of our core-tile L1D-caches project.


Overview

This project presents FetchFlare, an open-source stride prefetcher for high-performance cache hierarchies. FetchFlare is abale to capture the memory access patterns of the applications, predict their future memory accesses, and issue prefetch requests for them. FetchFlare works at the microarchitecture level, without any software intervention, it can identify positive and negative strides, and it can be configured to adjust its aggressiveness.


Contributions

FetchFlare presents a stride prefetcher for high-performance cache hierarchies, and provide an open-source RTL implementation that can be easily reused and extended. It is integrated in a system that includes teh HPDCache, the Sargantana core and the Openpiton cache hierarchy. All the elements of the system are open-source. It is evaluated using a set of memory-intensive benchmarks. Results show that, compared to a system without prefetching, FetchFlare provides an average speedup of 63%, reduces the miss ratio in the L1D and the L2 caches, and achieves average accuracy, coverage, and timeliness of 86%, 39%, and 99%, respectively.


Hardware Structure

The internal design of FetchFlare consists of several hardware structures, including a Reference Prediction Table (RPT) that identifies the strides, a FIFO queue and an engine arbiter that distribute the strides across the engines, the engines that generate the prefetch request, and a request arbiter that issues the prefetch requests to the L1D cache. Figure shows an scheme of the hardware structures of FetchFlare.

Graph_Prf

Methodology

In this work, a set of bare-metal benchmark tests is employed to evaluate the proposed prefetcher. The benchmarks have been compiled using the RISC-V C compiler toolchain. Furthermore, an RTL implementation of the proposed prefetcher has been developed in SystemVerilog. For simulation, we utilized two frameworks: Verilator 4.03 (an open-source, C++-based RTL simulator) and QuestaSim-64 2020 (a proprietary RTL simulator from Siemens). All the results reported in the evaluation have been extracted from simulations using QuestaSim.

Evaluation

We have evaluated the performance benefits of FetchFlare. Various standard metrics are used to assess the performance benefits, such as speedup, Instructions Per Cycle (IPC), Misses Per Kilo Instruction (MPKI), and miss ratio of the caches. Additionally, key prefetching metrics, including accuracy, coverage, and timeliness, are presented. The results demonstrate that, compared to a baseline system without prefetching, FetchFlare achieves an average performance speedup of 63%, it significantly reduces cache miss ratios in the L1D and the L2 caches, and it demonstrates average accuracy, coverage, and timeliness of 86%, 39%, and 99%, respectively.

About

Strided Data Prefetcher for High-Performance Cache Hierarchies

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •