1. Most components in our computer are in fact, a memory.
From CPU registers, RAM, and our
   main storage are all memories, due to the fact that they store bits, i.e. 1 and 0. There are lot of
   ways to store the bits information. It could be simple mechanical device which open and close,
   could also be electric voltage, etc. It is due to the fact that among many options available that
   drives the choice of memory system to be used. The main tradeoff is whether to choose speed,
   or to choose lower price. Motivated by this fact, the memory hierarchy explains the available
   choices of memory to build our computer.
    Overall, we can distinguish two categories of memory, volatile and non-volatile. They differ by
    their ability to keep the information stored once the system is turned off. It is then
    understandable that volatile memory loses the information while non-volatile keeps them. The
    ability of non-volatile memory to keep information albeit power is turned off comes to a price
    that it often performs slower.
    Volatile memory usually comes at the top of the hierarchy, comprising of CPU and registers
    which are ultrafast, followed by cache memories and/or SRAM which is a bit slower, then DRAM,
    respectively. SRAM (or cache) is used to bridge the speed difference between registers and
    DRAM by utilizing temporal and space locality. Anyway, even though DRAM is far slower than
    CPU or SRAM, it is in fact still much faster than the fastest non-volatile memory.
    NAND-Flash and HDD falls into non-volatile category. They are far slower than volatile memories,
    but they have the ability to store the data even after the power is completely off. They are also
    much cheaper per bits stored, thus usually comes with a huge storage. From this fact, they are
    mainly used to store OS and another data such as movies, music, etc. In terms of speed,
    NAND-Flash is normally 10 times faster than HDD as they completely differ in operating
    mechanisms. NAND-Flash is also more robust due to the fact that they do not use spinning disks,
    as what HDD does.
    So to summarize, normally in memory hierarchy, from top to bottom, we have CPU, SRAM,
    DRAM, NAND-Flash, and finally HDD.
2. In the previous discussion (question 1), we discussed the boundary between volatile and
   non-volatile memory, or more specifically DRAM and NAND-Flash. Even though DRAM list in the
   bottom-most of the volatile memory hierarchy while NAND-Flash sits on the topmost of the
   non-volatile memory, the speed differs in 10^3 order which is significant. Specifically, DRAM runs
   at 50 nsec while NAND-Flash at around 10 micro sec. This causes a huge latency gap and might
   degrade the performance.
    It is due to this fact that researchers are finding solution to bridge this latency gap by introducing
    SCM, storage class memory. The goal is to find a memory which cost per storage is lower than
    DRAM while the speed is faster than that of the NAND-Flash. Unlike DRAM, SCM is designed to
    be persistent in nature and retains data written to it across power cycles. SCM should also be
    more resistant to data re-writes and should has much higher endurance properties.
    Several alternatives are available for SCM candidates. Resistive RAM (RRAM),
    Spin-Transfer-Torque MRAM (STT-MRAM), Ferroelectric RAM (FeRAM), to name a few. In
     principle, RRAM operates by changing the resistance across a dielectric solid-state material. It
     involves generating defects in a thin oxide layer, known as oxygen vacancies. Upon set, the
     oxygen ions drift and separate with the oxygen vacancies thus enable it to have a low-resistance
     state. While upon reset, the voltage with the opposite polarity is given causing the oxygen ions
     goes into oxygen vacancies, leading to a high-resistance state. It gives a decent endurance as it
     demonstrates 10 year data retention at 85oC with 100k cycles (Wei, IEDM 2015). One of RRAM
     advantages is its cell size, which is significantly lower than DRAM, enables to pack a large
     capacity in a small size.
     STT-MRAM, on the other hand, also works similarly based on changing resistance to change
     state. It write the data by spin transfer torque. On parallel-spin, it generates low resistance state
     while anti parallel spin leads to high resistance state. A high Tunnel-Magneto-Resistance ratio is
     also needed for robust read margin. STT-MRAM looks very promising as it offers a very fast read
     and write time. It also offers a very long endurance. In terms of cell-size, it is still in the same
     order with DRAM, although it is a bit bigger. While STT-MRAM offers a very durable and high
     speed read and write, it is still very expensive in terms of cost/bit at the moment. RRAM looks
     more promising for SCM in my point of view for the time being.
3.    The effect of scaling down might degrade the transistor performance. Up to this point, it is
     thought that the MOSFET technology has reached its end in terms of scaling down, i.e. up until
     to 14 nm size. Further scaling down might seems impossible, or at least very prone to unpleasant
     effects, including leakage and wide variability.
     To make it clear, in planar MOSFET, a single gate controls the source-drain channel. It is intuitive
     that a single gate does not have good electrostatic field control, leading to leakage between
     source and drain even though the gate is under closed position. To alleviate this, a new
     technology called FinFET was proposed. In principle, the FinFet replaces the source-drain
     channel with a vertical fin, penetrating to the gate, as if the gate is wrapped around the
     source-drain channel. This enables a better control on electric field, thus leading to a more
     robust gate which is not prone to a leakage. This ability of FinFET is the main reason why it can
     alleviate the scaling down problem of planar MOSFET.
     There are other advantages of FinFET compared to planar MOSFET. This includes the high
     integration density due to its natural 3D shape. FinFET also offers smaller variability, mainly
     variability due to random dopant fluctuation. It is due to all of these advantages that FinFET
     enables scaling down further, even up to 7 nm.
4. The average instruction cycle number defines how many clock cycles required to perform one
   instruction. It is thus intuitive that it comprises of register cycle plus memory access cycle. In
   terms of memory access cycle itself, due to the cache memory technology, often the register
   finds the required data from cache memory instead of main memory, thus enable faster
   instruction cycle.
For case 1, since only level 1 cache is applied, we have,
Case 1:
Tam (DRAM) = 60 ns
Tac (cache) = 0.8 ns
Clock freq = 2.5 GHz
H = 95%
T=1+0.3×h×Tc+0.3×(1-h)×Tm
T=1+0.3×0.95×(0.8 x 10^(-9) 2.5 x 10^9)+0.3×(0.05)×(60 x 10^(-9) 2.5 x 10^9)
T = 1 + 0.57 + 2.25
T = 3.82 cycles
However for case 2, the process is a bit longer. Upon cache 1 miss, the register will try to fetch
from cache level 2, and if still fail it will then find data from main memory. Thus, the formula is
adapted to be,
Case 2:
Tam (DRAM) = 60 ns
Tac1 (cache1) = 0.4 ns
Tac2 (cache2) = 6 ns
Clock freq = 2.5 GHz
H1 = 95%
H2 = 97%
T=1+0.3×h1×Tc1+0.3×(1-h1)× (h2 x Tc2 + (1-h2) x Tm)
T=1+0.3×0.95 × (0.4 x 10^(-9) 2.5 x 10^9)+0.3×(0.05)× (0.97 x (6 x 10^(-9) 2.5 x 10^9) +
(0.03) x (60 x 10^(-9) 2.5 x 10^9))
T = 1 + 0.285 + 0.28575
T = 1.57075 cycles