

### From Embedded World to High Performance Computing using STT-MRAM

### Lionel Torres, Sophiane Senni



Paris, France May 29, 2017



Workshop NVRAM

# OUTLINE

- 1. Motivation
- 2. Spintronics
  - 1. Basics
  - 2. STT-MRAM technology
- 3. STT-MRAM exploration at system level
  - 1. Embbeded systems & High Performance Computing
- 4. Conclusions and Future Work

# Motivation

- CMOS scaling issues are observed...
  - Heat dissipation
  - Performance saturation
- Due to..
  - High leakage current
  - High power density
- Thermal constraints → partially turn off the system
- Turning off the memory part  $\rightarrow$  the execution state is lost



### Need to go beyond CMOS



#### Non-volatile system-on-chip

Embedded

STT-MRAM

High performance bus

Memory

Controller

Current system-on-chip

GPU

DDR

Controller

External STT-MRAM



#### Anisotropic magnetoresistance



William Thomson 1824-1907

The electrical resistance 0 of magnetic metal varies with the presence of an external magnetic field



**Resistance variation** 0  $\rightarrow$  2% - 5% at room temperature

#### Giant magnetoresistance



Peter Grünberg Albert Fert 2007 Nobel Prize (Physics)



Large increase of the conductance Ο with structure alternating ferromagnetic / non-magnetic layers

#### Tunnel magnetoresistance



T. Miyazaki J. Moodera (not in the pictures: M. Jullière)



- Unlike GMR, the barrier is an isolant Ο
- With MgO, TMR of 608% reached at 0 room temperature

0 50 100

### Tunnel magnetoresistance principle

• The transport of the electrons through the material is spin-dependent



Spin-up

Spin-down



# STT-MRAM technology

### **Bit Cell Structure**



- STT-MRAM can be used to build:
  - Flip-Flops
  - Cache memories
  - Main memories





#### 4Gb LPDDR2 STT-MRAM [2]



[1] B. Jovanovic et al., "A hybrid magnetic/complementary metal oxide semiconductor three-context memory bit cell for non-volatile circuit design," AIP Journal of Applied Physics, April 2014. [2] K. Rho et al., "A 4Gb LPDDR2 STT-MRAM with compact 9F2 1T1MTJ cell and hierarchical bitline architecture," Solid-State Circuits Conference (ISSCC), February 2017.

# **STT-MRAM** exploration

#### • The main objectives are...

- Evaluate the impact at system level of using STT-MRAM
- Explore new applications
  - Non-volatile working memories (registers, cache...)
  - In-memory computing

- This talk focuses on..
  - Non-volatile processor for embedded applications
  - STT-MRAM exploration framework for High Performance Computing



- Two application under study...
  - Normally-offComputing
    - The system is normally off
    - The execution state is preserved after a shutdown
    - Fast wakeup, near-zero leakage power in sleep mode

#### Checkpoint/Rollback

• Restore a safe state of the processor for instance after an execution error or a power failure

- Two 32-bit RISC processors considered...
  - Secretblaze (MIPS like)
  - Amber (ARM like)









Restore the register's state



- Conventional system
  - Leakage power during sleep mode
- Non-volatile system with instant-on/off
  - Near-zero leakage during sleep mode
  - Backup energy

 $Pleakage = 973 \mu W$ 

Ebackup = 1nJ

Tbackup = 20ns





Pleakage × Tbackup + Ebackup < Pleakage × Tsleep

 $\frac{Pleakage \times Tbackup + Ebackup}{Pleakage} < Tsleep$ 

#### Synthesis of the Amber processor

(Industrial 40nm CMOS low-power process)

#### Synthesis of the Secretblaze processor

(Industrial 40nm CMOS low-power process)



| N          | Non-Volatile Flip-Flops Performance |         |             |         |  |  |  |  |
|------------|-------------------------------------|---------|-------------|---------|--|--|--|--|
|            | Latency (ns)                        |         | Energy (pJ) |         |  |  |  |  |
| Technology | Restore                             | Back-up | Restore     | Back-up |  |  |  |  |
| STT-MRAM   | 0.2                                 | 4       | 0.012       | 0.5     |  |  |  |  |

Tsleep >1.05 µs

\*D. Chabi et al., "Ultra low power magnetic flip-flop based on checkpointing/power gating and self-enable mechanisms," IEEE Transaction on Circuits and Systems I, Jan uary 2014.









• Validation of the backup/recovery of the system



#### **Blowfish application**

#### • Evaluation of the cost

- Register level (Data from real flip-flop design)
  - Backup: ≈1nJ (<20ns)
  - Restore: <25pJ (≈1ns)
- Main memory level (Data from NVSim)
  - 1MB Main memory / 4kB Checkpoint memory
    - Backup: <100nJ (<20µs)</li>
    - Restore: <100nJ (<20μs)</li>

- A simulation framework has been developed to...
  - Explore the impact of STT-MRAM at system level
  - Provide essential feedback to enhance the development of STT-MRAM devices
  - Explore different memory technologies

- A cross-layer investigation is done...
  - Device level  $\rightarrow$  Physical Design Kit
  - − Circuit level  $\rightarrow$  Bit cell
  - − Memory level  $\rightarrow$  Cache, main memory...
  - System level  $\rightarrow$  Multi-core architectures



#### Case study...

- Architecture considered
  - 4-core out-of-order (ARMv7 ISA)
  - 32kB L1 instruction cache (SRAM)
  - 32kB L1 data cache (SRAM)
  - 1MB shared L2 cache
    - Two scenarios (SRAM / STT-MRAM)
  - 512MB DRAM DDR3 main memory

#### - Benchmarks

- PARSEC
- SPLASH-2



• Circuit-level analysis...

– Area



#### o STT-MRAM is denser for large cache capacity

- STT-MRAM cell size smaller than that of SRAM
- STT-MRAM needs large transistors for write operations

- Circuit-level analysis...
  - 1MB cache performances
    - Based on NVSim

|      |            | Read            |                | Write           |                | Standby         |    |
|------|------------|-----------------|----------------|-----------------|----------------|-----------------|----|
| Node | Technology | Latency<br>(ns) | Energy<br>(nJ) | Latency<br>(ns) | Energy<br>(nJ) | Leakage<br>(mW) |    |
| 45nm | SRAM       | 10.6            | 0.51           | 10.6            | 0.05           | 630             | 26 |
|      | STT-MRAM   | 7.6             | 0.15           | 16.7            | 0.65           | 24              |    |
|      |            |                 |                |                 |                |                 |    |

#### o STT-MRAM < SRAM for reads</p>

- Small area of STT-MRAM
- STT-MRAM > SRAM for writes
- o STT-MRAM << SRAM for static power</p>

- Set of results...
  - Runtime
    - Similar performance when using STT-MRAM







- Set of results...
  - L2 cache energy
    - STT-MRAM based L2 cache consumes >80% less energy than SRAM based L2

#### **PARSEC** benchmarks





- Set of results...
  - System energy
    - Evaluate the impact of the memory part compared to the rest of the system





Energy (J)

# Conclusions

#### • STT-MRAM is promising for:

- Energy-efficient & Reliable embedded systems
  - Normally-off computing
  - Checkpoint / Rollback
- Caches memories for High Performance Computing
- A system level simulation framework is developed to enhance the developement of STT-MRAM and other memory technologies

# Future Work

- Strenghten the results by designing a real system-on-chip based on STT-MRAM
  - Ongoing work (European Project  $\rightarrow$  GREAT)
- Explore STT-MRAM at main memory level
  - Ongoing work
    - Extension of the simulation framework
- Explore other memory technologies
  - Spin-Orbit-Torque MRAM
  - Voltage-Controlled MRAM