## **Safeguard Your Cloud Workloads and Then Accelerate:** An In-depth Look at CPU and GPU Confidential Computing

Jingyao Zhang

**Advisor: Elaheh Sadredini** 



The content of these slides are partly adapted from online materials.

# Agenda

.

.





#### □ What is Confidential Computing



- What is Confidential Computing
- □ Why Confidential Computing is the Future Infrastructure



- What is Confidential Computing
- Why Confidential Computing is the Future Infrastructure
- How does Confidential Computing Work



- What is Confidential Computing
- □ Why Confidential Computing is the Future Infrastructure
- How does Confidential Computing Work
- **GPU** Confidential Computing

# Confidential computing on IBM Cloud

Protect your data at rest, in transit and in use. Get a higher level of privacy assurance.













| Protect<br>a highe<br>Read the        | Blockchain Key Management Finance IPR Edge |
|---------------------------------------|--------------------------------------------|
| OFFERIN<br>PERFOR<br>IN THE C<br>Node | BLOG<br>Protect data in use with           |
| e 2022, Amazon W                      | OCI Confidential Computing                 |

# Intel is all-in on Confidential Computing



# **Confidential Computing Consortium**

Premier Members



# What is Confidential Computing (CC)

# **Confidential Computing takes you from here...**







## ... to here



# Why Confidential Computing is the Future

- Customers want to use public cloud:
  - Lower operational costs with **public cloud vs. on-premises servers**

**Customers want to use public cloud:** 

- Lower operational costs with **public cloud vs. on-premises servers**
- □ However, they have concerns about **data privacy and security**:

- Customers want to use public cloud:
  - Lower operational costs with **public cloud vs. on-premises servers**
- □ However, they have concerns about **data privacy and security**:
  - Remote computer and software stack owned by an **untrusted party** (e.g., CSPs)
    - Manipulate everything
    - Directly see and modify application code and data

- Customers want to use public cloud:
  - Lower operational costs with **public cloud vs. on-premises servers**
- □ However, they have concerns about **data privacy and security**:
  - Remote computer and software stack owned by an **untrusted party** (e.g., CSPs)
    - Manipulate everything
    - Directly see and modify application code and data
  - Software bugs
    - SMM-based rootkits
    - Xen 150K LOC, 40+ vulnerabilities per year
    - Monolithic kernel, e.g., Linux, 17M LOC, 100+ vulnerabilities per year
    - 70% of all security bugs are memory safety issues from Microsoft

- Customers want to use public cloud:
  - Lower operational costs with **public cloud vs. on-premises servers**
- □ However, they have concerns about **data privacy and security**:
  - Remote computer and software stack owned by an **untrusted party** (e.g., CSPs)
    - Manipulate everything
    - Directly see and modify application code and data
  - Software bugs
    - SMM-based rootkits
    - Xen 150K LOC, 40+ vulnerabilities per year
    - Monolithic kernel, e.g., Linux, 17M LOC, 100+ vulnerabilities per year
    - 70% of all security bugs are memory safety issues from Microsoft
  - Compliance of General Data Protection Regulation (GDPR) or Health Insurance Portability and Accountability Act (HIPAA), etc.

**CSPs** need more customers, especially security-sensitive customers:

- Utilization of computing resources often < 20%
- Security-sensitive customers are **rich**

**CSPs** need more customers, especially security-sensitive customers:

- Utilization of computing resources often < 20%
- Security-sensitive customers are **rich**

□ However, CSPs cannot **gain trust from security-sensitive customers**:

**CSPs** need more customers, especially security-sensitive customers:

- Utilization of computing resources often < 20%
- Security-sensitive customers are **rich**

□ However, CSPs cannot gain trust from security-sensitive customers:

- No guarantee of data privacy and security
  - Agreements only guarantee "won't" instead of "can't"
  - Malicious infrastructure administrators, hackers

**CSPs** need more customers, especially security-sensitive customers:

- Utilization of computing resources often < 20%
- Security-sensitive customers are **rich**

□ However, CSPs cannot gain trust from security-sensitive customers:

- No guarantee of data privacy and security
  - Agreements only guarantee "won't" instead of "can't"
  - Malicious infrastructure administrators, hackers
- Compliance of GDPR or HIPAA, etc.

# **Confidential Computing is a Win-Win**

13

# **Confidential Computing is a Win-Win**

"Confidential Computing addresses the **trust issue** between the **data/code owner** and the **platform owner** when they are not the same entity."

# **Confidential Computing is a Win-Win**

"Confidential Computing addresses the **trust issue** between the **data/code owner** and the **platform owner** when they are not the same entity."

*For customers:* 

# **Confidential Computing is a Win-Win**

"Confidential Computing addresses the **trust issue** between the **data/code owner** and the **platform owner** when they are not the same entity."

*For customers:* 

- CC guarantees that remote computers CAN'T manipulate data/code
  - Reduce operational costs by utilizing public cloud

# **Confidential Computing is a Win-Win**

"Confidential Computing addresses the **trust issue** between the **data/code owner** and the **platform owner** when they are not the same entity."

*For customers:* 

CC guarantees that remote computers CAN'T manipulate data/code

Reduce operational costs by utilizing public cloud

For cloud service providers (CSPs)

# **Confidential Computing is a Win-Win**

"Confidential Computing addresses the **trust issue** between the **data/code owner** and the **platform owner** when they are not the same entity."

*For customers:* 

- CC guarantees that remote computers CAN'T manipulate data/code
  - Reduce operational costs by utilizing public cloud

For cloud service providers (CSPs)

- CC helps to **gain trust** from security-sensitive customers
  - Attain higher ROI from new security-sensitive customers (e.g., health centers)

# **Confidential Computing is the Future**



#### **Confidential Computing can be Future Infrastructure**



#### **Confidential Computing can be Future Infrastructure**



# How does Confidential Computing Work

# **Confidential Computing Definition**

# **Confidential Computing Definition**



# **Trusted Execution Environment (TEE) Hardware**

# **Trusted Execution Environment (TEE) Hardware**



#### **Remote Attestation**

#### **Remote Attestation**



#### **TEE Examples**





# **SGX Enclave Programming Model**

Examples from: <u>https://github.com/intel/linux-sgx</u>



.

How do we ensure the runtime execution follows our expectation (confidentiality and integrity of the execution)?

How do we ensure the runtime execution follows our expectation (confidentiality and integrity of the execution)?

How do we ensure the enclave code is the code that we want to execute? (code integrity during initialization)

How do we ensure the runtime execution follows our expectation (confidentiality and integrity of the execution)?

How do we ensure the enclave code is the code that we want to execute? (code integrity during initialization)

 DRAM security? How to deal with Rowhammer and Coldboot attacks? (physical attacks)

#### **Intel SGX Overview**



### **Intel SGX Overview**

Enclave code/data map to PRM



### **Intel SGX Overview**

- Enclave code/data map to PRM
- Different enclaves access their own memory region



# **Virtual Memory Abstraction**



### **Intel SGX Address Translation Overview**

Virtual Address Space (Programmer's View)







Virtual Address Space (Programmer's View)



Virtual Address Space (Programmer's View)





PPN = Physical Page Number VPN = Virtual Page Number

□ Check for security invariant:

- Enclave VA, enclave mode -> PRM
- Non-enclave mode is not allowed access PRM using whitherever address

PPN = Physical Page Number VPN = Virtual Page Number

#### □ Check for security invariant:

- Enclave VA, enclave mode -> PRM
- Non-enclave mode is not allowed access PRM using whitherever address

- □ For each page in the PRM, remember the mapping from
  - o [PPN] -> [VPN, Enclave ID]
  - Keep the reversed page table in PRM, so privilege software cannot modify

#### □ Check for security invariant:

- Enclave VA, enclave mode -> PRM
- Non-enclave mode is not allowed access PRM using whitherever address

- □ For each page in the PRM, remember the mapping from
  - o [PPN] -> [VPN, Enclave ID]
  - Keep the reversed page table in PRM, so privilege software cannot modify

- □ When to perform the check? (Review address translation process)
  - After each address translation

PPN = Physical Page Number VPN = Virtual Page Number

A memory mapping attack that does not require modifying the page tables.

#### Page tables and DRAM before swapping



# **Solution: Page Encryption and Authentication**

Physical Address Space (limited by DRAM size)



#### A memory mapping attack that exploits stable TLB entries.



TLB = Translation lookaside buffer

### Solution: Keep TLB up-to-date

PPN = Physical Page Number VPN = Virtual Page Number

## Solution: Keep TLB up-to-date

- □ Keep an extra state in the inverted page table
  - o [PPN] -> [VPN, Enclave ID]
  - [PPN, state] -> [VPN, Enclave ID]
  - Mark "blocked"
  - Unset only until all the VPNs (can be mapped by multiple enclaves) exist and flush TLBs

## Solution: Keep TLB up-to-date

- Keep an extra state in the inverted page table
  - o [PPN] -> [VPN, Enclave ID]
  - [PPN, state] -> [VPN, Enclave ID]
  - Mark "blocked"
  - Unset only until all the VPNs (can be mapped by multiple enclaves) exist and flush TLBs
- If the TLB has stale data, post address translation check will see the physical address is "blocked"

36

- #1: Maintain an inverted page table and check after every address translation
  - Physical page in PRM -> (enclave ID, virtual page number)

- #1: Maintain an inverted page table and check after every address translation
  - Physical page in PRM -> (enclave ID, virtual page number)

#2: Encrypt/decrypt upon page swap to non-PRM region
 (nonce, enclave ID, virtual page number, key, page content) -> MAC

- #1: Maintain an inverted page table and check after every address translation
  - Physical page in PRM -> (enclave ID, virtual page number)

- #2: Encrypt/decrypt upon page swap to non-PRM region
   (nonce, enclave ID, virtual page number, key, page content) -> MAC
- □ #3: Keep TLB state up-to-date
  - Upon page swap, block the page in the inverted page table and unblock only until all the corresponding TLB entries are flushed

# **Security Tasks**

How do we ensure the runtime execution follows our expectation (confidentiality and integrity of the execution)?

How do we ensure the enclave code is the code that we want to execute? (code integrity during initialization)

 DRAM security? How to deal with Rowhammer and Coldboot attacks? (physical attacks)

# **Security Tasks**

How do we ensure the runtime execution follows our expectation (confidentiality and integrity of the execution)?

How do we ensure the enclave code is the code that we want to execute? (code integrity during initialization)

DRAM security? How to deal with Rowhammer and Coldboot attacks? (physical attacks)

## **Review: SGX Enclave Programming Model**

□ How to ensure the enclave is initialized correctly?



EADD **EEXTEND EEXTEND** EADD Page Page Location Data Data Location Metadata Metadata . . . SHA-256 SHA-256 SHA-256 SHA-256 MRENCLAVE<sup>0</sup> MRENCLAVE<sup>1</sup> MRENCLAVE<sup>2</sup> MRENCLAVE<sup>4</sup> MRENCLAVE<sup>3</sup>

Hardware generates a cryptographic log of the build process

- Code, data, stack, and heap contents
- Location of each page within the enclave
- Security attributes (e.g., page permissions) and enclave capabilities



Hardware generates a cryptographic log of the build process

- Code, data, stack, and heap contents
- Location of each page within the enclave
- Security attributes (e.g., page permissions) and enclave capabilities
- Enclave identity (MRENCLAVE) is a 256-bit digest of the log that represents the enclave



HW based attestation provides evidence that "this is the right application executing on an authentic platform" (approach similar to secure boot attestation)



# SGX Infrastructure Services – Chain of Trust



DRAM attacks: Rowhammer, Coldboot attacks



44

**Confidentiality**:

• DATA written to the DRAM cannot be distinguished from random data.

#### **Confidentiality**:

• DATA written to the DRAM cannot be distinguished from random data.

- □ Integrity + freshness:
  - DATA read back from DRAM to LLC is the same DATA that was most recently written from LLC to DRAM.

#### □ Confidentiality:

• DATA written to the DRAM cannot be distinguished from random data.

- □ Integrity + freshness:
  - DATA read back from DRAM to LLC is the same DATA that was most recently written from LLC to DRAM.

What attacks can be mitigated?

*Rowhammer? Bus tapping?* 

# Confidentiality

#### AES 128-CTR mode



Counter (CTR) mode encryption



#### Hash(plaintext)



Hash(plaintext)

- Keyed Hash
  - o hash = SHA(message)
  - o HMAC = enc(hash, key)



Hash (plaintext)

- Keyed Hash
  - o hash = SHA(message)
  - o HMAC = enc(hash, key)

- Freshness
  - o hash = SHA(message||nonce)
  - o HMAC = enc(hash, key)



- ·
- .

47

□ For each cache line: {ciphertext + CTR + MAC}

- MAC 56 bits
- CTR 56 bits

- □ For each cache line: {ciphertext + CTR + MAC}
  - MAC 56 bits
  - CTR 56 bits
- □ Can we store all the three components off-chip?

- □ For each cache line: {ciphertext + CTR + MAC}
  - MAC 56 bits
  - CTR 56 bits
- □ Can we store all the three components off-chip?
- □ Problem: if store CTR on-chip -> high on-chip storage requirement



Only need to store the root node on chip



- Only need to store the root node on chip
- □ How to verify block B1?



- Only need to store the root node on chip
- □ How to verify block B1?
- □ Write to block B3?

Secure processor (trusted)  $root = Hash(f_{2i} | | f_{2i+1})$ root  $f_1$  $f_i = Hash(g_{2i} | | g_{2i+1})$  $\mathbf{f}_0$  $g_i = Hash(h_{2i} || h_{2i+1})$  $\mathbf{g}_0$  $g_1$ **g**<sub>2</sub> **g**<sub>3</sub>  $h_i = Hash(B_i)$ h<sub>3</sub> '  $h_4$ h<sub>0</sub>  $h_1$ h<sub>2</sub>  $h_5$  $h_6$  $h_7$  $B_0$  $B_1$ **B**<sub>2</sub> B<sub>3</sub>  $B_4$ **B**<sub>5</sub>  $B_6$ **B**<sub>7</sub>

### **Summary**

**49** 



□ How does typical Confidential Computing (Intel SGX) works

### **Summary**

- □ How does typical Confidential Computing (Intel SGX) works
- Design tradeoffs between TCB size, flexibility, perf overhead, cost, etc.
  - Intel SGX, AMD SEV, ARM CCA
  - Keystone, Sanctum, Penglai, etc.





#### **Function and Use Cases Comparison**

| Intel SGX                                                                                                                                                          | AMD Memory Encryption Technology (SEV)                                                                                                    |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------|
| Initial design targeted microservices and small<br>workload. (small amount of secure memory and<br>was featured mainly in mobile and desktop family<br>processors) | Initial design targeted cloud and Infrastructure as a<br>Service. (Large amount of secure memory featured<br>in server family processors) |
| Requires major software changes and code refactoring. (Not suitable for securing legacy applications)                                                              | Does not require software changes and code refactoring. (Suitable for securing legacy applications)                                       |
| SGX works with ring 3 and is not suitable for workloads with many system calls.                                                                                    | SEV works with ring 0 and is suitable for broader range of workloads especially those with many system calls.                             |
| SGX is suitable for small but security-sensitive workload. (SGX has small TCB)                                                                                     | SEV is suitable for securing legacy, large and enterprise level application. (SEV has large TCB)                                          |

## **Security and Vulnerability Comparison**

| Intel SGX                                    | AMD SEV, SEV-ES, SEV-SNP                   |
|----------------------------------------------|--------------------------------------------|
| Provides Memory Integrity Protection.        | Provides Memory Integrity Protection.      |
| Vulnerable to Memory Side Channels.          | Vulnerable to Memory Side Channels.        |
| Vulnerable to Denial of Service Attacks. (OS | Vulnerable to Denial of Service Attacks.   |
| Handles System Calls)                        | (Hypervisor Handles VM Requests)           |
| Small TCB. (TCB is CPU package)              | Large TCB. (VM's OS is located inside TCB) |
| Vulnerable to Synchronization Attacks.       | AMD Secure Processor Firmware Bug          |
| (TOCTTOU, Use-After-Free)                    | Discovered. (MASTERKEY and FALLOUT)        |

# **GPU Confidential Computing**

## I/O Device TEE

66% overhead when running Deep Learning Recommendation Model (DLRM) on AMD SEV-SNP compared to non-secure environment





#### □ All data and code in GPU TCB



#### **GPU TEE Examples**

| Graviton      | 2018 OSDI   | <ul> <li>Build GPU TEE</li> </ul>                                                                                                                                                                                                                                                                                                                                                                                                   |
|---------------|-------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Slalom        | 2019 ICLR   | <ul> <li>Offload linear layers to untrusted GPU via differential privacy</li> </ul>                                                                                                                                                                                                                                                                                                                                                 |
| HIX           | 2019 ASPLOS | <ul> <li>Extend enclave memory to GPU</li> </ul>                                                                                                                                                                                                                                                                                                                                                                                    |
| DeepAttest    | 2019 ISCA   | • Design a device-specific fingerprint which is encoded in the weights of the DNN deployed on the target platform                                                                                                                                                                                                                                                                                                                   |
| Telekine      | 2020 NSDI   | <ul> <li>Address one of the side channel attacks</li> </ul>                                                                                                                                                                                                                                                                                                                                                                         |
| HETEE         | 2020 S&P    | <ul> <li>Separate FPGA for access control.</li> <li>PCIe fabric is within TCB which is not the case in common situation (e.g, SGX)</li> </ul>                                                                                                                                                                                                                                                                                       |
| Goten         | 2021 AAAI   | <ul> <li>Over Slalom, support training</li> </ul>                                                                                                                                                                                                                                                                                                                                                                                   |
| Gramine + SGX | Report      | $\circ$ Just interface implementation without protection                                                                                                                                                                                                                                                                                                                                                                            |
| StrongBox     | 2022 CCS    | <ul> <li>Support ARM GPU for general computation</li> <li>Extant Arm-based GPU defenses are intended for secure machine learning, and lack generality</li> </ul>                                                                                                                                                                                                                                                                    |
| Honeycomb     | 2023 OSDI   | <ul> <li>Provide a software-based GPU TEE by validating the offloaded GPU program, so there is no run-time overhead when executing GPU program         <ul> <li>The method highly depends on the quality of validation software (SFI)</li> <li>Why validating before execution works is that the offloaded GPU workload does not include many long divisions, nested branches and indirect memory references</li> </ul> </li> </ul> |
| SAGE          | 2023 ATC    | <ul> <li>Support software-based attestation</li> <li>Protect code integrity and secrecy, computation integrity, as well as data integrity and secrecy</li> </ul>                                                                                                                                                                                                                                                                    |

#### **NVIDIA Roadmap to Confidential Computing**



RoT = Root of Trust

### **NVIDIA Confidential Computing Goals**



#### **Threats and Mitigations of H100's CC Modes**

| Category        | Threat                                                                                                 | Mitigation   |
|-----------------|--------------------------------------------------------------------------------------------------------|--------------|
|                 | Use PCIE/NVLINK to read tenant data (e.g. Hypervisor, another VM, PCIE interposer)                     | √            |
|                 | Use Out-of-band management/debug channels to read tenant data (e.g. SMBus, JTAG)                       | √            |
|                 | Use memory remapping to read tenant data                                                               | $\checkmark$ |
| Y               | Use GPU Cache/Memory based side channels to read tenant data                                           | $\checkmark$ |
| Confidentiality | Use GPU TLB based side channels to read tenant data                                                    | $\checkmark$ |
|                 | Use GPU Performance Counters to read tenant data or fingerprint tenant                                 | $\checkmark$ |
|                 | Read tenant data via hypothetical physical attacks (physical side channels / DPA / EM, HBM interposer) | ×            |
| ×=              | Use PCIE/NVLINK to modify tenant data (e.g. Hypervisor, another VM, PCIE interposer)                   | √            |
|                 | Use Out-of-band management/debug channels to modify tenant data (e.g. SMBus, JTAG)                     | $\checkmark$ |
| Integrity       | Corrupt tenant data by replaying previous data or MMIO transactions (replay attacks)                   | $\checkmark$ |
|                 | Corrupt tenant data via hypothetical physical attacks (fault injection, HBM interposer)                | ×            |
| (i)             | Denial of Service to hypervisor by tenant                                                              | √            |
|                 | Denial of Service to tenant by another tenant                                                          | $\checkmark$ |
| Availability    | Permanent denial of service of GPU by tenant                                                           | $\checkmark$ |
| ~               | Denial of Service to tenant by hypervisor                                                              | ×            |
|                 | Use a spoofed, non-genuine, or known vulnerable TCB component                                          | √            |
| General         | Use hardware side channels (e.g. DPA) to extract persistent device keys                                | $\checkmark$ |
| 20110101        | Use hardware side channels (e.g. DPA) to extract tenant ephemeral session key                          | ×            |

#### **NVIDIA CC Introduction**



| Legend | TEE | Access<br>From Host |
|--------|-----|---------------------|
|--------|-----|---------------------|

#### **NVIDIA CC Introduction**

#### Prerequisites:

- CPU with support for a Virtualizedbased TEE ("Confidential VM")
- Supported variants are AMD Milan or later, or Intel SPR and later



| Legend TEE From Host |
|----------------------|
|----------------------|

#### **NVIDIA CC Introduction**

#### Prerequisites:

- CPU with support for a Virtualizedbased TEE ("Confidential VM")
- Supported variants are AMD Milan or later, or Intel SPR and later

#### Capabilities:

- Trusted Execution Environment
- Virtualization-based
- Secure Transfers  $\bigcirc$
- Hardware Root of Trust
  - Authenticated firmware; measurement & attestation for the GPU





Encrypted Transfers



- CVM = Confidential Virtual Machine
- NVIDIA Driver allocates bounce buffers in the Shared Memory area and encrypts data in those buffers with the session key

- Compute Protected Region (CPR) is protected by hardware firewalls
- GPU memory outside of the CPR:
  - Encrypted CUDA Command Buffers
  - Encrypted Bounce Buffers for NVLINK Peer to Peer

#### H100 CC with AMD SEV-SNP



Mode Enable

Device Boot

> Tenant Initialization

Tenant Shutdown



- 1) BMC issues out-of-band request to persistently enable CC mode.
  - NVIDIA OOB Specification will provide APIs to integrate into customer tools and OpenBMC.

Mode Enable

Device Boot

Tenant Initialization

Tenant Shutdown



- 1) BMC issues out-of-band request to persistently enable CC mode.
  - NVIDIA OOB Specification will provide APIs to integrate into customer tools and OpenBMC.

Mode Enable

Device Boot

Tenant Initialization

Tenant Shutdown 2) Host triggers GPU reset for mode to take effect



- 1) BMC issues out-of-band request to persistently enable CC mode.
  - NVIDIA OOB Specification will provide APIs to integrate into customer tools and OpenBMC.
- 2) Host triggers GPU reset for mode to take effect
- 3) GPU firmware scrubs GPU state & memory



Tenant Initialization

Mode Enable

- 1) BMC issues out-of-band request to persistently enable CC mode.
  - NVIDIA OOB Specification will provide APIs to integrate into customer tools and OpenBMC.

Mode Enable

Device Boot

Tenant Initialization

Tenant Shutdown

- 2) Host triggers GPU reset for mode to take effect
- 3) GPU firmware scrubs GPU state & memory
- GPU firmware configures firewall to prevent unauthorized access, then enables PCIE



- 1) BMC issues out-of-band request to persistently enable CC mode.
- NVIDIA OOB Specification will provide APIs to integrate into customer tools and OpenBMC.
- 2) Host triggers GPU reset for mode to take effect
- 3) GPU firmware scrubs GPU state & memory
- 4) GPU firmware configures firewall to prevent unauthorized access, then enables PCIE
- 5) GPU PF driver uses SPDM for session establishment & attestation report



Tenant Initialization

Mode Enable

Device Boot

- 1) BMC issues out-of-band request to persistently enable CC mode.
- NVIDIA OOB Specification will provide APIs to integrate into customer tools and OpenBMC.
- 2) Host triggers GPU reset for mode to take effect
- 3) GPU firmware scrubs GPU state & memory
- 4) GPU firmware configures firewall to prevent unauthorized access, then enables PCIE
- 5) GPU PF driver uses SPDM for session establishment & attestation report
- 6) Tenant attestation service gathers measurements, device certificate using NVML APIs.
   Verification done locally or transmitted to remote service



Tenant Shutdown

Tenant Initialization

Mode Enable

Device Boot

63

- 1) BMC issues out-of-band request to persistently enable CC mode.
- NVIDIA OOB Specification will provide APIs to integrate into customer tools and OpenBMC.
- 2) Host triggers GPU reset for mode to take effect
- 3) GPU firmware scrubs GPU state & memory
- 4) GPU firmware configures firewall to prevent unauthorized access, then enables PCIE
- 5) GPU PF driver uses SPDM for session establishment& attestation report
- 6) Tenant attestation service gathers measurements, device certificate using NVML APIs.
   Verification done locally or transmitted to remote service
  - CUDA programs allowed to use GPU



Tenant Shutdown

Tenant Initialization

Mode Enable

Device Boot

- 1) BMC issues out-of-band request to persistently enable CC mode.
- NVIDIA OOB Specification will provide APIs to integrate into customer tools and OpenBMC.

Mode Enable

Device Boot

Tenant Initialization

Tenant Shutdown

- 2) Host triggers GPU reset for mode to take effect
- 3) GPU firmware scrubs GPU state & memory
- 4) GPU firmware configures firewall to prevent unauthorized access, then enables PCIE
- 5) GPU PF driver uses SPDM for session establishment& attestation report
- 6) Tenant attestation service gathers measurements, device certificate using NVML APIs.
   Verification done locally or transmitted to remote service
  - CUDA programs allowed to use GPU
- 8) Host triggers PF-FLR to reset GPU; returns 3) device boot for scrubbing GPU state & memory



- 1) BMC issues out-of-band request to persistently enable CC mode.
- NVIDIA OOB Specification will provide APIs to integrate into customer tools and OpenBMC.

Mode Enable

Device Boot

Tenant Initialization

Tenant Shutdown

- 2) Host triggers GPU reset for mode to take effect
- 3) GPU firmware scrubs GPU state & memory
- 4) GPU firmware configures firewall to prevent unauthorized access, then enables PCIE
- 5) GPU PF driver uses SPDM for session establishment & attestation report
- I 6) Tenant attestation service gathers measurements,
  device certificate using NVML APIs.
  Verification done locally or transmitted to remote service
  - ) CUDA programs allowed to use GPU
- 8) Host triggers PF-FLR to reset GPU; returns 3) device boot for scrubbing GPU state & memory



64

#### Where Should Users be?



#### Where Should Users be?



66





#### What are Confidential Containers?



- What are Confidential Containers?
- Confidential Containers (CoCo) is a new sandbox project of the <u>Cloud</u> <u>Native Computing Foundation</u> (CNCF) that enables cloudnative <u>confidential computing</u> by taking advantage of a variety of hardware platforms and technologies. The project brings together software and hardware companies including Alibaba-cloud, AMD, ARM, IBM, Intel, Microsoft, Red Hat, Rivos and others.

#### High Level of CoCo Key Broker Service (CoCo-KBS)



#### **H100 Tenant Attestation**



**68** 

#### **GPU Attestation in CoCo-KBS**

.

#### **GPU Attestation in CoCo-KBS**

□ CoCo-KBS (Rust-based)

- □ CoCo-KBS (Rust-based)
- nvTrust (Python-based)

- CoCo-KBS (Rust-based)
- nvTrust (Python-based)
- □ POC sample code:

```
pyo3::prepare_freethreaded_python();
```

```
let gil = Python::acquire_gil();
let py = gil.python();
```

```
// Create a global dictionary containing __file__
let globals = [("__file__", "./LocalGPUTest.py")]
    .into_py_dict(py);
```

// Read the content of the Python script
let code = fs::read\_to\_string("./LocalGPUTest.py")
 .expect("Could not read file");

```
// Execute the Python script
py.run(&code, Some(globals), None)?;
```

```
0k(())
```

- CoCo-KBS (Rust-based)
- nvTrust (Python-based)
- □ POC sample code:
- Soon to release!

```
pyo3::prepare_freethreaded_python();
```

```
let gil = Python::acquire_gil();
let py = gil.python();
```

```
// Create a global dictionary containing __file__
let globals = [("__file__", "./LocalGPUTest.py")]
   .into_py_dict(py);
```

// Read the content of the Python script
let code = fs::read\_to\_string("./LocalGPUTest.py")
 .expect("Could not read file");

```
// Execute the Python script
py.run(&code, Some(globals), None)?;
```

```
0k(())
```

**Example of a Workload with a Low Compute to I/O Ratio** 

• BS is the batch size



ResNet-50 v1.5 Training Performance

Example of a Workload with High Compute to I/O Ratio

• BS is the batch size, and SL is the sequence length



**BERT LLM Inference Performance** 

- □ How does typical Confidential Computing (Intel SGX) works
- Design tradeoffs between TCB size, flexibility, perf overhead, cost, etc.
  - Intel SGX, AMD SEV, ARM CCA
  - Keystone, Sanctum, Penglai, etc.

- How does typical Confidential Computing (Intel SGX) works
- Design tradeoffs between TCB size, flexibility, perf overhead, cost, etc.
  - Intel SGX, AMD SEV, ARM CCA
  - Keystone, Sanctum, Penglai, etc.
- What is GPU TEE

- How does typical Confidential Computing (Intel SGX) works
- Design tradeoffs between TCB size, flexibility, perf overhead, cost, etc.
  - Intel SGX, AMD SEV, ARM CCA
  - Keystone, Sanctum, Penglai, etc.
- What is GPU TEE
- How does NVIDIA H100 Confidential Computing work

- How does typical Confidential Computing (Intel SGX) works
- Design tradeoffs between TCB size, flexibility, perf overhead, cost, etc.
  - Intel SGX, AMD SEV, ARM CCA
  - Keystone, Sanctum, Penglai, etc.
- What is GPU TEE
- □ How does NVIDIA H100 Confidential Computing work
- □ How do we apply H100 CC to open-source project CoCo

- □ How does typical Confidential Computing (Intel SGX) works
- Design tradeoffs between TCB size, flexibility, perf overhead, cost, etc.
  - Intel SGX, AMD SEV, ARM CCA
  - Keystone, Sanctum, Penglai, etc.
- What is GPU TEE
- □ How does NVIDIA H100 Confidential Computing work
- □ How do we apply H100 CC to open-source project CoCo
- Performance of H100 CC



# Backup

```
from nv_attestation_sdk import attestation
import os
import json
```

```
client = attestation.Attestation()
client.set_name("thisNode1")
print ("[LocalGPUTest] node name :", client.get_name())
file = "NVGPULocalPolicyExample.json"
```

```
client.add_verifier(attestation.Devices.GPU, attestation.Environment.LOCAL, "", "")
with open(os.path.join(os.path.dirname(__file__), file)) as json_file:
    json_data = json.load(json_file)
    att_result_policy = json.dumps(json_data)
```

```
print(client.get_verifiers())
```

```
print ("[LocalGPUTest] call attest() - expecting True")
print(client.attest())
```

```
print ("[LocalGPUTest] token : "+str(client.get_token()))
```

```
print ("[LocalGPUTest] call validate_token() - expecting True")
print(client.validate_token(att_result_policy))
```

|                      | HW TEE               | Homomorphic Encryption        | ТРМ       |
|----------------------|----------------------|-------------------------------|-----------|
| Data integrity       | Y                    | Y (subject to code integrity) | Keys only |
| Data confidentiality | Y                    | Y                             | Keys only |
| Code integrity       | Y                    | No                            | Y         |
| Code confidentiality | Y (may require work) | No                            | Y         |
| Authenticated Launch | Varies               | No                            | No        |
| Programmability      | Y                    | Partial ("circuits")          | No        |
| Attestability        | Y                    | No                            | Y         |
| Recoverability       | Y                    | No                            | Y         |

|                                           | Native | HW Tee      | Homomorphic Encryption |
|-------------------------------------------|--------|-------------|------------------------|
| Data size limits                          | High   | Medium      | Low                    |
| <b>Computation Speed</b>                  | High   | High-Medium | Low                    |
| Scale out across<br>machines              | Yes    | More work   | Yes                    |
| Ability to combine data across sets (MPC) | Yes    | Yes         | Very limited           |