Skip to content

Latest commit

 

History

History
 
 

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

Features

Compatibility Matrix

The tables below show mutually exclusive features and the support on some hardware.

The symbols used have the following meanings:

  • ✅ = Full compatibility
  • 🟠 = Partial compatibility
  • ❌ = No compatibility
  • ❔ = Unknown or TBD

!!! note Check the ❌ or 🟠 with links to see tracking issue for unsupported feature/hardware combination.

Feature x Feature

<style> td:not(:first-child) { text-align: center !important; } td { padding: 0.5rem !important; white-space: nowrap; } th { padding: 0.5rem !important; min-width: 0 !important; } th:not(:first-child) { writing-mode: vertical-lr; transform: rotate(180deg) } </style>
Feature CP APC LoRA SD CUDA graph pooling enc-dec logP prmpt logP async output multi-step mm best-of beam-search prompt-embeds
CP
APC
LoRA
SD
CUDA graph
pooling 🟠* 🟠*
enc-dec
logP
prmpt logP
async output
multi-step
mm 🟠^
best-of
beam-search
prompt-embeds

* Chunked prefill and prefix caching are only applicable to last-token or all pooling with causal attention.
^ LoRA is only applicable to the language backbone of multimodal models.

Feature x Hardware

Feature Volta Turing Ampere Ada Hopper CPU AMD Intel GPU
CP
APC
LoRA
SD
CUDA graph
pooling
enc-dec
mm
prompt-embeds
logP
prmpt logP
async output
multi-step
best-of
beam-search

!!! note For information on feature support on Google TPU, please refer to the TPU-Inference Recommended Models and Features documentation.