OCI Registry Support¶

vLLM supports loading models directly from OCI (Open Container Initiative) registries, enabling efficient distribution and deployment of large language models through container registries.

Overview¶

OCI Registry support allows you to:

Store and distribute models as OCI artifacts in container registries
Pull models directly from Docker Hub, GitHub Container Registry (ghcr.io), or any OCI-compliant registry
Leverage existing registry infrastructure for model distribution
Benefit from registry features like versioning, access control, and content-addressable storage

OCI Model Format¶

Models are stored as OCI artifacts with the following structure:

Layers¶

Safetensors Layers (application/vnd.docker.ai.safetensors)
Contains model weights in safetensors format
Multiple layers are supported for sharded models
Layer order in the manifest defines shard order
Each layer is a single safetensors file
Config Tar Layer (application/vnd.docker.ai.vllm.config.tar)
Contains model configuration files (tokenizer config, vocab files, etc.)
Packaged as a tar archive
Automatically extracted after downloading
Optional but recommended for complete model functionality
Additional Layers (optional)
License files (application/vnd.docker.ai.license)
Model cards, documentation, etc.

Example Manifest¶

{
  "schemaVersion": 2,
  "mediaType": "application/vnd.oci.image.manifest.v1+json",
  "config": {
    "mediaType": "application/vnd.docker.ai.model.config.v0.1+json",
    "digest": "sha256:...",
    "size": 465
  },
  "layers": [
    {
      "mediaType": "application/vnd.docker.ai.safetensors",
      "digest": "sha256:...",
      "size": 3968658944
    },
    {
      "mediaType": "application/vnd.docker.ai.safetensors",
      "digest": "sha256:...",
      "size": 2203268048
    },
    {
      "mediaType": "application/vnd.docker.ai.vllm.config.tar",
      "digest": "sha256:...",
      "size": 11530752
    }
  ]
}

Usage¶

Basic Usage¶

from vllm import LLM

# Load model from Docker Hub (default registry)
llm = LLM(
    model="username/modelname:tag",
    load_format="oci"
)

Model Reference Format¶

The model reference follows the standard OCI reference format:

[registry/]repository[:tag|@digest]

Examples:

username/model:tag → docker.io/username/model:tag (default registry)
docker.io/username/model:v1.0 → explicit Docker Hub reference
ghcr.io/org/model:latest → GitHub Container Registry
username/model@sha256:abc123... → reference by digest (immutable)

Advanced Usage¶

Specify Custom Download Directory¶

from vllm import LLM

llm = LLM(
    model="username/model:tag",
    load_format="oci",
    download_dir="/path/to/cache"
)

Using Different Registries¶

# GitHub Container Registry
llm = LLM(
    model="ghcr.io/organization/model:v1.0",
    load_format="oci"
)

# Google Container Registry
llm = LLM(
    model="gcr.io/project/model:latest",
    load_format="oci"
)

# Azure Container Registry
llm = LLM(
    model="myregistry.azurecr.io/model:tag",
    load_format="oci"
)

Reference by Digest (Immutable)¶

For reproducible deployments, use digest references:

llm = LLM(
    model="username/model@sha256:1234567890abcdef...",
    load_format="oci"
)

Installation¶

OCI Registry support is built into vLLM and uses the standard requests library for HTTP operations. No additional dependencies are required beyond the standard vLLM installation.

Caching¶

Cache Location¶

Downloaded OCI layers are cached in:

~/.cache/vllm/oci/{normalized-reference}/
├── manifest.json
├── layers/
│   ├── 0000_<digest>.safetensors
│   ├── 0001_<digest>.safetensors
│   └── ...
└── config/
    ├── config.json
    ├── tokenizer_config.json
    └── ...

Cache Behavior¶

Layers are downloaded only once and reused across multiple loads
Layer downloads check for existing files before downloading
Each model reference has a separate cache directory
Cache is shared across all vLLM instances on the same machine