Skip to content

OCI Registry Support

vLLM supports loading models directly from OCI (Open Container Initiative) registries, enabling efficient distribution and deployment of large language models through container registries.

Overview

OCI Registry support allows you to:

  • Store and distribute models as OCI artifacts in container registries
  • Pull models directly from Docker Hub, GitHub Container Registry (ghcr.io), or any OCI-compliant registry
  • Leverage existing registry infrastructure for model distribution
  • Benefit from registry features like versioning, access control, and content-addressable storage

OCI Model Format

Models are stored as OCI artifacts with the following structure:

Layers

  1. Safetensors Layers (application/vnd.docker.ai.safetensors)
  2. Contains model weights in safetensors format
  3. Multiple layers are supported for sharded models
  4. Layer order in the manifest defines shard order
  5. Each layer is a single safetensors file

  6. Config Tar Layer (application/vnd.docker.ai.vllm.config.tar)

  7. Contains model configuration files (tokenizer config, vocab files, etc.)
  8. Packaged as a tar archive
  9. Automatically extracted after downloading
  10. Optional but recommended for complete model functionality

  11. Additional Layers (optional)

  12. License files (application/vnd.docker.ai.license)
  13. Model cards, documentation, etc.

Example Manifest

{
  "schemaVersion": 2,
  "mediaType": "application/vnd.oci.image.manifest.v1+json",
  "config": {
    "mediaType": "application/vnd.docker.ai.model.config.v0.1+json",
    "digest": "sha256:...",
    "size": 465
  },
  "layers": [
    {
      "mediaType": "application/vnd.docker.ai.safetensors",
      "digest": "sha256:...",
      "size": 3968658944
    },
    {
      "mediaType": "application/vnd.docker.ai.safetensors",
      "digest": "sha256:...",
      "size": 2203268048
    },
    {
      "mediaType": "application/vnd.docker.ai.vllm.config.tar",
      "digest": "sha256:...",
      "size": 11530752
    }
  ]
}

Usage

Basic Usage

from vllm import LLM

# Load model from Docker Hub (default registry)
llm = LLM(
    model="username/modelname:tag",
    load_format="oci"
)

Model Reference Format

The model reference follows the standard OCI reference format:

[registry/]repository[:tag|@digest]

Examples:

  • username/model:tagdocker.io/username/model:tag (default registry)
  • docker.io/username/model:v1.0 → explicit Docker Hub reference
  • ghcr.io/org/model:latest → GitHub Container Registry
  • username/model@sha256:abc123... → reference by digest (immutable)

Advanced Usage

Specify Custom Download Directory

from vllm import LLM

llm = LLM(
    model="username/model:tag",
    load_format="oci",
    download_dir="/path/to/cache"
)

Using Different Registries

# GitHub Container Registry
llm = LLM(
    model="ghcr.io/organization/model:v1.0",
    load_format="oci"
)

# Google Container Registry
llm = LLM(
    model="gcr.io/project/model:latest",
    load_format="oci"
)

# Azure Container Registry
llm = LLM(
    model="myregistry.azurecr.io/model:tag",
    load_format="oci"
)

Reference by Digest (Immutable)

For reproducible deployments, use digest references:

llm = LLM(
    model="username/model@sha256:1234567890abcdef...",
    load_format="oci"
)

Installation

OCI Registry support is built into vLLM and uses the standard requests library for HTTP operations. No additional dependencies are required beyond the standard vLLM installation.

Caching

Cache Location

Downloaded OCI layers are cached in:

~/.cache/vllm/oci/{normalized-reference}/
├── manifest.json
├── layers/
   ├── 0000_<digest>.safetensors
   ├── 0001_<digest>.safetensors
   └── ...
└── config/
    ├── config.json
    ├── tokenizer_config.json
    └── ...

Cache Behavior

  • Layers are downloaded only once and reused across multiple loads
  • Layer downloads check for existing files before downloading
  • Each model reference has a separate cache directory
  • Cache is shared across all vLLM instances on the same machine