Getting Started with Docker's new CLI Model Runner

Introduction

Docker has revolutionized the way developers build, ship, and run applications. Now, with the introduction of the Docker Model CLI, working with AI models has never been easier. This new CLI provides a simple yet powerful way to manage, download, and run AI models within Docker environments.

Getting Started with Docker's new CLI Model Runner

In this article, we’ll explore how to get started with Docker Model CLI, from installation to running AI models efficiently. Whether you are an AI enthusiast, a machine learning practitioner, or just curious about Docker's AI capabilities, this guide will walk you through each step with practical examples and clear explanations.

Installing and Setting Up Docker Model CLI

To get started, ensure that you have the latest version of Docker Desktop 4.40+ installed on your machine. You can download it from the official Docker website.

Enable Docker Model Runner

Once the Docker Desktop is installed, make sure that the Docker Model Runner is enabled. By default, this feature is turned on in version 4.40.

  1. Open Docker Desktop.
  2. Navigate to Settings > Features in Development.
  3. Check if Enable Model Runner is enabled.
    Enable Docker Model Runner
  4. If you don’t see this option, enable Use nightly builds under Software Updates and restart Docker.
    enable Use nightly builds

To apply changes, click Apply & Restart.

Enabling TCP Support (Optional)

Docker Model Runner accepts connections via the host Docker socket (/var/run/docker.sock). However, you can also enable host-side TCP support to allow connections on a specified port (default: 12434).

To enable this feature:

  1. Open Docker Desktop settings.
  2. Locate Enable host-side TCP support and set a custom port if needed.
  3. Click Apply & Restart.

This setting is useful when working with external applications that need to interact with AI models running in Docker.

Using Docker Model CLI

Once everything is set up, open your terminal and verify that Docker Model CLI is available by running:

docker model --help

This should display the available commands:

docker model --help

Checking Model Runner Status

To verify if the Model Runner is running, execute:

docker model status

If it’s active, you should see:

docker model status

Managing AI Models with Docker Model CLI

Listing Available Models

docker model ls

Initially, this will return an empty list. Let’s download a model to see how it works.

Downloading an AI Model

Docker hosts several AI models on Docker Hub under the ai/ namespace. To download a model, use the following command:

docker model pull ai/llama3.2:1B-Q8_0

Available models include:

  • ai/gemma3
  • ai/llama3.2
  • ai/mistral
  • ai/phi4
  • ai/qwen2.5

More models will be added in the future.

Listing Downloaded Models

docker model ls

Listing Downloaded Models

Running a Model

To run a model and send a message:

docker model run ai/llama3.2:1B-Q8_0 "Hi"

Example output

run a model and send a message

To start an interactive chat session:

docker model run ai/llama3.2:1B-Q8_0

Type /bye to exit.

Removing a Model

docker model rm ai/llama3.2:1B-Q8_0

Connection Methods

Docker Model Runner offers three primary ways to interact with AI models:

  1. From within containers
    • Access via internal DNS: http://model-runner.docker.internal/
  2. From the host via Docker Socket
    • Example request:
      curl --unix-socket /var/run/docker.sock \
        localhost/exp/vDD4.40/engines/llama.cpp/v1/chat/completions \
        -H "Content-Type: application/json" \
        -d '{"model":"ai/llama3.2:1B-Q8_0"}'
  3. From the host via TCP
    • If TCP support is enabled, connect via the port 12434.
    • Alternatively, use a reverse proxy:
      docker run -d --name model-runner-proxy -p 8080:80 \
        alpine/socat tcp-listen:80,fork,reuseaddr tcp:model-runner.docker.internal:80

OpenAI API Compatibility

The Model Runner implements OpenAI-compatible endpoints:

  • GET /engines/{backend}/v1/models
  • GET /engines/{backend}/v1/models/{namespace}/{name}
  • POST /engines/{backend}/v1/chat/completions
  • POST /engines/{backend}/v1/completions
  • POST /engines/{backend}/v1/embeddings

Models are automatically loaded when specified in requests.

Checking Model Runner Logs

While docker model logs is not yet available, you can monitor logs manually:

tail -f ~/Library/Containers/com.docker.docker/Data/log/host/inference-llama.cpp.log

This helps in debugging and monitoring model performance.

Model Runner Logs

Building a GenAI Application with Docker Model Runner

Step 1. Download the model

docker model pull ai/llama3.2:1B-Q8_0B

If you did not download the model from the previous step, do so now:

Step 2. Clone the Sample Repository

git clone https://github.com/dockersamples/genai-app-demo
cd genai-app-demo
  • This project uses a Go backend that connects to a model runner API at http://model-runner.docker.internal/engines/llama.cpp/v1/
  • The backend uses the OpenAI Go client to make requests to this endpoint
  • The frontend is a React application that talks to the backend

Step 3. Set Environment Variables

BASE_URL=http://model-runner.docker.internal/engines/llama.cpp/v1/
MODEL=ai/llama3.2:1B-Q8_0
API_KEY=${API_KEY:-ollama}

Step 4. Start the Application

docker compose up -d

This command combines both files to create a complete deployment with all necessary components:

  • The frontend React app
  • The backend Go server

Step 5. Access the application

Access the frontend at http://localhost:3000.

Did you experience the speed with which you’re getting the response? It’s quite fast, isn’t it?

Step 6. Using Activity Monitor

On your Mac, press Command + Spacebar to open Activity Monitor and select the GPU history to view it in real time.

You’ll find a GPU process triggered whenever you add a new prompt in the chat window.

Build Your GenAI application using Docker Model Runner (using TCP Host support)

Let’s try to enable the host-side TCP support and see how it works.

  • Enable Model Runner and specify port 12434 for host-side TCP support.

Step 1. Clone the repository

If you didn't clone the repo in the previous demo, run the following command to do so:

git clone https://github.com/dockersamples/genai-app-demo
cd genai-app-demo

Step 2. Setting the required environment variables

To demonstrate the TCP host support, let’s change the BASE_URL in the backend.env file to http://host.docker.internal:12434/engines/llama.cpp/v1/ as shown below:

BASE_URL=http://host.docker.internal:12434/engines/llama.cpp/v1/
MODEL=ai/llama3.2:1B-Q8_0
API_KEY=${API_KEY:-ollama}

Note. that here we’re using host.docker.internal, which is a hostname that allows containers to access applications running natively on the host.

Step 4. Start the application using Docker Compose

docker compose up -d

Step 5. Access the application

It’s time to access the frontend at http://localhost:3000

The Docker Model Runner configuration has both options - internal socket connections and TCP support - giving developers flexibility in how they connect to and utilize the inference services.

When working with Docker containers and inference engines, developers need TCP support in addition to internal DNS for several important reasons:

  1. External communication: TCP allows containers to communicate with services outside the Docker network, which is essential for accepting inference requests from external applications or services.
  2. Direct network access: While internal DNS works for container-to-container communication within Docker's network, TCP support provides a more direct way to access specific containers from the host machine or external networks.

Additional Demos

Do check out these additional demos built around the Model Runner.

Demo Description
Product Catalog AI Demo Product Catalog AI is a Node.js-based application that leverages AI for generating and recommending products in a catalog system. The application seamlessly integrates with LangChain for AI processing, using Docker Model Runner as the execution framework.
Goose House Demo

A lightweight containerized deployment of the Goose AI command-line assistant that seamlessly integrates with Docker Model Runner to provide powerful, tools-aware conversational AI capabilities through a web-based terminal interface. (uses AI tools catalog by Docker for MCP tools).

Visual Chatbot Visual Chatbot is an educational web application that provides transparent visibility into LLM-based interactions. This app visualizes the entire conversation flow with an LLM, including system messages, tool execution requests, and responses.

Conclusion

Docker Model CLI simplifies AI model management within Docker environments. Whether you're experimenting with AI models, building inference applications, or deploying large-scale AI systems, Docker’s new tooling makes it seamless and efficient.

By following this article, you should now be comfortable installing, managing, and running AI models using Docker Model CLI.

Up Next
    Ebook Download
    View all
    Learn
    View all