Introduction
Docker has revolutionized the way developers build, ship, and run applications. Now, with the introduction of the Docker Model CLI, working with AI models has never been easier. This new CLI provides a simple yet powerful way to manage, download, and run AI models within Docker environments.
![Getting Started with Docker's new CLI Model Runner]()
In this article, we’ll explore how to get started with Docker Model CLI, from installation to running AI models efficiently. Whether you are an AI enthusiast, a machine learning practitioner, or just curious about Docker's AI capabilities, this guide will walk you through each step with practical examples and clear explanations.
Installing and Setting Up Docker Model CLI
To get started, ensure that you have the latest version of Docker Desktop 4.40+ installed on your machine. You can download it from the official Docker website.
Enable Docker Model Runner
Once the Docker Desktop is installed, make sure that the Docker Model Runner is enabled. By default, this feature is turned on in version 4.40.
- Open Docker Desktop.
- Navigate to Settings > Features in Development.
- Check if Enable Model Runner is enabled.
![Enable Docker Model Runner]()
- If you don’t see this option, enable Use nightly builds under Software Updates and restart Docker.
![enable Use nightly builds]()
To apply changes, click Apply & Restart.
Enabling TCP Support (Optional)
Docker Model Runner accepts connections via the host Docker socket (/var/run/docker.sock
). However, you can also enable host-side TCP support to allow connections on a specified port (default: 12434).
To enable this feature:
- Open Docker Desktop settings.
- Locate Enable host-side TCP support and set a custom port if needed.
- Click Apply & Restart.
This setting is useful when working with external applications that need to interact with AI models running in Docker.
Using Docker Model CLI
Once everything is set up, open your terminal and verify that Docker Model CLI is available by running:
docker model --help
This should display the available commands:
![docker model --help]()
Checking Model Runner Status
To verify if the Model Runner is running, execute:
docker model status
If it’s active, you should see:
![docker model status]()
Managing AI Models with Docker Model CLI
Listing Available Models
docker model ls
Initially, this will return an empty list. Let’s download a model to see how it works.
Downloading an AI Model
Docker hosts several AI models on Docker Hub under the ai/
namespace. To download a model, use the following command:
docker model pull ai/llama3.2:1B-Q8_0
Available models include:
ai/gemma3
ai/llama3.2
ai/mistral
ai/phi4
ai/qwen2.5
More models will be added in the future.
Listing Downloaded Models
docker model ls
![Listing Downloaded Models]()
Running a Model
To run a model and send a message:
docker model run ai/llama3.2:1B-Q8_0 "Hi"
Example output
![run a model and send a message]()
To start an interactive chat session:
docker model run ai/llama3.2:1B-Q8_0
Type /bye
to exit.
Removing a Model
docker model rm ai/llama3.2:1B-Q8_0
Connection Methods
Docker Model Runner offers three primary ways to interact with AI models:
- From within containers
- Access via internal DNS:
http://model-runner.docker.internal/
- From the host via Docker Socket
- From the host via TCP
OpenAI API Compatibility
The Model Runner implements OpenAI-compatible endpoints:
GET /engines/{backend}/v1/models
GET /engines/{backend}/v1/models/{namespace}/{name}
POST /engines/{backend}/v1/chat/completions
POST /engines/{backend}/v1/completions
POST /engines/{backend}/v1/embeddings
Models are automatically loaded when specified in requests.
Checking Model Runner Logs
While docker model logs is not yet available, you can monitor logs manually:
tail -f ~/Library/Containers/com.docker.docker/Data/log/host/inference-llama.cpp.log
This helps in debugging and monitoring model performance.
![Model Runner Logs]()
Building a GenAI Application with Docker Model Runner
Step 1. Download the model
docker model pull ai/llama3.2:1B-Q8_0B
If you did not download the model from the previous step, do so now:
Step 2. Clone the Sample Repository
git clone https://github.com/dockersamples/genai-app-demo
cd genai-app-demo
- This project uses a Go backend that connects to a model runner API at http://model-runner.docker.internal/engines/llama.cpp/v1/
- The backend uses the OpenAI Go client to make requests to this endpoint
- The frontend is a React application that talks to the backend
Step 3. Set Environment Variables
BASE_URL=http://model-runner.docker.internal/engines/llama.cpp/v1/
MODEL=ai/llama3.2:1B-Q8_0
API_KEY=${API_KEY:-ollama}
Step 4. Start the Application
docker compose up -d
This command combines both files to create a complete deployment with all necessary components:
- The frontend React app
- The backend Go server
Step 5. Access the application
Access the frontend at http://localhost:3000
.
![]()
Did you experience the speed with which you’re getting the response? It’s quite fast, isn’t it?
Step 6. Using Activity Monitor
On your Mac, press Command + Spacebar to open Activity Monitor and select the GPU history to view it in real time.
![]()
You’ll find a GPU process triggered whenever you add a new prompt in the chat window.
![]()
Build Your GenAI application using Docker Model Runner (using TCP Host support)
Let’s try to enable the host-side TCP support and see how it works.
- Enable Model Runner and specify port 12434 for host-side TCP support.
![]()
Step 1. Clone the repository
If you didn't clone the repo in the previous demo, run the following command to do so:
git clone https://github.com/dockersamples/genai-app-demo
cd genai-app-demo
Step 2. Setting the required environment variables
To demonstrate the TCP host support, let’s change the BASE_URL in the backend.env file to http://host.docker.internal:12434/engines/llama.cpp/v1/ as shown below:
BASE_URL=http://host.docker.internal:12434/engines/llama.cpp/v1/
MODEL=ai/llama3.2:1B-Q8_0
API_KEY=${API_KEY:-ollama}
Note. that here we’re using host.docker.internal, which is a hostname that allows containers to access applications running natively on the host.
Step 4. Start the application using Docker Compose
docker compose up -d
Step 5. Access the application
It’s time to access the frontend at http://localhost:3000
The Docker Model Runner configuration has both options - internal socket connections and TCP support - giving developers flexibility in how they connect to and utilize the inference services.
When working with Docker containers and inference engines, developers need TCP support in addition to internal DNS for several important reasons:
- External communication: TCP allows containers to communicate with services outside the Docker network, which is essential for accepting inference requests from external applications or services.
- Direct network access: While internal DNS works for container-to-container communication within Docker's network, TCP support provides a more direct way to access specific containers from the host machine or external networks.
Additional Demos
Do check out these additional demos built around the Model Runner.
Demo |
Description |
Product Catalog AI Demo |
Product Catalog AI is a Node.js-based application that leverages AI for generating and recommending products in a catalog system. The application seamlessly integrates with LangChain for AI processing, using Docker Model Runner as the execution framework. |
Goose House Demo |
A lightweight containerized deployment of the Goose AI command-line assistant that seamlessly integrates with Docker Model Runner to provide powerful, tools-aware conversational AI capabilities through a web-based terminal interface. (uses AI tools catalog by Docker for MCP tools).
|
Visual Chatbot |
Visual Chatbot is an educational web application that provides transparent visibility into LLM-based interactions. This app visualizes the entire conversation flow with an LLM, including system messages, tool execution requests, and responses. |
Conclusion
Docker Model CLI simplifies AI model management within Docker environments. Whether you're experimenting with AI models, building inference applications, or deploying large-scale AI systems, Docker’s new tooling makes it seamless and efficient.
By following this article, you should now be comfortable installing, managing, and running AI models using Docker Model CLI.