Running AI Models Locally with Docker and Spring AI

Are you tired of sending your data to cloud APIs just to use AI in your applications? What if you could run powerful AI models right on your machine with zero API keys, zero data sharing, and zero monthly fees?

Docker Desktop recently introduced an exciting new feature called Model Runner that allows developers to run open-source AI models locally. When combined with Spring AI, this creates a powerful platform for building AI-powered applications that respect privacy, control costs, and simplify development workflows.

In this post, I'll show you how to use Docker's Model Runner feature with Spring Boot applications to create fully local AI experiences in just 15 minutes.

Why Run AI Models Locally?

Before diving into the technical details, let's consider why running AI models locally matters:

Privacy - Your data never leaves your machine
Cost control - No usage-based billing or subscription fees
Reliability - No dependency on external API availability
Development simplicity - Test and iterate without API keys or quotas
Learning opportunity - Understand how AI models actually work

For Spring developers building modern applications, this local approach provides a compelling alternative to cloud-based AI services while maintaining all the benefits of Spring's programming model.

Understanding Docker Model Runner

Docker Model Runner is a plugin for Docker Desktop that allows you to:

Pull open-source models from Docker Hub
Run models directly from the command line
Manage local model installations
Interact with models via prompts or chat mode
Access models via an OpenAI-compatible API endpoint

Currently, Docker Model Runner only works on Docker Desktop for Mac with Apple Silicon (M1/M2/M3/M4 chips), but support for other platforms is coming soon.

Setting Up Docker for Local AI

To get started, you'll need Docker Desktop 4.40 or later. Once installed, follow these steps:

Open Docker Desktop
Navigate to Settings → Features in development (Beta tab)
Enable "Docker Model Runner"
Also enable "Enable host-side TCP support" (leave the default port of 12434)
Apply and restart Docker Desktop

Once configured, you can pull your first model using the Docker CLI:

docker model pull ai/gemma3

This will download Google's Gemma 3 model, which is a good balance of capability and resource usage. You can check which models you have installed with:

docker model list

To test your setup, try running the model in interactive mode:

docker model run ai/gemma3

This launches an interactive chat session where you can directly interact with the model right in your terminal:

> What is an interesting fact about Docker?
Docker was originally developed as an internal project at a company called dotCloud, 
which was a Platform-as-a-Service company. The technology was later open-sourced in 2013 
and became immensely popular, eventually leading to dotCloud pivoting their entire 
business to focus on Docker. This pivot transformed the company into Docker, Inc.

Creating a Spring Boot Application with Spring AI

Now let's create a Spring Boot application that connects to this locally running model. The first step is to initialize a new project with the right dependencies:

Visit https://start.spring.io
Choose Maven, Java 17+, and the latest Spring Boot version
Add the following dependencies:
- Spring Web
- Spring AI OpenAI (we'll use the OpenAI client since Docker exposes an OpenAI-compatible API)

Your pom.xml should include these key dependencies:

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-openai-spring-boot-starter</artifactId>
</dependency>

<!-- In the dependency management section -->
<dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>org.springframework.ai</groupId>
            <artifactId>spring-ai-bom</artifactId>
            <version>${spring-ai.version}</version>
            <type>pom</type>
            <scope>import</scope>
        </dependency>
    </dependencies>
</dependencyManagement>

Configuration

The crucial step is configuring Spring AI to connect to your local Docker model instead of the actual OpenAI API. Add these properties to your application.properties file:

spring.ai.openai.api-key=_
spring.ai.openai.chat.base-url=http://localhost:12434/engines/llama.cpp
spring.ai.openai.chat.options.model=ai/gemma3

Let's break down what each property does:

spring.ai.openai.api-key=_ - We need to provide a value here because Spring AI expects it, but since we're not actually connecting to OpenAI, any value works
spring.ai.openai.chat.base-url=http://localhost:12434/engines/llama.cpp - Points to the local Docker Model Runner API endpoint
spring.ai.openai.chat.options.model=ai/gemma3 - Specifies which model to use

Application Code

With our configuration in place, we can write a simple application to test the integration. Here's a basic example using Spring's CommandLineRunner to prompt the model after startup:

@SpringBootApplication
public class Application {

    public static void main(String[] args) {
        SpringApplication.run(Application.class, args);
    }

    @Bean
    CommandLineRunner commandLineRunner(ChatClient.Builder builder) {
        return args -> {
            var client = builder.build();
            String response = client.prompt("When was Docker created?")
                    .call()
                    .content();

            System.out.println(response);
        };
    }
}

This minimal example:

Creates a Spring Boot application
Configures a CommandLineRunner that executes after startup
Uses Spring AI's ChatClient to send a prompt to the AI model
Prints the model's response to the console

The beauty of this approach is that you're using Spring AI's abstractions, which means your code remains identical whether you're using a local model or a cloud-based one. If you later decide to switch to actual OpenAI or another provider, you only need to change your configuration, not your code.

Running the Application

To run the application, make sure Docker Desktop is running with your model pulled, then use:

./mvnw spring-boot:run

You should see output similar to:

Docker was officially created in July of 2013. Here's a breakdown of the key milestones:

1. The project began as an internal project at dotCloud, a Platform-as-a-Service company
2. Solomon Hykes presented Docker at PyCon in March 2013 with the famous "Docker in 5 minutes" demo
3. The open-source release was in March 2013
4. Docker, Inc. (the company) was officially formed in July 2013 when dotCloud pivoted to focus on Docker

Docker's containerization technology quickly gained popularity because it solved many deployment challenges by packaging applications with their dependencies, making them portable across different environments.

Beyond Basic Integration

Once you have the basic integration working, you can expand your application in several ways:

Building a Conversational Interface

Instead of one-off prompts, you can create a conversational interface by adding memory:

@RestController
public class ChatController {

    private final ChatClient chatClient;
    private final InMemoryChatMemory memory = new InMemoryChatMemory();

    public ChatController(ChatClient.Builder builder) {
        this.chatClient = builder
                .defaultAdvisors(new MessageChatMemoryAdvisor(memory))
                .build();
    }

    @PostMapping("/chat")
    public String chat(@RequestBody String prompt) {
        return chatClient.prompt()
                .user(prompt)
                .call()
                .content();
    }
}

Streaming Responses

For a more interactive experience, you can stream responses as they're generated:

@GetMapping(value = "/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public Flux<String> stream(@RequestParam String prompt) {
    return chatClient.prompt()
            .user(prompt)
            .stream()
            .content();
}

Performance Considerations

Local AI performance depends heavily on your hardware. Apple Silicon Macs with 16GB+ RAM generally provide good performance with smaller models like Gemma 3. If you encounter performance issues:

Try smaller models (Gemma 3 is a good starting point)
Close other memory-intensive applications
Adjust Docker Desktop's resource allocation
Consider using quantized models (smaller, faster, but slightly less accurate)

Troubleshooting Common Issues

Missing Docker Model Command

If your system doesn't recognize the docker model command, create a symlink:

ln -s /Applications/Docker.app/Contents/Resources/cli-plugins/docker-model ~/.docker/cli-plugins/docker-model

Connection Refused Errors

If your Spring application can't connect to the Docker Model Runner API, check:

Docker Desktop is running
The "Enable host-side TCP support" option is enabled
The model is running (check with docker model list)
Your base URL configuration is correct

Conclusion

Running AI models locally with Docker Model Runner and Spring AI creates a powerful combination for development. It lets you:

Keep sensitive data on your machine
Develop without API keys or rate limits
Maintain full control over your AI infrastructure
Use Spring's programming model for AI applications

While these locally-run models may not match the capabilities of the latest cloud-based offerings, they're more than sufficient for many applications and provide an excellent development environment.

As Docker expands Model Runner support to more platforms and as open-source models continue to improve, this local approach to AI will become increasingly viable even for production use cases.

Ready to try it yourself? Check out the complete example on GitHub and the Docker Model Runner documentation.

Happy coding!

Subscribe to my newsletter.