Running AI Models Locally with Docker and Spring AI
Are you tired of sending your data to cloud APIs just to use AI in your applications? What if you could run powerful AI models right on your machine with zero API keys, zero data sharing, and zero monthly fees?
Docker Desktop recently introduced an exciting new feature called Model Runner that allows developers to run open-source AI models locally. When combined with Spring AI, this creates a powerful platform for building AI-powered applications that respect privacy, control costs, and simplify development workflows.
In this post, I'll show you how to use Docker's Model Runner feature with Spring Boot applications to create fully local AI experiences in just 15 minutes.
Why Run AI Models Locally?
Before diving into the technical details, let's consider why running AI models locally matters:
- Privacy - Your data never leaves your machine
- Cost control - No usage-based billing or subscription fees
- Reliability - No dependency on external API availability
- Development simplicity - Test and iterate without API keys or quotas
- Learning opportunity - Understand how AI models actually work
For Spring developers building modern applications, this local approach provides a compelling alternative to cloud-based AI services while maintaining all the benefits of Spring's programming model.
Understanding Docker Model Runner
Docker Model Runner is a plugin for Docker Desktop that allows you to:
- Pull open-source models from Docker Hub
- Run models directly from the command line
- Manage local model installations
- Interact with models via prompts or chat mode
- Access models via an OpenAI-compatible API endpoint
Currently, Docker Model Runner only works on Docker Desktop for Mac with Apple Silicon (M1/M2/M3/M4 chips), but support for other platforms is coming soon.
Setting Up Docker for Local AI
To get started, you'll need Docker Desktop 4.40 or later. Once installed, follow these steps:
- Open Docker Desktop
- Navigate to Settings → Features in development (Beta tab)
- Enable "Docker Model Runner"
- Also enable "Enable host-side TCP support" (leave the default port of 12434)
- Apply and restart Docker Desktop
Once configured, you can pull your first model using the Docker CLI:
docker model pull ai/gemma3
This will download Google's Gemma 3 model, which is a good balance of capability and resource usage. You can check which models you have installed with:
docker model list
To test your setup, try running the model in interactive mode:
docker model run ai/gemma3
This launches an interactive chat session where you can directly interact with the model right in your terminal:
> What is an interesting fact about Docker?
Docker was originally developed as an internal project at a company called dotCloud,
which was a Platform-as-a-Service company. The technology was later open-sourced in 2013
and became immensely popular, eventually leading to dotCloud pivoting their entire
business to focus on Docker. This pivot transformed the company into Docker, Inc.
Creating a Spring Boot Application with Spring AI
Now let's create a Spring Boot application that connects to this locally running model. The first step is to initialize a new project with the right dependencies:
- Visit https://start.spring.io
- Choose Maven, Java 17+, and the latest Spring Boot version
- Add the following dependencies:
- Spring Web
- Spring AI OpenAI (we'll use the OpenAI client since Docker exposes an OpenAI-compatible API)
Your pom.xml
should include these key dependencies:
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-openai-spring-boot-starter</artifactId>
</dependency>
<!-- In the dependency management section -->
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-bom</artifactId>
<version>${spring-ai.version}</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
Configuration
The crucial step is configuring Spring AI to connect to your local Docker model instead of the actual OpenAI API. Add these properties to your application.properties
file:
spring.ai.openai.api-key=_
spring.ai.openai.chat.base-url=http://localhost:12434/engines/llama.cpp
spring.ai.openai.chat.options.model=ai/gemma3
Let's break down what each property does:
spring.ai.openai.api-key=_
- We need to provide a value here because Spring AI expects it, but since we're not actually connecting to OpenAI, any value worksspring.ai.openai.chat.base-url=http://localhost:12434/engines/llama.cpp
- Points to the local Docker Model Runner API endpointspring.ai.openai.chat.options.model=ai/gemma3
- Specifies which model to use
Application Code
With our configuration in place, we can write a simple application to test the integration. Here's a basic example using Spring's CommandLineRunner
to prompt the model after startup:
@SpringBootApplication
public class Application {
public static void main(String[] args) {
SpringApplication.run(Application.class, args);
}
@Bean
CommandLineRunner commandLineRunner(ChatClient.Builder builder) {
return args -> {
var client = builder.build();
String response = client.prompt("When was Docker created?")
.call()
.content();
System.out.println(response);
};
}
}
This minimal example:
- Creates a Spring Boot application
- Configures a
CommandLineRunner
that executes after startup - Uses Spring AI's
ChatClient
to send a prompt to the AI model - Prints the model's response to the console
The beauty of this approach is that you're using Spring AI's abstractions, which means your code remains identical whether you're using a local model or a cloud-based one. If you later decide to switch to actual OpenAI or another provider, you only need to change your configuration, not your code.
Running the Application
To run the application, make sure Docker Desktop is running with your model pulled, then use:
./mvnw spring-boot:run
You should see output similar to:
Docker was officially created in July of 2013. Here's a breakdown of the key milestones:
1. The project began as an internal project at dotCloud, a Platform-as-a-Service company
2. Solomon Hykes presented Docker at PyCon in March 2013 with the famous "Docker in 5 minutes" demo
3. The open-source release was in March 2013
4. Docker, Inc. (the company) was officially formed in July 2013 when dotCloud pivoted to focus on Docker
Docker's containerization technology quickly gained popularity because it solved many deployment challenges by packaging applications with their dependencies, making them portable across different environments.
Beyond Basic Integration
Once you have the basic integration working, you can expand your application in several ways:
Building a Conversational Interface
Instead of one-off prompts, you can create a conversational interface by adding memory:
@RestController
public class ChatController {
private final ChatClient chatClient;
private final InMemoryChatMemory memory = new InMemoryChatMemory();
public ChatController(ChatClient.Builder builder) {
this.chatClient = builder
.defaultAdvisors(new MessageChatMemoryAdvisor(memory))
.build();
}
@PostMapping("/chat")
public String chat(@RequestBody String prompt) {
return chatClient.prompt()
.user(prompt)
.call()
.content();
}
}
Streaming Responses
For a more interactive experience, you can stream responses as they're generated:
@GetMapping(value = "/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public Flux<String> stream(@RequestParam String prompt) {
return chatClient.prompt()
.user(prompt)
.stream()
.content();
}
Performance Considerations
Local AI performance depends heavily on your hardware. Apple Silicon Macs with 16GB+ RAM generally provide good performance with smaller models like Gemma 3. If you encounter performance issues:
- Try smaller models (Gemma 3 is a good starting point)
- Close other memory-intensive applications
- Adjust Docker Desktop's resource allocation
- Consider using quantized models (smaller, faster, but slightly less accurate)
Troubleshooting Common Issues
Missing Docker Model Command
If your system doesn't recognize the docker model
command, create a symlink:
ln -s /Applications/Docker.app/Contents/Resources/cli-plugins/docker-model ~/.docker/cli-plugins/docker-model
Connection Refused Errors
If your Spring application can't connect to the Docker Model Runner API, check:
- Docker Desktop is running
- The "Enable host-side TCP support" option is enabled
- The model is running (check with
docker model list
) - Your base URL configuration is correct
Conclusion
Running AI models locally with Docker Model Runner and Spring AI creates a powerful combination for development. It lets you:
- Keep sensitive data on your machine
- Develop without API keys or rate limits
- Maintain full control over your AI infrastructure
- Use Spring's programming model for AI applications
While these locally-run models may not match the capabilities of the latest cloud-based offerings, they're more than sufficient for many applications and provide an excellent development environment.
As Docker expands Model Runner support to more platforms and as open-source models continue to improve, this local approach to AI will become increasingly viable even for production use cases.
Ready to try it yourself? Check out the complete example on GitHub and the Docker Model Runner documentation.
Happy coding!