One Interface to Rule Them All: Open WebUI with Gemini and Home Assistant

I've been running Home Assistant for a while now — managing blinds, gates, a pool cover, AC units, and probably too many Shelly devices. It works great. But I kept wanting something more: a single conversational interface where I could manage my home, ask questions, plan my day, and eventually tie everything together. Not just a smart home dashboard — a personal AI assistant that happens to control my house.

That's what led me to wire up Open WebUI with Google's Gemini and Home Assistant's MCP server. The result is a chat interface running on a Shelly Wall Display XL in my kitchen where anyone in the family can type (and eventually talk to) an assistant that knows our home inside out.

This post walks through exactly how I set it up, including every wrong turn and dead end so you don't have to repeat them.

The Architecture

Before diving into configs, it helps to understand how the pieces connect:

Shelly Wall Display XL (browser)
        ↓
   Open WebUI (port 8080, same machine as HA)
        ↓
   Gemini 2.5 Flash (via OpenAI-compatible API)
        ↓
   ha-mcp Add-on (70+ tools for controlling devices, fuzzy entity search)
        ↓
   Home Assistant (the actual smart home)

Open WebUI is the chat interface. Gemini is the brain. The ha-mcp add-on gives Gemini hands — it can search for entities, read device states, call services, manage automations, and interact with Home Assistant on your behalf. The whole thing runs on the same machine as Home Assistant.

There's also an integration in the other direction — Home Assistant's Assist pipeline can route conversations to Open WebUI, which means you can use HA's voice satellites and built-in conversation UI to reach the same Gemini-powered assistant. But the primary interface for us is Open WebUI running in a browser on the wall display.

Step 1: Getting Open WebUI Running

I run Open WebUI as a Docker container on the same machine as Home Assistant. If you're on Home Assistant OS, you can install it as an add-on. If you're running supervised or container, just add it to your Docker setup.

The key thing is making sure Open WebUI is reachable from your local network. In my case it runs on port 8080:

http://homeassistant.local:8080

After first launch, create an admin account in the Open WebUI interface. You'll need this to configure everything else.

Step 2: Connecting Gemini

This was the easiest part of the whole setup. Open WebUI supports Gemini natively through its OpenAI-compatible endpoint feature.

Get a Gemini API key from Google AI Studio
In Open WebUI, go to Settings → Connections
Add a new OpenAI-compatible connection with the Gemini API endpoint
Paste your API key

Open WebUI will auto-detect all available models. I'm using Gemini 2.5 Flash — it's fast, cheap (basically free for personal use), and handles home automation commands well. The 2.5 Pro model is more capable but slower and pricier, and for "turn on the kitchen lights" you really don't need it.

Step 3: The ha-mcp Add-on — Giving Gemini Hands

This is where things get interesting. An LLM without tools can only talk about your smart home. With the ha-mcp (Home Assistant MCP) add-on, it can actually do things — and crucially, it can discover things too.

Why ha-mcp Over the Basic Approach

A common approach is to inject the full entity list into the system prompt using HA's {{exposed_entities}} template — the model sees every device name and entity ID upfront. That works for small setups, but it has real limitations: the prompt gets huge with hundreds of entities, the model can't discover new devices added after the prompt was written, and you end up playing whack-a-mole with entity ID formatting.

The ha-mcp add-on (by homeassistant-ai) is a much better solution. It provides 70+ tools including fuzzy entity search with BM25 scoring. Instead of stuffing entity IDs into the prompt, the model searches for entities by natural language — "pool temperature", "living room light" — and gets back the correct entity IDs. It also supports automation management, calendars, todo lists, dashboards, and more.

Installing ha-mcp

In Home Assistant, go to Settings → Add-ons → Add-on Store
Add the ha-mcp repository: https://github.com/homeassistant-ai/ha-mcp
Install "Home Assistant MCP Server"
Start the add-on and enable "Start on boot" and "Watchdog"

Once running, the add-on generates a unique MCP endpoint URL with a cryptographic secret path. You can find it in the add-on's Log tab — it will look something like:

http://homeassistant.local:9583/private_HJ4H-8nGTkZZGaxmZwNbmw

Save this URL — you'll need it for the next step.

Connecting ha-mcp to Open WebUI

Open WebUI v0.8.12+ supports MCP natively:

In Open WebUI: Admin → Settings → Integrations
Under "Manage Tool Servers", click the + button
Give it a name like "Home Assistant"
Select type "MCP Streamable HTTP"
Paste the MCP endpoint URL from the add-on logs
Save and verify the connection

If it connects successfully, you'll see "Home Assistant" listed under Manage Tool Servers with a green toggle.

The Critical Step: Enabling Tools on the Model

Here's something that tripped me up for a while. Adding the MCP server to Open WebUI doesn't automatically make it available to your model. You also need to enable it at the model level:

Go to Admin → Settings → Models
Find your Gemini model and click the edit (pencil) icon
Scroll down to the Tools section
Check the "Home Assistant" checkbox
Click "Save & Update"

If you skip this step, Gemini will happily chat with you but won't have access to any Home Assistant tools. I spent an embarrassing amount of time wondering why some chats could control HA and others couldn't — turns out the tool toggle is per-chat and can be set as a default on the model. Setting it at the model level means every new chat gets it automatically.

Step 4: The System Prompt — Much Simpler with ha-mcp

Because ha-mcp has built-in entity search, the system prompt is much simpler than it would be with a basic MCP server. You don't need to inject the full entity list into the prompt — the model just searches for what it needs.

Here's what I'm using:

You are a helpful home assistant AI. You have access to Home Assistant
tools via MCP.

IMPORTANT: When the user asks about any device, sensor, entity, or
wants to control something:
1. ALWAYS use ha_search_entities() first to find the correct entity ID.
   Never guess entity IDs.
2. Use the entity ID returned by the search to call ha_get_state() or
   ha_call_service().
3. If the search returns multiple matches, pick the most relevant one
   or ask the user to clarify.

Examples:
- "What's the pool temperature?" → First call
  ha_search_entities("pool temperature"), then use the returned entity
  ID with ha_get_state().
- "Turn on the living room lights" → First call
  ha_search_entities("living room light"), then use ha_call_service()
  with the correct entity ID.

Always be helpful and provide clear, concise answers about the home state.

Set this in the model's System Prompt field (the same edit page where you enabled the Tools checkbox).

A few things worth noting:

"ALWAYS use ha_search_entities() first" — This is the key instruction. Without it, Gemini will try to guess entity IDs like sensor.pool_temperature and get 404 errors. The fuzzy search in ha-mcp handles natural language queries and returns the actual entity IDs.

No entity list injection needed — With the old approach, you had to template in {{exposed_entities}} which bloated the prompt. With ha-mcp, the model discovers entities on demand, keeping the prompt lean and the context window available for actual conversation.

Concise answers — Critical if you're also using this through Assist's voice pipeline. Nobody wants to listen to a three-paragraph response about their living room lights.

Step 5: Wiring It Into Home Assistant Assist (Optional but Useful)

If you want HA's built-in conversation UI and voice satellites to use your Open WebUI + Gemini setup, you'll need the OpenWebUI Conversation integration:

In HACS, add https://github.com/TheRealPSV/ha-openwebui-conversation as a custom repository
Download "OpenWebUI Conversation" and restart HA
Go to Settings → Devices & Services → Add Integration → "OpenWebUI Conversation"
Configure:
- Base URL: http://localhost:8080 (or whatever your Open WebUI address is)
- API Key: Generate one in Open WebUI → Settings → Account → API Keys
- Verify SSL: OFF (for local HTTP connections)
- Strip Markdown: ON (essential for voice — Assist can't render markdown)
Go to Settings → Voice assistants and set your conversation agent to "OpenWebUI Conversation"

Tip: Keep "Prefer handling commands locally" enabled. Simple commands like "turn on the lights" get handled instantly by HA's built-in intents without a round-trip to Gemini. Complex or free-form queries fall through to Open WebUI.

Note that the Assist pipeline and the MCP tool approach solve the same problem differently, and you don't need both. The Assist pipeline (via the Extended OpenAI Conversation integration) injects entity lists directly into the prompt and sends it straight to Gemini — great for voice and the built-in HA conversation UI. The MCP tool approach (via ha-mcp and Open WebUI) gives the model tools to discover entities on demand — better for a chat interface where you want the model to explore and learn. I use both: Assist for voice satellites and the HA conversation UI, and MCP via Open WebUI for the kitchen wall display.

The End Result

The setup works remarkably well. I can walk up to the Wall Display in the kitchen and type things like:

"What devices measure temperature?" → Searches entities, finds all temperature sensors, reports their values
"Close the blinds in William's room" → Searches for the entity, asks for confirmation, then calls the service
"What should I cook for dinner with chicken and rice?" → Just answers like a normal AI assistant
"Is the garage gate open?" → Searches for the entity, checks its state, responds

The transition between "smart home controller" and "general assistant" is seamless. It doesn't try to route everything through Home Assistant — it only reaches for the HA tools when the request is actually about the smart home.

The fuzzy entity search is the real game-changer here. Ask for "pool temperature" and it gracefully reports no matching sensor exists, instead of crashing on a guessed entity ID. Ask for "temperature" and it discovers every temperature sensor in your system. No need to memorize entity IDs or maintain a massive prompt with the full entity list.

The Gotchas

Let me save you some debugging time.

HTTPS Breaking MCP Connections

If you enable HTTPS on Home Assistant (via SSL certificates in configuration.yaml), the MCP connection from Open WebUI may break silently. Since both Open WebUI and ha-mcp are running on the same local network, you likely don't need HTTPS for this internal communication. If you hit connection issues, check whether your http: config in configuration.yaml has SSL enabled and consider whether it's necessary for your setup.

API Key Confusion

The OpenWebUI Conversation integration (for Assist) uses an API key generated inside Open WebUI (Settings → Account → API Keys), not your Gemini API key, and not a Home Assistant long-lived access token. I mixed these up multiple times. The key should start with sk-. If you're getting 401 errors, this is almost certainly the issue.

You can verify the stored key in /config/.storage/core.config_entries on the HA side.

The "Model Not Found" Error

Error code: 400 - {'detail': 'Model not found'}

This shows up when the OpenWebUI Conversation integration's model selection gets out of sync. Fix it by going to the integration's Configure dialog and re-selecting the model from the dropdown.

SSL Verification on Local Connections

If Open WebUI is running on the same machine or your local network over plain HTTP, you must set Verify SSL to OFF in the integration. Leaving it on causes silent failures — you'll get a generic "something went wrong" error with no useful details.

HTTPS and Microphone Access — The Unsolved Problem

This is the big one, and it's still a work in progress for me.

Browsers require HTTPS to access the microphone. My Shelly Wall Display XL runs a browser to show Open WebUI. But here's the problem:

Self-signed certificates don't work — The Shelly Wall Display XL's browser doesn't accept self-signed certs. I tried setting up nginx as a reverse proxy with a self-signed certificate, and while it technically works on a regular desktop browser (after clicking through the warning), the Wall Display just refuses the connection.
Nabu Casa would work — It provides valid HTTPS URLs. But I don't want to depend on the cloud for something running on my local network. Every request would route through Nabu Casa's servers and back, adding latency to what should be a local operation.
Let's Encrypt on a local domain — Not straightforward since homeassistant.local isn't a real domain you can get certificates for.

So right now, the text chat works perfectly on the Wall Display, but voice input doesn't. I'm still exploring solutions — a local ACME server, or possibly running an ESPHome voice satellite (like an M5Stack Atom Echo) next to the display and using it purely as a mic input.

If you've solved HTTPS on a local-only network for a device that doesn't accept self-signed certs, I'd love to hear about it.

Markdown in Voice Responses

When using the Assist pipeline, Gemini's responses come back with markdown formatting — bold text, bullet points, code blocks. This sounds terrible when read aloud by a TTS engine. The "Strip Markdown" option in the OpenWebUI Conversation integration fixes this, but it's easy to forget to enable it.

Per-Chat Tool Toggle

Even after enabling the Home Assistant tool as a default on the model, Open WebUI has a per-chat tool toggle. It's the wrench/chain icon next to the "+" button at the bottom of the chat input. If a new chat isn't using HA tools, check that this toggle is on. Setting the default at the model level (Step 3) prevents this from being an issue for new chats, but existing chats may still have it off.

What's Next

The immediate priorities are:

Solving the HTTPS/microphone problem for the Wall Display so voice input works locally
Adding more context to the prompt — calendar events, weather forecasts, family routines
Exploring automations triggered by conversation — "remind me to close the pool cover at sunset" that actually creates an HA automation
Adding a pool temperature sensor — turns out the Shelly Wall Display XL doesn't have a built-in temperature sensor (the -275°C reading was a giveaway). A Shelly BLU H&T paired via Bluetooth would be the natural companion.

The vision is a single interface that knows my home, my schedule, and can take action on both. We're about 70% there. The bones are solid — it's the polish that takes time.

Running Home Assistant OS, Open WebUI v0.8.12, Gemini 2.5 Flash, ha-mcp v7.3.0, on a single machine. Wall display is a Shelly Wall Display XL.