Skip to main content

Run vMCP locally with the CLI

Most of the Virtual MCP Server (vMCP) guides deploy vMCP to Kubernetes using the operator. The ToolHive CLI also ships a local mode for the same vMCP runtime, so you can aggregate MCP servers from a ToolHive group on your workstation without a cluster.

Local mode is useful for:

  • Prototyping aggregation against a handful of MCP servers before committing to a VirtualMCPServer resource in Kubernetes.
  • Local development against the same optimizer, conflict resolution, and composite-tool logic that the operator uses.
  • Demos where you want a single endpoint fronting several MCP servers without the operator overhead.

The CLI command is thv vmcp with three subcommands: serve, init, and validate. They share the pkg/vmcp/ runtime with the Kubernetes operator, so configuration, tool aggregation, and the optimizer behave the same way in both environments.

Prerequisites

  • ToolHive CLI v0.24.0 or later. See Install the ToolHive CLI.
  • One or more MCP servers running in a ToolHive group. See Run MCP servers and Group management to create a group and add servers to it.
  • Docker or Podman if you plan to enable the Tier 2 semantic optimizer, which starts a Text Embeddings Inference (TEI) container.

Quick mode: aggregate a group in one command

When you already have a ToolHive group with running MCP servers, the fastest way to start vMCP is quick mode. Pass --group and vMCP generates a minimal in-memory configuration from the group at startup:

thv vmcp serve --group demo-tools

This is equivalent to creating a config file with:

  • groupRef: demo-tools
  • incomingAuth.type: anonymous
  • outgoingAuth.source: inline
  • aggregation.conflictResolution: prefix with prefix format {workload}_

vMCP binds to 127.0.0.1:4483 and aggregates every accessible MCP server in the group behind that single endpoint.

Quick mode is loopback-only

Because quick mode uses anonymous authentication, thv vmcp serve --group rejects non-loopback bind addresses. Valid values for --host are 127.0.0.1, localhost, or any other loopback address. To expose vMCP on another interface or add real authentication, switch to a config file (see below).

Connect a client

In another terminal, register the vMCP endpoint as a remote MCP server so your configured AI clients can discover it:

thv run http://localhost:4483/mcp --name local-vmcp

If you haven't registered any AI clients yet, run thv client setup first. See Client configuration for details.

Config-file mode: generate, review, serve

When you need incoming OIDC auth, a non-loopback bind, per-backend outgoing auth, or composite tools, use a configuration file. The init subcommand scaffolds one from an existing group, validate checks it, and serve runs it.

Step 1: generate a starter config

thv vmcp init enumerates the workloads in a group and emits a YAML config with one backend entry per accessible server:

thv vmcp init --group demo-tools --output vmcp.yaml

The --group flag is required. If you omit --output, the generated YAML is written to stdout so you can pipe it through less or redirect it yourself.

A generated file looks similar to this (comments trimmed):

vmcp.yaml
name: demo-tools-vmcp
groupRef: demo-tools

incomingAuth:
type: anonymous

outgoingAuth:
source: inline

aggregation:
conflictResolution: prefix
conflictResolutionConfig:
prefixFormat: "{workload}_"

backends:
- name: fetch
url: http://localhost:24162/mcp
transport: streamable-http
- name: osv
url: http://localhost:24163/mcp
transport: streamable-http

Edit the file to switch to OIDC, change the conflict-resolution strategy, add composite tool definitions, or enable the optimizer. See Configure vMCP servers for the complete schema.

Step 2: validate the config

Before starting the server, run the validator to catch syntax errors and schema violations:

thv vmcp validate --config vmcp.yaml

A valid config exits with status 0 and no output. An invalid config exits non-zero with a descriptive error, for example:

Error: validation failed: aggregation.conflictResolution: unknown strategy "preffix"

Step 3: start the server

Once the config validates, start vMCP:

thv vmcp serve --config vmcp.yaml

When --config is set, --group is ignored. Pass --host and --port to override the default 127.0.0.1:4483 bind address (non-loopback addresses are allowed in config-file mode).

Enable the optimizer

vMCP supports the same tool optimizer in local mode as in Kubernetes. The CLI exposes three tiers through flags on thv vmcp serve.

TierFlagWhat it doesExternal service
0(none)Pass-through: backends' tools are exposed as-is.None
1--optimizerFTS5 keyword optimizer: clients see only find_tool and call_tool, backed by an in-process SQLite FTS5 index.None
2--optimizer-embeddingTier 1 plus TEI semantic search. Implies --optimizer.Managed TEI container

Tier 2 starts a managed TEI container on first run and stops it when the server exits. Customize the model and image with --embedding-model and --embedding-image:

thv vmcp serve --group demo-tools \
--optimizer-embedding \
--embedding-model BAAI/bge-small-en-v1.5 \
--embedding-image ghcr.io/huggingface/text-embeddings-inference:cpu-latest

The defaults use the upstream CPU image; switch to a GPU image if you have one available. First-start container pulls can take 30-60 seconds.

Enable audit logging

Pass --enable-audit to log each incoming request with the default audit configuration. If your config file already defines an audit section, the flag has no effect and the file's configuration wins.

thv vmcp serve --config vmcp.yaml --enable-audit

See Audit logging for the event format and how to customize it from a config file.

Local CLI vs Kubernetes deployment

Aspectthv vmcp serveVirtualMCPServer CRD
Where it runsYour workstationKubernetes cluster
Backend discoveryLocal ToolHive groupMCPGroup with MCPServer/MCPRemoteProxy
Default bind127.0.0.1:4483Service in the cluster
AuthenticationAnonymous (quick mode) or as configuredFull OIDC, token exchange, embedded auth
LifecycleForeground process; Ctrl-C stopsOperator-managed Deployment
OptimizerFlag-driven (Tier 0/1/2)embeddingServerRef on the CRD

Both paths load the same vMCP runtime from pkg/vmcp/, so a config file that validates locally will behave the same way when the operator loads it as a ConfigMap. This makes local mode useful for iterating on aggregation, conflict resolution, and composite-tool definitions before moving to Kubernetes.

Next steps