oBeaver

Build local. Keep it yours.本地构建，数据由你掌控。

Local-first LLM inference toolkit. Run models on your own hardware with an OpenAI-compatible API — no cloud, no API keys, no data leaving your machine.本地优先的 LLM 推理工具包。在自有硬件上运行模型，提供 OpenAI 兼容 API——无需云端、无需 API Key、数据不离开本机。

Get Started →快速开始 → GitHub ↗GitHub ↗

Quick Install快速安装

Requires Python 3.12+. Clone the repo and install — all dependencies are included:需要 Python 3.12+。克隆仓库并安装——所有依赖已包含：

bash

git clone https://github.com/microsoft/obeaver.git
cd obeaver
pip install -e .

Then set up the default model directory and verify your environment:然后设置默认模型目录并验证环境：

bash

# View oBeaver banner and configuration info
obeaver

# Set the default model save location
obeaver init

# Check environment and dependencies
obeaver check

💡 Tip: Running obeaver with no arguments displays the welcome banner showing your current model directory paths. Use obeaver init to configure where models are saved — all sub-folders (ORT, Foundry Local, HF cache) are created automatically. Use obeaver models to list all local models.💡 提示：直接运行 obeaver 不带参数会显示欢迎横幅，包含当前模型目录路径。使用 obeaver init 配置模型保存位置——所有子目录（ORT、Foundry Local、HF cache）会自动创建。使用 obeaver models 列出所有本地模型。

Why oBeaver?为什么选择 oBeaver？

⚙️

Dual Inference Engine双推理引擎

Foundry Local for macOS/Windows with NPU acceleration, or ONNX Runtime GenAI for Linux — choose the right backend for your platform.macOS/Windows 使用 Foundry Local（含 NPU 加速），Linux 使用 ONNX Runtime GenAI——为你的平台选择合适的后端。

📡

OpenAI-Compatible APIOpenAI 兼容 API

Drop-in /v1/chat/completions and /v1/embeddings endpoints. Switch between local and cloud by changing one URL.即插即用的 /v1/chat/completions 和 /v1/embeddings 端点。只需修改一个 URL 即可在本地与云端间切换。

🛠️

Tool Calling工具调用

Full OpenAI function-calling support for building agentic workflows that run entirely on-device.完整的 OpenAI 函数调用支持，构建完全在本机运行的 Agentic 工作流。

🔢

Text Embeddings文本嵌入

Dedicated ONNX embedding engine with Qwen3-Embedding and EmbeddingGemma for RAG pipelines.专用 ONNX 嵌入引擎，支持 Qwen3-Embedding 和 EmbeddingGemma，适用于 RAG 管线。

👁️

Vision-Language Models视觉语言模型

Auto-detected VL models — send images alongside text for multimodal understanding with Qwen VL.自动检测 VL 模型——通过 Qwen VL 同时发送图片和文本进行多模态理解。

🐳

Docker ReadyDocker 就绪

CPU container for linux/amd64 and linux/arm64 — deploy locally, in CI, or on headless servers.支持 linux/amd64 和 linux/arm64 的 CPU 容器——在本地、CI 或无头服务器上部署。

Dual Engine Architecture双引擎架构

Engine引擎	Platform平台	Description说明
Foundry Local	macOS, WindowsmacOS、Windows	Powered by Microsoft Foundry Local. Automatic model download, hardware acceleration (NPU > GPU > CPU).基于 Microsoft Foundry Local。支持自动下载模型，并按 NPU > GPU > CPU 的顺序优先使用硬件加速。
ORT	macOS, Windows, LinuxmacOS、Windows、Linux	Powered by ONNX Runtime GenAI. Loads a local `.onnx` model directory — fully offline.基于 ONNX Runtime GenAI。加载本地 `.onnx` 模型目录，完整离线运行。
Embedding嵌入	All platforms全平台	ONNX-only text embeddings for RAG & retrieval pipelines.仅基于 ONNX 的文本嵌入能力，适用于 RAG 与检索流水线。

Quick Example快速示例

Chat in the terminal在终端中聊天

bash

obeaver run phi-4-mini

Serve an OpenAI-compatible API启动 OpenAI 兼容 API 服务

bash

obeaver serve Phi-4-mini

Use with the OpenAI Python SDK使用 OpenAI Python SDK

python

from openai import OpenAI

client = OpenAI(base_url="http://127.0.0.1:18000/v1", api_key="unused")

response = client.chat.completions.create(
    model="Phi-4-mini",
    messages=[{"role": "user", "content": "What is the capital of France?"}],
    stream=True,
)
for chunk in response:
    print(chunk.choices[0].delta.content or "", end="", flush=True)

💡 Tip: Run obeaver models to list all local models. On macOS/Windows, run foundry model list to browse the full Foundry Local catalog.💡 提示：运行 obeaver models 列出所有本地模型。在 macOS/Windows 上，运行 foundry model list 浏览完整的 Foundry Local 目录。