oBeaver
Build local. Keep it yours.本地构建,数据由你掌控。
Local-first LLM inference toolkit. Run models on your own hardware with an OpenAI-compatible API — no cloud, no API keys, no data leaving your machine.本地优先的 LLM 推理工具包。在自有硬件上运行模型,提供 OpenAI 兼容 API——无需云端、无需 API Key、数据不离开本机。
Quick Install快速安装
Requires Python 3.12+. Clone the repo and install — all dependencies are included:需要 Python 3.12+。克隆仓库并安装——所有依赖已包含:
git clone https://github.com/microsoft/obeaver.git
cd obeaver
pip install -e .
Then set up the default model directory and verify your environment:然后设置默认模型目录并验证环境:
# View oBeaver banner and configuration info
obeaver
# Set the default model save location
obeaver init
# Check environment and dependencies
obeaver check
💡 Tip: Running obeaver with no arguments displays the welcome banner showing your current model directory paths. Use obeaver init to configure where models are saved — all sub-folders (ORT, Foundry Local, HF cache) are created automatically. Use obeaver models to list all local models.💡 提示:直接运行 obeaver 不带参数会显示欢迎横幅,包含当前模型目录路径。使用 obeaver init 配置模型保存位置——所有子目录(ORT、Foundry Local、HF cache)会自动创建。使用 obeaver models 列出所有本地模型。
Why oBeaver?为什么选择 oBeaver?
Dual Inference Engine双推理引擎
Foundry Local for macOS/Windows with NPU acceleration, or ONNX Runtime GenAI for Linux — choose the right backend for your platform.macOS/Windows 使用 Foundry Local(含 NPU 加速),Linux 使用 ONNX Runtime GenAI——为你的平台选择合适的后端。
OpenAI-Compatible APIOpenAI 兼容 API
Drop-in /v1/chat/completions and /v1/embeddings endpoints. Switch between local and cloud by changing one URL.即插即用的 /v1/chat/completions 和 /v1/embeddings 端点。只需修改一个 URL 即可在本地与云端间切换。
Tool Calling工具调用
Full OpenAI function-calling support for building agentic workflows that run entirely on-device.完整的 OpenAI 函数调用支持,构建完全在本机运行的 Agentic 工作流。
Text Embeddings文本嵌入
Dedicated ONNX embedding engine with Qwen3-Embedding and EmbeddingGemma for RAG pipelines.专用 ONNX 嵌入引擎,支持 Qwen3-Embedding 和 EmbeddingGemma,适用于 RAG 管线。
Vision-Language Models视觉语言模型
Auto-detected VL models — send images alongside text for multimodal understanding with Qwen VL.自动检测 VL 模型——通过 Qwen VL 同时发送图片和文本进行多模态理解。
Docker ReadyDocker 就绪
CPU container for linux/amd64 and linux/arm64 — deploy locally, in CI, or on headless servers.支持 linux/amd64 和 linux/arm64 的 CPU 容器——在本地、CI 或无头服务器上部署。
Dual Engine Architecture双引擎架构
| Engine引擎 | Platform平台 | Description说明 |
|---|---|---|
| Foundry Local | macOS, WindowsmacOS、Windows | Powered by Microsoft Foundry Local. Automatic model download, hardware acceleration (NPU > GPU > CPU).基于 Microsoft Foundry Local。支持自动下载模型,并按 NPU > GPU > CPU 的顺序优先使用硬件加速。 |
| ORT | macOS, Windows, LinuxmacOS、Windows、Linux | Powered by ONNX Runtime GenAI. Loads a local .onnx model directory — fully offline.基于 ONNX Runtime GenAI。加载本地 .onnx 模型目录,完整离线运行。 |
| Embedding嵌入 | All platforms全平台 | ONNX-only text embeddings for RAG & retrieval pipelines.仅基于 ONNX 的文本嵌入能力,适用于 RAG 与检索流水线。 |
Quick Example快速示例
Chat in the terminal在终端中聊天
obeaver run phi-4-mini
Serve an OpenAI-compatible API启动 OpenAI 兼容 API 服务
obeaver serve Phi-4-mini
Use with the OpenAI Python SDK使用 OpenAI Python SDK
from openai import OpenAI
client = OpenAI(base_url="http://127.0.0.1:18000/v1", api_key="unused")
response = client.chat.completions.create(
model="Phi-4-mini",
messages=[{"role": "user", "content": "What is the capital of France?"}],
stream=True,
)
for chunk in response:
print(chunk.choices[0].delta.content or "", end="", flush=True)
💡 Tip: Run obeaver models to list all local models. On macOS/Windows, run foundry model list to browse the full Foundry Local catalog.💡 提示:运行 obeaver models 列出所有本地模型。在 macOS/Windows 上,运行 foundry model list 浏览完整的 Foundry Local 目录。