oBeaver

Build local. Keep it yours.本地构建,数据由你掌控。

Local-first LLM inference toolkit. Run models on your own hardware with an OpenAI-compatible API — no cloud, no API keys, no data leaving your machine.本地优先的 LLM 推理工具包。在自有硬件上运行模型,提供 OpenAI 兼容 API——无需云端、无需 API Key、数据不离开本机。

Quick Install快速安装

Requires Python 3.12+. Clone the repo and install — all dependencies are included:需要 Python 3.12+。克隆仓库并安装——所有依赖已包含:

bash
git clone https://github.com/microsoft/obeaver.git
cd obeaver
pip install -e .

Then set up the default model directory and verify your environment:然后设置默认模型目录并验证环境:

bash
# View oBeaver banner and configuration info
obeaver

# Set the default model save location
obeaver init

# Check environment and dependencies
obeaver check

💡 Tip: Running obeaver with no arguments displays the welcome banner showing your current model directory paths. Use obeaver init to configure where models are saved — all sub-folders (ORT, Foundry Local, HF cache) are created automatically. Use obeaver models to list all local models.💡 提示:直接运行 obeaver 不带参数会显示欢迎横幅,包含当前模型目录路径。使用 obeaver init 配置模型保存位置——所有子目录(ORT、Foundry Local、HF cache)会自动创建。使用 obeaver models 列出所有本地模型。

Why oBeaver?为什么选择 oBeaver?

Dual Engine Architecture双引擎架构

Engine引擎 Platform平台 Description说明
Foundry Local macOS, WindowsmacOS、Windows Powered by Microsoft Foundry Local. Automatic model download, hardware acceleration (NPU > GPU > CPU).基于 Microsoft Foundry Local。支持自动下载模型,并按 NPU > GPU > CPU 的顺序优先使用硬件加速。
ORT macOS, Windows, LinuxmacOS、Windows、Linux Powered by ONNX Runtime GenAI. Loads a local .onnx model directory — fully offline.基于 ONNX Runtime GenAI。加载本地 .onnx 模型目录,完整离线运行。
Embedding嵌入 All platforms全平台 ONNX-only text embeddings for RAG & retrieval pipelines.仅基于 ONNX 的文本嵌入能力,适用于 RAG 与检索流水线。

Quick Example快速示例

Chat in the terminal在终端中聊天

bash
obeaver run phi-4-mini

Serve an OpenAI-compatible API启动 OpenAI 兼容 API 服务

bash
obeaver serve Phi-4-mini

Use with the OpenAI Python SDK使用 OpenAI Python SDK

python
from openai import OpenAI

client = OpenAI(base_url="http://127.0.0.1:18000/v1", api_key="unused")

response = client.chat.completions.create(
    model="Phi-4-mini",
    messages=[{"role": "user", "content": "What is the capital of France?"}],
    stream=True,
)
for chunk in response:
    print(chunk.choices[0].delta.content or "", end="", flush=True)

💡 Tip: Run obeaver models to list all local models. On macOS/Windows, run foundry model list to browse the full Foundry Local catalog.💡 提示:运行 obeaver models 列出所有本地模型。在 macOS/Windows 上,运行 foundry model list 浏览完整的 Foundry Local 目录。