Features功能特性
oBeaver goes beyond simple chat — it supports embeddings, tool calling, vision-language models, and model conversion to cover the full spectrum of local AI development.oBeaver 不止于简单对话——支持嵌入、工具调用、视觉语言模型和模型转换,覆盖本地 AI 开发的全部场景。
Dual Inference Engine双推理引擎
oBeaver supports two inference backends, automatically selecting the right one for your platform:oBeaver 支持两种推理后端,自动为你的平台选择合适的后端:
| Engine引擎 | Platform平台 | Description说明 |
|---|---|---|
| Foundry Local | macOS, Windows | Powered by Microsoft Foundry Local. Automatic model download, hardware acceleration (NPU > GPU > CPU), launched via catalog alias.由 Microsoft Foundry Local 驱动。自动下载模型、硬件加速(NPU > GPU > CPU),通过 catalog alias 启动。 |
| ORT | macOS, Windows, Linux | Powered by ONNX Runtime GenAI. Loads a local .onnx model directory — fully offline, zero cloud dependency.由 ONNX Runtime GenAI 驱动。加载本地 .onnx 模型目录——完全离线、零云端依赖。 |
| Condition条件 | Engine引擎 | Model argument模型参数 |
|---|---|---|
| macOS / Windows (default)macOS / Windows(默认) | Foundry Local | Catalog alias (e.g. Phi-4-mini)Catalog alias(如 Phi-4-mini) |
--engine ort or Linux (default)--engine ort 或 Linux(默认) | ONNX Runtime GenAI | Local directory path本地目录路径 |
embed / serve-embed | Embedding Engine (ONNX)嵌入引擎(ONNX) | Local model directory本地模型目录 |
Linux note: Foundry Local is not available on Linux. The engine is fixed to ort; passing --engine foundry will exit with an error.Linux 注意:Linux 平台不支持 Foundry Local。引擎固定为 ort;传入 --engine foundry 将报错退出。
Text Embeddings文本嵌入
The embedding engine is ONNX-only — no --engine flag needed. Perfect for RAG & retrieval pipelines.嵌入引擎仅支持 ONNX——无需 --engine 参数。非常适合 RAG 和检索管线。
CLI — One-shot EmbeddingCLI — 单次嵌入
obeaver embed ./models/Qwen3-Embedding-0.6B "Hello, world!"
# Interactive loop
obeaver embed ./models/embeddinggemma-300m-ONNX
Embedding Server嵌入服务
obeaver serve-embed ./models/Qwen3-Embedding-0.6B # default port 18001
obeaver serve-embed ./models/embeddinggemma-300m-ONNX -p 8002
Use with OpenAI SDK使用 OpenAI SDK
from openai import OpenAI
client = OpenAI(base_url="http://127.0.0.1:18001/v1", api_key="unused")
response = client.embeddings.create(
model="Qwen3-Embedding-0.6B",
input=["Hello, world!", "Embeddings are useful."],
)
for item in response.data:
print(f"index={item.index} dim={len(item.embedding)}")
Tool Calling (Agentic Workflows)工具调用(Agentic 工作流)
Both engines support the OpenAI function-calling interface — build agents that plan, call tools, and reason entirely on-device.两个引擎均支持 OpenAI 函数调用接口——在设备上构建能规划、调用工具和推理的 Agent。
How It Works工作原理
| Engine引擎 | Strategy策略 |
|---|---|
| Foundry Local | Tools forwarded natively via the standard OpenAI tools parameter. Response may carry native tool_calls or functools[...] format — both normalised automatically.工具定义通过标准 OpenAI tools 参数原生转发。响应可能携带原生 tool_calls 字段或 functools[...] 格式——两种格式均自动归一化。 |
| ORT | Tools serialised as JSON Schema inside the system prompt. Model replies with a <tool_call>{...}</tool_call> block; parse_tool_call() extracts and validates it.工具以 JSON Schema 形式注入系统提示词。模型以 <tool_call>{...}</tool_call> 块回复,parse_tool_call() 负责提取和验证。 |
Supported Output Formats (auto-detected)支持的输出格式(自动检测)
| Format格式 | Example示例 |
|---|---|
<tool_call> block块 | <tool_call>{"name": "get_weather", "arguments": {"city": "Paris"}}</tool_call> |
| Phi-3 nativePhi-3 原生格式 | <|function_calls|>{...}<|/function_calls|> |
| Mistral-styleMistral 风格 | <functioncall>{...}</functioncall> |
| Markdown code blockMarkdown 代码块 | ```json {...} ``` |
| OpenAI-legacy wrapperOpenAI 旧版包装 | {"function_call": {"name": ..., "arguments": ...}} |
| Bare JSONBare JSON | {"name": "...", "arguments": {...}} |
Example: Two-turn Tool-Calling Agent示例:两轮工具调用 Agent
from openai import OpenAI
import json
client = OpenAI(base_url="http://127.0.0.1:18000/v1", api_key="unused")
tools = [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a city",
"parameters": {
"type": "object",
"properties": {"city": {"type": "string"}},
"required": ["city"],
},
},
}]
messages = [{"role": "user", "content": "What is the weather in Tokyo?"}]
# Turn 1 — model decides to call a tool
resp = client.chat.completions.create(model="Phi-4-mini", messages=messages, tools=tools)
choice = resp.choices[0]
if choice.finish_reason == "tool_calls":
tool_call = choice.message.tool_calls[0]
args = json.loads(tool_call.function.arguments)
# Execute your function
weather_result = {"city": args["city"], "temperature": "18°C", "condition": "Sunny"}
# Turn 2 — send result back
messages += [
choice.message,
{"role": "tool", "tool_call_id": tool_call.id, "content": json.dumps(weather_result)},
]
final = client.chat.completions.create(model="Phi-4-mini", messages=messages)
print(final.choices[0].message.content)
Vision-Language Models视觉语言模型
VL models auto-switch to the ORT engine when detected. Send images (local or remote) alongside text:VL 模型检测到时会自动切换到 ORT 引擎。可同时发送图片(本地或远程)和文本:
# Launch VL model (auto-detected)
obeaver serve ./models/Qwen3-VL-2B-Instruct_VL_ONNX_INT4_CPU
# Send a local image
curl -s http://127.0.0.1:18000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"messages": [{
"role": "user",
"content": [
{"type": "image_url", "image_url": {"url": "./cat.jpg"}},
{"type": "text", "text": "Describe this image"}
]
}]
}'
Web DashboardWeb 仪表盘
Run obeaver dashboard to launch a real-time monitoring dashboard at http://127.0.0.1:1573/:运行 obeaver dashboard 启动实时监控仪表盘,地址为 http://127.0.0.1:1573/:
- Model Selector模型选择器 — switch between cached models at runtime; NPU-accelerated models are marked with ⚡运行时切换已缓存的模型;NPU 加速模型标有 ⚡ 标识
- System Info Bar系统信息栏 — model name, engine type, platform, Python version, live health模型名称、引擎类型、平台、Python 版本、实时健康状态
- Memory Gauges内存仪表 — CPU, GPU, NPU, and process memory in real-timeCPU、GPU、NPU 和进程内存实时监控
- Inference Parameters推理参数 — temperature, top-p, top-k, max tokens with presetstemperature、top-p、top-k、max tokens 及预设
- Chat Interface聊天界面 — send messages with streaming response and performance stats (TTFT, tok/s)发送消息,支持流式响应和性能统计(TTFT、tok/s)
- Conversation History会话历史 — saved conversations and system prompt configuration保存的对话和系统提示词配置
- Server Logs服务器日志 — live request log with method, path, status, and timing实时请求日志,包含方法、路径、状态和耗时
- Export导出 — export conversations as JSON or Markdown将对话导出为 JSON 或 Markdown