Features功能特性

oBeaver goes beyond simple chat — it supports embeddings, tool calling, vision-language models, and model conversion to cover the full spectrum of local AI development.oBeaver 不止于简单对话——支持嵌入、工具调用、视觉语言模型和模型转换，覆盖本地 AI 开发的全部场景。

Dual Inference Engine双推理引擎

oBeaver supports two inference backends, automatically selecting the right one for your platform:oBeaver 支持两种推理后端，自动为你的平台选择合适的后端：

Engine引擎	Platform平台	Description说明
Foundry Local	macOS, Windows	Powered by Microsoft Foundry Local. Automatic model download, hardware acceleration (NPU > GPU > CPU), launched via catalog alias.由 Microsoft Foundry Local 驱动。自动下载模型、硬件加速（NPU > GPU > CPU），通过 catalog alias 启动。
ORT	macOS, Windows, Linux	Powered by ONNX Runtime GenAI. Loads a local `.onnx` model directory — fully offline, zero cloud dependency.由 ONNX Runtime GenAI 驱动。加载本地 `.onnx` 模型目录——完全离线、零云端依赖。

Condition条件	Engine引擎	Model argument模型参数
macOS / Windows (default)macOS / Windows（默认）	Foundry Local	Catalog alias (e.g. `Phi-4-mini`)Catalog alias（如 `Phi-4-mini`）
`--engine ort` or Linux (default)`--engine ort` 或 Linux（默认）	ONNX Runtime GenAI	Local directory path本地目录路径
`embed` / `serve-embed`	Embedding Engine (ONNX)嵌入引擎（ONNX）	Local model directory本地模型目录

Linux note: Foundry Local is not available on Linux. The engine is fixed to ort; passing --engine foundry will exit with an error.Linux 注意：Linux 平台不支持 Foundry Local。引擎固定为 ort；传入 --engine foundry 将报错退出。

Text Embeddings文本嵌入

The embedding engine is ONNX-only — no --engine flag needed. Perfect for RAG & retrieval pipelines.嵌入引擎仅支持 ONNX——无需 --engine 参数。非常适合 RAG 和检索管线。

CLI — One-shot EmbeddingCLI — 单次嵌入

bash

obeaver embed ./models/Qwen3-Embedding-0.6B "Hello, world!"

# Interactive loop
obeaver embed ./models/embeddinggemma-300m-ONNX

Embedding Server嵌入服务

bash

obeaver serve-embed ./models/Qwen3-Embedding-0.6B           # default port 18001
obeaver serve-embed ./models/embeddinggemma-300m-ONNX -p 8002

Use with OpenAI SDK使用 OpenAI SDK

python

from openai import OpenAI

client = OpenAI(base_url="http://127.0.0.1:18001/v1", api_key="unused")

response = client.embeddings.create(
    model="Qwen3-Embedding-0.6B",
    input=["Hello, world!", "Embeddings are useful."],
)
for item in response.data:
    print(f"index={item.index}  dim={len(item.embedding)}")

Tool Calling (Agentic Workflows)工具调用（Agentic 工作流）

Both engines support the OpenAI function-calling interface — build agents that plan, call tools, and reason entirely on-device.两个引擎均支持 OpenAI 函数调用接口——在设备上构建能规划、调用工具和推理的 Agent。

How It Works工作原理

Engine引擎	Strategy策略
Foundry Local	Tools forwarded natively via the standard OpenAI `tools` parameter. Response may carry native `tool_calls` or `functools[...]` format — both normalised automatically.工具定义通过标准 OpenAI `tools` 参数原生转发。响应可能携带原生 `tool_calls` 字段或 `functools[...]` 格式——两种格式均自动归一化。
ORT	Tools serialised as JSON Schema inside the system prompt. Model replies with a `<tool_call>{...}</tool_call>` block; `parse_tool_call()` extracts and validates it.工具以 JSON Schema 形式注入系统提示词。模型以 `<tool_call>{...}</tool_call>` 块回复，`parse_tool_call()` 负责提取和验证。

Supported Output Formats (auto-detected)支持的输出格式（自动检测）

Format格式	Example示例
`<tool_call>` block块	`<tool_call>{"name": "get_weather", "arguments": {"city": "Paris"}}</tool_call>`
Phi-3 nativePhi-3 原生格式	`<\|function_calls\|>{...}<\|/function_calls\|>`
Mistral-styleMistral 风格	`<functioncall>{...}</functioncall>`
Markdown code blockMarkdown 代码块	```json {...} ```
OpenAI-legacy wrapperOpenAI 旧版包装	`{"function_call": {"name": ..., "arguments": ...}}`
Bare JSONBare JSON	`{"name": "...", "arguments": {...}}`

Example: Two-turn Tool-Calling Agent示例：两轮工具调用 Agent

python

from openai import OpenAI
import json

client = OpenAI(base_url="http://127.0.0.1:18000/v1", api_key="unused")

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get current weather for a city",
        "parameters": {
            "type": "object",
            "properties": {"city": {"type": "string"}},
            "required": ["city"],
        },
    },
}]

messages = [{"role": "user", "content": "What is the weather in Tokyo?"}]

# Turn 1 — model decides to call a tool
resp = client.chat.completions.create(model="Phi-4-mini", messages=messages, tools=tools)
choice = resp.choices[0]

if choice.finish_reason == "tool_calls":
    tool_call = choice.message.tool_calls[0]
    args = json.loads(tool_call.function.arguments)

    # Execute your function
    weather_result = {"city": args["city"], "temperature": "18°C", "condition": "Sunny"}

    # Turn 2 — send result back
    messages += [
        choice.message,
        {"role": "tool", "tool_call_id": tool_call.id, "content": json.dumps(weather_result)},
    ]
    final = client.chat.completions.create(model="Phi-4-mini", messages=messages)
    print(final.choices[0].message.content)

Vision-Language Models视觉语言模型

VL models auto-switch to the ORT engine when detected. Send images (local or remote) alongside text:VL 模型检测到时会自动切换到 ORT 引擎。可同时发送图片（本地或远程）和文本：

bash

# Launch VL model (auto-detected)
obeaver serve ./models/Qwen3-VL-2B-Instruct_VL_ONNX_INT4_CPU

bash

# Send a local image
curl -s http://127.0.0.1:18000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [{
      "role": "user",
      "content": [
        {"type": "image_url", "image_url": {"url": "./cat.jpg"}},
        {"type": "text", "text": "Describe this image"}
      ]
    }]
  }'

Web DashboardWeb 仪表盘

Run obeaver dashboard to launch a real-time monitoring dashboard at http://127.0.0.1:1573/:运行 obeaver dashboard 启动实时监控仪表盘，地址为 http://127.0.0.1:1573/：

Model Selector模型选择器 — switch between cached models at runtime; NPU-accelerated models are marked with ⚡运行时切换已缓存的模型；NPU 加速模型标有 ⚡ 标识
System Info Bar系统信息栏 — model name, engine type, platform, Python version, live health模型名称、引擎类型、平台、Python 版本、实时健康状态
Memory Gauges内存仪表 — CPU, GPU, NPU, and process memory in real-timeCPU、GPU、NPU 和进程内存实时监控
Inference Parameters推理参数 — temperature, top-p, top-k, max tokens with presetstemperature、top-p、top-k、max tokens 及预设
Chat Interface聊天界面 — send messages with streaming response and performance stats (TTFT, tok/s)发送消息，支持流式响应和性能统计（TTFT、tok/s）
Conversation History会话历史 — saved conversations and system prompt configuration保存的对话和系统提示词配置
Server Logs服务器日志 — live request log with method, path, status, and timing实时请求日志，包含方法、路径、状态和耗时
Export导出 — export conversations as JSON or Markdown将对话导出为 JSON 或 Markdown