Features功能特性

oBeaver goes beyond simple chat — it supports embeddings, tool calling, vision-language models, and model conversion to cover the full spectrum of local AI development.oBeaver 不止于简单对话——支持嵌入、工具调用、视觉语言模型和模型转换,覆盖本地 AI 开发的全部场景。

Dual Inference Engine双推理引擎

oBeaver supports two inference backends, automatically selecting the right one for your platform:oBeaver 支持两种推理后端,自动为你的平台选择合适的后端:

Engine引擎Platform平台Description说明
Foundry Local macOS, Windows Powered by Microsoft Foundry Local. Automatic model download, hardware acceleration (NPU > GPU > CPU), launched via catalog alias.由 Microsoft Foundry Local 驱动。自动下载模型、硬件加速(NPU > GPU > CPU),通过 catalog alias 启动。
ORT macOS, Windows, Linux Powered by ONNX Runtime GenAI. Loads a local .onnx model directory — fully offline, zero cloud dependency.由 ONNX Runtime GenAI 驱动。加载本地 .onnx 模型目录——完全离线、零云端依赖。
Condition条件Engine引擎Model argument模型参数
macOS / Windows (default)macOS / Windows(默认)Foundry LocalCatalog alias (e.g. Phi-4-mini)Catalog alias(如 Phi-4-mini
--engine ort or Linux (default)--engine ort 或 Linux(默认)ONNX Runtime GenAILocal directory path本地目录路径
embed / serve-embedEmbedding Engine (ONNX)嵌入引擎(ONNX)Local model directory本地模型目录

Linux note: Foundry Local is not available on Linux. The engine is fixed to ort; passing --engine foundry will exit with an error.Linux 注意:Linux 平台不支持 Foundry Local。引擎固定为 ort;传入 --engine foundry 将报错退出。

Text Embeddings文本嵌入

The embedding engine is ONNX-only — no --engine flag needed. Perfect for RAG & retrieval pipelines.嵌入引擎仅支持 ONNX——无需 --engine 参数。非常适合 RAG 和检索管线。

CLI — One-shot EmbeddingCLI — 单次嵌入

bash
obeaver embed ./models/Qwen3-Embedding-0.6B "Hello, world!"

# Interactive loop
obeaver embed ./models/embeddinggemma-300m-ONNX

Embedding Server嵌入服务

bash
obeaver serve-embed ./models/Qwen3-Embedding-0.6B           # default port 18001
obeaver serve-embed ./models/embeddinggemma-300m-ONNX -p 8002

Use with OpenAI SDK使用 OpenAI SDK

python
from openai import OpenAI

client = OpenAI(base_url="http://127.0.0.1:18001/v1", api_key="unused")

response = client.embeddings.create(
    model="Qwen3-Embedding-0.6B",
    input=["Hello, world!", "Embeddings are useful."],
)
for item in response.data:
    print(f"index={item.index}  dim={len(item.embedding)}")

Tool Calling (Agentic Workflows)工具调用(Agentic 工作流)

Both engines support the OpenAI function-calling interface — build agents that plan, call tools, and reason entirely on-device.两个引擎均支持 OpenAI 函数调用接口——在设备上构建能规划、调用工具和推理的 Agent。

How It Works工作原理

Engine引擎Strategy策略
Foundry Local Tools forwarded natively via the standard OpenAI tools parameter. Response may carry native tool_calls or functools[...] format — both normalised automatically.工具定义通过标准 OpenAI tools 参数原生转发。响应可能携带原生 tool_calls 字段或 functools[...] 格式——两种格式均自动归一化。
ORT Tools serialised as JSON Schema inside the system prompt. Model replies with a <tool_call>{...}</tool_call> block; parse_tool_call() extracts and validates it.工具以 JSON Schema 形式注入系统提示词。模型以 <tool_call>{...}</tool_call> 块回复,parse_tool_call() 负责提取和验证。

Supported Output Formats (auto-detected)支持的输出格式(自动检测)

Format格式Example示例
<tool_call> block<tool_call>{"name": "get_weather", "arguments": {"city": "Paris"}}</tool_call>
Phi-3 nativePhi-3 原生格式<|function_calls|>{...}<|/function_calls|>
Mistral-styleMistral 风格<functioncall>{...}</functioncall>
Markdown code blockMarkdown 代码块```json {...} ```
OpenAI-legacy wrapperOpenAI 旧版包装{"function_call": {"name": ..., "arguments": ...}}
Bare JSONBare JSON{"name": "...", "arguments": {...}}

Example: Two-turn Tool-Calling Agent示例:两轮工具调用 Agent

python
from openai import OpenAI
import json

client = OpenAI(base_url="http://127.0.0.1:18000/v1", api_key="unused")

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get current weather for a city",
        "parameters": {
            "type": "object",
            "properties": {"city": {"type": "string"}},
            "required": ["city"],
        },
    },
}]

messages = [{"role": "user", "content": "What is the weather in Tokyo?"}]

# Turn 1 — model decides to call a tool
resp = client.chat.completions.create(model="Phi-4-mini", messages=messages, tools=tools)
choice = resp.choices[0]

if choice.finish_reason == "tool_calls":
    tool_call = choice.message.tool_calls[0]
    args = json.loads(tool_call.function.arguments)

    # Execute your function
    weather_result = {"city": args["city"], "temperature": "18°C", "condition": "Sunny"}

    # Turn 2 — send result back
    messages += [
        choice.message,
        {"role": "tool", "tool_call_id": tool_call.id, "content": json.dumps(weather_result)},
    ]
    final = client.chat.completions.create(model="Phi-4-mini", messages=messages)
    print(final.choices[0].message.content)

Vision-Language Models视觉语言模型

VL models auto-switch to the ORT engine when detected. Send images (local or remote) alongside text:VL 模型检测到时会自动切换到 ORT 引擎。可同时发送图片(本地或远程)和文本:

bash
# Launch VL model (auto-detected)
obeaver serve ./models/Qwen3-VL-2B-Instruct_VL_ONNX_INT4_CPU
bash
# Send a local image
curl -s http://127.0.0.1:18000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [{
      "role": "user",
      "content": [
        {"type": "image_url", "image_url": {"url": "./cat.jpg"}},
        {"type": "text", "text": "Describe this image"}
      ]
    }]
  }'

Web DashboardWeb 仪表盘

Run obeaver dashboard to launch a real-time monitoring dashboard at http://127.0.0.1:1573/:运行 obeaver dashboard 启动实时监控仪表盘,地址为 http://127.0.0.1:1573/