Models模型
oBeaver supports three model sources: the Foundry Local catalog, ONNX GenAI models, and dedicated embedding models. Use obeaver models to list all local models at any time.oBeaver 支持三种模型来源:Foundry Local 目录、ONNX GenAI 模型和专用嵌入模型。随时使用 obeaver models 列出所有本地模型。
obeaver models
Foundry Local CatalogFoundry Local 目录
On macOS and Windows, Foundry Local handles model downloads automatically. Just pass a catalog alias:在 macOS 和 Windows 上,Foundry Local 会自动处理模型下载。只需传入目录别名:
obeaver run phi-4-mini
Popular Catalog Models热门目录模型
| Alias | Family | Parameters | Notes |
|---|---|---|---|
Phi-4-mini | Phi-4 | 3.8B | Latest Phi generation (recommended)最新 Phi 系列(推荐) |
💡 Tip: Run obeaver models to list all locally cached models. On macOS/Windows, run foundry model list to browse the full Foundry Local catalog for more aliases.💡 提示:运行 obeaver models 列出所有本地缓存的模型。在 macOS/Windows 上,运行 foundry model list 浏览完整的 Foundry Local 目录获取更多别名。
ONNX GenAI ModelsONNX GenAI 模型
For all platforms (including Linux), you can download ONNX models from Hugging Face or convert them with obeaver convert.在所有平台(包括 Linux)上,你可以从 Hugging Face 下载 ONNX 模型,或使用 obeaver convert 进行转换。
🔑 Hugging Face Authentication Required: Downloading models with hf download or obeaver convert requires Hugging Face CLI authentication. If you haven't logged in yet, run:🔑 需要 Hugging Face 身份验证:使用 hf download 或 obeaver convert 下载模型需要 Hugging Face CLI 身份验证。如果你还未登录,请运行:
pip install -U huggingface-hub
hf auth login
You will be prompted to enter an access token. If you don't have one, visit huggingface.co/settings/tokens to create a free account and generate a token. Some gated models (Llama, Mistral, etc.) also require accepting the model's license on the Hugging Face model page before download.系统会提示你输入访问令牌。如果你还没有令牌,请访问 huggingface.co/settings/tokens 创建免费账户并生成令牌。部分受限模型(Llama、Mistral 等)还需要在 Hugging Face 模型页面上接受模型许可协议后才能下载。
You can also use an environment variable instead of hf auth login:你也可以使用环境变量替代 hf auth login:
export HF_TOKEN='your_token_here'
Run obeaver check to verify your Hugging Face authentication status.运行 obeaver check 验证你的 Hugging Face 身份验证状态。
A valid ONNX model directory contains genai_config.json and one or more .onnx files.有效的 ONNX 模型目录包含 genai_config.json 和一个或多个 .onnx 文件。
Vision-Language Models视觉语言模型
VL models enable multimodal understanding — send images alongside text. They're auto-detected by the presence of vision.onnx and always run on the ORT engine.VL 模型支持多模态理解——同时发送图片和文本。通过 vision.onnx 的存在自动检测,始终使用 ORT 引擎运行。
Supported VL Models支持的 VL 模型
| Model模型 | HF SourceHF 源 | Quantization量化 |
|---|---|---|
| Qwen 2.5 VL 3B | Qwen/Qwen2.5-VL-3B-Instruct | INT4 CPU |
| Qwen 3 VL 2B | Qwen/Qwen3-VL-2B-Instruct | INT4 CPU |
Embedding Models嵌入模型
Embedding models are ONNX-only and power RAG & retrieval pipelines.嵌入模型仅支持 ONNX 格式,用于驱动 RAG 和检索管线。
| Model模型 | Params参数量 | HF Repo |
|---|---|---|
| Qwen3-Embedding | 0.6B | onnx-community/Qwen3-Embedding-0.6B |
| Qwen3-Embedding | 4B | onnx-community/Qwen3-Embedding-4B |
| Qwen3-Embedding | 8B | onnx-community/Qwen3-Embedding-8B |
| EmbeddingGemma | 300M | onnx-community/embeddinggemma-300m-ONNX |
hf download onnx-community/Qwen3-Embedding-0.6B \
--local-dir ./models/ort/Qwen3-Embedding-0.6B
Model Conversion模型转换
Convert Hugging Face models to optimised ONNX format directly from the CLI. This step is required when using the ORT engine with models downloaded from Hugging Face in raw format.通过 CLI 直接将 Hugging Face 模型转换为优化的 ONNX 格式。使用 ORT 引擎运行从 Hugging Face 下载的原始格式模型时,此步骤是必需的。
Why convert? The ORT engine requires models in ONNX GenAI format (containing genai_config.json + .onnx files). Raw Hugging Face model weights (PyTorch / SafeTensors) cannot be loaded directly. Use obeaver convert to transform them into the correct format.为什么需要转换? ORT 引擎要求模型为 ONNX GenAI 格式(包含 genai_config.json + .onnx 文件)。原始的 Hugging Face 模型权重(PyTorch / SafeTensors)无法直接加载。使用 obeaver convert 将其转换为正确的格式。
When Do You Need to Convert?何时需要转换?
🟢 Foundry Local Engine🟢 Foundry Local 引擎
No conversion needed. Foundry Local downloads pre-optimised models from the Microsoft catalog automatically.无需转换。 Foundry Local 会自动从 Microsoft 目录下载预优化模型。
obeaver run phi-4-mini
🔶 ORT Engine🔶 ORT 引擎
Conversion required if you download a raw model from Hugging Face.需要转换——如果从 Hugging Face 下载原始模型。
obeaver convert Qwen/Qwen3-0.6B
obeaver run --engine ort ./models/ort/Qwen3-0.6B_ONNX_INT4_CPU
Text Model Conversion文本模型转换
For standard text generation models (Qwen, Phi, Llama, etc.), use the default text type:对于标准文本生成模型(Qwen、Phi、Llama 等),使用默认的 text 类型:
# INT4 quantized (smallest, fastest on CPU — recommended)
obeaver convert Qwen/Qwen3-0.6B
# FP16 precision
obeaver convert Qwen/Qwen3-0.6B -p fp16
# Custom output + extra options
obeaver convert Qwen/Qwen3-0.6B -o ./my_model --extra-options 'shared_embeddings=true'
Complete Example: Qwen3-0.6B完整示例:Qwen3-0.6B
-
Convert the model转换模型
The model is downloaded, converted, and saved toobeaver convert Qwen/Qwen3-0.6B./models/ort/Qwen3-0.6B_ONNX_INT4_CPU.模型将被下载、转换并保存到./models/ort/Qwen3-0.6B_ONNX_INT4_CPU。 -
Verify the output验证输出
ls ./models/ort/Qwen3-0.6B_ONNX_INT4_CPU/ # → genai_config.json model.onnx tokenizer.json ... -
Run with ORT engine使用 ORT 引擎运行
obeaver run --engine ort ./models/ort/Qwen3-0.6B_ONNX_INT4_CPU
Vision-Language (VL) Model Conversion视觉语言 (VL) 模型转换
VL model conversion uses the Olive optimization pipeline. It exports three sub-models (text decoder, vision encoder, text embedding) and quantizes them to INT4.VL 模型转换使用 Olive 优化管道。它会导出三个子模型(文本解码器、视觉编码器、文本嵌入)并将其量化为 INT4。
# Qwen 2.5 VL
obeaver convert Qwen/Qwen2.5-VL-3B-Instruct --type vl
# Qwen 3 VL (requires building from source)
obeaver convert Qwen/Qwen3-VL-2B-Instruct --type vl --build-from-source
CMake and a C++ compiler are required when using --build-from-source. On Windows you also need Visual Studio 2022+ with the "Desktop development with C++" workload (including the MSVC v143+ x64/x86 build tools component). Install CMake and the compiler before converting Qwen 3 VL or other models that need a source build of onnxruntime-genai.使用 --build-from-source 时需要安装 CMake 和 C++ 编译器。 在 Windows 上还需要安装 Visual Studio 2022+,并勾选 "使用 C++ 的桌面开发" 工作负载(包含 MSVC v143+ x64/x86 生成工具 组件)。在转换 Qwen 3 VL 或其他需要从源码构建 onnxruntime-genai 的模型前,请先完成安装。
# macOS
brew install cmake
# Windows — Step 1: Install Visual Studio 2022 Build Tools
# Download from https://visualstudio.microsoft.com/downloads/
# In the installer, select:
# ✅ Desktop development with C++
# → MSVC v143+ x64/x86 build tools (checked by default)
# → Windows SDK (checked by default)
# Windows — Step 2: Install CMake
winget install Kitware.CMake
# Linux (Debian/Ubuntu)
sudo apt-get update && sudo apt-get install -y cmake build-essential
# Linux (Fedora/RHEL)
sudo dnf install -y cmake gcc-c++
VL models always use the ORT engine. When a VL model directory is detected (contains vision.onnx), the engine is automatically set to ort.VL 模型始终使用 ORT 引擎。 当检测到 VL 模型目录(包含 vision.onnx)时,引擎会自动设置为 ort。
Using Pre-converted Models使用预转换模型
Some models on Hugging Face are already in ONNX GenAI format — no conversion needed. Look for repos tagged onnxruntime-genai:Hugging Face 上的部分模型已经是 ONNX GenAI 格式——无需转换。查找带有 onnxruntime-genai 标签的仓库:
# Download a pre-converted ONNX model (no conversion needed)
hf download microsoft/Phi-3-mini-4k-instruct-onnx \
--include "cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4/*" \
--local-dir ./models/ort/phi3-mini-int4
# Run directly
obeaver run --engine ort ./models/ort/phi3-mini-int4
How to tell if conversion is needed: Check if the downloaded directory contains genai_config.json and .onnx files. If it contains *.safetensors or pytorch_model.bin instead, you need to convert it first.如何判断是否需要转换:检查下载的目录是否包含 genai_config.json 和 .onnx 文件。如果包含的是 *.safetensors 或 pytorch_model.bin,则需要先转换。
Conversion Flags Reference转换参数参考
| Flag参数 | Default默认值 | Description说明 |
|---|---|---|
MODEL_NAME | (required)(必填) | Hugging Face model name or local path (e.g. Qwen/Qwen3-0.6B)Hugging Face 模型名称或本地路径(如 Qwen/Qwen3-0.6B) |
-t / --type | text | Model type: text (text-only) or vl (vision-language)模型类型:text(纯文本)或 vl(视觉语言) |
-o / --output | Auto | Output directory. Default: <models_dir>/ort/<model>_ONNX_<PREC>_<EP>输出目录。默认:<models_dir>/ort/<model>_ONNX_<PREC>_<EP> |
-p / --precision | int4 | Quantization precision: fp32, fp16, int4量化精度:fp32、fp16、int4 |
-e / --ep | cpu | Execution provider: cpu or cuda (GPU)执行提供程序:cpu 或 cuda(GPU) |
-c / --cache-dir | <models_dir>/cache_dir | Cache directory for downloaded Hugging Face files下载的 Hugging Face 文件缓存目录 |
--extra-options | — | KEY=VALUE pairs for onnxruntime-genai model builder (text only)KEY=VALUE 键值对,传递给 onnxruntime-genai model builder(仅文本模型) |
--build-from-source | false | Build onnxruntime-genai from source before VL conversion (VL only)在 VL 转换前从源码构建 onnxruntime-genai(仅 VL 模型) |
Troubleshooting故障排除
| Problem问题 | Solution解决方案 |
|---|---|
ModuleNotFoundError: onnxruntime_genai |
Re-run pip install -e . from the oBeaver project root.在 oBeaver 项目根目录重新运行 pip install -e .。 |
401 / authentication error on hf downloadhf download 出现 401 / 身份验证错误 |
Run hf auth login or set HF_TOKEN. See the authentication guide above.运行 hf auth login 或设置 HF_TOKEN。参阅上方的身份验证指南。 |
| Qwen 3 VL conversion failsQwen 3 VL 转换失败 | Add --build-from-source: obeaver convert Qwen/Qwen3-VL-2B-Instruct --type vl --build-from-source添加 --build-from-source:obeaver convert Qwen/Qwen3-VL-2B-Instruct --type vl --build-from-source |
| Out of memory during conversion转换过程中内存不足 | Close other applications. INT4 quantization requires significant RAM.关闭其他应用程序。INT4 量化需要大量内存。 |