Vector DBs Comparison for offline RAG Search Running on In-car Embedded System

Status

Draft
Proposed
Accepted
Deprecated

Context

With the increasing complexity of in-car assistance systems, the technical documentation searching becomes essential for enhancing the user experience. In resource-constrained in-car embedded systems, running a Retrieval-Augmented Generation (RAG) solution for document searching is a typical approach. Traditional cloud-based solutions are not viable due to offline constraints, thus on-device vector search is a crucial requirement in our scenario.

The solution scenario focuses on RAG searching for car manuals in an offline in-car environment on Qualcomm Android Automotive OS (AAOS) device. Vector DB is used for indexing management and searching from the manuals and return the search results to SLM for RAG response. A proof of concept (PoC) has been developed using ChromaDB for this use case. However, we need to compare and justify if ChromaDB or other vector DB is better suited in this edge environment.

A decision needs be made to select the most suitable vector database for continued development.

The architectures below illustrates the role of a vector DB is playing in the RAG on Edge application: rag-on-edge-indexing-architecture

Decision

Chroma DB would be a best fit for current PoC solution requirements due to its small footprint, easy deployment, acceptable speed, compatibility with the solution requirements.

It has slower indexing mechanism, but the experimental latency test on Ubuntu ARM64 is acceptable for now. It's not optimized for massive datasets but the dataset on car manuals does not require an increasing large scale of dataset. However, in serverless mode, Java is not officially supported.

FAISS (Facebook AI Similarity Search) could be another choice, but it has high RAM consumption and does not officially support Java, which is a customer requirement.

The last option could be Qdrant. Though the main concern is that its native server mode may need heavy engineering efforts for AAOS deployment.

Decision Drivers

The below decision drivers are based on the requirements of the in-car offline RAG search scenario.

Works offline (no cloud dependency) A suitable vector DB should support fully local indexing and querying without requiring cloud services.
Consumes low resources (RAM, CPU, and storage) An AAOS environment running on a Qualcomm chip has constrained CPU, RAM, and storage. A suitable vector DB should possess a low storage footprint and low RAM usage for resource-constrained devices, along with low CPU consumption for faster search.
Compatibility with edge environment, required frameworks, language SDK, etc.
Performance on an embedded system A suitable vector DB should support fast indexing and searching on an embedded system.

Considered Options

The below lightweight vector DB candidates will be compared:

ChromaDB
Qdrant
FAISS

Comparison on Works Offline and Easily Packaged

Comparison metrics below measures if it is easily packaged with minimum dependencies and deployable to a resource constrained embedded system device.

VDB	open source	lightweight	offline support	Serverless Mode	Dependencies	Packaging for AAOS
ChromaDB	Yes (MIT)	yes	fully offline	Yes, can be used as an in-process DB in a same application	Minimal (uses SQLite by default)	Easier due to its serverless nature
Qdrant	Yes(Apache 2.0)	medium	fully offline	No, requires running a separate server	Requires Rust-based server with dependencies	Harder, as it requires a running database service
FAISS	Yes (MIT)	yes	fully offline	Yes, fully in-memory runs as a local library	Minimal (library-only)	Easier, since it's a library-only solution

ChromaDB and FAISS are easier to deploy on AAOS because they don't require running a separate database server.

Comparison on Software Features & Compatibility

Comparison metrics below compare the software features and measure the compatibility for the edge environment and development requirements, including development complexity and the supported indexing mechanism.

The indexing mechanism affects both indexing speed - how fast new documents are added to the database, and search speed - how quickly a query can find the closest vector matches. Hierarchical Navigable Small World (HNSW) indexing is well known for efficient approximate nearest neighbor (ANN) search, thus the comparison metrics include if HNSW indexing is supported.

Besides, vector quantization support is compared as an important feature for reducing the memory footprint and improving the search speed.

VDB	ARM-compatible	LangChain Support	supported languages	support HNSW indexing	support feature extraction	support quantization	production ready	Persistence Support
ChromaDB	Yes (Pure Python, runs on ARM64)	Official integration	Python(primary), HTTP(any languages that supports HTTP requests)	No	Yes	No (Not natively)	Not fully production-ready. Actively developed and may evolving breaking changes.	Yes
Qdrant	Yes (Has ARM64 Docker images)	Official integration	Rust(primary), Python,Java, Go, HTTP	yes, natively use as the primary index structure	No	Yes	Fully production-ready (not include feature extraction)	Yes
FAISS	Yes (Meta officially supports ARM)	Official integration	C++(primary), Python, community wrappers for Java(unofficial), (no built-in HTTP)	yes as one of its indexing options	No	Yes	Production-ready for specific use cases (not include feature extraction), as it lacks built-in persistence and REST API.	No

Qdrant is strong at scalable vector searching where high availability in k8s cluster is required. However in our case, scalable vector search is not applicable for a car manual search scenario

Qdrant is generally considered more production-ready compared to Chroma DB.

Comparison on Resource Consumption & Performance

Comparison metrics below measures the resource consumption, which can affect vector search efficiency on a resource constrained device.

VDB	Storage Usage	RAM usage	CPU usage
ChromaDB	Uses SQLite by default to store data	Configurable. Support both RAM and disk based indexing. Depends on dataset size when RAM-based indexing	Moderate. As it doesn't require extra server overhead. But not support HNSW which have less CPU consumption
Qdrant	Uses RocksDB for on-disk storage	Configurable. Server-based design increases CPU&RAM usage. Support both RAM and disk based indexing.	Higher. Server based mode makes it more CPU consumption than an in-process library. Default HNSW have less CPU consumption
FAISS	fully in-memory unless manually save/load	High. Vectors stored in memory, no disk based indexing	Low. Running as a library makes it avoiding extra server overhead and consume less CPU. If use HNSW for indexing, it will have less CPU consumption.

In RAM-based search, vector DB loads all vectors into RAM, which enables a fast search. In disk-based search, vector DB loads vectors from disk to RAM when needed, which is slower than RAM-based search but can handle larger datasets.

Chroma DB could be slower than FAISS for large datasets due to its indexing mechanism. However in our experiment on Ubuntu ARM64 8-core CPU edge environment, indexing with 40 HTML files, 32MB data in total, ChromaDB's searching speed is acceptable for the required use case. However further latency test with a larger dataset is needed.

Consequences

We justified that Chroma DB is suitable for current edge environment and solution scenario.

This document provides alternative vector DB options if the solution requirements change.

Future Considerations

Further latency test on a larger dataset is needed for the 3 vector DBs to assess real-world performance.
Once a vector DB is selected, evaluate its actual RAM consumption under different indexing methods and quantization configurations, balancing latency and recall trade-offs.

Reference

AI and automation capabilities described in this scenario should be implemented following responsible AI principles, including fairness, reliability, safety, privacy, inclusiveness, transparency, and accountability. Organizations should ensure appropriate governance, monitoring, and human oversight are in place for all AI-powered solutions.

Status​

Context​

Decision​

Decision Drivers​

Considered Options​

Comparison on Works Offline and Easily Packaged​

Comparison on Software Features & Compatibility​

Comparison on Resource Consumption & Performance​

Consequences​

Future Considerations​

Reference​