Skip to main content

Vector DBs Comparison for offline RAG Search Running on In-car Embedded System

Status

  • Draft
  • Proposed
  • Accepted
  • Deprecated

Context

With the increasing complexity of in-car assistance systems, the technical documentation searching becomes essential for enhancing the user experience. In resource-constrained in-car embedded systems, running a Retrieval-Augmented Generation (RAG) solution for document searching is a typical approach. Traditional cloud-based solutions are not viable due to offline constraints, thus on-device vector search is a crucial requirement in our scenario.

The solution scenario focuses on RAG searching for car manuals in an offline in-car environment on Qualcomm Android Automotive OS (AAOS) device. Vector DB is used for indexing management and searching from the manuals and return the search results to SLM for RAG response. A proof of concept (PoC) has been developed using ChromaDB for this use case. However, we need to compare and justify if ChromaDB or other vector DB is better suited in this edge environment.

A decision needs be made to select the most suitable vector database for continued development.

The architectures below illustrates the role of a vector DB is playing in the RAG on Edge application: rag-on-edge-indexing-architecture rag-on-edge-searching-architecture

Decision

Chroma DB would be a best fit for current PoC solution requirements due to its small footprint, easy deployment, acceptable speed, compatibility with the solution requirements.

It has slower indexing mechanism, but the experimental latency test on Ubuntu ARM64 is acceptable for now. It's not optimized for massive datasets but the dataset on car manuals does not require an increasing large scale of dataset. However, in serverless mode, Java is not officially supported.

FAISS (Facebook AI Similarity Search) could be another choice, but it has high RAM consumption and does not officially support Java, which is a customer requirement.

The last option could be Qdrant. Though the main concern is that its native server mode may need heavy engineering efforts for AAOS deployment.

Decision Drivers

The below decision drivers are based on the requirements of the in-car offline RAG search scenario.

  • Works offline (no cloud dependency) A suitable vector DB should support fully local indexing and querying without requiring cloud services.
  • Consumes low resources (RAM, CPU, and storage) An AAOS environment running on a Qualcomm chip has constrained CPU, RAM, and storage. A suitable vector DB should possess a low storage footprint and low RAM usage for resource-constrained devices, along with low CPU consumption for faster search.
  • Compatibility with edge environment, required frameworks, language SDK, etc.
  • Performance on an embedded system A suitable vector DB should support fast indexing and searching on an embedded system.

Considered Options

The below lightweight vector DB candidates will be compared:

  • ChromaDB
  • Qdrant
  • FAISS

Comparison on Works Offline and Easily Packaged

Comparison metrics below measures if it is easily packaged with minimum dependencies and deployable to a resource constrained embedded system device.

VDBopen sourcelightweightoffline supportServerless ModeDependenciesPackaging for AAOS
ChromaDBYes (MIT)yesfully offlineYes, can be used as an in-process DB in a same applicationMinimal (uses SQLite by default)Easier due to its serverless nature
QdrantYes(Apache 2.0)mediumfully offlineNo, requires running a separate serverRequires Rust-based server with dependenciesHarder, as it requires a running database service
FAISSYes (MIT)yesfully offlineYes, fully in-memory runs as a local libraryMinimal (library-only)Easier, since it's a library-only solution

ChromaDB and FAISS are easier to deploy on AAOS because they don't require running a separate database server.

Comparison on Software Features & Compatibility

Comparison metrics below compare the software features and measure the compatibility for the edge environment and development requirements, including development complexity and the supported indexing mechanism.

The indexing mechanism affects both indexing speed - how fast new documents are added to the database, and search speed - how quickly a query can find the closest vector matches. Hierarchical Navigable Small World (HNSW) indexing is well known for efficient approximate nearest neighbor (ANN) search, thus the comparison metrics include if HNSW indexing is supported.

Besides, vector quantization support is compared as an important feature for reducing the memory footprint and improving the search speed.

VDBARM-compatibleLangChain Supportsupported languagessupport HNSW indexingsupport feature extractionsupport quantizationproduction readyPersistence Support
ChromaDBYes (Pure Python, runs on ARM64)Official integrationPython(primary), HTTP(any languages that supports HTTP requests)NoYesNo (Not natively)Not fully production-ready. Actively developed and may evolving breaking changes.Yes
QdrantYes (Has ARM64 Docker images)Official integrationRust(primary), Python,Java, Go, HTTPyes, natively use as the primary index structureNoYesFully production-ready (not include feature extraction)Yes
FAISSYes (Meta officially supports ARM)Official integrationC++(primary), Python, community wrappers for Java(unofficial), (no built-in HTTP)yes as one of its indexing optionsNoYesProduction-ready for specific use cases (not include feature extraction), as it lacks built-in persistence and REST API.No

Qdrant is strong at scalable vector searching where high availability in k8s cluster is required. However in our case, scalable vector search is not applicable for a car manual search scenario

Qdrant is generally considered more production-ready compared to Chroma DB.

Comparison on Resource Consumption & Performance

Comparison metrics below measures the resource consumption, which can affect vector search efficiency on a resource constrained device.

VDBStorage UsageRAM usageCPU usage
ChromaDBUses SQLite by default to store dataConfigurable. Support both RAM and disk based indexing. Depends on dataset size when RAM-based indexingModerate. As it doesn't require extra server overhead. But not support HNSW which have less CPU consumption
QdrantUses RocksDB for on-disk storageConfigurable. Server-based design increases CPU&RAM usage. Support both RAM and disk based indexing.Higher. Server based mode makes it more CPU consumption than an in-process library. Default HNSW have less CPU consumption
FAISSfully in-memory unless manually save/loadHigh. Vectors stored in memory, no disk based indexingLow. Running as a library makes it avoiding extra server overhead and consume less CPU. If use HNSW for indexing, it will have less CPU consumption.

In RAM-based search, vector DB loads all vectors into RAM, which enables a fast search. In disk-based search, vector DB loads vectors from disk to RAM when needed, which is slower than RAM-based search but can handle larger datasets.

Chroma DB could be slower than FAISS for large datasets due to its indexing mechanism. However in our experiment on Ubuntu ARM64 8-core CPU edge environment, indexing with 40 HTML files, 32MB data in total, ChromaDB's searching speed is acceptable for the required use case. However further latency test with a larger dataset is needed.

chromadb-latency-test

Consequences

We justified that Chroma DB is suitable for current edge environment and solution scenario.

This document provides alternative vector DB options if the solution requirements change.

Future Considerations

  • Further latency test on a larger dataset is needed for the 3 vector DBs to assess real-world performance.

  • Once a vector DB is selected, evaluate its actual RAM consumption under different indexing methods and quantization configurations, balancing latency and recall trade-offs.

Reference

AI and automation capabilities described in this scenario should be implemented following responsible AI principles, including fairness, reliability, safety, privacy, inclusiveness, transparency, and accountability. Organizations should ensure appropriate governance, monitoring, and human oversight are in place for all AI-powered solutions.