Core Document Search And Summarization
Core Document Search And Summarization
Version implemented: 0.241.007
Overview and Purpose
This feature adds a shared backend document-search service, a dedicated authenticated search API, and an always-loaded Semantic Kernel core plugin for search and summarization.
The goal is to give agents and backend callers a common way to:
- Run relevance-ranked hybrid search with a bounded, caller-overridable top-N.
- Retrieve ordered chunks for a specific document instead of relying only on distilled search hits.
- Generate hierarchical summaries across the full document by summarizing chunk windows and then reducing the intermediate summaries.
Dependencies:
- Azure AI Search chunk indexes for personal, group, and public workspaces
- Azure OpenAI chat model configuration for summary generation
- Semantic Kernel core plugin loading in the shared kernel loader
Technical Specifications
Architecture Overview
The feature is split into three thin layers over one shared service:
functions_search.pykeeps hybrid search and now exposes shared top-N and scope normalization helpers.functions_search_service.pyresolves document scope, retrieves ordered chunks, builds chunk windows, and performs hierarchical summarization.- Adapters expose the shared service through:
route_backend_search.pyfor authenticated backend APIssemantic_kernel_plugins/document_search_plugin.pyfor always-loaded Semantic Kernel access
This keeps the HTTP API and the Semantic Kernel plugin behavior aligned and makes future comparison workflows easier to add.
API Endpoints
The backend route module adds three authenticated endpoints:
POST /api/search/documents- Runs hybrid search.
- Supports
query,doc_scope,top_n,document_idordocument_ids, optional tags, and optional scope hints.
POST /api/search/document-chunks- Resolves an accessible document and returns ordered chunks.
- Supports optional windowing by pages or chunks.
POST /api/search/document-summary- Resolves a document, retrieves its ordered chunks, windows the content, summarizes each window, and reduces the intermediate summaries into a final result.
- Supports optional focus instructions and target-length overrides.
Semantic Kernel Plugin Functions
The core plugin is loaded for model-only and agent-backed kernel sessions and exposes three functions:
search_documentsretrieve_document_chunkssummarize_document
The plugin resolves the current signed-in user at invocation time and calls the shared service directly rather than going through HTTP.
Summary Workflow
The summarization workflow is hierarchical:
- Resolve an accessible document in personal, group, or public scope.
- Retrieve ordered chunks for the document.
- Build chunk windows by pages when page numbers are available, otherwise by chunk count.
- Summarize each window to a configurable intermediate target.
- Reduce intermediate summaries in batches until a single final summary remains or the configured reduction limit is reached.
Default behavior keeps the first-pass window summaries and final summary at two pages unless the caller overrides those values.
File Structure
application/single_app/functions_search.pyapplication/single_app/functions_documents.pyapplication/single_app/functions_search_service.pyapplication/single_app/route_backend_search.pyapplication/single_app/semantic_kernel_plugins/document_search_plugin.pyapplication/single_app/semantic_kernel_loader.pyfunctional_tests/test_document_search_api_and_plugin.py
Usage Instructions
Backend API Usage
Use the backend endpoints when server-side code or future UI features need direct access to search, chunk retrieval, or summarization without invoking an agent.
Typical request patterns:
- Search a workspace with default distilled behavior and an explicit
top_n. - Retrieve all ordered chunks for a document before downstream processing.
- Summarize a document with focus instructions such as risks, deadlines, implementation details, or policy requirements.
Agent Usage
Agents can use the core plugin to:
- Search for relevant chunks across accessible documents.
- Pull the full ordered chunk stream for one document when distilled search is not enough.
- Generate a hierarchical summary with caller-specified focus and length guidance.
This is intended to improve document-grounded summarization today and to support future document-to-document or version-to-version comparison workflows without changing the core retrieval contract.
Testing and Validation
Test Coverage
Functional coverage validates:
- Shared search helper exposure and document-id propagation
- Shared search service entry points for search, retrieval, and summarization
- Backend route registration and authentication decorators
- Core plugin kernel-function contract
- Loader and Flask app registration
Related test:
functional_tests/test_document_search_api_and_plugin.py
Known Limitations
- Summaries are generated on demand and are not persisted or cached in this implementation.
- Very large documents can require multiple summarization stages and multiple model calls.
- Comparison workflows are not included yet, but the retrieval and scope-resolution shapes are designed to support them.