promptflow.rag module#

promptflow.rag.build_index(*, name: str, vector_store: str = 'azure_ai_search', input_source: Union[AzureAISearchSource, LocalSource], index_config: Optional[AzureAISearchConfig] = None, embeddings_model_config: EmbeddingsModelConfig, data_source_url: Optional[str] = None, tokens_per_chunk: int = 1024, token_overlap_across_chunks: int = 0, input_glob: str = '**/*', max_sample_files: Optional[int] = None, chunk_prepend_summary: Optional[bool] = None, document_path_replacement_regex: Optional[Dict[str, str]] = None, embeddings_cache_path: Optional[str] = None) str#

Generates embeddings locally and stores Index reference in memory

Parameters:
  • name (str) – The name of the output index.

  • vector_store (str) – The vector store to be indexed.

  • input_source (Union[AzureAISearchSource, LocalSource]) – The configuration for input data source.

  • index_config (AzureAISearchConfig) – The configuration for Azure Cognitive Search output.

  • embeddings_model_config (EmbeddingsModelConfig) – The configuration for embedding model.

  • data_source_url (Optional[str]) – The URL of the data source.

  • tokens_per_chunk (int) – The size of each chunk.

  • token_overlap_across_chunks (int) – The overlap between chunks.

  • input_glob (str) – The input glob pattern.

  • max_sample_files (Optional[int]) – The maximum number of sample files.

  • chunk_prepend_summary (Optional[bool]) – Whether to prepend summary to each chunk.

  • document_path_replacement_regex (Optional[Dict[str, str]]) – The regex for document path replacement.

  • embeddings_cache_path (Optional[str]) – The path to embeddings cache.

Returns:

local path to the index created.

Return type:

str

promptflow.rag.get_langchain_retriever_from_index(path: str)#

Subpackages#