Security Filters

This document provides some guidance about how to organize your documents in order to secure your data, e.g. making sure users can access only data meant to be accessible to them.

Kernel Memory allows to organize memories with two main approaches, which can also be used together for maximum flexibility.

Storing information in separate collections called “Indexes”.
Labeling information with custom keywords called “Tags”.

Indexes

Depending on the storage engine, using multiple indexes can be expensive, so we recommend using indexes only to scale horizontally, using your application scalability metrics, such as the number of users, the number of projects, the number of chats, and so on.

Currently, indexes are completely isolated, Kernel Memory doesn’t allow to search across indexes, so you should consider whether that’s compatible with your scenarios.

When uploading and searching, unless specified, Kernel Memory uses a default index name, a single container for all the memories.

Code examples

Here’s some example about how to use indexes and tags.

Simple file upload, without tags or explicit index name. The associated memory records can’t be filterable and are stored in the default index.

// Upload a file into memory. This file has no tags.
var docId = await memory.ImportDocumentAsync("project.docx");

// Ask a question, without tags. This will search the entire index.
var answer = await memory.AskAsync("what's the project timeline?");

Simple file upload without tags, stored in a custom index. The associated memory records can’t be filtered by tags, but are isolated in a dedicated index.

// Upload a file in a specific index.
var docId = await memory.ImportDocumentAsync("project.docx", index: "index001");

// NO ANSWER: the data is not in the default index
var answer = await memory.AskAsync("what's the project timeline?");

// OK
var answer = await memory.AskAsync("what's the project timeline?", index: "index001");

Security Filters

These examples use the user tag to secure data retrieval, making sure the current user can see only data tagged by their user ID.

Example 1

File upload with a user tag. The associated memory records can be filtered using the user tag.

Note that filters are not mandatory, so records are visible also without a filter.

var docId = await memory.ImportDocumentAsync(new Document()
                                                .AddFile("project.docx")
                                                .AddTag("user", "USER-333"));

// OK
var answer = await memory.AskAsync("what's the project timeline?");

// OK
var answer = await memory.AskAsync("what's the project timeline?",
                                    filter: MemoryFilters.ByTag("user", "USER-333"));

// NO ANSWER: memories are tagged with 'USER-333', so filter 'USER-444'
//            will not match the information extracted from project.docs
var answer = await memory.AskAsync("what's the project timeline?",
                                   filter: MemoryFilters.ByTag("user", "USER-444"));

Example 2

Very similar to previous example, using a specific index.

// Upload a document in specific user and tag with user ID.
var docId = await memory.ImportDocumentAsync(new Document()
                                                .AddFile("project.docx")
                                                .AddTag("user", "USER-333"),
                                             index: "index002");

// NO ANSWER: the data is not in the default index
var answer = await memory.AskAsync("what's the project timeline?");

// NO ANSWER: even if the filter is correct, the data is not in the default index
var answer = await memory.AskAsync("what's the project timeline?",
                                   filter: MemoryFilters.ByTag("user", "USER-333"));

// OK
var answer = await memory.AskAsync("what's the project timeline?",
                                   filter: MemoryFilters.ByTag("user", "USER-333"),
                                   index: "index002");

// IMPORTANT: this command is missing the user tag and the service will return the data.
//            This is equivalent to an admin having full access.
var answer = await memory.AskAsync("what's the project timeline?",
                                   index: "index002");

Example 3

Example showing how to apply multiple tags, even for the same tag name.

In this case the document information is tagged with two user IDs, so both users can ask for questions.

// Upload file, allow two users to access
var docId = await memory.ImportDocumentAsync(new Document()
                                                .AddFile("project.docx")
                                                .AddTag("user", "USER-333")
                                                .AddTag("user", "USER-444"));

// OK: USER-333 tag matches
var answer = await memory.AskAsync("what's the project timeline?",
                                   filter: MemoryFilters.ByTag("user", "USER-333"));

// OK: USER-444 tag matches
var answer = await memory.AskAsync("what's the project timeline?",
                                   filter: MemoryFilters.ByTag("user", "USER-444"));

Example 4

Finally , tags can be used also for categorizing data:

// Upload file, allow two users to access, and add a content type tag for extra filtering
var docId = await memory.ImportDocumentAsync(new Document()
                                                .AddFile("project.docx")
                                                .AddTag("user", "USER-333")
                                                .AddTag("user", "USER-444")
                                                .AddTag("type", "planning"));

// No information found, the type tag doesn't match
var answer = await memory.AskAsync("what's the project timeline?",
                                   filter: MemoryFilters.ByTag("user", "USER-333")
                                                        .ByTag("type", "email"));

// OK
var answer = await memory.AskAsync("what's the project timeline?",
                                   filter: MemoryFilters.ByTag("user", "USER-333")
                                                        .ByTag("type", "planning"));

Security best practices

Summarizing, we recommend these best practices to secure Kernel Memory usage:

Use Kernel Memory as a private backend component, similar to a SQL Server, without granting direct access. When using Kernel Memory as a service, consider assigning the service a reserved IP, accessible only to your services, and using HTTPS only.
Authenticate users in your backend using a secure solution like Azure Active Directory, extract the user ID from the signed credentials like JWT tokens or client certs, and tag every interaction with Kernel Memory with this User ID
Use Kernel Memory Tags as Security Filters. Make sure every API call to Kernel Memory uses a User tag, both when reading and writing to memory.

Complex filters

When filtering memories it’s possible to combine filters with AND and OR logic. For instance, consider these scenarios:

Reply using memories belonging to “Taylor OR Andrea”
Reply using memories belonging to “Taylor AND Andrea”
Reply using “News belonging to Taylor AND Blogs belonging to Andrea”

Using OR logic

Example:

Reply using memories belonging to “Taylor OR Andrea”

Code:

var answer = await memory.AskAsync(question,
                                   filters: new List<MemoryFilter>
                                   {
                                      MemoryFilters.ByTag("user", "Taylor"),
                                      // ... OR ...
                                      MemoryFilters.ByTag("user", "Andrea"),
                                   });

AND vs OR syntax

Example:

Reply using memories belonging to “Taylor AND Andrea”

Code:

var answer = await memory.AskAsync(question,
                                   filters: new List<MemoryFilter>
                                   {
                                      MemoryFilters.ByTag("user", "Taylor")
                                                   // ... AND ...
                                                   .ByTag("user", "Andrea"),
                                   });

which can also be written more concisely as a single filter (using filter instead of filters):

var answer = await memory.AskAsync(question,
                                   filter: MemoryFilters.ByTag("user", "Taylor")
                                                        // ... AND ...
                                                        .ByTag("user", "Andrea"));

Using both AND and OR

Examples:

Reply using “News belonging to Taylor AND Blogs belonging to Andrea”

In this case the “AND” is not strictly a logical AND asking to intersect two sets, but an ask to merge (union) two results. As a result the sentence can be interpreted and implemented in two different ways:

Ground the answer on memories that are both “news” and “blogs” and belong to both “Taylor” and “Andrea”:

var answer = await memory.AskAsync(question,
                                   filters: new List<MemoryFilter>
                                   {
                                      MemoryFilters.ByTag("user", "Taylor")
                                                   // ... AND ...
                                                   .ByTag("type", "News"),
                                                   // ... AND ...
                                                   .ByTag("user", "Andrea")
                                                   // ... AND ...
                                                   .ByTag("type", "Blog"),
                                   });

Ground the answer on memories that are “news owned by Taylor” or “blogs owned by Andrea”:

var answer = await memory.AskAsync(question,
                                   filters: new List<MemoryFilter>
                                   {
                                      MemoryFilters.ByTag("user", "Taylor")
                                                   // ... AND ...
                                                   .ByTag("type", "News"),
                                      // ... OR ...
                                      MemoryFilters.ByTag("user", "Andrea")
                                                   // ... AND ...
                                                   .ByTag("type", "Blog"),
                                   });

The latter is what users would expect. There are however several ways to ask a question, and ultimately the logc depends on the language (English, Spanish, Portuguese, etc.) and the user expectations.

For instance:

Reply using “News written by Taylor using only News about Space travel”

all these conditions must be met:

a memory must belong to Taylor
a memory must be of type News
a memory must be of type Space Travel

var answer = await memory.AskAsync(question,
                                   filters: new List<MemoryFilter>
                                   {
                                      MemoryFilters.ByTag("type", "Taylor")
                                                   // ... AND ...
                                                   .ByTag("type", "News"),
                                                   // ... AND ...
                                                   .ByTag("type", "Space Travel"),
                                   });

And one last example:

Reply using “News written by Taylor using only News about Science or Space travel”

which translates to these conditions:

a memory must belong to Taylor
a memory must be a News about Science, OR a News about Space travel

var answer = await memory.AskAsync(question,
                                   filters: new List<MemoryFilter>
                                   {
                                      MemoryFilters.ByTag("user", "Taylor")
                                                   // ... AND ...
                                                   .ByTag("type", "News")
                                                   // ... AND ...
                                                   .ByTag("type", "Science"),
                                      // ... OR ...
                                      MemoryFilters.ByTag("user", "Taylor")
                                                   // ... AND ...
                                                   .ByTag("type", "News")
                                                   // ... AND ...
                                                   .ByTag("type", "Space travel"),
                                   });

Security Filters

Indexes

Tags

Code examples

Security Filters

Example 1

Example 2

Example 3

Example 4

Security best practices

Complex filters

Using OR logic

AND vs OR syntax

Using both AND and OR