CedarTrie

CedarTrie is ported from the C++ cedar library, an efficiently updatable double-array trie.

APIs

class pyis.python.ops.CedarTrie(self: ops.CedarTrie) None

CedarTrie implements a dual-array trie tree with speedy update support.

build(self: ops.CedarTrie, data: List[Tuple[str, int]]) int

Insert/Update all key-value pair in a given list.

Parameters

data (List[Tuple[str, int]]) – key-value pairs to be inserted

Returns

Number of key-value pairs inserted.

build_from_file(self: ops.CedarTrie, path: str) int

Insert/Update all key-value pair in a given file.

The data file should contains two columns, separated by seperators. The first and second columns are key and value correspondingly. The row missing the second column will have a default value 0.

Throws RuntimeError exception if failed to read the given file.

For example

Alpha 1
Beta 2
Delta 3
AlphaBeta

(AlphaBeta row has the default value 0)

Parameters

path (str) – The path of given file.

Returns

Number of key-value pairs inserted.

contains(self: ops.CedarTrie, key: str) bool

Check if the key exists in the trie.

Returns

True if the key exists, otherwise False.

erase(self: ops.CedarTrie, key: str) int

Remove a key-value pair from trie. Returns -1 if the key not exists. Returns 0 if the pair is successfully removed.

Parameters

key (str) – The key to be removed.

Returns

int, 0 for removed, -1 for key not exists.

insert(self: ops.CedarTrie, key: str, value: int = 0) int

Insert a key-value pair into trie. Returns -1 if the key is already exists (In this case, the value will be updated), or 0 for inserted key-value pair.

Value should be in range [INT32_MIN+2, INT32_MAX]. INT32_MIN and INT32_MIN+1 are reserved values and you should never use them.

Parameters
  • key (str) – The key string

  • val (int) – The associated value.

Returns

int, 0 for inserted, -1 for updated, 1 if value is reserved.

items(self: ops.CedarTrie) List[Tuple[str, int]]

Dump all key-value pair in trie into a list of tuple. Each tuple stores a pair of key and value.

Returns

A list stores all key-value pair in the trie.

load(self: ops.CedarTrie, path: str) None

Load a binary trie file from given path. Throws RuntimeError exception if failed to load from file.

Parameters

path (str) – The trie file path.

longest_prefix(self: ops.CedarTrie, query: str) Tuple[str, int]

Find a key-value pair which the key is the longest prefix of a specific string. A RuntimeError exception will be thrown if no key in trie is the prefix of query.

For example, a trie with folloing key-value pairs.

Alpha 1
Beta 2
Delta 3
AlphaBeta 4

Will return (‘AlphaBeta’, 4) for longest_prefix(‘AlphaBetaDelta’)

Parameters

query (str) – The specific string

Returns

A tuple of key-value pair.

lookup(self: ops.CedarTrie, key: str) int

Look up a specific key in trie. Returns the corresponding value if key-value pair exists.

A RuntimeError exception will be thrown if the key is not in the trie tree.

Parameters

key (str) – The key to look up.

Returns

The corresponding value of the queried key.

numkeys(self: ops.CedarTrie) int

Get the key-value pair count in the trie.

Returns

int, indicates the number of key-value pair in the trie.

predict(self: ops.CedarTrie, prefix: str) List[Tuple[str, int]]

Find all key-value pair which the key starts with a specific prefix.

Parameters

prefix (str) – The specific prefix.

Returns

A list stores key-value pair(s).

prefix(self: ops.CedarTrie, query: str) List[Tuple[str, int]]

Find all key-value pair which the key is prefix of a specific string

Parameters

query (str) – The specific string.

Returns

A list stores key-value pair(s).

reset(self: ops.CedarTrie) None

Reset the trie to the empty state.

save(self: ops.CedarTrie, path: str) None

Store the trie into a binary trie file. Throws RuntimeError exception if failed to save to file.

Parameters

path (str) – The trie file path;

Example

# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT license.

import os
from pyis.python import ops

trie = ops.CedarTrie()
trie.insert("Alpha", 1)
trie.insert("Beta", 2)
trie.insert("Delta", 3)
trie.insert("AlphaBeta", 4)

trie.predict("Alpha") # [('Alpha', 1), ('AlphaBeta', 4)]
trie.predict("Gamma") # []
trie.prefix("AlphaBetaGamma") # [('Alpha', 1), ('AlphaBeta', 4)]
trie.numkeys() # 4
trie.lookup("Alpha") # 1

# RuntimeError raised if key not found
try:
    trie.lookup("Gamma")
except RuntimeError as e:
    print("not found")

trie.insert("Alpha", 5)
trie.lookup("Alpha") # 5

trie.erase("Beta")

# Runtime Error raised
try:
    trie.lookup("Beta")
except RuntimeError as e:
    print("not found")

os.makedirs('tmp', exist_ok=True)
trie.save("tmp/trie.bin") # Store to a binary file

trie2 = ops.CedarTrie()
# Load from a binary file
trie2.load("tmp/trie.bin")

# [('Alpha, 5'), ('Delta, 3'), ('AlphaBeta', 4)]
trie2.items() # Dump all key-value pairs from the trie