CedarTrie¶
CedarTrie is ported from the C++ cedar library, an efficiently updatable double-array trie.
APIs¶
- class pyis.python.ops.CedarTrie(self: ops.CedarTrie) None ¶
CedarTrie implements a dual-array trie tree with speedy update support.
- build(self: ops.CedarTrie, data: List[Tuple[str, int]]) int ¶
Insert/Update all key-value pair in a given list.
- Parameters
data (List[Tuple[str, int]]) – key-value pairs to be inserted
- Returns
Number of key-value pairs inserted.
- build_from_file(self: ops.CedarTrie, path: str) int ¶
Insert/Update all key-value pair in a given file.
The data file should contains two columns, separated by seperators. The first and second columns are key and value correspondingly. The row missing the second column will have a default value 0.
Throws RuntimeError exception if failed to read the given file.
For example
Alpha 1 Beta 2 Delta 3 AlphaBeta
(AlphaBeta row has the default value 0)
- Parameters
path (str) – The path of given file.
- Returns
Number of key-value pairs inserted.
- contains(self: ops.CedarTrie, key: str) bool ¶
Check if the key exists in the trie.
- Returns
True if the key exists, otherwise False.
- erase(self: ops.CedarTrie, key: str) int ¶
Remove a key-value pair from trie. Returns -1 if the key not exists. Returns 0 if the pair is successfully removed.
- Parameters
key (str) – The key to be removed.
- Returns
int, 0 for removed, -1 for key not exists.
- insert(self: ops.CedarTrie, key: str, value: int = 0) int ¶
Insert a key-value pair into trie. Returns -1 if the key is already exists (In this case, the value will be updated), or 0 for inserted key-value pair.
Value should be in range [INT32_MIN+2, INT32_MAX]. INT32_MIN and INT32_MIN+1 are reserved values and you should never use them.
- Parameters
key (str) – The key string
val (int) – The associated value.
- Returns
int, 0 for inserted, -1 for updated, 1 if value is reserved.
- items(self: ops.CedarTrie) List[Tuple[str, int]] ¶
Dump all key-value pair in trie into a list of tuple. Each tuple stores a pair of key and value.
- Returns
A list stores all key-value pair in the trie.
- load(self: ops.CedarTrie, path: str) None ¶
Load a binary trie file from given path. Throws RuntimeError exception if failed to load from file.
- Parameters
path (str) – The trie file path.
- longest_prefix(self: ops.CedarTrie, query: str) Tuple[str, int] ¶
Find a key-value pair which the key is the longest prefix of a specific string. A RuntimeError exception will be thrown if no key in trie is the prefix of query.
For example, a trie with folloing key-value pairs.
Alpha 1 Beta 2 Delta 3 AlphaBeta 4
Will return (‘AlphaBeta’, 4) for longest_prefix(‘AlphaBetaDelta’)
- Parameters
query (str) – The specific string
- Returns
A tuple of key-value pair.
- lookup(self: ops.CedarTrie, key: str) int ¶
Look up a specific key in trie. Returns the corresponding value if key-value pair exists.
A RuntimeError exception will be thrown if the key is not in the trie tree.
- Parameters
key (str) – The key to look up.
- Returns
The corresponding value of the queried key.
- numkeys(self: ops.CedarTrie) int ¶
Get the key-value pair count in the trie.
- Returns
int, indicates the number of key-value pair in the trie.
- predict(self: ops.CedarTrie, prefix: str) List[Tuple[str, int]] ¶
Find all key-value pair which the key starts with a specific prefix.
- Parameters
prefix (str) – The specific prefix.
- Returns
A list stores key-value pair(s).
- prefix(self: ops.CedarTrie, query: str) List[Tuple[str, int]] ¶
Find all key-value pair which the key is prefix of a specific string
- Parameters
query (str) – The specific string.
- Returns
A list stores key-value pair(s).
- reset(self: ops.CedarTrie) None ¶
Reset the trie to the empty state.
- save(self: ops.CedarTrie, path: str) None ¶
Store the trie into a binary trie file. Throws RuntimeError exception if failed to save to file.
- Parameters
path (str) – The trie file path;
Example¶
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT license.
import os
from pyis.python import ops
trie = ops.CedarTrie()
trie.insert("Alpha", 1)
trie.insert("Beta", 2)
trie.insert("Delta", 3)
trie.insert("AlphaBeta", 4)
trie.predict("Alpha") # [('Alpha', 1), ('AlphaBeta', 4)]
trie.predict("Gamma") # []
trie.prefix("AlphaBetaGamma") # [('Alpha', 1), ('AlphaBeta', 4)]
trie.numkeys() # 4
trie.lookup("Alpha") # 1
# RuntimeError raised if key not found
try:
trie.lookup("Gamma")
except RuntimeError as e:
print("not found")
trie.insert("Alpha", 5)
trie.lookup("Alpha") # 5
trie.erase("Beta")
# Runtime Error raised
try:
trie.lookup("Beta")
except RuntimeError as e:
print("not found")
os.makedirs('tmp', exist_ok=True)
trie.save("tmp/trie.bin") # Store to a binary file
trie2 = ops.CedarTrie()
# Load from a binary file
trie2.load("tmp/trie.bin")
# [('Alpha, 5'), ('Delta, 3'), ('AlphaBeta', 4)]
trie2.items() # Dump all key-value pairs from the trie