ImmutableTrie¶
Immutable Trie has a much smaller memory footprint, compared to cedar trie. See the benchmark section for details.
APIs¶
- class pyis.python.ops.ImmutableTrie(self: ops.ImmutableTrie) None ¶
An Implementation of compile-stype immutable trie.
Constructs an empty immutable trie.
- static compile(data: List[Tuple[str, int]], path: str) None ¶
Compile a list of key-value pairs into a immutable trie file. This process could costs lots of memory. Values in the list should not exceeded the presenting capacity of unsigned 32bit integer. Throws RuntimeError if compile process could not finish (e.g. file cannot write).
- Parameters
data (List[Tuple[str, int]]) – The key-value pairs to be compiled.
path (str) – The path the compiled file to be stored.
- contains(self: ops.ImmutableTrie, key: str) bool ¶
Look for a specific key.
- Parameters
key (str) – The specific key to look up.
- Returns
True if the trie contains the key, False otherwise.
- Return type
result (bool)
- items(self: ops.ImmutableTrie) List[Tuple[str, int]] ¶
Dump all key-value pairs from the trie.
- Returns
All key-value pairs in the trie.
- Return type
data (List[Tuple[str, int]])
- load(self: ops.ImmutableTrie, path: str) None ¶
Load the trie from a pre-compiled binary file.
- Parameters
path (str) – Binary file path.
- load_items(self: ops.ImmutableTrie, data: List[Tuple[str, int]]) None ¶
Re-construct the trie with given data.
- Parameters
data (List[Tuple[str, int]]) – Key-value pairs to construct the new trie.
- lookup(self: ops.ImmutableTrie, key: str) int ¶
Look up a specific key from the trie. Return the corrsponding value if found. Throws RuntimeError if trie is not initialized or the key not found.
- Parameters
key (str) – The specific key to look up.
- Returns
The corrsponding value.
- Return type
value (int)
Example¶
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT license.
import os
from pyis.python import ops
data = [('Alpha', 1), ('Beta', 2), ('Delta', 3), ('AlphaBeta', 4)]
os.makedirs('tmp', exist_ok=True)
ops.ImmutableTrie.compile(data, 'tmp/trie.bin')
trie = ops.ImmutableTrie()
trie.load('tmp/trie.bin')
set(data) == set(trie.items())
1 == trie.lookup('Alpha')
2 == trie.lookup('Beta')
4 == trie.lookup('AlphaBeta')
trie.lookup('NonExists') # raise RuntimeError
Benchmark¶
Test Item |
Immutable Trie |
Cedar Trie |
---|---|---|
Construction Time |
49.319 sec |
15.245 sec |
Memory used during construction |
5.1 GB |
199 MB |
Runtime memory footprint |
85 MB |
170 MB |
Query all keys |
0.192 sec |
0.979 sec |
Dump |
1.05 sec |
1.428 sec |
The test dataset contains 5,000,000 randomly generated strings, length from 5 to 20 characters including upper/lower characters and digits. The following is the script used to generate the dataset.
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT license.
#! /bin/usr/env python3
# -*- encoding: ascii -*-
import random
def randstr():
lst = [chr(i + ord('a')) for i in range(26)] + \
[chr(i + ord('A')) for i in range(26)] + [chr(i + ord('0')) for i in range(10)]
result = ""
length = random.randint(5,20)
for i in range(length):
result += random.choice(lst)
return result
with open('dict.txt', 'w') as f:
for i in range(5000000):
f.write(randstr() + '\n')