ImmutableTrie

Immutable Trie has a much smaller memory footprint, compared to cedar trie. See the benchmark section for details.

APIs

class pyis.python.ops.ImmutableTrie(self: ops.ImmutableTrie) None

An Implementation of compile-stype immutable trie.

Constructs an empty immutable trie.

static compile(data: List[Tuple[str, int]], path: str) None

Compile a list of key-value pairs into a immutable trie file. This process could costs lots of memory. Values in the list should not exceeded the presenting capacity of unsigned 32bit integer. Throws RuntimeError if compile process could not finish (e.g. file cannot write).

Parameters
  • data (List[Tuple[str, int]]) – The key-value pairs to be compiled.

  • path (str) – The path the compiled file to be stored.

contains(self: ops.ImmutableTrie, key: str) bool

Look for a specific key.

Parameters

key (str) – The specific key to look up.

Returns

True if the trie contains the key, False otherwise.

Return type

result (bool)

items(self: ops.ImmutableTrie) List[Tuple[str, int]]

Dump all key-value pairs from the trie.

Returns

All key-value pairs in the trie.

Return type

data (List[Tuple[str, int]])

load(self: ops.ImmutableTrie, path: str) None

Load the trie from a pre-compiled binary file.

Parameters

path (str) – Binary file path.

load_items(self: ops.ImmutableTrie, data: List[Tuple[str, int]]) None

Re-construct the trie with given data.

Parameters

data (List[Tuple[str, int]]) – Key-value pairs to construct the new trie.

lookup(self: ops.ImmutableTrie, key: str) int

Look up a specific key from the trie. Return the corrsponding value if found. Throws RuntimeError if trie is not initialized or the key not found.

Parameters

key (str) – The specific key to look up.

Returns

The corrsponding value.

Return type

value (int)

Example

# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT license.

import os
from pyis.python import ops

data = [('Alpha', 1), ('Beta', 2), ('Delta', 3), ('AlphaBeta', 4)]

os.makedirs('tmp', exist_ok=True)
ops.ImmutableTrie.compile(data, 'tmp/trie.bin')

trie = ops.ImmutableTrie()
trie.load('tmp/trie.bin')

set(data) == set(trie.items())

1 == trie.lookup('Alpha')
2 == trie.lookup('Beta')
4 == trie.lookup('AlphaBeta')
trie.lookup('NonExists') # raise RuntimeError  

Benchmark

Performance Comparsion of Immutable Trie and Cedar Trie

Test Item

Immutable Trie

Cedar Trie

Construction Time

49.319 sec

15.245 sec

Memory used during construction

5.1 GB

199 MB

Runtime memory footprint

85 MB

170 MB

Query all keys

0.192 sec

0.979 sec

Dump

1.05 sec

1.428 sec

The test dataset contains 5,000,000 randomly generated strings, length from 5 to 20 characters including upper/lower characters and digits. The following is the script used to generate the dataset.

# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT license.

#! /bin/usr/env python3
# -*- encoding: ascii -*-
import random

def randstr():
    lst = [chr(i + ord('a')) for i in range(26)] + \
          [chr(i + ord('A')) for i in range(26)] + [chr(i + ord('0')) for i in range(10)]
    result = ""
    length = random.randint(5,20)
    for i in range(length):
        result += random.choice(lst)

    return result

with open('dict.txt', 'w') as f:
    for i in range(5000000):
        f.write(randstr() + '\n')