4.8 KiB
4.8 KiB
marisa-rs
Safe Rust wrapper for the marisa-trie C++ library.
marisa-trie is a static and space-efficient trie data structure library. This crate provides safe Rust bindings to the C++ library.
Installation
Add this to your Cargo.toml:
[dependencies]
marisa-rs = "0.1"
Quick Start
use marisa_rs::{Keyset, Trie};
fn main() {
// Create a keyset and add words
let mut keyset = Keyset::new();
keyset.push("apple");
keyset.push("application");
keyset.push("apply");
// Build the trie
let mut trie = Trie::new();
trie.build(&mut keyset).unwrap();
// Lookup a word
if let Some(id) = trie.lookup("apple") {
println!("Found 'apple' with ID: {}", id);
}
// Search for words starting with "app"
trie.predictive_search("app", |word, id| {
println!("Found: {} (ID: {})", word, id);
});
}
Basic Usage
Creating and Building a Trie
use marisa_rs::{Keyset, Trie};
// Create a keyset
let mut keyset = Keyset::new();
// Add words to the keyset
keyset.push("cat");
keyset.push("car");
keyset.push("card");
keyset.push("care");
// Build the trie
let mut trie = Trie::new();
trie.build(&mut keyset)?;
Saving and Loading Tries
use marisa_rs::{Keyset, Trie};
// Build a trie
let mut keyset = Keyset::new();
keyset.push("hello");
keyset.push("world");
let mut trie = Trie::new();
trie.build(&mut keyset)?;
// Save the trie to a file
trie.save("my_trie.marisa")?;
// Load the trie from a file
let mut loaded_trie = Trie::new();
loaded_trie.load("my_trie.marisa")?;
// Or use memory mapping for better performance with large tries
let mut mmapped_trie = Trie::new();
mmapped_trie.mmap("my_trie.marisa")?;
// Check the serialized size before saving
println!("Trie size: {} bytes", trie.io_size());
Lookup Operations
// Exact lookup
match trie.lookup("car") {
Some(id) => println!("Found with ID: {}", id),
None => println!("Not found"),
}
// Reverse lookup (get word by ID)
match trie.reverse_lookup(0) {
Ok(word) => println!("ID 0 corresponds to: {}", word),
Err(_) => println!("Invalid ID"),
}
Search Operations
// Find all prefixes of a word
trie.common_prefix_search("cards", |word, id| {
println!("Prefix: {} (ID: {})", word, id);
// Output: "car", "card"
});
// Find all words starting with a prefix
trie.predictive_search("car", |word, id| {
println!("Word: {} (ID: {})", word, id);
// Output: "car", "card", "care"
});
Working with Weights
let mut keyset = Keyset::new();
// Add words with custom weights
keyset.push_back("important", 10.0);
keyset.push_back("normal", 1.0);
keyset.push_back("less_important", 0.1);
let mut trie = Trie::new();
trie.build(&mut keyset)?;
API Reference
Keyset
Keyset::new()- Create a new empty keysetkeyset.push(key)- Add a key with default weight (1.0)keyset.push_back(key, weight)- Add a key with specified weightkeyset.size()- Get the number of keyskeyset.is_empty()- Check if the keyset is empty
Trie
Trie::new()- Create a new empty trietrie.build(&mut keyset)- Build the trie from a keysettrie.lookup(key)- Find the ID of a key (returnsOption<usize>)trie.reverse_lookup(id)- Find the key for an ID (returnsResult<String, &str>)trie.common_prefix_search(query, callback)- Find all keys that are prefixes of querytrie.predictive_search(query, callback)- Find all keys that start with querytrie.size()- Get the number of keys in the trietrie.is_empty()- Check if the trie is emptytrie.save(path)- Save the trie to a file (returnsResult<(), &str>)trie.load(path)- Load a trie from a file (returnsResult<(), &str>)trie.mmap(path)- Memory-map a trie file for efficient read-only access (returnsResult<(), &str>)trie.io_size()- Get the serialized size of the trie in bytestrie.clear()- Clear the trie, removing all keys (returnsResult<(), &str>)
Japanese Text Example
use marisa_rs::{Keyset, Trie};
let mut keyset = Keyset::new();
keyset.push("あ"); // a
keyset.push("あい"); // ai (love)
keyset.push("あいて"); // aite (partner)
let mut trie = Trie::new();
trie.build(&mut keyset).unwrap();
// Works with UTF-8 strings
if let Some(id) = trie.lookup("あい") {
println!("Found Japanese word with ID: {}", id);
}
Thread Safety
All types (Keyset, Trie, Agent) implement Send and can be transferred between threads. However, they are not Sync and cannot be shared between threads without additional synchronization.
License
This project is licensed under LGPL, Version 2.0
This crate is built on top of the excellent marisa-trie library by Susumu Yata.