marisa-rs/README.md

192 lines
4.8 KiB
Markdown

# marisa-rs
Safe Rust wrapper for the [marisa-trie](https://github.com/s-yata/marisa-trie) C++ library.
marisa-trie is a static and space-efficient trie data structure library. This crate provides safe Rust bindings to the C++ library.
## Installation
Add this to your `Cargo.toml`:
```toml
[dependencies]
marisa-rs = "0.1"
```
## Quick Start
```rust
use marisa_rs::{Keyset, Trie};
fn main() {
// Create a keyset and add words
let mut keyset = Keyset::new();
keyset.push("apple");
keyset.push("application");
keyset.push("apply");
// Build the trie
let mut trie = Trie::new();
trie.build(&mut keyset).unwrap();
// Lookup a word
if let Some(id) = trie.lookup("apple") {
println!("Found 'apple' with ID: {}", id);
}
// Search for words starting with "app"
trie.predictive_search("app", |word, id| {
println!("Found: {} (ID: {})", word, id);
});
}
```
## Basic Usage
### Creating and Building a Trie
```rust
use marisa_rs::{Keyset, Trie};
// Create a keyset
let mut keyset = Keyset::new();
// Add words to the keyset
keyset.push("cat");
keyset.push("car");
keyset.push("card");
keyset.push("care");
// Build the trie
let mut trie = Trie::new();
trie.build(&mut keyset)?;
```
### Saving and Loading Tries
```rust
use marisa_rs::{Keyset, Trie};
// Build a trie
let mut keyset = Keyset::new();
keyset.push("hello");
keyset.push("world");
let mut trie = Trie::new();
trie.build(&mut keyset)?;
// Save the trie to a file
trie.save("my_trie.marisa")?;
// Load the trie from a file
let mut loaded_trie = Trie::new();
loaded_trie.load("my_trie.marisa")?;
// Or use memory mapping for better performance with large tries
let mut mmapped_trie = Trie::new();
mmapped_trie.mmap("my_trie.marisa")?;
// Check the serialized size before saving
println!("Trie size: {} bytes", trie.io_size());
```
### Lookup Operations
```rust
// Exact lookup
match trie.lookup("car") {
Some(id) => println!("Found with ID: {}", id),
None => println!("Not found"),
}
// Reverse lookup (get word by ID)
match trie.reverse_lookup(0) {
Ok(word) => println!("ID 0 corresponds to: {}", word),
Err(_) => println!("Invalid ID"),
}
```
### Search Operations
```rust
// Find all prefixes of a word
trie.common_prefix_search("cards", |word, id| {
println!("Prefix: {} (ID: {})", word, id);
// Output: "car", "card"
});
// Find all words starting with a prefix
trie.predictive_search("car", |word, id| {
println!("Word: {} (ID: {})", word, id);
// Output: "car", "card", "care"
});
```
### Working with Weights
```rust
let mut keyset = Keyset::new();
// Add words with custom weights
keyset.push_back("important", 10.0);
keyset.push_back("normal", 1.0);
keyset.push_back("less_important", 0.1);
let mut trie = Trie::new();
trie.build(&mut keyset)?;
```
## API Reference
### Keyset
- `Keyset::new()` - Create a new empty keyset
- `keyset.push(key)` - Add a key with default weight (1.0)
- `keyset.push_back(key, weight)` - Add a key with specified weight
- `keyset.size()` - Get the number of keys
- `keyset.is_empty()` - Check if the keyset is empty
### Trie
- `Trie::new()` - Create a new empty trie
- `trie.build(&mut keyset)` - Build the trie from a keyset
- `trie.lookup(key)` - Find the ID of a key (returns `Option<usize>`)
- `trie.reverse_lookup(id)` - Find the key for an ID (returns `Result<String, &str>`)
- `trie.common_prefix_search(query, callback)` - Find all keys that are prefixes of query
- `trie.predictive_search(query, callback)` - Find all keys that start with query
- `trie.size()` - Get the number of keys in the trie
- `trie.is_empty()` - Check if the trie is empty
- `trie.save(path)` - Save the trie to a file (returns `Result<(), &str>`)
- `trie.load(path)` - Load a trie from a file (returns `Result<(), &str>`)
- `trie.mmap(path)` - Memory-map a trie file for efficient read-only access (returns `Result<(), &str>`)
- `trie.io_size()` - Get the serialized size of the trie in bytes
- `trie.clear()` - Clear the trie, removing all keys (returns `Result<(), &str>`)
## Japanese Text Example
```rust
use marisa_rs::{Keyset, Trie};
let mut keyset = Keyset::new();
keyset.push("あ"); // a
keyset.push("あい"); // ai (love)
keyset.push("あいて"); // aite (partner)
let mut trie = Trie::new();
trie.build(&mut keyset).unwrap();
// Works with UTF-8 strings
if let Some(id) = trie.lookup("あい") {
println!("Found Japanese word with ID: {}", id);
}
```
## Thread Safety
All types (`Keyset`, `Trie`, `Agent`) implement `Send` and can be transferred between threads. However, they are not `Sync` and cannot be shared between threads without additional synchronization.
## License
This project is licensed under LGPL, Version 2.0
This crate is built on top of the excellent [marisa-trie](https://github.com/s-yata/marisa-trie) library by Susumu Yata.