No description
Find a file
2025-07-24 02:24:10 +09:00
src initial 2025-07-24 02:24:10 +09:00
.gitignore initial 2025-07-24 02:24:10 +09:00
build.rs initial 2025-07-24 02:24:10 +09:00
Cargo.lock initial 2025-07-24 02:24:10 +09:00
Cargo.toml initial 2025-07-24 02:24:10 +09:00
flake.lock initial 2025-07-24 02:24:10 +09:00
flake.nix initial 2025-07-24 02:24:10 +09:00
README.md initial 2025-07-24 02:24:10 +09:00
wrapper.cpp initial 2025-07-24 02:24:10 +09:00
wrapper.h initial 2025-07-24 02:24:10 +09:00

marisa-rs

Safe Rust wrapper for the marisa-trie C++ library.

marisa-trie is a static and space-efficient trie data structure library. This crate provides safe Rust bindings to the C++ library.

Installation

Add this to your Cargo.toml:

[dependencies]
marisa-rs = "0.1"

Requirements

This crate requires the marisa-trie C++ library to be installed on your system.

Ubuntu/Debian

sudo apt-get install libmarisa-dev

macOS

brew install marisa-trie

Quick Start

use marisa_rs::{Keyset, Trie};

fn main() {
    // Create a keyset and add words
    let mut keyset = Keyset::new();
    keyset.push("apple");
    keyset.push("application");
    keyset.push("apply");

    // Build the trie
    let mut trie = Trie::new();
    trie.build(&mut keyset).unwrap();

    // Lookup a word
    if let Some(id) = trie.lookup("apple") {
        println!("Found 'apple' with ID: {}", id);
    }

    // Search for words starting with "app"
    trie.predictive_search("app", |word, id| {
        println!("Found: {} (ID: {})", word, id);
    });
}

Basic Usage

Creating and Building a Trie

use marisa_rs::{Keyset, Trie};

// Create a keyset
let mut keyset = Keyset::new();

// Add words to the keyset
keyset.push("cat");
keyset.push("car");
keyset.push("card");
keyset.push("care");

// Build the trie
let mut trie = Trie::new();
trie.build(&mut keyset)?;

Lookup Operations

// Exact lookup
match trie.lookup("car") {
    Some(id) => println!("Found with ID: {}", id),
    None => println!("Not found"),
}

// Reverse lookup (get word by ID)
match trie.reverse_lookup(0) {
    Ok(word) => println!("ID 0 corresponds to: {}", word),
    Err(_) => println!("Invalid ID"),
}

Search Operations

// Find all prefixes of a word
trie.common_prefix_search("cards", |word, id| {
    println!("Prefix: {} (ID: {})", word, id);
    // Output: "car", "card"
});

// Find all words starting with a prefix
trie.predictive_search("car", |word, id| {
    println!("Word: {} (ID: {})", word, id);
    // Output: "car", "card", "care"
});

Working with Weights

let mut keyset = Keyset::new();

// Add words with custom weights
keyset.push_back("important", 10.0);
keyset.push_back("normal", 1.0);
keyset.push_back("less_important", 0.1);

let mut trie = Trie::new();
trie.build(&mut keyset)?;

API Reference

Keyset

  • Keyset::new() - Create a new empty keyset
  • keyset.push(key) - Add a key with default weight (1.0)
  • keyset.push_back(key, weight) - Add a key with specified weight
  • keyset.size() - Get the number of keys
  • keyset.is_empty() - Check if the keyset is empty

Trie

  • Trie::new() - Create a new empty trie
  • trie.build(&mut keyset) - Build the trie from a keyset
  • trie.lookup(key) - Find the ID of a key (returns Option<usize>)
  • trie.reverse_lookup(id) - Find the key for an ID (returns Result<String, &str>)
  • trie.common_prefix_search(query, callback) - Find all keys that are prefixes of query
  • trie.predictive_search(query, callback) - Find all keys that start with query
  • trie.size() - Get the number of keys in the trie
  • trie.is_empty() - Check if the trie is empty

Japanese Text Example

use marisa_rs::{Keyset, Trie};

let mut keyset = Keyset::new();
keyset.push("あ");      // a
keyset.push("あい");    // ai (love)
keyset.push("あいて");  // aite (partner)

let mut trie = Trie::new();
trie.build(&mut keyset).unwrap();

// Works with UTF-8 strings
if let Some(id) = trie.lookup("あい") {
    println!("Found Japanese word with ID: {}", id);
}

Thread Safety

All types (Keyset, Trie, Agent) implement Send and can be transferred between threads. However, they are not Sync and cannot be shared between threads without additional synchronization.

License

This project is licensed under either of

  • Apache License, Version 2.0
  • MIT license

at your option.

This crate is built on top of the excellent marisa-trie library by Susumu Yata.