I'm writing a bog-standard Unicode tokeniser to replace the crap one SQLite ships, and I'm wondering why I'm wasting my life writing C code again.
In the same vein of the old post I re-tooted, is there a Rust guru out there that can tell me if:
* Decent ICU bindings or equivalent Unicode normalisation, case folding and word-break analysis exists for Rust? (the latter being key)
* Decent SQLite FTS5 custom tokeniser bindings or equivalent exist for Rust?
@YaLTeR thanks for the pointer! I took a look and it might not handle word-segmentation for CJK/Thai/etc, which is mostly the point of using ICU.
I'll check it out though, probably a good project to get my feet wet.
The social network of the future: No ads, no corporate surveillance, ethical design, and decentralization! Own your data with Mastodon!