Commit Graph

11 Commits

Author SHA1 Message Date
Clément Renault
a799470997
fix: Change the tokenizer to mesure cjk chars positions 2019-02-22 23:06:42 +01:00
Clément Renault
10414791a2
fix: Remove debug println from the tokenizer 2019-02-22 22:34:37 +01:00
Clément Renault
0e267cae4b
feat: Make the Tokenizer support Kanjis 2019-02-22 19:37:19 +01:00
Clément Renault
5070b27728
feat: Make the tokenizer support parentheses
Interpreting them as hard ponctuation (like a dot).
2019-02-22 18:18:17 +01:00
Clément Renault
b53ef08d05
feat: Make WordArea be based on char index and length 2019-01-09 20:14:08 +01:00
Clément Renault
b32c96cdc9
feat: Introduce a WordArea struct
Useful to highlight matching areas in the original text.
2018-12-24 15:58:46 +01:00
Clément Renault
731ed11153
feat: Index and store/serialize attributes while creating the update 2018-12-07 11:32:27 +01:00
Clément Renault
b2cec98805
feat: Implemented a basic deserialiazation 2018-12-06 17:22:54 +01:00
Clément Renault
b3249d515d
feat: Introduce an Index system based on RocksDB 2018-12-02 12:00:29 +01:00
Clément Renault
98899d3ea0 fix: Change the tokenizer to accept quotes 2018-10-17 17:00:49 +02:00
Clément Renault
7a668dde98 chore: Make the repo use examples and keep the library 2018-10-09 18:23:35 +02:00