433: fix(filter): Fix two bugs. r=Kerollmops a=irevoire
- Stop lowercasing the field when looking in the field id map
- When a field id does not exist it means there is currently zero
documents containing this field thus we return an empty RoaringBitmap
instead of throwing an internal error
Will fix https://github.com/meilisearch/MeiliSearch/issues/2082 once meilisearch is released
Co-authored-by: Tamo <tamo@meilisearch.com>
- Stop lowercasing the field when looking in the field id map
- When a field id does not exist it means there is currently zero
documents containing this field thus we returns an empty RoaringBitmap
instead of throwing an internal error
The `matching_bytes` function takes a `&Token` now and:
- gets the number of bytes to highlight (unchanged).
- uses `Token.num_graphemes_from_bytes` to get the number of grapheme
clusters to highlight.
In essence, the `matching_bytes` function returns the number of matching
grapheme clusters instead of bytes. Should this function be renamed
then?
Added proper highlighting in the HTTP UI:
- requires dependency on `unicode-segmentation` to extract grapheme
clusters from tokens
- `<mark>` tag is put around only the matched part
- before this change, the entire word was highlighted even if only a
part of it matched
402: Optimize document transform r=MarinPostma a=MarinPostma
This pr optimizes the transform of documents additions in the obkv format. Instead on accepting any serializable objects, we instead treat json and CSV specifically:
- For json, we build a serde `Visitor`, that transform the json straight into obkv without intermediate representation.
- For csv, we directly write the lines in the obkv, applying other optimization as well.
Co-authored-by: marin postma <postma.marin@protonmail.com>