Commit Graph

10 Commits

Author SHA1 Message Date
Samyak S Sarnayak
e752bd06f7
Fix matching_words tests to compile successfully
The tests still fail due to a bug in https://github.com/meilisearch/tokenizer/pull/59
2022-01-17 11:37:45 +05:30
Samyak S Sarnayak
30247d70cd
Fix search highlight for non-unicode chars
The `matching_bytes` function takes a `&Token` now and:
- gets the number of bytes to highlight (unchanged).
- uses `Token.num_graphemes_from_bytes` to get the number of grapheme
  clusters to highlight.

In essence, the `matching_bytes` function returns the number of matching
grapheme clusters instead of bytes. Should this function be renamed
then?

Added proper highlighting in the HTTP UI:
- requires dependency on `unicode-segmentation` to extract grapheme
  clusters from tokens
- `<mark>` tag is put around only the matched part
    - before this change, the entire word was highlighted even if only a
      part of it matched
2022-01-17 11:37:44 +05:30
many
9f62149b94
Fix matching lenghth in matching_words 2021-07-01 19:03:28 +02:00
Irevoire
6044b80362
Update milli/src/search/matching_words.rs
Co-authored-by: Clément Renault <renault.cle@gmail.com>
2021-06-30 00:35:26 +02:00
Tamo
be75e738b1
add more tests 2021-06-29 16:24:58 +02:00
Tamo
56fceb1928
re-implement the Damerau-Levenshtein used for the highlighting 2021-06-29 15:36:03 +02:00
Tamo
9716fb3b36
format the whole project 2021-06-16 18:33:33 +02:00
many
e923a3ed6a
Replace Consecutive by Phrase in query tree
Replace Consecutive by Phrase in query tree in order to remove theorical bugs,
due of the Consecutive enum type.
2021-06-10 11:16:16 +02:00
many
225ae6fd25
Resolve PR comments 2021-06-01 11:53:09 +02:00
many
1df68d342a
Make the MatchingWords return the number of matching bytes 2021-05-31 18:22:29 +02:00