mirror of https://github.com/meilisearch/meilisearch.git synced 2025-02-20 17:45:54 +08:00

History

426: Fix search highlight for non-unicode chars r=ManyTheFish a=Samyak2

# Pull Request

## What does this PR do?
Fixes https://github.com/meilisearch/MeiliSearch/issues/1480
<!-- Please link the issue you're trying to fix with this PR, if none then please create an issue first. -->

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

## Changes

The `matching_bytes` function takes a `&Token` now and:
- gets the number of bytes to highlight (unchanged).
- uses `Token.num_graphemes_from_bytes` to get the number of grapheme clusters to highlight.

In essence, the `matching_bytes` function now returns the number of matching grapheme clusters instead of bytes.

Added proper highlighting in the HTTP UI:
- requires dependency on `unicode-segmentation` to extract grapheme clusters from tokens
- `<mark>` tag is put around only the matched part
    - before this change, the entire word was highlighted even if only a part of it matched

## Questions

Since `matching_bytes` does not return number of bytes but grapheme clusters, should it be renamed to something like `matching_chars` or `matching_graphemes`? Will this break the API?

Thank you very much `@ManyTheFish` for helping 😄 

Co-authored-by: Samyak S Sarnayak <samyak201@gmail.com>

2022-01-17 13:39:00 +00:00

fuzz

apply review comments

2022-01-13 18:51:08 +01:00

src

Run cargo fmt on matching_words.rs

2022-01-17 13:04:33 +05:30

tests

remove update_id in UpdateBuilder

2021-11-16 13:07:04 +01:00

Cargo.toml

Update tokenizer to v0.2.7

2022-01-17 13:02:00 +05:30

README.md

update the readme + dependencies

2022-01-12 18:30:11 +01:00

README.md

Milli

Fuzzing milli

Currently you can only fuzz the indexation. To execute the fuzzer run:

cargo +nightly fuzz run indexing

To execute the fuzzer on multiple thread you can also run:

cargo +nightly fuzz run -j4 indexing

Since the fuzzer is going to create a lot of temporary file to let milli index its documents I would also recommand to execute it on a ramdisk. Here is how to setup a ramdisk on linux:

sudo mount -t tmpfs none path/to/your/ramdisk

And then set the TMPDIR environment variable to make the fuzzer create its file in it:

export TMPDIR=path/to/your/ramdisk