mirror of https://github.com/meilisearch/meilisearch.git synced 2024-11-23 10:37:41 +08:00

History

bors[bot] 38d23546a5 Merge #431 431: Fix and improve word prefix pair proximity r=ManyTheFish a=Kerollmops This PR first fixes the algorithm we used to select and compute the word prefix pair proximity database. The previous version was skipping nearly all of the prefixes. The issue is that this fix made this method to take more time and we were trying to reduce the time spent in it. With `@ManyTheFish` we found out that we could skip some of the work we were doing by: - discarding the prefixes that were shorter than a specific threshold (default: 2). - discarding the word prefix pairs with proximity bigger than a specific threshold (default: 4). - remove the unused threshold that was specifying a minimum amount of word docids to merge. We will take more time to do some more optimization, like stop clearing and recomputing from scratch the database, we will compute the subsets of keys to create, keep and merge. This change is a little bit more complex than what this PR does. I keep this PR as a draft as I want to further test the real gain if it is enough or not if it is valid or not. I advise reviewers to review commit by commit to see the changes bit by bit, reviewing the whole PR can be hard. Co-authored-by: Clément Renault <clement@meilisearch.com>		2022-01-27 07:04:56 +00:00
..
fuzz	fix(fuzzer): fix the fuzzer after #430	2022-01-25 12:08:47 +01:00
src	Apply suggestions from code review	2022-01-26 11:28:11 +01:00
tests	document batch support	2022-01-19 12:40:20 +01:00
Cargo.toml	bump milli	2022-01-17 16:41:34 +01:00
README.md	update the readme + dependencies	2022-01-12 18:30:11 +01:00

README.md

Milli

Fuzzing milli

Currently you can only fuzz the indexation. To execute the fuzzer run:

cargo +nightly fuzz run indexing

To execute the fuzzer on multiple thread you can also run:

cargo +nightly fuzz run -j4 indexing

Since the fuzzer is going to create a lot of temporary file to let milli index its documents I would also recommand to execute it on a ramdisk. Here is how to setup a ramdisk on linux:

sudo mount -t tmpfs none path/to/your/ramdisk

And then set the TMPDIR environment variable to make the fuzzer create its file in it:

export TMPDIR=path/to/your/ramdisk