meilisearch

mirror of https://github.com/meilisearch/meilisearch.git synced 2025-02-20 09:38:00 +08:00

Author	SHA1	Message	Date
bors[bot]	c3f4835e8e	Merge #733 733: Avoid a prefix-related worst-case scenario in the proximity criterion r=loiclec a=loiclec # Pull Request ## Related issue Somewhat fixes (until merged into meilisearch) https://github.com/meilisearch/meilisearch/issues/3118 ## What does this PR do? When a query ends with a word and a prefix, such as: ``` word pr ``` Then we first determine whether `pre` could possibly be in the proximity prefix database before querying it. There are then three possibilities: 1. `pr` is not in any prefix cache because it is not the prefix of many words. We don't query the proximity prefix database. Instead, we list all the word derivations of `pre` through the FST and query the regular proximity databases. 2. `pr` is in the prefix cache but cannot be found in the proximity prefix databases. In this case, we partially disable the proximity ranking rule for the pair `word pre`. This is done as follows: 1. Only find the documents where `word` is in proximity to `pre` exactly (no derivations) 2. Otherwise, assume that their proximity in all the documents in which they coexist is >= 8 3. `pr` is in the prefix cache and can be found in the proximity prefix databases. In this case we simply query the proximity prefix databases. Note that if a prefix is longer than 2 bytes, then it cannot be in the proximity prefix databases. Also, proximities larger than 4 are not present in these databases either. Therefore, the impact on relevancy is: 1. For common prefixes of one or two letters: we no longer distinguish between proximities from 4 to 8 2. For common prefixes of more than two letters: we no longer distinguish between any proximities 3. For uncommon prefixes: nothing changes Regarding (1), it means that these two documents would be considered equally relevant according to the proximity rule for the query `heard pr` (IF `pr` is the prefix of more than 200 words in the dataset): ```json [ { "text": "I heard there is a faster proximity criterion" }, { "text": "I heard there is a faster but less relevant proximity criterion" } ] ``` Regarding (2), it means that two documents would be considered equally relevant according to the proximity rule for the query "faster pro": ```json [ { "text": "I heard there is a faster but less relevant proximity criterion" } { "text": "I heard there is a faster proximity criterion" }, ] ``` But the following document would be considered more relevant than the two documents above: ```json { "text": "I heard there is a faster swimmer who is competing in the pro section of the competition " } ``` Note, however, that this change of behaviour only occurs when using the set-based version of the proximity criterion. In cases where there are fewer than 1000 candidate documents when the proximity criterion is called, this PR does not change anything. --- ## Performance I couldn't use the existing search benchmarks to measure the impact of the PR, but I did some manual tests with the `songs` benchmark dataset. ``` 1. 10x 'a': - 640ms ⟹ 630ms = no significant difference 2. 10x 'b': - set-based: 4.47s ⟹ 7.42 = bad, ~2x regression - dynamic: 1s ⟹ 870 ms = no significant difference 3. 'Someone I l': - set-based: 250ms ⟹ 12 ms = very good, x20 speedup - dynamic: 21ms ⟹ 11 ms = good, x2 speedup 4. 'billie e': - set-based: 623ms ⟹ 2ms = very good, x300 speedup - dynamic: ~4ms ⟹ 4ms = no difference 5. 'billie ei': - set-based: 57ms ⟹ 20ms = good, ~2x speedup - dynamic: ~4ms ⟹ ~2ms. = no significant difference 6. 'i am getting o' - set-based: 300ms ⟹ 60ms = very good, 5x speedup - dynamic: 30ms ⟹ 6ms = very good, 5x speedup 7. 'prologue 1 a 1: - set-based: 3.36s ⟹ 120ms = very good, 30x speedup - dynamic: 200ms ⟹ 30ms = very good, 6x speedup 8. 'prologue 1 a 10': - set-based: 590ms ⟹ 18ms = very good, 30x speedup - dynamic: 82ms ⟹ 35ms = good, ~2x speedup ``` Performance is often significantly better, but there is also one regression in the set-based implementation with the query `b b b b b b b b b b`. Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>	2023-01-04 09:00:50 +00:00
bors[bot]	49f58b2c47	Merge #732 732: Interpret synonyms as phrases r=loiclec a=loiclec # Pull Request ## Related issue Fixes (when merged into meilisearch) https://github.com/meilisearch/meilisearch/issues/3125 ## What does this PR do? We now map multi-word synonyms to phrases instead of loose words. Such that the request: ``` btw I am going to nyc soon ``` is interpreted as (when the synonym interpretation is chosen for both `btw` and `nyc`): ``` "by the way" I am going to "New York City" soon ``` instead of: ``` by the way I am going to New York City soon ``` This prevents queries containing multi-word synonyms to exceed to word length limit and degrade the search performance. In terms of relevancy, there is a debate to have. I personally think this could be considered an improvement, since it would be strange for a user to search for: ``` good DIY project ``` and have a result such as: ``` { "text": "whether it is a good project to do, you'll have to decide for yourself" } ``` However, for synonyms such as `NYC -> New York City`, then we will stop matching documents where `New York` is separated from `City`. This is however solvable by adding an additional mapping: `NYC -> New York`. ## Performance With the old behaviour, some long search requests making heavy uses of synonyms could take minutes to be executed. This is no longer the case, these search requests now take an average amount of time to be resolved. Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>	2023-01-04 08:34:18 +00:00
bors[bot]	6a10e85707	Merge #736 736: Update charabia r=curquiza a=ManyTheFish Update Charabia to the last version. > We are now Romanizing Chinese characters into Pinyin. > Note that we keep the accent because they are in fact never typed directly by the end-user, moreover, changing an accent leads to a different Chinese character, and I don't have sufficient knowledge to forecast the impact of removing accents in this context. Co-authored-by: ManyTheFish <many@meilisearch.com>	2023-01-03 15:44:41 +00:00
bors[bot]	c505fa9d7d	Merge #758 758: Bump taiki-e/install-action from 1 to 2 r=curquiza a=dependabot[bot] Bumps [taiki-e/install-action](https://github.com/taiki-e/install-action) from 1 to 2. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/taiki-e/install-action/releases">taiki-e/install-action's releases</a>.</em></p> <blockquote> <h2>2.0.0</h2> <p>This release implements a mechanism to automatically track the latest version of the tool on our end. (<a href="https://github-redirect.dependabot.com/taiki-e/install-action/pull/27">#27</a>) Hopefully, this will avoid situations such as "new version of the tool has been released, but the maintainer has not been aware of it for a number of months". This also makes it easier to add support for new tools.</p> <p>This release also includes the following improvements:</p> <ul> <li> <p>Verify SHA256 checksums for downloaded files in all tools installed from GH releases. (<a href="https://github-redirect.dependabot.com/taiki-e/install-action/pull/27">#27</a>)</p> </li> <li> <p>Support omitting the patch/minor version in all tools installed from GH releases. (<a href="https://github-redirect.dependabot.com/taiki-e/install-action/pull/27">#27</a>)</p> <p>For example:</p> <pre lang="yaml"><code>- uses: taiki-e/install-action@v2 with: tool: cargo-hack@0.5 </code></pre> <p>You can also omit the minor version if the major version of tool is 1 or greater.</p> </li> <li> <p>Support <code>just</code>. (<a href="https://github-redirect.dependabot.com/taiki-e/install-action/pull/34">#34</a>)</p> </li> <li> <p>Support <code>dprint</code>. (<a href="https://github-redirect.dependabot.com/taiki-e/install-action/pull/34">#34</a>)</p> </li> </ul> <p>Note: This release is considered a breaking change because installing on versions not yet recognized by the action or on pre-release versions will no longer work with this release. (They were never officially supported, but they could work before.) Please submit an issue if you need these supports again.</p> <h2>1.17.3</h2> <ul> <li>Update <code>wasmtime@latest</code> to 4.0.0.</li> </ul> <h2>1.17.2</h2> <ul> <li>Update <code>mdbook@latest</code> to 0.4.25.</li> </ul> <h2>1.17.1</h2> <ul> <li>Update <code>mdbook@latest</code> to 0.4.23.</li> <li>Support <code>mdbook</code> on Linux (musl).</li> <li>Update <code>cargo-llvm-cov@latest</code> to 0.5.3.</li> </ul> <h2>1.17.0</h2> <ul> <li>Update <code>protoc@latest</code> to 3.21.12.</li> <li>Support aarch64 self-hosted runners (Linux, macOS, Windows).</li> <li>Improve support for Fedora/RHEL based containers/self-hosted runners.</li> </ul> <h2>1.16.0</h2> <ul> <li> <p>Update <code>cargo-binstall@latest</code> to 0.18.1. (<a href="https://github-redirect.dependabot.com/taiki-e/install-action/pull/32">#32</a>, thanks <a href="https://github.com/NobodyXu"><code>`@NobodyXu</code></a>)</p>` </li> <li> <p>If the host environment lacks packages required for installation, such as <code>curl</code> or <code>tar</code>, install them if possible.</p> <p>It is mainly intended to make the use of this action easy on containers or self-hosted runners, and currently supports Debian-based distributions (including Ubuntu) and Alpine.</p> </li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/taiki-e/install-action/blob/main/CHANGELOG.md">taiki-e/install-action's changelog</a>.</em></p> </details> <details> <summary>Commits</summary> <ul> <li><a href="`8ffc26aecd`"><code>8ffc26a</code></a> Release 2.0.1</li> <li><a href="`82e9eb5996`"><code>82e9eb5</code></a> Update DEVELOPMENT.md</li> <li><a href="`d3b7ad8380`"><code>d3b7ad8</code></a> Update changelog</li> <li><a href="`f1a96ee3ed`"><code>f1a96ee</code></a> Update changelog when manifest update</li> <li><a href="`46063c186c`"><code>46063c1</code></a> Update cargo-minimal-versions</li> <li><a href="`048586d7a8`"><code>048586d</code></a> Update cargo-hack</li> <li><a href="`d117b8d41a`"><code>d117b8d</code></a> Remove outdated todo</li> <li><a href="`76828c33cd`"><code>76828c3</code></a> Release 2.0.0</li> <li><a href="`eea8c318de`"><code>eea8c31</code></a> Update readme and changelog</li> <li><a href="`ab0e193cf5`"><code>ab0e193</code></a> Support dprint</li> <li>Additional commits viewable in <a href="https://github.com/taiki-e/install-action/compare/v1...v2">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=taiki-e/install-action&package-manager=github_actions&previous-version=1&new-version=2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) You can trigger a rebase of this PR by commenting ``@dependabot` rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - ``@dependabot` rebase` will rebase this PR - ``@dependabot` recreate` will recreate this PR, overwriting any edits that have been made to it - ``@dependabot` merge` will merge this PR after your CI passes on it - ``@dependabot` squash and merge` will squash and merge this PR after your CI passes on it - ``@dependabot` cancel merge` will cancel a previously requested merge and block automerging - ``@dependabot` reopen` will reopen this PR if it is closed - ``@dependabot` close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - ``@dependabot` ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - ``@dependabot` ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - ``@dependabot` ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-01-03 10:44:36 +00:00
bors[bot]	9519e60f97	Merge #709 709: Optimise the `ExactWords` sub-criterion within `Exactness` r=loiclec a=loiclec # Pull Request ## Related issue Fixes (partially) https://github.com/meilisearch/meilisearch/issues/3116 ## What does this PR do? 1. Reduces the algorithmic complexity of finding the documents containing N exact words from something that is exponential to something that is polynomial. 2. Cache intermediary results between different calls to the `exactness` criterion. ## Performance Results On the `smol_songs.csv` dataset, a request containing 10 common words now takes about 60ms instead of 5 seconds to execute. For example, this is the case with this (admittedly nonsensical) request: `Rock You Hip Hop Folk World Country Electronic Love The`. Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>	2023-01-02 12:28:30 +00:00
Loïc Lecrenier	b5df889dcb	Apply review suggestions: simplify implementation of exactness criterion	2023-01-02 13:11:47 +01:00
bors[bot]	31155dce4c	Merge #752 752: Simplify primary key inference r=irevoire a=dureuill # Pull Request ## Related issue Related to https://github.com/meilisearch/meilisearch/issues/3233 ## What does this PR do? ### User PoV - Change primary key inference to only consider a value as a candidate when it ends with "id", rather than when it simply contains "id". - Change primary key inference to always fail when there are multiple candidates. - Replace UserError::MissingPrimaryKey with `UserError::NoPrimaryKeyCandidateFound` and `UserError::MultiplePrimaryKeyCandidatesFound` ### Implementation-wise - Remove uses of UserError::MissingPrimaryKey not pertaining to inference. This introduces a possible panicking path. ## PR checklist Please check if your PR fulfills the following requirements: - [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)? - [x] Have you read the contributing guidelines? - [x] Have you made sure that the title is accurate and descriptive of the changes? Thank you so much for contributing to Meilisearch! Co-authored-by: Louis Dureuil <louis@meilisearch.com>	2023-01-02 11:44:22 +00:00
Loïc Lecrenier	8d36570958	Add explicit criterion impl strategy to proximity search tests	2023-01-02 10:37:01 +01:00
dependabot[bot]	939e7faf31	Bump taiki-e/install-action from 1 to 2 Bumps [taiki-e/install-action](https://github.com/taiki-e/install-action) from 1 to 2. - [Release notes](https://github.com/taiki-e/install-action/releases) - [Changelog](https://github.com/taiki-e/install-action/blob/main/CHANGELOG.md) - [Commits](https://github.com/taiki-e/install-action/compare/v1...v2) --- updated-dependencies: - dependency-name: taiki-e/install-action dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>	2023-01-01 10:02:00 +00:00
Loïc Lecrenier	32c6062e65	Optimise exactness criterion 1. Cache some results between calls to next() 2. Compute the combinations of exact words more efficiently	2022-12-22 12:28:45 +01:00
Loïc Lecrenier	f097aafa1c	Add unit test for prefix handling by the proximity criterion	2022-12-22 12:08:00 +01:00
Loïc Lecrenier	777b387dc4	Avoid a prefix-related worst-case scenario in the proximity criterion	2022-12-22 12:08:00 +01:00
Loïc Lecrenier	b0f3dc2c06	Interpret synonyms as phrases	2022-12-22 12:07:51 +01:00
Louis Dureuil	4b166bea2b	Add primary_key_inference test	2022-12-21 15:13:38 +01:00
Louis Dureuil	5943100754	Fix existing tests	2022-12-21 15:13:38 +01:00
Louis Dureuil	b24def3281	Add logging when inference took place. Displays log message in the form: ``` [2022-12-21T09:19:42Z INFO milli::update::index_documents::enrich] Primary key was not specified in index. Inferred to 'id' ```	2022-12-21 15:13:38 +01:00
Louis Dureuil	402dcd6b2f	Simplify primary key inference	2022-12-21 15:13:38 +01:00
Louis Dureuil	13c95d25aa	Remove uses of UserError::MissingPrimaryKey not related to inference	2022-12-21 15:13:36 +01:00
bors[bot]	a8defb585b	Merge #742 742: Add a "Criterion implementation strategy" parameter to Search r=irevoire a=loiclec Add a parameter to search requests which determines the implementation strategy of the criteria. This can be either `set-based`, `iterative`, or `dynamic` (ie choosing between set-based or iterative at search time). See https://github.com/meilisearch/milli/issues/755 for more context about this change. Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>	2022-12-21 12:18:49 +00:00
Loïc Lecrenier	339a4b0789	Make clippy happy	2022-12-21 12:49:34 +01:00
Loïc Lecrenier	904fd2f6d1	Add a search strategy option to the cli	2022-12-21 12:48:53 +01:00
Loïc Lecrenier	229405aeb9	Choose implementation strategy of criterion at runtime	2022-12-21 09:29:39 +01:00
bors[bot]	249e051cd4	Merge #750 750: Fix hard-deletion of an external id that was soft-deleted and then reimported - main r=irevoire a=loiclec # Pull Request ## Related issue Fixes (when merged into meilisearch) https://github.com/meilisearch/meilisearch/issues/3021 ## What does this PR do? There was a bug happening when: 1. Documents were added 2. Some of these documents were replaced using soft-deletion 3. A deletion of another non-replaced document takes place and triggers a hard-deletion 4. Documents with the same identifiers as the replaced documents are added again Then, search results would return duplicate documents. No crash would happen at any time (this is the reason it wasn't caught by the previous fuzz test. I have updated the new one such that it also checks the result of a placeholder search request, which then finds the bug immediately). The cause of the bug is: 1. When a hard-deletion is triggered, we try to retrieve the external document id associated with each soft-deleted document id. 2. Then, we take this list of external document ids and remove each of them from the `ExternalDocumentsIds` structure. 3. However, this is not correct in case an existing (non-deleted) document shares the external id of a soft-deleted document. ## Implementation of the fix 1. Before we process a permanent deletion, we update the list of soft-deleted document ids. 2. Then, the permanent deletion's job is to remove the soft-deleted documents from all data structures. Therefore, to update `ExternalDocumentsIds`, we can simply call the `delete_soft_deleted_documents_ids_from_fsts` method, which is faster and simpler. ## Correctness A unit test was added to reproduce the bug. The new fuzz test, when adjusted to check the correctness of a placeholder search, could also instantly reproduce the bug, but now does not find any other problem. Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>	2022-12-20 16:13:20 +00:00
Loïc Lecrenier	fc0e7382fe	Fix hard-deletion of an external id that was soft-deleted	2022-12-20 15:33:31 +01:00
bors[bot]	97fb64e40e	Merge #747 747: Soft-deletion computation no longer depends on the mapsize r=irevoire a=dureuill # Pull Request ## Related issue Related to https://github.com/meilisearch/meilisearch/issues/3231: After removing `--max-index-size`, the `mapsize` will always be unrelated to the actual max size the user wants for their DB, so it doesn't make sense to use these values any longer. This implements solution 2.3 from https://github.com/meilisearch/meilisearch/issues/3231#issuecomment-1348628824 ## What does this PR do? ### User-visible - Soft-deleted are no longer deleted when there is less than 10% of the mapsize available or when they take more than 10% of the mapsize - Instead, they are deleted when they are more soft deleted than regular documents, or when they take more than 1GiB disk space (estimated). ### Implementation standpoint 1. Adds a `DeletionStrategy` struct to replace the boolean `disable_soft_deletion` that we had up until now. This enum allows us to specify that we want "always hard", "always soft", or to use the dynamic soft-deletion strategy (default). 2. Uses the current strategy when deleting documents, with the new heuristics being used in the `DeletionStrategy::Dynamic` variant. 3. Updates the tests to use the appropriate DeletionStrategy whenever needed (one of `AlwaysHard` or `AlwaysSoft` depending on the test) Note to reviewers: this PR is optimized for a commit-by-commit review. ## PR checklist Please check if your PR fulfills the following requirements: - [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)? - [x] Have you read the contributing guidelines? - [x] Have you made sure that the title is accurate and descriptive of the changes? Thank you so much for contributing to Meilisearch! Co-authored-by: Louis Dureuil <louis@meilisearch.com> Co-authored-by: Tamo <tamo@meilisearch.com>	2022-12-19 17:46:18 +00:00
Tamo	69edbf9f6d	Update milli/src/update/delete_documents.rs	2022-12-19 18:23:50 +01:00
bors[bot]	8957251eed	Merge #751 751: Update version for the next release (v0.38.0) in Cargo.toml files r=curquiza a=meili-bot ⚠️ This PR is automatically generated. Check the new version is the expected one before merging. Co-authored-by: curquiza <curquiza@users.noreply.github.com>	2022-12-19 17:02:39 +00:00
curquiza	c72535531b	Update version for the next release (v0.38.0) in Cargo.toml files	2022-12-19 16:35:38 +00:00
Louis Dureuil	916c23e7be	Tests: rename snapshots	2022-12-19 10:07:17 +01:00
Louis Dureuil	ad9937c755	Fix tests after adding DeletionStrategy	2022-12-19 10:07:17 +01:00
Louis Dureuil	171c942282	Soft-deletion computation no longer takes into account the mapsize Implemented solution 2.3 from https://github.com/meilisearch/meilisearch/issues/3231#issuecomment-1348628824	2022-12-19 10:07:17 +01:00
Louis Dureuil	e2ae3b24aa	Hard or soft delete according to the deletion strategy	2022-12-19 10:00:13 +01:00
Louis Dureuil	fc7618d49b	Add DeletionStrategy	2022-12-19 09:49:58 +01:00
ManyTheFish	7f88c4ff2f	Fix #1714 test	2022-12-15 18:22:28 +01:00
ManyTheFish	96d4242b93	Update charabia	2022-12-15 18:22:22 +01:00
bors[bot]	5114686394	Merge #743 743: Fix finite pagination with placeholder search r=Kerollmops a=ManyTheFish this bug is reproducible on real datasets and is hard to isolate in a simple test. related to: https://github.com/meilisearch/meilisearch/issues/3200 poke `@curquiza` Co-authored-by: ManyTheFish <many@meilisearch.com>	2022-12-15 09:31:47 +00:00
ManyTheFish	3322018c06	Fix placeholder search	2022-12-14 20:09:47 +01:00
bors[bot]	0276d5212a	Merge #728 728: Add some integration tests on the sort criterion r=ManyTheFish a=loiclec This is simply an integration test ensuring that the sort criterion works properly. However, only one version of the algorithm is tested here (the iterative one). To test the version that uses the facet DB, one has to manually set the `CANDIDATES_THRESHOLD` constant to `0`. I have done that and ensured that the test still succeeds. However, in the future, we will probably want to have an option to force which algorithm is used at runtime, for testing purposes. Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>	2022-12-14 09:27:12 +00:00
bors[bot]	e2ffc3d69a	Merge #741 741: Add test reproducing the bug fixed by #737 r=Kerollmops a=ManyTheFish related to #737 Co-authored-by: ManyTheFish <many@meilisearch.com>	2022-12-13 15:02:19 +00:00
ManyTheFish	739da9fd4d	Add test	2022-12-13 15:54:43 +01:00
bors[bot]	2af93966e0	Merge #740 740: Fix two nightly errors r=Kerollmops a=irevoire Currently, we have these two errors on rust nightly. It would be nice to help rustc understand what's going on ``` error[E0658]: anonymous lifetimes in `impl Trait` are unstable --> filter-parser/src/lib.rs:173:53 \| 173 \| fn ws<'a, O>(inner: impl FnMut(Span<'a>) -> IResult<O>) -> impl FnMut(Span<'a>) -> IResult<O> { \| ^ expected named lifetime parameter \| = help: add `#![feature(anonymous_lifetime_in_impl_trait)]` to the crate attributes to enable help: consider introducing a named lifetime parameter \| 173 \| fn ws<'a, 'a, O>(inner: impl FnMut(Span<'a>) -> IResult<'a, O>) -> impl FnMut(Span<'a>) -> IResult<O> { \| +++ +++ error[E0658]: anonymous lifetimes in `impl Trait` are unstable --> filter-parser/src/error.rs:36:49 \| 36 \| mut parser: impl FnMut(Span<'a>) -> IResult<O>, \| ^ expected named lifetime parameter \| = help: add `#![feature(anonymous_lifetime_in_impl_trait)]` to the crate attributes to enable help: consider introducing a named lifetime parameter \| 35 ~ pub fn cut_with_err<'a, 'a, O>( 36 ~ mut parser: impl FnMut(Span<'a>) -> IResult<'a, O>, \| For more information about this error, try `rustc --explain E0658`. error: could not compile `filter-parser` due to 2 previous errors ``` Co-authored-by: Tamo <tamo@meilisearch.com>	2022-12-13 14:33:40 +00:00
Tamo	2c47500bc3	fix two nightly errors	2022-12-13 15:29:52 +01:00
bors[bot]	406ee31d1a	Merge #737 737: Fix typo initial candidates computation r=Kerollmops a=ManyTheFish When `Typo` criterion was after a different criterion than `Words` and the previous criterion wasn't returning any candidates at the first iteration of the bucket sort, then the `initial_candidates` were lost. Now, `Typo`ensure to keep the `initial_candidates` between iterations. related to https://github.com/meilisearch/meilisearch/issues/3200#issuecomment-1345179578 related to https://github.com/meilisearch/meilisearch/issues/3228 Co-authored-by: ManyTheFish <many@meilisearch.com>	2022-12-13 10:29:28 +00:00
ManyTheFish	2d8d0af1a6	Rename short name bc by ic for initial_candidates	2022-12-13 10:56:38 +01:00
bors[bot]	e0a8f8cb5a	Merge #734 734: Fix bug 2945/3021 (missing key in documents database) r=Kerollmops a=loiclec # Pull Request ## Related issue Fixes (partially, until merged into meilisearch) https://github.com/meilisearch/meilisearch/issues/2945 (until we integrate the new milli bump into meilisearch). Note that a dump will not be sufficient to upgrade from meilisearch v0.30.2 to meilisearch v0.30.3 due to this fix because the bug could have caused the `documents` database to be corrupted. Instead, a full manual reimport of the documents will be necessary. ## What does this PR do? There was a bug happening when: 1. A few documents are added to the index 2. Some of these documents are soft-deleted 3. New documents are added, replacing existing ones and triggering a hard-deletion The `IndexDocuments::execute` method would then perform the hard-deletion but forget to change the `external_document_ids` structure appropriately. As a result, the `external_document_ids` would contain keys corresponding to documents that do no exist anymore. To fix this bug, I split the `DeleteDocuments::execute` method into two: `execute_inner` and `execute`. - `execute_inner` returns a `DetailedDocumentDeletionResult` which says whether soft-deletion was used or not - `execute` keeps the exact same signature and behaviour Then, when deleting replaced documents inside `IndexDocuments::execute`, we call `DeleteDocuments::execute_inner` instead of `DeleteDocuments::execute`. If soft-deletion was used, nothing more is done. But if hard-deletion was used, we remove every reference to soft-deleted documents in the new `external_documents_ids` structure. ## Correctness - Every other test still passes - The reproduction test case now passes - In a different branch ([`update-fuzz-test`](https://github.com/meilisearch/milli/pull/735)), I created a fuzz-test that reproduces the past two bugs. This fuzz test cannot find this bug through any combination of some hand-selected `DocumentAddition / DocumentDeletion / DocumentClear / SettingsUpdate` operations. In that test, each relevant operations can be executed with or without soft-deletion, and document additions can be done in batches, replacing or updating existing documents. Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>	2022-12-13 09:45:57 +00:00
Loïc Lecrenier	be3b00350c	Apply review suggestions: naming and documentation	2022-12-13 10:15:22 +01:00
ManyTheFish	80d34a4169	Fix typo initial candiddates computation	2022-12-12 19:02:48 +01:00
Loïc Lecrenier	e3ee553dcc	Remove soft deleted ids from ExternalDocumentIds during document import If the document import replaces a document using hard deletion	2022-12-12 14:16:09 +01:00
Loïc Lecrenier	bebd050961	Add new test for bug 3021	2022-12-08 19:19:40 +01:00
bors[bot]	1f1beae077	Merge #729 729: Fix distincted exhaustive hits r=Kerollmops a=ManyTheFish This PR changes the name and behavior of `bucket_candidates`: - `bucket_candidates` become `initial_candidates` that is less confusing - `initial_candidates` is no more a simple `RoaringBitmap` but an enum allowing us to precise if the candidates are exhaustive or not - this enum ensures that any modification is allowed only if the candidates are not already exhaustive. The bug occurred because `initial_candidates` are modified during the bucket sort allowing the estimation to be more and more precise along the search, and this was an issue when the `initial_candidates` were already exhaustive, now, if candidates are exhaustive, then no modifications are made. Co-authored-by: ManyTheFish <many@meilisearch.com>	2022-12-08 09:26:34 +00:00

1 2 3 4 5 ...

2374 Commits