meilisearch

mirror of https://github.com/meilisearch/meilisearch.git synced 2024-11-23 02:27:40 +08:00

Author	SHA1	Message	Date
Louis Dureuil	5d7061682e	Add tracing to milli	2024-02-08 15:03:31 +01:00
Louis Dureuil	02e6c8a440	Add tracing to index-scheduler	2024-02-08 15:03:31 +01:00
Louis Dureuil	89401d097b	Add tracing-trace	2024-02-08 15:03:30 +01:00
Louis Dureuil	698ea5139d	Update Cargo.lock	2024-02-01 10:40:23 +01:00
curquiza	c57f7f7379	Update version for the next release (v1.6.1) in Cargo.toml	2024-02-01 10:33:26 +01:00
meili-bors[bot]	b6fc181993	Merge #4304 4304: Add CUDA GPU support for Hugging Face embedders r=Kerollmops a=dureuill Adds a "cuda" feature to `milli`. Compiling with this feature requires that the CUDA support library be installed (see "with CUDA support" paragraph in https://huggingface.github.io/candle/guide/installation.html), and adds CUDA support to the `huggingFace` embedder. To enable GPU support, users will need to: 1. Have a compatible NVidia GPU under Linux 2. Follow [the guide](https://huggingface.github.io/candle/guide/installation.html) to install the CUDA dependencies 3. Compile Meilisearch with the `cuda` feature: `cargo build --release --features cuda` # Impact Enabling the CUDA feature allows to use an available GPU to compute embeddings with a `huggingFace` embedder. On an AWS Graviton 2, this yields a x3 - x5 improvement on indexing time. # Technical details - I had to change the CI so that the cuda feature is not included in the `Tests all features` workflow - To achieve that, I had to add a binary following the `cargo xtask` design pattern, to list all features excepted the cuda one. - I then changed the workflow accordingly (renamed to "Tests almost all features" 😉) - A test run of the new feature was done on a temporary version of this PR that had it enabled for PRs: [See the results here](https://github.com/meilisearch/meilisearch/actions/runs/7461331929/job/20301216732) Co-authored-by: Louis Dureuil <louis@meilisearch.com>	2024-01-22 13:55:04 +00:00
Louis Dureuil	d35fe43fd5	Update lock file	2024-01-22 10:49:17 +01:00
Louis Dureuil	4aa4a15dc9	Add to Cargo.lock	2024-01-22 10:25:54 +01:00
Louis Dureuil	84f49d76cd	Add cuda feature	2024-01-22 10:25:16 +01:00
dependabot[bot]	b5b2333a05	Bump h2 from 0.3.20 to 0.3.24 Bumps [h2](https://github.com/hyperium/h2) from 0.3.20 to 0.3.24. - [Release notes](https://github.com/hyperium/h2/releases) - [Changelog](https://github.com/hyperium/h2/blob/v0.3.24/CHANGELOG.md) - [Commits](https://github.com/hyperium/h2/compare/v0.3.20...v0.3.24) --- updated-dependencies: - dependency-name: h2 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com>	2024-01-19 16:20:22 +00:00
Clément Renault	50e1d34c66	Rollback http to 0.2.11	2024-01-16 16:57:33 +01:00
Clément Renault	01e2c3d6bb	Bump arroy to v0.2.0	2024-01-16 16:45:55 +01:00
Clément Renault	0c8d1644a6	Rollback rustls to 0.20.9	2024-01-16 15:55:16 +01:00
Clément Renault	7f125bfb12	Update incompatible dependencies	2024-01-16 15:15:54 +01:00
Clément Renault	5869ca7716	Upgrade all compatible dependencies	2024-01-16 15:05:03 +01:00
Clément Renault	d9d0419845	Update the dependencies	2024-01-16 14:38:48 +01:00
Louis Dureuil	12edc2c20a	Update arroy to a fixed version	2024-01-03 15:59:37 +01:00
meili-bors[bot]	43e822e802	Merge #4238 4238: Task queue webhook r=dureuill a=irevoire # Prototype `prototype-task-queue-webhook-1` The prototype is available through Docker by using the following command: ```bash docker run -p 7700:7700 -v $(pwd)/meili_data:/meili_data getmeili/meilisearch:prototype-task-queue-webhook-1 ``` # Pull Request Implements the task queue webhook. ## Related issue Fixes https://github.com/meilisearch/meilisearch/issues/4236 ## What does this PR do? - Provide a new cli and env var for the webhook, respectively called `--task-webhook-url` and `MEILI_TASK_WEBHOOK_URL` - Also supports sending the requests with a custom `Authorization` header by specifying the optional `--task-webhook-authorization-header` CLI parameter or `MEILI_TASK_WEBHOOK_AUTHORIZATION_HEADER` env variable. - Throw an error if the specified URL is invalid - Every time a batch is processed, send all the finished tasks into the webhook with our public `TaskView` type as a JSON Line GZIPed body. - Add one test. ## PR checklist ### Before becoming ready to review - [x] Add a test - [x] Compress the data we send - [x] Chunk and stream the data we send - [x] Remove the unwrap in the index-scheduler when sending the data fails - [x] The analytics are missing ### Before merging - [x] Release a prototype Co-authored-by: Tamo <tamo@meilisearch.com> Co-authored-by: Clément Renault <clement@meilisearch.com>	2023-12-21 14:43:46 +00:00
Tamo	be72326c0a	gzip the tasks	2023-12-19 10:35:51 +01:00
Tamo	547379abb0	parse the url correctly	2023-12-19 10:35:51 +01:00
Tamo	d78ad51082	Implement the webhook	2023-12-19 10:35:50 +01:00
Louis Dureuil	942d49314c	Remove dependency that requires libstdc++	2023-12-18 22:17:18 +01:00
curquiza	50d6317ec0	Update version for the next release (v1.6.0) in Cargo.toml	2023-12-18 13:57:46 +00:00
Louis Dureuil	61bd2fb7a9	Update arroy	2023-12-14 16:08:41 +01:00
Louis Dureuil	65e49b7092	Remove stuff, add distribution shift (WIP)	2023-12-14 16:08:38 +01:00
Louis Dureuil	cb4ebe163e	WIP	2023-12-14 16:07:49 +01:00
Louis Dureuil	dde3a04679	WIP arroy integration	2023-12-14 16:07:49 +01:00
Louis Dureuil	13c2c6c16b	Small commit to add hybrid search and autoembedding	2023-12-14 16:07:48 +01:00
Louis Dureuil	21bcf32109	Add candle and hg_hub, updating a lot of deps in the process	2023-12-14 16:07:48 +01:00
Clément Renault	56571f762a	Merge remote-tracking branch 'origin/main' into tmp-release-v1.5.1	2023-12-13 11:57:01 +01:00
curquiza	4b644f6bc0	Update version for the next release (v1.5.1) in Cargo.toml	2023-12-11 17:15:11 +00:00
Clément Renault	d32eb11329	Move to the v0.20.0-alpha.9 of heed	2023-11-27 11:52:22 +01:00
Clément Renault	0d4482625a	Make the changes to use heed v0.20-alpha.6	2023-11-23 11:43:58 +01:00
Clément Renault	56a0d91ecd	Update the heed dependency and lock file	2023-11-22 15:11:09 +01:00
Clément Renault	7cb7e37ba8	Merge branch 'main' into tmp-release-v1.5.0	2023-11-21 16:30:46 +01:00
Clément Renault	b10c060bf7	Cleanup TOML	2023-11-01 14:03:04 +01:00
Clément Renault	c71b1d33ae	Sort entries using rayon in the transform sorters	2023-11-01 11:07:16 +01:00
Clément Renault	b57b818b67	Don't use the last version of clap	2023-10-30 16:57:31 +01:00
Clément Renault	f7ea94e5f4	Modify the Dockerfile to compile meilisearch and meilitool	2023-10-30 16:32:17 +01:00
Clément Renault	13416ccbf7	Introduce a new meilitool to help the cloud team	2023-10-30 14:30:20 +01:00
Louis Dureuil	5be569e3e2	Update obkv	2023-10-30 11:40:20 +01:00
ManyTheFish	17b647dfe5	Wip	2023-10-30 11:13:08 +01:00
ManyTheFish	4c6fddb1cb	update charabia	2023-10-26 17:01:10 +02:00
curquiza	ee6f79d60b	Update version for the next release (v1.5.0) in Cargo.toml	2023-10-23 11:49:07 +00:00
curquiza	2042229927	Update version for the next release (v1.4.2) in Cargo.toml	2023-10-23 12:02:45 +02:00
dependabot[bot]	e761db582f	Bump rustix from 0.36.15 to 0.36.16 Bumps [rustix](https://github.com/bytecodealliance/rustix) from 0.36.15 to 0.36.16. - [Release notes](https://github.com/bytecodealliance/rustix/releases) - [Commits](https://github.com/bytecodealliance/rustix/compare/v0.36.15...v0.36.16) --- updated-dependencies: - dependency-name: rustix dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com>	2023-10-18 18:42:12 +00:00
Clément Renault	c5f7893fbb	Remove the puffin http dependency	2023-10-13 13:11:08 +02:00
meili-bors[bot]	0913373a5e	Merge #4122 4122: Bring back changes from `release-v1.4.1` into `main` r=Kerollmops a=curquiza Co-authored-by: curquiza <curquiza@users.noreply.github.com> Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com> Co-authored-by: Tamo <tamo@meilisearch.com> Co-authored-by: Vivek Kumar <vivek.26@outlook.com> Co-authored-by: Clément Renault <clement@meilisearch.com>	2023-10-12 15:57:47 +00:00
curquiza	8a95bf28e5	Update version for the next release (v1.4.1) in Cargo.toml	2023-10-10 09:01:45 +00:00
dependabot[bot]	c668a29ed5	Bump webpki from 0.22.1 to 0.22.2 Bumps [webpki](https://github.com/briansmith/webpki) from 0.22.1 to 0.22.2. - [Commits](https://github.com/briansmith/webpki/commits) --- updated-dependencies: - dependency-name: webpki dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com>	2023-10-02 21:53:45 +00:00
meili-bors[bot]	86b314626d	Merge #4080 4080: Bring back changes from v1.4.0 into main r=Kerollmops a=curquiza Co-authored-by: ManyTheFish <many@meilisearch.com> Co-authored-by: Clément Renault <clement@meilisearch.com> Co-authored-by: Kerollmops <clement@meilisearch.com> Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com> Co-authored-by: curquiza <curquiza@users.noreply.github.com> Co-authored-by: Tamo <tamo@meilisearch.com> Co-authored-by: curquiza <clementine@meilisearch.com> Co-authored-by: Vivek Kumar <vivek.26@outlook.com> Co-authored-by: dogukanakkaya <doguakkaya27@hotmail.com>	2023-09-26 08:13:49 +00:00
meili-bors[bot]	b4c44603db	Merge #4009 4009: Bump rustls-webpki from 0.100.1 to 0.100.2 r=Kerollmops a=dependabot[bot] Bumps [rustls-webpki](https://github.com/rustls/webpki) from 0.100.1 to 0.100.2. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/rustls/webpki/releases">rustls-webpki's releases</a>.</em></p> <blockquote> <h2>v/0.100.2</h2> <h2>Release notes</h2> <ul> <li>certificate path building and verification is now capped at 100 signature validation operations to avoid the risk of CPU usage denial-of-service attack when validating crafted certificate chains producing quadratic runtime. This risk affected both clients, as well as servers that verified client certificates.</li> </ul> <h2>What's Changed</h2> <ul> <li>v0.100.2 prep by <a href="https://github.com/cpu"><code>`@cpu</code></a>` in <a href="https://redirect.github.com/rustls/webpki/pull/154">rustls/webpki#154</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/rustls/webpki/compare/v/0.100.1...v/0.100.2">https://github.com/rustls/webpki/compare/v/0.100.1...v/0.100.2</a></p> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`c8b821450b`"><code>c8b8214</code></a> Bump MSRV to 1.60</li> <li><a href="`855752292e`"><code>8557522</code></a> Avoid testing MSRV of dev-dependencies</li> <li><a href="`73a7f0c7d7`"><code>73a7f0c</code></a> Cargo: version 0.100.1 -> 0.100.2</li> <li><a href="`4ea052366f`"><code>4ea0523</code></a> verify_cert: enforce maximum number of signatures.</li> <li>See full diff in <a href="https://github.com/rustls/webpki/compare/v/0.100.1...v/0.100.2">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=rustls-webpki&package-manager=cargo&previous-version=0.100.1&new-version=0.100.2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting ``@dependabot` rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - ``@dependabot` rebase` will rebase this PR - ``@dependabot` recreate` will recreate this PR, overwriting any edits that have been made to it - ``@dependabot` merge` will merge this PR after your CI passes on it - ``@dependabot` squash and merge` will squash and merge this PR after your CI passes on it - ``@dependabot` cancel merge` will cancel a previously requested merge and block automerging - ``@dependabot` reopen` will reopen this PR if it is closed - ``@dependabot` close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - ``@dependabot` show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - ``@dependabot` ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - ``@dependabot` ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - ``@dependabot` ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/meilisearch/meilisearch/network/alerts). </details> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-09-11 13:11:07 +00:00
meili-bors[bot]	487d493f49	Merge #4043 4043: Bring back hotfixes from v1.3.3 into v1.4.0 r=Kerollmops a=curquiza Co-authored-by: curquiza <curquiza@users.noreply.github.com> Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com> Co-authored-by: Kerollmops <clement@meilisearch.com> Co-authored-by: curquiza <clementine@meilisearch.com>	2023-09-11 12:27:34 +00:00
dependabot[bot]	9636c5f558	Bump webpki from 0.22.0 to 0.22.1 Bumps [webpki](https://github.com/briansmith/webpki) from 0.22.0 to 0.22.1. - [Commits](https://github.com/briansmith/webpki/commits) --- updated-dependencies: - dependency-name: webpki dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com>	2023-09-11 10:32:34 +00:00
curquiza	651657c03e	Fix git conflicts	2023-09-07 16:48:13 +02:00
meili-bors[bot]	9945cbf9db	Merge #4038 4038: Fix filter escaping issues r=ManyTheFish a=Kerollmops This PR fixes #4034 by always escaping the sequences. Users must always put quotes (simple or double) to escape the filter values. Co-authored-by: Kerollmops <clement@meilisearch.com>	2023-09-06 12:29:29 +00:00
Kerollmops	03d0f628bd	Use the unescaper crate to unescape any char sequence	2023-09-06 13:59:45 +02:00
curquiza	93285041a9	Update version for the next release (v1.3.3) in Cargo.toml	2023-09-06 09:23:20 +00:00
Clément Renault	af0f6f0bf0	Merge branch 'main' into update-version-v1.4.0	2023-08-28 15:08:59 +02:00
meili-bors[bot]	ccf3ba3f32	Merge #4019 4019: Bringing back changes from `v1.3.2` onto `main` r=irevoire a=Kerollmops Co-authored-by: Kerollmops <clement@meilisearch.com> Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com> Co-authored-by: irevoire <irevoire@users.noreply.github.com> Co-authored-by: Clément Renault <clement@meilisearch.com>	2023-08-28 12:14:11 +00:00
Kerollmops	65528a3e06	Update version for the next release (v1.4.0) in Cargo.toml	2023-08-28 11:52:28 +00:00
dependabot[bot]	e59d7f238c	Bump rustls-webpki from 0.100.1 to 0.100.2 Bumps [rustls-webpki](https://github.com/rustls/webpki) from 0.100.1 to 0.100.2. - [Release notes](https://github.com/rustls/webpki/releases) - [Commits](https://github.com/rustls/webpki/compare/v/0.100.1...v/0.100.2) --- updated-dependencies: - dependency-name: rustls-webpki dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com>	2023-08-22 18:10:53 +00:00
Kerollmops	717b069907	Bump charabia to 0.8.3	2023-08-22 16:25:00 +02:00
irevoire	b947f3bb9d	Update version for the next release (v1.3.2) in Cargo.toml	2023-08-16 08:20:36 +00:00
ManyTheFish	cab27c2ab4	upgrade indexmap = "2.0.0"	2023-08-10 18:09:02 +02:00
ManyTheFish	624fa9052f	upgrade deserr = "0.6.0"	2023-08-10 18:09:02 +02:00
ManyTheFish	359ede4862	upgrade fastrand = "2.0.0"	2023-08-10 18:09:02 +02:00
ManyTheFish	60c11dbdbd	upgrade rstar - "0.11.0"	2023-08-10 18:09:02 +02:00
ManyTheFish	dacee40ebc	upgrade memmap2 = "0.7.1"	2023-08-10 18:09:02 +02:00
ManyTheFish	6089083a8e	upgrade sysinfo = "0.29.7"	2023-08-10 18:09:02 +02:00
ManyTheFish	cc2c19d4c3	upgrade itertools = "0.10.5"	2023-08-10 18:09:02 +02:00
ManyTheFish	a5c56fac8a	Update dependencies	2023-08-10 18:09:02 +02:00
meili-bors[bot]	e4e49e63d0	Merge #3993 3993: Bringing back changes from v1.3.1 to `main` r=irevoire a=curquiza Co-authored-by: irevoire <irevoire@users.noreply.github.com> Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com> Co-authored-by: Tamo <tamo@meilisearch.com> Co-authored-by: ManyTheFish <many@meilisearch.com>	2023-08-10 14:30:02 +00:00
irevoire	75c87d5391	Update version for the next release (v1.3.1) in Cargo.toml	2023-08-08 10:30:06 +00:00
ManyTheFish	b45c36cd71	Merge branch 'main' into tmp-release-v1.3.0	2023-08-01 15:05:17 +02:00
Clément Renault	d8b47b689e	Use the new read-txn-no-tls heed feature	2023-07-26 15:45:15 +02:00
Kerollmops	29ab54b259	Replace the hnsw crate by the instant-distance one	2023-07-25 12:37:35 +02:00
ManyTheFish	0497f93494	Update Charabia to the last version	2023-07-19 15:19:32 +02:00
meili-bors[bot]	2dfbb6813a	Merge #3913 3913: Expose a Puffin server to profile the indexing process r=Kerollmops a=Kerollmops This PR exposes a puffin HTTP server to expose the internal timing it takes to index documents, delete documents, or update the settings of an index. <img width="1752" alt="Capture d’écran 2023-07-10 à 18 44 58" src="https://github.com/meilisearch/meilisearch/assets/3610253/a3c7a6bf-db5b-42f4-8be1-c4e31c869843"> ## To be done - [x] Move the puffin HTTP server under a feature flag. - [x] Use [the `puffin::set_scopes_on` function](https://docs.rs/puffin/latest/puffin/fn.set_scopes_on.html) to toggle it (by using the feature directly). When this function is called with `false`, [a call to `profile_scope!` talked 1-2ns](https://docs.rs/puffin/latest/puffin/fn.set_scopes_on.html). - [x] Create a _PROFILING.md_ file explaining how to use it. - [x] Explain that merging scopes on the interface is not always useful. - [x] Add more info on the number of batched tasks (using the `puffin::profile_scope!` macro data). - I added more info, but that's more continuous work when we consider we need more info here and there. - [x] Clean up some scopes, and don't touch too much code to inject puffin. - I am not sure that the _index_documents/mod.rs_ function is that complex with the addition of the scope. - [x] Think about what we consider frames. One indexation operation or the wall program. When must we stop the frame, then? - What we consider a frame is one single `IndexScheduler::tick` execution. - We can change that later. Co-authored-by: Kerollmops <clement@meilisearch.com> Co-authored-by: Clément Renault <clement@meilisearch.com>	2023-07-19 09:44:01 +00:00
Kerollmops	eef95de30e	First iteration on exposing puffin profiling	2023-07-18 17:38:13 +02:00
ManyTheFish	c106906f8f	deactivate camelCase segmentation	2023-07-13 12:06:27 +02:00
Kerollmops	8ba1c8f88f	Update proc-macro2 to compile with the latest nightly	2023-07-12 11:47:27 +02:00
Kerollmops	e7f8daaf86	Update criterion to 0.5.1 to remove the atty dependency	2023-07-03 18:51:42 +02:00
Kerollmops	d1ff631df8	Replace the atty dependency with the is-terminal one	2023-07-03 18:51:42 +02:00
gillian-meilisearch	1d40452057	Update version for the next release (v1.3.0) in Cargo.toml	2023-07-03 08:32:21 +00:00
meili-bors[bot]	661d1f90dc	Merge #3866 3866: Update charabia v0.8.0 r=dureuill a=ManyTheFish # Pull Request Update Charabia: - enhance Japanese segmentation - enhance Latin Tokenization - words containing `_` are now properly segmented into several words - brackets `{([])}` are no more considered as context separators so word separated by brackets are now considered near together for the proximity ranking rule - fixes #3815 - fixes #3778 - fixes [product#151](https://github.com/meilisearch/product/discussions/151) > Important note: now the float numbers are segmented around the `.` so `3.22` is segmented as [`3`, `.`, `22`] but the middle dot isn't considered as a hard separator, which means that if we search `3.22` we find documents containing `3.22` Co-authored-by: ManyTheFish <many@meilisearch.com>	2023-06-29 15:24:36 +00:00
ManyTheFish	e8dee3ca65	Update lock file	2023-06-29 17:02:24 +02:00
ManyTheFish	84845de9ef	Update Charabia	2023-06-29 15:56:32 +02:00
Kerollmops	a385642ec3	Replace the BTreeMap by an IndexMap to return values in order	2023-06-29 14:33:31 +02:00
Kerollmops	737aec1705	Expose an _semanticSimilarity as a dot product in the documents	2023-06-27 12:32:41 +02:00
Kerollmops	c79e82c62a	Move back to the hnsw crate This reverts commit 7a4b6c065482f988b01298642f4c18775503f92f.	2023-06-27 12:32:39 +02:00
Kerollmops	268a9ef416	Move to the hgg crate	2023-06-27 12:32:38 +02:00
Clément Renault	4571e512d2	Store the vectors in an HNSW in LMDB	2023-06-27 12:32:38 +02:00
Clément Renault	34349faeae	Create a new _vector extractor	2023-06-27 12:32:37 +02:00
meili-bors[bot]	45636d315c	Merge #3670 3670: Fix addition deletion bug r=irevoire a=irevoire The first commit of this PR is a revert of https://github.com/meilisearch/meilisearch/pull/3667. It re-enable the auto-batching of addition and deletion of tasks. No new changes have been introduced outside of `milli`. So all the changes you see on the autobatcher have actually already been reviewed. It fixes https://github.com/meilisearch/meilisearch/issues/3440. ### What was happening? The issue was that the `external_documents_ids` generated in the `transform` were used in a very strange way that wasn’t compatible with the deletion of documents. Instead of doing a clear merge between the external document IDs of the DB and the one returned by the transform + writing it on disk, we were doing some weird tricks with the soft-deleted to avoid writing the fst on disk as much as possible. The new algorithm may be a bit slower but is way more straightforward and doesn’t change depending on if the soft deletion was used or not. Here is a list of the changes introduced: 1. We now do a clear distinction between the `new_external_documents_ids` coming from the transform and only held on RAM and the `external_documents_ids` coming from the DB. 2. The `new_external_documents_ids` (coming out of the transform) are now represented as an `fst`. We don't need to struggle with the hard, soft distinction + the soft_deleted => That's easier to understand 3. When indexing documents, we merge the `external_documents_ids` coming from the DB and the `new_external_documents_ids` coming from the transform. ### Other things introduced in this PR Since we constantly have to write small, very specialized fuzzers for this kind of bug, we decided to push the one used to reproduce this bug. It's not perfect, but it's easy to improve in the future. It'll also run for as long as possible on every merge on the main branch. Co-authored-by: Tamo <tamo@meilisearch.com> Co-authored-by: Loïc Lecrenier <loic.lecrenier@icloud.com>	2023-06-19 09:09:30 +00:00
Tamo	6c6387d05e	move the fuzzer to its own crate	2023-05-29 12:27:39 +02:00
Tamo	4391cba6ca	fix the addition + deletion bug	2023-05-17 18:28:57 +02:00
Kerollmops	1a79fd0c3c	Use the new heed v0.12.6	2023-05-15 11:42:30 +02:00
Kerollmops	c4a40e7110	Use the writemap flag to reduce the memory usage	2023-05-15 10:15:33 +02:00
curquiza	3533d4f2bb	Update version for the next release (v1.2.0) in Cargo.toml	2023-05-08 17:52:33 +00:00

1 2 3 4 5 ...

666 Commits