Commit Graph

667 Commits

Author SHA1 Message Date
Louis Dureuil
7e47cea0c4
Add tracing to Meilisearch 2024-02-08 15:03:31 +01:00
Louis Dureuil
5d7061682e
Add tracing to milli 2024-02-08 15:03:31 +01:00
Louis Dureuil
02e6c8a440
Add tracing to index-scheduler 2024-02-08 15:03:31 +01:00
Louis Dureuil
89401d097b
Add tracing-trace 2024-02-08 15:03:30 +01:00
Louis Dureuil
698ea5139d
Update Cargo.lock 2024-02-01 10:40:23 +01:00
curquiza
c57f7f7379
Update version for the next release (v1.6.1) in Cargo.toml 2024-02-01 10:33:26 +01:00
meili-bors[bot]
b6fc181993
Merge #4304
4304: Add CUDA GPU support for Hugging Face embedders r=Kerollmops a=dureuill

Adds a "cuda" feature to `milli`.

Compiling with this feature requires that the CUDA support library be installed (see "with CUDA support" paragraph in https://huggingface.github.io/candle/guide/installation.html), and adds CUDA support to the `huggingFace` embedder.

To enable GPU support, users will need to:

1. Have a compatible NVidia GPU under Linux
2. Follow [the guide](https://huggingface.github.io/candle/guide/installation.html) to install the CUDA dependencies
3. Compile Meilisearch with the `cuda` feature: `cargo build --release --features cuda`

# Impact

Enabling the CUDA feature allows to use an available GPU to compute embeddings with a `huggingFace` embedder. 
On an AWS Graviton 2, this yields a x3 - x5 improvement on indexing time.

# Technical details

- I had to change the CI so that the cuda feature is not included in the `Tests all features` workflow
- To achieve that, I had to add a binary following the `cargo xtask` design pattern, to list all features excepted the cuda one.
- I then changed the workflow accordingly (renamed to "Tests almost all features" 😉)
- A test run of the new feature was done on a temporary version of this PR that had it enabled for PRs: [See the results here](https://github.com/meilisearch/meilisearch/actions/runs/7461331929/job/20301216732)

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-01-22 13:55:04 +00:00
Louis Dureuil
d35fe43fd5
Update lock file 2024-01-22 10:49:17 +01:00
Louis Dureuil
4aa4a15dc9
Add to Cargo.lock 2024-01-22 10:25:54 +01:00
Louis Dureuil
84f49d76cd
Add cuda feature 2024-01-22 10:25:16 +01:00
dependabot[bot]
b5b2333a05
Bump h2 from 0.3.20 to 0.3.24
Bumps [h2](https://github.com/hyperium/h2) from 0.3.20 to 0.3.24.
- [Release notes](https://github.com/hyperium/h2/releases)
- [Changelog](https://github.com/hyperium/h2/blob/v0.3.24/CHANGELOG.md)
- [Commits](https://github.com/hyperium/h2/compare/v0.3.20...v0.3.24)

---
updated-dependencies:
- dependency-name: h2
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-01-19 16:20:22 +00:00
Clément Renault
50e1d34c66
Rollback http to 0.2.11 2024-01-16 16:57:33 +01:00
Clément Renault
01e2c3d6bb
Bump arroy to v0.2.0 2024-01-16 16:45:55 +01:00
Clément Renault
0c8d1644a6
Rollback rustls to 0.20.9 2024-01-16 15:55:16 +01:00
Clément Renault
7f125bfb12
Update incompatible dependencies 2024-01-16 15:15:54 +01:00
Clément Renault
5869ca7716
Upgrade all compatible dependencies 2024-01-16 15:05:03 +01:00
Clément Renault
d9d0419845
Update the dependencies 2024-01-16 14:38:48 +01:00
Louis Dureuil
12edc2c20a
Update arroy to a fixed version 2024-01-03 15:59:37 +01:00
meili-bors[bot]
43e822e802
Merge #4238
4238: Task queue webhook r=dureuill a=irevoire

# Prototype `prototype-task-queue-webhook-1`

The prototype is available through Docker by using the following command:

```bash
docker run -p 7700:7700 -v $(pwd)/meili_data:/meili_data getmeili/meilisearch:prototype-task-queue-webhook-1
```

# Pull Request

Implements the task queue webhook.

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/4236

## What does this PR do?
- Provide a new cli and env var for the webhook, respectively called `--task-webhook-url` and `MEILI_TASK_WEBHOOK_URL`
- Also supports sending the requests with a custom `Authorization` header by specifying the optional `--task-webhook-authorization-header` CLI parameter or `MEILI_TASK_WEBHOOK_AUTHORIZATION_HEADER` env variable.
- Throw an error if the specified URL is invalid
- Every time a batch is processed, send all the finished tasks into the webhook with our public `TaskView` type as a JSON Line GZIPed body.
- Add one test.

## PR checklist

### Before becoming ready to review
- [x] Add a test
- [x] Compress the data we send
- [x] Chunk and stream the data we send
- [x] Remove the unwrap in the index-scheduler when sending the data fails
- [x] The analytics are missing

### Before merging
- [x] Release a prototype



Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
2023-12-21 14:43:46 +00:00
Tamo
be72326c0a
gzip the tasks 2023-12-19 10:35:51 +01:00
Tamo
547379abb0
parse the url correctly 2023-12-19 10:35:51 +01:00
Tamo
d78ad51082
Implement the webhook 2023-12-19 10:35:50 +01:00
Louis Dureuil
942d49314c
Remove dependency that requires libstdc++ 2023-12-18 22:17:18 +01:00
curquiza
50d6317ec0 Update version for the next release (v1.6.0) in Cargo.toml 2023-12-18 13:57:46 +00:00
Louis Dureuil
61bd2fb7a9
Update arroy 2023-12-14 16:08:41 +01:00
Louis Dureuil
65e49b7092
Remove stuff, add distribution shift (WIP) 2023-12-14 16:08:38 +01:00
Louis Dureuil
cb4ebe163e
WIP 2023-12-14 16:07:49 +01:00
Louis Dureuil
dde3a04679
WIP arroy integration 2023-12-14 16:07:49 +01:00
Louis Dureuil
13c2c6c16b
Small commit to add hybrid search and autoembedding 2023-12-14 16:07:48 +01:00
Louis Dureuil
21bcf32109
Add candle and hg_hub, updating a lot of deps in the process 2023-12-14 16:07:48 +01:00
Clément Renault
56571f762a
Merge remote-tracking branch 'origin/main' into tmp-release-v1.5.1 2023-12-13 11:57:01 +01:00
curquiza
4b644f6bc0 Update version for the next release (v1.5.1) in Cargo.toml 2023-12-11 17:15:11 +00:00
Clément Renault
d32eb11329
Move to the v0.20.0-alpha.9 of heed 2023-11-27 11:52:22 +01:00
Clément Renault
0d4482625a
Make the changes to use heed v0.20-alpha.6 2023-11-23 11:43:58 +01:00
Clément Renault
56a0d91ecd
Update the heed dependency and lock file 2023-11-22 15:11:09 +01:00
Clément Renault
7cb7e37ba8
Merge branch 'main' into tmp-release-v1.5.0 2023-11-21 16:30:46 +01:00
Clément Renault
b10c060bf7
Cleanup TOML 2023-11-01 14:03:04 +01:00
Clément Renault
c71b1d33ae
Sort entries using rayon in the transform sorters 2023-11-01 11:07:16 +01:00
Clément Renault
b57b818b67
Don't use the last version of clap 2023-10-30 16:57:31 +01:00
Clément Renault
f7ea94e5f4
Modify the Dockerfile to compile meilisearch and meilitool 2023-10-30 16:32:17 +01:00
Clément Renault
13416ccbf7
Introduce a new meilitool to help the cloud team 2023-10-30 14:30:20 +01:00
Louis Dureuil
5be569e3e2
Update obkv 2023-10-30 11:40:20 +01:00
ManyTheFish
17b647dfe5
Wip 2023-10-30 11:13:08 +01:00
ManyTheFish
4c6fddb1cb update charabia 2023-10-26 17:01:10 +02:00
curquiza
ee6f79d60b Update version for the next release (v1.5.0) in Cargo.toml 2023-10-23 11:49:07 +00:00
curquiza
2042229927
Update version for the next release (v1.4.2) in Cargo.toml 2023-10-23 12:02:45 +02:00
dependabot[bot]
e761db582f
Bump rustix from 0.36.15 to 0.36.16
Bumps [rustix](https://github.com/bytecodealliance/rustix) from 0.36.15 to 0.36.16.
- [Release notes](https://github.com/bytecodealliance/rustix/releases)
- [Commits](https://github.com/bytecodealliance/rustix/compare/v0.36.15...v0.36.16)

---
updated-dependencies:
- dependency-name: rustix
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-10-18 18:42:12 +00:00
Clément Renault
c5f7893fbb
Remove the puffin http dependency 2023-10-13 13:11:08 +02:00
meili-bors[bot]
0913373a5e
Merge #4122
4122: Bring back changes from `release-v1.4.1` into `main` r=Kerollmops a=curquiza



Co-authored-by: curquiza <curquiza@users.noreply.github.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Vivek Kumar <vivek.26@outlook.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
2023-10-12 15:57:47 +00:00
curquiza
8a95bf28e5 Update version for the next release (v1.4.1) in Cargo.toml 2023-10-10 09:01:45 +00:00
dependabot[bot]
c668a29ed5
Bump webpki from 0.22.1 to 0.22.2
Bumps [webpki](https://github.com/briansmith/webpki) from 0.22.1 to 0.22.2.
- [Commits](https://github.com/briansmith/webpki/commits)

---
updated-dependencies:
- dependency-name: webpki
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-10-02 21:53:45 +00:00
meili-bors[bot]
86b314626d
Merge #4080
4080: Bring back changes from v1.4.0 into main r=Kerollmops a=curquiza



Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: curquiza <curquiza@users.noreply.github.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: curquiza <clementine@meilisearch.com>
Co-authored-by: Vivek Kumar <vivek.26@outlook.com>
Co-authored-by: dogukanakkaya <doguakkaya27@hotmail.com>
2023-09-26 08:13:49 +00:00
meili-bors[bot]
b4c44603db
Merge #4009
4009: Bump rustls-webpki from 0.100.1 to 0.100.2 r=Kerollmops a=dependabot[bot]

Bumps [rustls-webpki](https://github.com/rustls/webpki) from 0.100.1 to 0.100.2.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a href="https://github.com/rustls/webpki/releases">rustls-webpki's releases</a>.</em></p>
<blockquote>
<h2>v/0.100.2</h2>
<h2>Release notes</h2>
<ul>
<li>certificate path building and verification is now capped at 100 signature validation operations to avoid the risk of CPU usage denial-of-service attack when validating crafted certificate chains producing quadratic runtime. This risk affected both clients, as well as servers that verified client certificates.</li>
</ul>
<h2>What's Changed</h2>
<ul>
<li>v0.100.2 prep by <a href="https://github.com/cpu"><code>`@​cpu</code></a>` in <a href="https://redirect.github.com/rustls/webpki/pull/154">rustls/webpki#154</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a href="https://github.com/rustls/webpki/compare/v/0.100.1...v/0.100.2">https://github.com/rustls/webpki/compare/v/0.100.1...v/0.100.2</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="c8b821450b"><code>c8b8214</code></a> Bump MSRV to 1.60</li>
<li><a href="855752292e"><code>8557522</code></a> Avoid testing MSRV of dev-dependencies</li>
<li><a href="73a7f0c7d7"><code>73a7f0c</code></a> Cargo: version 0.100.1 -&gt; 0.100.2</li>
<li><a href="4ea052366f"><code>4ea0523</code></a> verify_cert: enforce maximum number of signatures.</li>
<li>See full diff in <a href="https://github.com/rustls/webpki/compare/v/0.100.1...v/0.100.2">compare view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=rustls-webpki&package-manager=cargo&previous-version=0.100.1&new-version=0.100.2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting ``@dependabot` rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- ``@dependabot` rebase` will rebase this PR
- ``@dependabot` recreate` will recreate this PR, overwriting any edits that have been made to it
- ``@dependabot` merge` will merge this PR after your CI passes on it
- ``@dependabot` squash and merge` will squash and merge this PR after your CI passes on it
- ``@dependabot` cancel merge` will cancel a previously requested merge and block automerging
- ``@dependabot` reopen` will reopen this PR if it is closed
- ``@dependabot` close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- ``@dependabot` show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency
- ``@dependabot` ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/meilisearch/meilisearch/network/alerts).

</details>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-09-11 13:11:07 +00:00
meili-bors[bot]
487d493f49
Merge #4043
4043: Bring back hotfixes from v1.3.3 into v1.4.0 r=Kerollmops a=curquiza



Co-authored-by: curquiza <curquiza@users.noreply.github.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: curquiza <clementine@meilisearch.com>
2023-09-11 12:27:34 +00:00
dependabot[bot]
9636c5f558
Bump webpki from 0.22.0 to 0.22.1
Bumps [webpki](https://github.com/briansmith/webpki) from 0.22.0 to 0.22.1.
- [Commits](https://github.com/briansmith/webpki/commits)

---
updated-dependencies:
- dependency-name: webpki
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-09-11 10:32:34 +00:00
curquiza
651657c03e Fix git conflicts 2023-09-07 16:48:13 +02:00
meili-bors[bot]
9945cbf9db
Merge #4038
4038: Fix filter escaping issues r=ManyTheFish a=Kerollmops

This PR fixes #4034 by always escaping the sequences. Users must always put quotes (simple or double) to escape the filter values.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2023-09-06 12:29:29 +00:00
Kerollmops
03d0f628bd
Use the unescaper crate to unescape any char sequence 2023-09-06 13:59:45 +02:00
curquiza
93285041a9 Update version for the next release (v1.3.3) in Cargo.toml 2023-09-06 09:23:20 +00:00
Clément Renault
af0f6f0bf0
Merge branch 'main' into update-version-v1.4.0 2023-08-28 15:08:59 +02:00
meili-bors[bot]
ccf3ba3f32
Merge #4019
4019: Bringing back changes from `v1.3.2` onto `main` r=irevoire a=Kerollmops



Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: irevoire <irevoire@users.noreply.github.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
2023-08-28 12:14:11 +00:00
Kerollmops
65528a3e06 Update version for the next release (v1.4.0) in Cargo.toml 2023-08-28 11:52:28 +00:00
dependabot[bot]
e59d7f238c
Bump rustls-webpki from 0.100.1 to 0.100.2
Bumps [rustls-webpki](https://github.com/rustls/webpki) from 0.100.1 to 0.100.2.
- [Release notes](https://github.com/rustls/webpki/releases)
- [Commits](https://github.com/rustls/webpki/compare/v/0.100.1...v/0.100.2)

---
updated-dependencies:
- dependency-name: rustls-webpki
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-08-22 18:10:53 +00:00
Kerollmops
717b069907
Bump charabia to 0.8.3 2023-08-22 16:25:00 +02:00
irevoire
b947f3bb9d Update version for the next release (v1.3.2) in Cargo.toml 2023-08-16 08:20:36 +00:00
ManyTheFish
cab27c2ab4 upgrade indexmap = "2.0.0" 2023-08-10 18:09:02 +02:00
ManyTheFish
624fa9052f upgrade deserr = "0.6.0" 2023-08-10 18:09:02 +02:00
ManyTheFish
359ede4862 upgrade fastrand = "2.0.0" 2023-08-10 18:09:02 +02:00
ManyTheFish
60c11dbdbd upgrade rstar - "0.11.0" 2023-08-10 18:09:02 +02:00
ManyTheFish
dacee40ebc upgrade memmap2 = "0.7.1" 2023-08-10 18:09:02 +02:00
ManyTheFish
6089083a8e upgrade sysinfo = "0.29.7" 2023-08-10 18:09:02 +02:00
ManyTheFish
cc2c19d4c3 upgrade itertools = "0.10.5" 2023-08-10 18:09:02 +02:00
ManyTheFish
a5c56fac8a Update dependencies 2023-08-10 18:09:02 +02:00
meili-bors[bot]
e4e49e63d0
Merge #3993
3993: Bringing back changes from v1.3.1 to `main` r=irevoire a=curquiza



Co-authored-by: irevoire <irevoire@users.noreply.github.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: ManyTheFish <many@meilisearch.com>
2023-08-10 14:30:02 +00:00
irevoire
75c87d5391 Update version for the next release (v1.3.1) in Cargo.toml 2023-08-08 10:30:06 +00:00
ManyTheFish
b45c36cd71 Merge branch 'main' into tmp-release-v1.3.0 2023-08-01 15:05:17 +02:00
Clément Renault
d8b47b689e
Use the new read-txn-no-tls heed feature 2023-07-26 15:45:15 +02:00
Kerollmops
29ab54b259
Replace the hnsw crate by the instant-distance one 2023-07-25 12:37:35 +02:00
ManyTheFish
0497f93494 Update Charabia to the last version 2023-07-19 15:19:32 +02:00
meili-bors[bot]
2dfbb6813a
Merge #3913
3913: Expose a Puffin server to profile the indexing process r=Kerollmops a=Kerollmops

This PR exposes a puffin HTTP server to expose the internal timing it takes to index documents, delete documents, or update the settings of an index.

<img width="1752" alt="Capture d’écran 2023-07-10 à 18 44 58" src="https://github.com/meilisearch/meilisearch/assets/3610253/a3c7a6bf-db5b-42f4-8be1-c4e31c869843">

## To be done

 - [x] Move the puffin HTTP server under a feature flag.
 - [x] Use [the `puffin::set_scopes_on` function](https://docs.rs/puffin/latest/puffin/fn.set_scopes_on.html) to toggle it (by using the feature directly).
     When this function is called with `false`, [a call to `profile_scope!` talked 1-2ns](https://docs.rs/puffin/latest/puffin/fn.set_scopes_on.html).
 - [x] Create a _PROFILING.md_ file explaining how to use it.
   - [x] Explain that merging scopes on the interface is not always useful.
 - [x] Add more info on the number of batched tasks (using the `puffin::profile_scope!` macro data).
   - I added more info, but that's more continuous work when we consider we need more info here and there.
 - [x] Clean up some scopes, and don't touch too much code to inject puffin.
   - I am not sure that the _index_documents/mod.rs_ function is that complex with the addition of the scope.
 - [x] Think about what we consider frames. One indexation operation or the wall program. When must we stop the frame, then?
   - What we consider a frame is one single `IndexScheduler::tick` execution.
   - We can change that later.

Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
2023-07-19 09:44:01 +00:00
Kerollmops
eef95de30e
First iteration on exposing puffin profiling 2023-07-18 17:38:13 +02:00
ManyTheFish
c106906f8f deactivate camelCase segmentation 2023-07-13 12:06:27 +02:00
Kerollmops
8ba1c8f88f
Update proc-macro2 to compile with the latest nightly 2023-07-12 11:47:27 +02:00
Kerollmops
e7f8daaf86
Update criterion to 0.5.1 to remove the atty dependency 2023-07-03 18:51:42 +02:00
Kerollmops
d1ff631df8
Replace the atty dependency with the is-terminal one 2023-07-03 18:51:42 +02:00
gillian-meilisearch
1d40452057 Update version for the next release (v1.3.0) in Cargo.toml 2023-07-03 08:32:21 +00:00
meili-bors[bot]
661d1f90dc
Merge #3866
3866: Update charabia v0.8.0 r=dureuill a=ManyTheFish

# Pull Request

Update Charabia:
- enhance Japanese segmentation
- enhance Latin Tokenization
  - words containing `_` are now properly segmented into several words
  - brackets `{([])}` are no more considered as context separators so word separated by brackets are now considered near together for the proximity ranking rule
- fixes #3815
- fixes #3778
- fixes [product#151](https://github.com/meilisearch/product/discussions/151)

> Important note: now the float numbers are segmented around the `.` so `3.22` is segmented as [`3`, `.`, `22`] but the middle dot isn't considered as a hard separator, which means that if we search `3.22` we find documents containing `3.22`

Co-authored-by: ManyTheFish <many@meilisearch.com>
2023-06-29 15:24:36 +00:00
ManyTheFish
e8dee3ca65 Update lock file 2023-06-29 17:02:24 +02:00
ManyTheFish
84845de9ef Update Charabia 2023-06-29 15:56:32 +02:00
Kerollmops
a385642ec3
Replace the BTreeMap by an IndexMap to return values in order 2023-06-29 14:33:31 +02:00
Kerollmops
737aec1705
Expose an _semanticSimilarity as a dot product in the documents 2023-06-27 12:32:41 +02:00
Kerollmops
c79e82c62a
Move back to the hnsw crate
This reverts commit 7a4b6c065482f988b01298642f4c18775503f92f.
2023-06-27 12:32:39 +02:00
Kerollmops
268a9ef416
Move to the hgg crate 2023-06-27 12:32:38 +02:00
Clément Renault
4571e512d2
Store the vectors in an HNSW in LMDB 2023-06-27 12:32:38 +02:00
Clément Renault
34349faeae
Create a new _vector extractor 2023-06-27 12:32:37 +02:00
meili-bors[bot]
45636d315c
Merge #3670
3670: Fix addition deletion bug r=irevoire a=irevoire

The first commit of this PR is a revert of https://github.com/meilisearch/meilisearch/pull/3667. It re-enable the auto-batching of addition and deletion of tasks. No new changes have been introduced outside of `milli`. So all the changes you see on the autobatcher have actually already been reviewed.

It fixes https://github.com/meilisearch/meilisearch/issues/3440.

### What was happening?

The issue was that the `external_documents_ids` generated in the `transform` were used in a very strange way that wasn’t compatible with the deletion of documents.
Instead of doing a clear merge between the external document IDs of the DB and the one returned by the transform + writing it on disk, we were doing some weird tricks with the soft-deleted to avoid writing the fst on disk as much as possible.
The new algorithm may be a bit slower but is way more straightforward and doesn’t change depending on if the soft deletion was used or not. Here is a list of the changes introduced:
1. We now do a clear distinction between the `new_external_documents_ids` coming from the transform and only held on RAM and the `external_documents_ids` coming from the DB.
2. The `new_external_documents_ids` (coming out of the transform) are now represented as an `fst`. We don't need to struggle with the hard, soft distinction + the soft_deleted => That's easier to understand
3. When indexing documents, we merge the `external_documents_ids` coming from the DB and the `new_external_documents_ids` coming from the transform.

### Other things introduced in this  PR

Since we constantly have to write small, very specialized fuzzers for this kind of bug, we decided to push the one used to reproduce this bug.
It's not perfect, but it's easy to improve in the future.
It'll also run for as long as possible on every merge on the main branch.

Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Loïc Lecrenier <loic.lecrenier@icloud.com>
2023-06-19 09:09:30 +00:00
Tamo
6c6387d05e
move the fuzzer to its own crate 2023-05-29 12:27:39 +02:00
Tamo
4391cba6ca
fix the addition + deletion bug 2023-05-17 18:28:57 +02:00
Kerollmops
1a79fd0c3c
Use the new heed v0.12.6 2023-05-15 11:42:30 +02:00
Kerollmops
c4a40e7110
Use the writemap flag to reduce the memory usage 2023-05-15 10:15:33 +02:00