379: Revert "Change chunk size to 4MiB to fit more the end user usage" r=curquiza a=ManyTheFish
Reverts meilisearch/milli#370
Co-authored-by: Many <legendre.maxime.isn@gmail.com>
378: Hotfix meilisearch#1707 r=Kerollmops a=ManyTheFish
This PR contains an ugly quick fix of [meilisearch#1707](https://github.com/meilisearch/MeiliSearch/issues/1707).
- remove comparison reverse on rank. Enhancing relevancy and performances
- iterate over level 0 only. Enhancing performances.
A better fix is in development.
Co-authored-by: many <maxime@meilisearch.com>
Co-authored-by: Many <legendre.maxime.isn@gmail.com>
374: Enhance CSV document parsing r=Kerollmops a=ManyTheFish
Benchmarks on `search_songs` were crashing because of the CSV parsing.
Co-authored-by: many <maxime@meilisearch.com>
376: Stop casting integer docids to string r=Kerollmops a=irevoire
When a docid is an integer, we stop casting it to a string, and thus we don't add `"` around it.
Co-authored-by: Tamo <tamo@meilisearch.com>
373: Improve error message for bad sort syntax with geosearch r=Kerollmops a=irevoire
`@Kerollmops` This should be the last PR for the geosearch and error handling, sorry for doing it in so many steps 😬
Co-authored-by: Tamo <tamo@meilisearch.com>
372: Fix Meilisearch 1714 r=Kerollmops a=ManyTheFish
The bug comes from the typo tolerance, to know how many typos are accepted we were counting bytes instead of characters in a word.
On Chinese Script characters, we were allowing 2 typos on 3 characters words.
We are now counting the number of char instead of counting bytes to assign the typo tolerance.
Related to [Meilisearch#1714](https://github.com/meilisearch/MeiliSearch/issues/1714)
Co-authored-by: many <maxime@meilisearch.com>
371: Provide a sort error handler r=Kerollmops a=irevoire
This PR simplify the error handling of asc-desc rules for Meilisearch or any other wrapper by providing directly in milli a new error type called `SortError` that can be generated from an `AscDescError` and that can be automatically converted to a `UserError`.
Basically now, wherever you are in the code as a user or in milli you can parse an `AscDesc` syntax and depending on the context, cast it either as a `SortError` or a `CriterionError` in one line with improved error messages.
Co-authored-by: Tamo <tamo@meilisearch.com>
370: Change chunk size to 4MiB to fit more the end user usage r=ManyTheFish a=ManyTheFish
We made several indexing tests using different sizes of datasets (5 datasets from 9MiB to 100MiB) on several typologies of VMs (`XS: 1GiB RAM, 1 VCPU`, `S: 2GiB RAM, 2 VCPU`, `M: 4GiB RAM, 3 VCPU`, `L: 8GiB RAM, 4 VCPU`).
The result of these tests shows that the `4MiB` chunk size seems to be the best size compared to other chunk sizes (`2Mib`, `4MiB`, `8Mib`, `16Mib`, `32Mib`, `64Mib`, `128Mib`).
below is the average time per chunk size:
![Capture d’écran 2021-09-27 à 14 27 50](https://user-images.githubusercontent.com/6482087/134909368-ef0bc45e-68d5-49d1-aaf9-91113b7c410f.png)
<details>
<summary>Detailled data</summary>
<br>
![Capture d’écran 2021-09-27 à 14 39 48](https://user-images.githubusercontent.com/6482087/134909952-a36b1457-bbbd-4a6c-bbe5-519e4b926b5a.png)
</br>
</details>
Co-authored-by: many <maxime@meilisearch.com>
369: Add test checking the bug reported in meilisearch issue 1716 r=Kerollmops a=ManyTheFish
The bug is not present in the newer milli version.
Related to [Meilisearch#1716](https://github.com/meilisearch/MeiliSearch/issues/1716)
Co-authored-by: many <maxime@meilisearch.com>
366: Geosearch error handling r=Kerollmops a=irevoire
Rewrite most of geosearch error handling and another batch of tests on the criterion parsing.
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Irevoire <tamo@meilisearch.com>
364: Fix all the benchmarks r=Kerollmops a=irevoire
#324 broke all benchmarks.
I fixed everything and noticed that `cargo check --all` was insufficient to check the bench in multiple workspaces, so I also updated the CI to use `cargo check --workspace --all-targets`.
Co-authored-by: Tamo <tamo@meilisearch.com>
363: Fix the returned `AscDesc` error r=Kerollmops a=irevoire
With my previous PR on the geosearch I erased the change I've introduced with my pre-previous PR about the new error type when we fail to parse the `AscDesc` type.
Sorry for that, here is the fix
Co-authored-by: Tamo <tamo@meilisearch.com>
357: Add benchmarks for the geosearch r=Kerollmops a=irevoire
closes#336
Should I merge this PR in #322 and then we merge everything in `main` or should we wait for #322 to be merged and then merge this one in `main` later?
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Irevoire <tamo@meilisearch.com>
324: Implement documents API r=Kerollmops a=MarinPostma
This pr implement the intermediary document representation for milli. The JSON, JSONL and CSV formats are replaced with the format instead, to push the serialization duty on the client side.
The `documents` module contains the interface to the new document format:
- The `DocumentsBuilder` allows the creation of a writer backed document addition, when documents are added either one by one, or as arrays of depth 1. This is made possible by the fact that the seriliazer used by the `add_documents` methods only accepts `[Object]` and `Object`. The related serialization logic is located in the `serde.rs` file.
- The `DocumentsReader` allows to to iterate over the documents created by a `DocumentsBuilder`. A call to `next_document_with_index` returns the next obkv reader in the document addition, along with a reference to the index used to map the field ids in the obkv reader to the field names
All references to json, jsonl or csv in the tests have been replaced with the `documents!` macro, works exaclty like the `serde_json::json` macro, as a convenient way to create a `DocumentsReader`.
Rewrote the search cli, to the `cli` crate, to also allow index manipulation. This only offers basic functionalities for now, but is meant to be easier to extend than http ui
blocked by #308
Co-authored-by: mpostma <postma.marin@protonmail.com>