417: Change chunk size to 4MiB to fit more the end user usage r=Kerollmops a=ManyTheFish

Reverts meilisearch/milli#379

We made several indexing tests using different sizes of datasets (5 datasets from 9MiB to 100MiB) on several typologies of VMs (`XS: 1GiB RAM, 1 VCPU`, `S: 2GiB RAM, 2 VCPU`, `M: 4GiB RAM, 3 VCPU`, `L: 8GiB RAM, 4 VCPU`).
The result of these tests shows that the `4MiB` chunk size seems to be the best size compared to other chunk sizes (`2Mib`, `4MiB`, `8Mib`, `16Mib`,  `32Mib`, `64Mib`, `128Mib`).

below is the average time per chunk size:

![Capture d’écran 2021-09-27 à 14 27 50](https://user-images.githubusercontent.com/6482087/134909368-ef0bc45e-68d5-49d1-aaf9-91113b7c410f.png)

<details>
<summary>Detailled data</summary>
<br>

![Capture d’écran 2021-09-27 à 14 39 48](https://user-images.githubusercontent.com/6482087/134909952-a36b1457-bbbd-4a6c-bbe5-519e4b926b5a.png)
</br>
</details> 


Co-authored-by: Many <many@meilisearch.com>
This commit is contained in:
bors[bot] 2022-02-02 18:30:59 +00:00 committed by GitHub
commit fda4f229bb
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -243,7 +243,7 @@ where
let chunk_iter = grenad_obkv_into_chunks( let chunk_iter = grenad_obkv_into_chunks(
documents_file, documents_file,
params.clone(), params.clone(),
self.indexer_config.documents_chunk_size.unwrap_or(1024 * 1024 * 128), // 128MiB self.indexer_config.documents_chunk_size.unwrap_or(1024 * 1024 * 4), // 4MiB
); );
let result = chunk_iter.map(|chunk_iter| { let result = chunk_iter.map(|chunk_iter| {