meilisearch

mirror of https://github.com/meilisearch/meilisearch.git synced 2024-11-23 18:45:06 +08:00

Author	SHA1	Message	Date
bors[bot]	68c758a533	Merge #376 376: Stop casting integer docids to string r=Kerollmops a=irevoire When a docid is an integer, we stop casting it to a string, and thus we don't add `"` around it. Co-authored-by: Tamo <tamo@meilisearch.com>	2021-09-29 08:32:48 +00:00
many	d2427f18e5	Enhance CSV document parsing	2021-09-29 10:25:33 +02:00
bors[bot]	00f94b1ffd	Merge #377 377: Update version for the next release (v0.17.0) r=Kerollmops a=curquiza Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>	2021-09-28 20:43:33 +00:00
Clémentine Urquizar	0e8665bf18	Update version for the next release (v0.17.0)	2021-09-28 19:38:12 +02:00
Tamo	f65153ad64	stop casting integer docids to string	2021-09-28 18:35:54 +02:00
bors[bot]	adddf3f179	Merge #375 375: Fixes #365 r=Kerollmops a=vishnugt Co-authored-by: Vishnu Ganesan <vganesan@microsoft.com> Co-authored-by: Vishnu Gt <vishnugt@hotmail.com>	2021-09-28 14:42:48 +00:00
Vishnu Gt	785c1372f2	Change "settings" to "setting" Co-authored-by: Clément Renault <renault.cle@gmail.com>	2021-09-28 20:11:32 +05:30
Vishnu Ganesan	3580b2d803	Fixes #365	2021-09-28 19:30:23 +05:30
bors[bot]	3a12f5887e	Merge #373 373: Improve error message for bad sort syntax with geosearch r=Kerollmops a=irevoire `@Kerollmops` This should be the last PR for the geosearch and error handling, sorry for doing it in so many steps 😬 Co-authored-by: Tamo <tamo@meilisearch.com>	2021-09-28 12:39:32 +00:00
Tamo	a80dcfd4a3	improve error message for bad sort syntax with geosearch	2021-09-28 14:32:24 +02:00
bors[bot]	b2a332599e	Merge #372 372: Fix Meilisearch 1714 r=Kerollmops a=ManyTheFish The bug comes from the typo tolerance, to know how many typos are accepted we were counting bytes instead of characters in a word. On Chinese Script characters, we were allowing 2 typos on 3 characters words. We are now counting the number of char instead of counting bytes to assign the typo tolerance. Related to [Meilisearch#1714](https://github.com/meilisearch/MeiliSearch/issues/1714) Co-authored-by: many <maxime@meilisearch.com>	2021-09-28 11:59:45 +00:00
many	8046ae4bd5	Count the number of char instead of counting bytes to assign the typo tolerance	2021-09-28 12:10:43 +02:00
many	1988416295	Add failing test related to Meilisearch#1714	2021-09-28 12:05:11 +02:00
bors[bot]	3b479948c6	Merge #371 371: Provide a sort error handler r=Kerollmops a=irevoire This PR simplify the error handling of asc-desc rules for Meilisearch or any other wrapper by providing directly in milli a new error type called `SortError` that can be generated from an `AscDescError` and that can be automatically converted to a `UserError`. Basically now, wherever you are in the code as a user or in milli you can parse an `AscDesc` syntax and depending on the context, cast it either as a `SortError` or a `CriterionError` in one line with improved error messages. Co-authored-by: Tamo <tamo@meilisearch.com>	2021-09-28 09:28:32 +00:00
Tamo	cc732fe95e	update http-ui to use the sort-error	2021-09-28 11:15:24 +02:00
Tamo	c7cb816ae1	simplify the error handling of the sort syntax for meilisearch	2021-09-27 19:07:22 +02:00
bors[bot]	4c09f6838f	Merge #370 370: Change chunk size to 4MiB to fit more the end user usage r=ManyTheFish a=ManyTheFish We made several indexing tests using different sizes of datasets (5 datasets from 9MiB to 100MiB) on several typologies of VMs (`XS: 1GiB RAM, 1 VCPU`, `S: 2GiB RAM, 2 VCPU`, `M: 4GiB RAM, 3 VCPU`, `L: 8GiB RAM, 4 VCPU`). The result of these tests shows that the `4MiB` chunk size seems to be the best size compared to other chunk sizes (`2Mib`, `4MiB`, `8Mib`, `16Mib`, `32Mib`, `64Mib`, `128Mib`). below is the average time per chunk size: ![Capture d’écran 2021-09-27 à 14 27 50](https://user-images.githubusercontent.com/6482087/134909368-ef0bc45e-68d5-49d1-aaf9-91113b7c410f.png) <details> <summary>Detailled data</summary> <br> ![Capture d’écran 2021-09-27 à 14 39 48](https://user-images.githubusercontent.com/6482087/134909952-a36b1457-bbbd-4a6c-bbe5-519e4b926b5a.png) </br> </details> Co-authored-by: many <maxime@meilisearch.com>	2021-09-27 12:57:52 +00:00
many	b188063869	Change chunk size to 4MiB to fit more the end user usage	2021-09-27 14:26:21 +02:00
bors[bot]	0f8320bdc2	Merge #369 369: Add test checking the bug reported in meilisearch issue 1716 r=Kerollmops a=ManyTheFish The bug is not present in the newer milli version. Related to [Meilisearch#1716](https://github.com/meilisearch/MeiliSearch/issues/1716) Co-authored-by: many <maxime@meilisearch.com>	2021-09-23 14:27:34 +00:00
many	551df0cb77	Add test checking the bug reported in meilisearch issue 1716	2021-09-23 15:55:39 +02:00
bors[bot]	87dd441a3a	Merge #367 367: Update version for the next release (v0.16.0) r=Kerollmops a=curquiza Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>	2021-09-22 15:20:20 +00:00
Clémentine Urquizar	1eacab2169	Update version for the next release (v0.15.1)	2021-09-22 17:18:54 +02:00
bors[bot]	b806097141	Merge #366 366: Geosearch error handling r=Kerollmops a=irevoire Rewrite most of geosearch error handling and another batch of tests on the criterion parsing. Co-authored-by: Tamo <tamo@meilisearch.com> Co-authored-by: Irevoire <tamo@meilisearch.com>	2021-09-22 15:08:11 +00:00
Irevoire	218f0a6661	Apply suggestions from code review Co-authored-by: Clément Renault <clement@meilisearch.com>	2021-09-22 17:00:27 +02:00
Tamo	47ee93b0bd	return an error when _geoPoint is used but _geo is not sortable	2021-09-22 16:37:41 +02:00
Tamo	1e5e3d57e2	auto convert AscDescError into CriterionError	2021-09-22 16:37:41 +02:00
Tamo	023446ecf3	create a smaller and easier to maintain CriterionError type	2021-09-22 16:37:41 +02:00
Tamo	86e272856a	create an asc_desc error type that is never supposed to be returned to the end user	2021-09-22 16:37:41 +02:00
Tamo	257e621d40	create an asc_desc module	2021-09-22 16:37:41 +02:00
Tamo	113a061bee	fix the error handling on the criterion side	2021-09-22 15:09:07 +02:00
bors[bot]	ad3befaaf5	Merge #364 364: Fix all the benchmarks r=Kerollmops a=irevoire #324 broke all benchmarks. I fixed everything and noticed that `cargo check --all` was insufficient to check the bench in multiple workspaces, so I also updated the CI to use `cargo check --workspace --all-targets`. Co-authored-by: Tamo <tamo@meilisearch.com>	2021-09-22 12:40:34 +00:00
Tamo	176160d32f	fix all benchmarks and add the compile time checking of the benhcmarks in the ci	2021-09-22 12:10:21 +02:00
bors[bot]	16790ee620	Merge #363 363: Fix the returned `AscDesc` error r=Kerollmops a=irevoire With my previous PR on the geosearch I erased the change I've introduced with my pre-previous PR about the new error type when we fail to parse the `AscDesc` type. Sorry for that, here is the fix Co-authored-by: Tamo <tamo@meilisearch.com>	2021-09-22 09:53:35 +00:00
Tamo	78b0bce9a1	fix the returned error when asc desc fails to be parsed	2021-09-22 11:37:05 +02:00
bors[bot]	2837cab5da	Merge #362 362: Remove the `Cargo.lock` again r=Kerollmops a=irevoire Co-authored-by: Tamo <tamo@meilisearch.com>	2021-09-22 09:33:09 +00:00
Tamo	2e99fa8251	remove the cargo.lock again	2021-09-22 11:30:33 +02:00
bors[bot]	fe9f380993	Merge #361 361: Update version for the next release (v0.15.0) r=Kerollmops a=curquiza Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>	2021-09-21 16:19:16 +00:00
Clémentine Urquizar	f8ecbc28e2	Update version for the next release (v0.15.0)	2021-09-21 18:09:14 +02:00
bors[bot]	700318dc62	Merge #357 357: Add benchmarks for the geosearch r=Kerollmops a=irevoire closes #336 Should I merge this PR in #322 and then we merge everything in `main` or should we wait for #322 to be merged and then merge this one in `main` later? Co-authored-by: Tamo <tamo@meilisearch.com> Co-authored-by: Irevoire <tamo@meilisearch.com>	2021-09-21 16:08:06 +00:00
bors[bot]	9d9010e45f	Merge #324 324: Implement documents API r=Kerollmops a=MarinPostma This pr implement the intermediary document representation for milli. The JSON, JSONL and CSV formats are replaced with the format instead, to push the serialization duty on the client side. The `documents` module contains the interface to the new document format: - The `DocumentsBuilder` allows the creation of a writer backed document addition, when documents are added either one by one, or as arrays of depth 1. This is made possible by the fact that the seriliazer used by the `add_documents` methods only accepts `[Object]` and `Object`. The related serialization logic is located in the `serde.rs` file. - The `DocumentsReader` allows to to iterate over the documents created by a `DocumentsBuilder`. A call to `next_document_with_index` returns the next obkv reader in the document addition, along with a reference to the index used to map the field ids in the obkv reader to the field names All references to json, jsonl or csv in the tests have been replaced with the `documents!` macro, works exaclty like the `serde_json::json` macro, as a convenient way to create a `DocumentsReader`. Rewrote the search cli, to the `cli` crate, to also allow index manipulation. This only offers basic functionalities for now, but is meant to be easier to extend than http ui blocked by #308 Co-authored-by: mpostma <postma.marin@protonmail.com>	2021-09-21 15:40:03 +00:00
mpostma	aa6c5df0bc	Implement documents format document reader transform remove update format support document sequences fix document transform clean transform improve error handling add documents! macro fix transform bug fix tests remove csv dependency Add comments on the transform process replace search cli fmt review edits fix http ui fix clippy warnings Revert "fix clippy warnings" This reverts commit a1ce3cd96e603633dbf43e9e0b12b2453c9c5620. fix review comments remove smallvec in transform loop review edits	2021-09-21 16:58:33 +02:00
bors[bot]	94764e5c7c	Merge #360 360: Update version for the next release (v0.14.0) r=Kerollmops a=curquiza Release containing the geosearch, cf #322 Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>	2021-09-21 08:43:27 +00:00
bors[bot]	31c8de1cca	Merge #322 322: Geosearch r=ManyTheFish a=irevoire This PR introduces [basic geo-search functionalities](https://github.com/meilisearch/specifications/pull/59), it makes the engine able to index, filter and, sort by geo-point. We decided to use [the rstar library](https://docs.rs/rstar) and to save the points in [an RTree](https://docs.rs/rstar/0.9.1/rstar/struct.RTree.html) that we de/serialize in the index database [by using serde](https://serde.rs/) with [bincode](https://docs.rs/bincode). This is not an efficient way to query this tree as it will consume a lot of CPU and memory when a search is made, but at least it is an easy first way to do so. ### What we will have to do on the indexing part: - [x] Index the `_geo` fields from the documents. - [x] Create a new module with an extractor in the `extract` module that takes the `obkv_documents` and retrieves the latitude and longitude coordinates, outputting them in a `grenad::Reader` for further process. - [x] Call the extractor in the `extract::extract_documents_data` function and send the result to the `TypedChunk` module. - [x] Get the `grenad::Reader` in the `typed_chunk::write_typed_chunk_into_index` function and store all the points in the `rtree` - [x] Delete the documents from the `RTree` when deleting documents from the database. All this can be done in the `delete_documents.rs` file by getting the data structure and removing the points from it, inserting it back after the modification. - [x] Clearing the `RTree` entirely when we clear the documents from the database, everything happens in the `clear_documents.rs` file. - [x] save a Roaring bitmap of all documents containing the `_geo` field ### What we will have to do on the query part: - [x] Filter the documents at a certain distance around a point, this is done by [collecting the documents from the searched point](https://docs.rs/rstar/0.9.1/rstar/struct.RTree.html#method.nearest_neighbor_iter) while they are in range. - [x] We must introduce new `geoLowerThan` and `geoGreaterThan` variants to the `Operator` filter enum. - [x] Implement the `negative` method on both variants where the `geoGreaterThan` variant is implemented by executing the `geoLowerThan` and removing the results found from the whole list of geo faceted documents. - [x] Add the `_geoRadius` function in the pest parser. - [x] Introduce a `_geo` ascending ranking function that takes a point in parameter, ~~this function must keep the iterator on the `RTree` and make it peekable~~ This was not possible for now, we had to collect the whole iterator. Only the documents that are part of the candidates must be sent too! - [x] This ascending ranking rule will only be active if the search is set up with the `_geoPoint` parameter that indicates the center point of the ascending ranking rule. ----------- - On Meilisearch part: We must introduce a new concept, returning the documents with a new `_geoDistance` field when it passed by the `_geo` ranking rule, this has never been done before. We could maybe just do it afterward when the documents have been retrieved from the database, computing the distance from the `_geoPoint` and all of the documents to be returned. Co-authored-by: Irevoire <tamo@meilisearch.com> Co-authored-by: cvermand <33010418+bidoubiwa@users.noreply.github.com> Co-authored-by: Tamo <tamo@meilisearch.com>	2021-09-20 19:04:57 +00:00
Irevoire	0d104a0fce	Update milli/src/criterion.rs Co-authored-by: Clément Renault <clement@meilisearch.com>	2021-09-20 18:13:17 +02:00
Clémentine Urquizar	3f1453f470	Update version for the next release (v0.14.0)	2021-09-20 18:12:23 +02:00
Tamo	f4b8e5675d	move the reserved keyword logic for the criterion and sort + add test	2021-09-20 17:21:02 +02:00
Irevoire	3b7a2cdbce	fix typo Co-authored-by: Clément Renault <clement@meilisearch.com>	2021-09-20 16:10:39 +02:00
bors[bot]	203aa727a7	Merge #359 359: Improve the benchmark comparison script r=irevoire a=irevoire This modification allow us to compare more than 2 benchmarks or to only print the results of one benchmark Co-authored-by: Irevoire <tamo@meilisearch.com> Co-authored-by: Tamo <tamo@meilisearch.com>	2021-09-20 12:39:59 +00:00
Tamo	eaba772f21	update the README to better match the new critcmp usage Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>	2021-09-20 10:59:55 +02:00
Irevoire	9a920d1f93	Fix datasets links in the readme Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>	2021-09-20 10:44:37 +02:00

... 3 4 5 6 7 ...

1451 Commits