meilisearch

mirror of https://github.com/meilisearch/meilisearch.git synced 2024-11-27 04:25:06 +08:00

Author	SHA1	Message	Date
bors[bot]	15bbde1022	Merge #432 432: Fuzzer r=Kerollmops a=irevoire Provide a first way of fuzzing the indexing part of milli. It depends on [cargo-fuzz](https://rust-fuzz.github.io/book/cargo-fuzz.html) Co-authored-by: Tamo <tamo@meilisearch.com>	2022-01-17 12:50:26 +00:00
Tamo	0605c0ac68	apply review comments	2022-01-13 18:51:08 +01:00
Tamo	b22c80106f	add some settings to the fuzzed milli and use the published version of arbitrary json	2022-01-13 15:35:24 +01:00
Tamo	c94952e25d	update the readme + dependencies	2022-01-12 18:30:11 +01:00
Tamo	e1053989c0	add a fuzzer on milli	2022-01-12 17:57:54 +01:00
bors[bot]	559e019de1	Merge #424 424: Store the geopoint in three dimensions r=Kerollmops a=irevoire Related to this issue: https://github.com/meilisearch/MeiliSearch/issues/1872 Fix the whole computation of distance for any “geo” operations (sort or filter). Now when you sort points they are returned to you in the right order. And when you filter on a specific radius you only get points included in the radius. This PR changes the way we store the geo points in the RTree. Instead of considering the latitude and longitude as orthogonal coordinates, we convert them to real orthogonal coordinates projected on a sphere with a radius of 1. This is the conversion formulae. ![image](https://user-images.githubusercontent.com/7032172/145990456-eefe840a-384f-4486-848b-81d0036814ec.png) Which, in rust, translate to this function: ```rust pub fn lat_lng_to_xyz(coord: &[f64; 2]) -> [f64; 3] { let [lat, lng] = coord.map(\|f\| f.to_radians()); let x = lat.cos() * lng.cos(); let y = lat.cos() * lng.sin(); let z = lat.sin(); [x, y, z] } ``` Storing the points on a sphere is easier / faster to compute than storing the point on an approximation of the real earth shape. But when we need to compute the distance between two points we still need to use the haversine distance which works with latitude and longitude. So, to do the fewest search-time computation possible I'm now associating every point with its `DocId` and its lat/lng. Co-authored-by: Tamo <tamo@meilisearch.com>	2022-01-10 15:23:43 +00:00
bors[bot]	660eac50b2	Merge #427 427: Handle escaped characters in filters r=Kerollmops a=irevoire Co-authored-by: Tamo <tamo@meilisearch.com>	2022-01-10 15:01:23 +00:00
Tamo	92804f6f45	apply clippy suggestions	2022-01-10 15:59:04 +01:00
Tamo	0fcde35a20	Update filter-parser/src/value.rs Co-authored-by: Clément Renault <clement@meilisearch.com>	2022-01-10 15:53:44 +01:00
Tamo	3c7ea1d298	Apply code suggestions Co-authored-by: Clément Renault <clement@meilisearch.com>	2022-01-10 15:19:21 +01:00
bors[bot]	74594be234	Merge #429 429: Benchmark CIs: not use a default label to call the GH runner r=irevoire a=curquiza Since we now have multiple self-hosted github runners, we need to differentiate them calling them in the CI. The `self-hosted` label is the default one, so we need to use the unique and appropriate one for the benchmark machine <img width="925" alt="Capture d’écran 2022-01-04 à 15 42 18" src="https://user-images.githubusercontent.com/20380692/148079840-49cd7878-5912-46ff-8ab8-bf646777f782.png"> Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>	2022-01-04 15:41:08 +00:00
Clémentine Urquizar	3d99686f7a	Change self-hosted label by benchmarks	2022-01-04 16:01:01 +01:00
bors[bot]	c039562723	Merge #428 428: Reintroduce the gitignore for the fuzzer r=Kerollmops a=irevoire Reintroduce the gitignore in the fuzz directory Co-authored-by: Tamo <tamo@meilisearch.com>	2022-01-04 12:09:06 +00:00
Tamo	9bdcd42b9b	reintroduce the gitignore for the fuzzer	2022-01-04 13:07:32 +01:00
bors[bot]	4cae691b86	Merge #425 425: Push the result of the benchmarks to influxdb r=irevoire a=irevoire Now execute a benchmark for every PR merged into main and then upload the results to influxdb. Co-authored-by: Tamo <tamo@meilisearch.com>	2022-01-04 11:04:16 +00:00
Tamo	6a1216bd51	Integrate telegraf into our CI	2022-01-04 11:59:05 +01:00
Tamo	02a21fd309	Handle the escapes of quote in the filters	2022-01-04 04:04:10 +01:00
Tamo	98a365aaae	store the geopoint in three dimensions	2021-12-14 12:21:24 +01:00
bors[bot]	11a056d116	Merge #422 422: Prefer returning `None` instead of using an `FilterCondition::Empty` state r=Kerollmops a=Kerollmops This PR is related to the issue comment https://github.com/meilisearch/MeiliSearch/issues/1338#issuecomment-989322889 which exhibits the fact that when a filter is known to be empty no results are returned which is wrong, the filter should not apply as no restriction is done on the documents set. The filter system on the milli side has introduced an Empty state which was used in this kind of situation but I found out that it is not needed and that when we parse a filter and that it is empty we can simply return `None` as the `Filter::from_array` constructor does. So I removed it and added tests! On the MeiliSearch side, we just need to match on a `None` and completely ignore the filter in such a case. Co-authored-by: Clément Renault <clement@meilisearch.com>	2021-12-09 15:03:04 +00:00
Clément Renault	94011bb9a8	Fix the benchmarks to work with optional filters	2021-12-09 12:14:16 +01:00
Clément Renault	1c6c89f345	Fix the binaries that use the new optional filters	2021-12-09 11:57:53 +01:00
Clément Renault	25faef67d0	Remove the database setup in the filter_depth test	2021-12-09 11:57:53 +01:00
Clément Renault	65519bc04b	Test that empty filters return a None	2021-12-09 11:57:53 +01:00
Clément Renault	ef59762d8e	Prefer returning None instead of the Empty Filter state	2021-12-09 11:57:52 +01:00
bors[bot]	80dcfd5c3e	Merge #421 421: Introduce the depth method on FilterCondition r=Kerollmops a=Kerollmops This PR introduces the depth method on the FilterCondition type to be able to react to it. It is meant to be used to reject filters that go too deep and can make the engine stack overflow. Co-authored-by: Clément Renault <clement@meilisearch.com>	2021-12-09 10:28:52 +00:00
Clément Renault	ee856a7a46	Limit the max filter depth to 2000	2021-12-07 17:36:45 +01:00
Clément Renault	32bd9f091f	Detect the filters that are too deep and return an error	2021-12-07 17:20:11 +01:00
Clément Renault	90f49eab6d	Check the filter max depth limit and reject the invalid ones	2021-12-07 16:32:48 +01:00
Clément Renault	49c2db9485	Change the depth function to return the token depth	2021-12-07 16:06:10 +01:00
Clément Renault	57502fcf6a	Introduce the depth method on FilterCondition	2021-12-06 17:35:20 +01:00
bors[bot]	c83b77304a	Merge #420 420: Update milli 0.21.0 r=ManyTheFish a=ManyTheFish Update all modules to 0.21.0 Co-authored-by: many <maxime@meilisearch.com>	2021-11-30 17:22:12 +00:00
many	1b3923b5ce	Update all packages to 0.21.0	2021-11-29 12:17:59 +01:00
bors[bot]	26629a3f9e	Merge #419 419: fix word pair proximity indexing r=ManyTheFish a=ManyTheFish # Pull Request Sort positions before iterating over them during word pair proximity extraction. fixes [Meilisearch#1913](https://github.com/meilisearch/MeiliSearch/issues/1913) Co-authored-by: many <maxime@meilisearch.com>	2021-11-23 10:21:05 +00:00
many	8970246bc4	Sort positions before iterating over them during word pair proximity extraction	2021-11-22 18:16:54 +01:00
bors[bot]	cc32519a2d	Merge #418 418: change visibility of DocumentDeletionResult r=Kerollmops a=MarinPostma Change the visibility of `DocumentDeletionResult`, so its fields can be accesses from outside milli. Co-authored-by: Marin Postma <postma.marin@protonmail.com>	2021-11-22 14:45:55 +00:00
Marin Postma	6e977dd8e8	change visibility of DocumentDeletionResult	2021-11-22 15:44:44 +01:00
bors[bot]	68f1db123a	Merge #416 416: Update tokenizer v0.2.6 r=Kerollmops a=ManyTheFish Co-authored-by: many <maxime@meilisearch.com>	2021-11-18 16:01:11 +00:00
many	35f9499638	Export tokenizer from milli	2021-11-18 16:57:12 +01:00
many	64ef5869d7	Update tokenizer v0.2.6	2021-11-18 16:56:05 +01:00
bors[bot]	2c14efa8a2	Merge #409 409: remove update_id in UpdateBuilder r=ManyTheFish a=MarinPostma Removing the `update_id` from `UpdateBuidler`, since it serves no purpose. I had introduced it when working in HA some time ago, but I think there are better ways to do it now, so it can be removed an stop being in our way. Co-authored-by: Marin Postma <postma.marin@protonmail.com>	2021-11-16 14:59:09 +00:00
Marin Postma	6eb47ab792	remove update_id in UpdateBuilder	2021-11-16 13:07:04 +01:00
bors[bot]	21b78f3926	Merge #414 414: improve update result types r=ManyTheFish a=MarinPostma Inprove the returned meta when performing document additions and deletions: - On document addition return the number of indexed documents and the total number of documents in the index after the indexion - On document deletion return the number of deleted documents, and the remaining number of documents in the index after the deletion is performed I also fixed a potential bug when performing a document deletion and the primary key couldn't be found: before we assumed that the db was empty and returned that no documents were deleted, but since we checked before that the db wasn't empty, entering this branch is actually a bug, and now returns a 'MissingPrimaryKey' error. Co-authored-by: Marin Postma <postma.marin@protonmail.com>	2021-11-15 09:06:10 +00:00
Marin Postma	09b4281cff	improve document addition returned metaimprove document addition returned metaimprove document addition returned metaimprove document addition returned metaimprove document addition returned metaimprove document addition returned metaimprove document addition returned metaimprove document addition returned meta	2021-11-10 14:08:36 +01:00
Marin Postma	721fc294be	improve document deletion returned meta returns both the remaining number of documents and the number of deleted documents.	2021-11-10 14:08:18 +01:00
bors[bot]	8dff08d772	Merge #400 400: Rewrite the filter parser and add a lot of tests r=irevoire a=irevoire This PR is a complete rewrite of #358, which was reverted in #403. You can already try this PR in Meilisearch here https://github.com/meilisearch/MeiliSearch/pull/1880. Since writing a parser is quite complicated, I moved all the logic to another workspace called `filter_parser`. In this workspace, we don't know anything about milli, the filterable fields / field ID or anything. As you can see in its `cargo.toml`, it has only three dependencies entirely focused on the parsing part: ``` nom = "7.0.0" nom_locate = "4.0.0" ``` But introducing this new workspace made some changes necessary on the “AST”. Now the parser only returns `Tokens` (a simple `&str` with a bit of context). Everything is interpreted when we execute the filter later in milli. This crate provides a new error type for all filter related errors. --------- ## Errors Currently, we have multiple kinds of errors. Sometimes we are generating errors looking like that: (for `name = truc`) ``` Attribute `name` is not filterable. Available filterable attributes are: ``. ``` While sometimes pest was generating errors looking like that: ``` Invalid syntax for the filter parameter: ` --> 1:7 \| 1 \| name = \| ^--- \| = expected word`. ``` Which most people were seeing like that: (for `name =`) ``` Invalid syntax for the filter parameter: ` --> 1:7\n \|\n1 \| name =\n \| ^---\n \|\n = expected word`. ``` ----------- With this PR, the error format is unified between all errors. All errors follow this more straightforward format: ``` The error message. [from char]:[to char] filter ``` This should be way easier to read when embedded in the JSON for a human. And it should also allow us to parse the errors easily and provide highlighting or something with a frontend playground. Here is an example of the two previous errors with the new format: For `name = truc`: ``` Attribute `name` is not filterable. Available filterable attributes are: ``. 1:4 name = truc ``` Or in one line: ``` Attribute `name` is not filterable. Available filterable attributes are: ``.\n1:4 name = truc ``` And for `name =`: ``` Was expecting a value but instead got nothing. 7:7 name = ``` Or in one line: ``` Was expecting a value but instead got nothing.\n7:7 name = ``` Also, since we now have control over the parser, we can generate more explicit error messages so a lot of new errors have been created. I tried to be as helpful as possible for the user; here is a little overview of the new error message you can get when misusing a filter: ``` Expression `"truc` is missing the following closing delimiter: `"`. 8:13 name = "truc ``` The `_geoRadius` filter is an operation and can't be used as a value. 8:30 name = _geoRadius(12, 13, 14) ``` etc ## Tests A lot of tests have been written in the `filter_parser` crate. I think there is a unit test for every part of the syntax. But since we can never be sure we covered all the cases, I also fuzzed the new parser A LOT (for ±8 hours on 20 threads). And the code to fuzz the parser is included in the workspace, so if one day we need to change something to the syntax, we'll be able to re-use it by simply running: ``` cargo fuzz run --release parse ``` ## Milli I renamed the type and module `filter_condition.rs` / `FilterCondition` to `filter.rs` / `Filter`. Co-authored-by: Tamo <tamo@meilisearch.com>	2021-11-09 16:09:34 +00:00
Irevoire	7c3017734a	re-ignore the ! symbol when generating a good error message	2021-11-09 17:08:04 +01:00
Tamo	bff48681d2	Re-order the operator Co-authored-by: Clément Renault <clement@meilisearch.com>	2021-11-09 17:05:36 +01:00
Irevoire	519d6b2bf3	remove the `!` syntax for the not	2021-11-09 16:47:54 +01:00
Irevoire	73df873f44	fix typos	2021-11-09 16:41:10 +01:00
Irevoire	99197387af	fix the test with the new escaped format	2021-11-09 16:41:10 +01:00

1 2 3 4 5 ...

1452 Commits