meilisearch

mirror of https://github.com/meilisearch/meilisearch.git synced 2024-11-27 12:35:05 +08:00

Author	SHA1	Message	Date
mpostma	aa6c5df0bc	Implement documents format document reader transform remove update format support document sequences fix document transform clean transform improve error handling add documents! macro fix transform bug fix tests remove csv dependency Add comments on the transform process replace search cli fmt review edits fix http ui fix clippy warnings Revert "fix clippy warnings" This reverts commit a1ce3cd96e603633dbf43e9e0b12b2453c9c5620. fix review comments remove smallvec in transform loop review edits	2021-09-21 16:58:33 +02:00
bors[bot]	31c8de1cca	Merge #322 322: Geosearch r=ManyTheFish a=irevoire This PR introduces [basic geo-search functionalities](https://github.com/meilisearch/specifications/pull/59), it makes the engine able to index, filter and, sort by geo-point. We decided to use [the rstar library](https://docs.rs/rstar) and to save the points in [an RTree](https://docs.rs/rstar/0.9.1/rstar/struct.RTree.html) that we de/serialize in the index database [by using serde](https://serde.rs/) with [bincode](https://docs.rs/bincode). This is not an efficient way to query this tree as it will consume a lot of CPU and memory when a search is made, but at least it is an easy first way to do so. ### What we will have to do on the indexing part: - [x] Index the `_geo` fields from the documents. - [x] Create a new module with an extractor in the `extract` module that takes the `obkv_documents` and retrieves the latitude and longitude coordinates, outputting them in a `grenad::Reader` for further process. - [x] Call the extractor in the `extract::extract_documents_data` function and send the result to the `TypedChunk` module. - [x] Get the `grenad::Reader` in the `typed_chunk::write_typed_chunk_into_index` function and store all the points in the `rtree` - [x] Delete the documents from the `RTree` when deleting documents from the database. All this can be done in the `delete_documents.rs` file by getting the data structure and removing the points from it, inserting it back after the modification. - [x] Clearing the `RTree` entirely when we clear the documents from the database, everything happens in the `clear_documents.rs` file. - [x] save a Roaring bitmap of all documents containing the `_geo` field ### What we will have to do on the query part: - [x] Filter the documents at a certain distance around a point, this is done by [collecting the documents from the searched point](https://docs.rs/rstar/0.9.1/rstar/struct.RTree.html#method.nearest_neighbor_iter) while they are in range. - [x] We must introduce new `geoLowerThan` and `geoGreaterThan` variants to the `Operator` filter enum. - [x] Implement the `negative` method on both variants where the `geoGreaterThan` variant is implemented by executing the `geoLowerThan` and removing the results found from the whole list of geo faceted documents. - [x] Add the `_geoRadius` function in the pest parser. - [x] Introduce a `_geo` ascending ranking function that takes a point in parameter, ~~this function must keep the iterator on the `RTree` and make it peekable~~ This was not possible for now, we had to collect the whole iterator. Only the documents that are part of the candidates must be sent too! - [x] This ascending ranking rule will only be active if the search is set up with the `_geoPoint` parameter that indicates the center point of the ascending ranking rule. ----------- - On Meilisearch part: We must introduce a new concept, returning the documents with a new `_geoDistance` field when it passed by the `_geo` ranking rule, this has never been done before. We could maybe just do it afterward when the documents have been retrieved from the database, computing the distance from the `_geoPoint` and all of the documents to be returned. Co-authored-by: Irevoire <tamo@meilisearch.com> Co-authored-by: cvermand <33010418+bidoubiwa@users.noreply.github.com> Co-authored-by: Tamo <tamo@meilisearch.com>	2021-09-20 19:04:57 +00:00
Irevoire	0d104a0fce	Update milli/src/criterion.rs Co-authored-by: Clément Renault <clement@meilisearch.com>	2021-09-20 18:13:17 +02:00
Tamo	f4b8e5675d	move the reserved keyword logic for the criterion and sort + add test	2021-09-20 17:21:02 +02:00
Irevoire	3b7a2cdbce	fix typo Co-authored-by: Clément Renault <clement@meilisearch.com>	2021-09-20 16:10:39 +02:00
Tamo	c695a1ffd2	add the possibility to sort by descending order on geoPoint	2021-09-15 11:49:58 +02:00
Tamo	91ce4d1721	Stop iterating through the whole list of points We stop when there is no possible candidates left	2021-09-15 11:49:58 +02:00
Tamo	cfc62a1c15	use geoutils instead of haversine	2021-09-09 18:11:38 +02:00
many	26deeb45a3	Add lacking parameter to word level position builder	2021-09-09 17:49:04 +02:00
Tamo	3fc145c254	if we have no rtree we return all other provided documents	2021-09-09 17:44:09 +02:00
Irevoire	a84f3a8b31	Apply suggestions from code review Co-authored-by: Clément Renault <clement@meilisearch.com>	2021-09-09 15:09:35 +02:00
Tamo	c81ff22c5b	delete the invalid criterion name error in favor of invalid ranking rule name	2021-09-08 19:17:00 +02:00
Tamo	bad8ea47d5	edit the two lasts TODO comments	2021-09-08 18:24:09 +02:00
Tamo	b15c77ebc4	return an error in case a user try to sort with :desc	2021-09-08 18:24:09 +02:00
Tamo	e5ef0cad9a	use meters in the filters	2021-09-08 18:24:09 +02:00
Tamo	4f69b190bc	remove the distance from the search, the computation of the distance will be made on meilisearch side	2021-09-08 18:24:09 +02:00
Tamo	7ae2a7341c	introduce the reserved keywords in the filters	2021-09-08 18:24:09 +02:00
Tamo	6d5762a6c8	handle the case where you forgot entirely the parenthesis	2021-09-08 18:24:09 +02:00
Tamo	ebf82ac28c	improve the error messages and add tests for the filters	2021-09-08 18:24:09 +02:00
Tamo	bd4c248292	improve the error handling in general and introduce the concept of reserved keywords	2021-09-08 18:24:09 +02:00
Tamo	e8c093c1d0	fix the error handling in the filters	2021-09-08 18:24:09 +02:00
Tamo	f0b74637dc	fix all the tests	2021-09-08 18:24:09 +02:00
Tamo	b1bf7d4f40	reformat	2021-09-08 18:24:09 +02:00
Tamo	aca707413c	remove the memory leak	2021-09-08 18:24:09 +02:00
Tamo	a8a1f5bd55	move the geosearch criteria out of asc_desc.rs	2021-09-08 18:24:09 +02:00
Tamo	dc84ecc40b	fix a bug	2021-09-08 18:24:09 +02:00
Tamo	4820ac71a6	allow spaces in a geoRadius	2021-09-08 18:24:09 +02:00
Tamo	13c78e5aa2	Implement the _geoPoint in the sortable	2021-09-08 18:24:09 +02:00
Tamo	5bb175fc90	only index _geo if it's set as sortable OR filterable and only allow the filters if geo was set to filterable	2021-09-08 17:51:08 +02:00
Tamo	f73273d71c	only call the extractor if needed	2021-09-08 17:51:08 +02:00
Irevoire	ea2f2ecf96	create a new database containing all the documents that were geo-faceted	2021-09-08 17:51:08 +02:00
Irevoire	4b459768a0	create the _geoRadius filter	2021-09-08 17:51:07 +02:00
Irevoire	6d70978edc	update the facet filter grammar	2021-09-08 17:51:07 +02:00
Irevoire	216a8aa3b2	add a tests for the indexation of the geosearch	2021-09-08 17:51:07 +02:00
Irevoire	a21c854790	handle errors	2021-09-08 17:51:07 +02:00
Irevoire	70ab2c37c5	remove multiple bugs	2021-09-08 17:51:07 +02:00
Irevoire	b4b6ba6d82	rename all the ’long’ into ’lng’ like written in the specification	2021-09-08 17:51:07 +02:00
Irevoire	3b9f1db061	implement the clear of the rtree	2021-09-08 17:51:07 +02:00
Irevoire	d344489c12	implement the deletion of geo points	2021-09-08 17:51:07 +02:00
Irevoire	44d6b6ae9e	Index the geo points	2021-09-08 17:51:07 +02:00
Irevoire	8d9c2c4425	create a new db with getters and setters	2021-09-08 17:51:07 +02:00
bors[bot]	b22aac92ac	Merge #342 342: Let the caller decide what kind of error they want to returns when parsing `AscDesc` r=Kerollmops a=irevoire This is one possible fix for #339 We would then need to patch these lines https://github.com/meilisearch/MeiliSearch/blob/main/meilisearch-http/src/index/search.rs#L110-L114 to return the error we want. Another solution would be to add a parameter to the `from_str` to specify which context we are in. Co-authored-by: Tamo <tamo@meilisearch.com>	2021-09-08 14:18:57 +00:00
Tamo	932998f5cc	let the caller decide if they want to return an invalidSortName or an invalidCriterionName error	2021-09-08 16:17:31 +02:00
many	e54280fbfc	Skip empty normalized words	2021-09-08 15:25:23 +02:00
many	d18ee58ab9	Check if key are not empty in validator	2021-09-08 15:25:23 +02:00
many	9961b78b06	Drop sorter before creating a new one	2021-09-08 13:30:26 +02:00
bors[bot]	48d211b8b0	Merge #344 344: Move the sort ranking rule before the exactness ranking rule r=ManyTheFish a=Kerollmops This PR moves the sort ranking rule at the 5th position by default, right before the exactness one. Co-authored-by: Kerollmops <clement@meilisearch.com>	2021-09-07 15:47:15 +00:00
Kerollmops	e2cefc9b4f	Move the sort ranking rule before the exactness ranking rule	2021-09-07 16:41:33 +02:00
Kerollmops	fd3daa4423	Throw a query time error when a sort param is used but sort ranking rule is missing	2021-09-07 11:02:00 +02:00
Kerollmops	8dca36433c	Introduce the new SortRankingRuleMissing user error variant	2021-09-07 11:01:59 +02:00
Alexey Shekhirin	c2517e7d5f	fix(facet): string fields sorting	2021-09-03 11:58:26 +03:00
bors[bot]	5cbe879325	Merge #308 308: Implement a better parallel indexer r=Kerollmops a=ManyTheFish Rewrite the indexer: - enhance memory consumption control - optimize parallelism using rayon and crossbeam channel - factorize the different parts and make new DB implementation easier - optimize and fix prefix databases Co-authored-by: many <maxime@meilisearch.com>	2021-09-02 15:03:52 +00:00
many	741a4444a9	Remove log in chunk generator	2021-09-02 16:57:46 +02:00
many	7f7fafb857	Make document_chunk_size settable from update builder	2021-09-02 15:25:39 +02:00
many	db0c681bae	Fix Pr comments	2021-09-02 15:17:52 +02:00
many	4860fd4529	Ignore empty facet values	2021-09-01 16:48:40 +02:00
many	b3a22f31f6	Fix memory consuption in word pair proximity extractor	2021-09-01 16:48:40 +02:00
many	9452fabfb2	Optimize cbo roaring bitmaps merge	2021-09-01 16:48:40 +02:00
many	8f702828ca	Ignore errors comming from crossbeam channel senders	2021-09-01 16:48:40 +02:00
many	e09eec37bc	Handle distance addition with hard separators	2021-09-01 16:48:40 +02:00
many	fc7cc770d4	Add logging timers	2021-09-01 16:48:40 +02:00
many	a2f59a28f7	Remove unwrap sending errors in channel	2021-09-01 16:48:40 +02:00
many	5c962c03dd	Fix and optimize word_prefix_pair_proximity_docids database	2021-09-01 16:48:40 +02:00
many	2d1727697d	Take stop word in account	2021-09-01 16:48:40 +02:00
many	823da19745	Fix test and use progress callback	2021-09-01 16:48:39 +02:00
many	1d314328f0	Plug new indexer	2021-09-01 16:48:36 +02:00
many	3aaf1d62f3	Publish grenad CompressionType type in milli	2021-09-01 16:42:08 +02:00
Alexey Shekhirin	0e379558a1	fix(search): get sortable_fields only if criteria present	2021-08-31 21:35:41 +03:00
bors[bot]	d6bba0663a	Merge #334 334: Wrap long values into BStr for warn logs r=Kerollmops a=shekhirin Resolves https://github.com/meilisearch/milli/issues/263 Co-authored-by: Alexey Shekhirin <a.shekhirin@gmail.com>	2021-08-31 17:38:54 +00:00
Alexey Shekhirin	0b02eb456c	chore(update): wrap long values into BStr for warn logs	2021-08-31 20:28:16 +03:00
Kerollmops	f230ae6fd5	Introduce the reset_sortable_fields Settings method	2021-08-25 17:44:16 +02:00
Kerollmops	af65485ba7	Reexport the grenad CompressionType from milli	2021-08-24 18:15:31 +02:00
Clément Renault	89d0758713	Revert "Revert "Sort at query time""	2021-08-24 11:55:16 +02:00
Clément Renault	c084f7f731	Fix the facet string docids filterable deletion bug	2021-08-23 10:50:39 +02:00
Clémentine Urquizar	922f9fd4d5	Revert "Sort at query time"	2021-08-20 18:09:17 +02:00
many	d1df0d20f9	Add integration test of SortBy criterion	2021-08-18 16:21:51 +02:00
Kerollmops	1b7f6ea1e7	Return a new error when the sort criteria is not sortable	2021-08-18 15:04:07 +02:00
Kerollmops	71602e0f1b	Add the sortable fields into the settings and in the index	2021-08-18 15:04:07 +02:00
Kerollmops	407f53872a	Add a sort_criteria method to the Search builder struct	2021-08-18 15:04:07 +02:00
Kerollmops	687cd2e205	Introduce the new Sort criterion and AscDesc enum	2021-08-18 15:04:07 +02:00
Kerollmops	5b88df508e	Use the new Asc/Desc syntax everywhere	2021-08-17 14:15:22 +02:00
Kerollmops	fcedff95e8	Change the Asc/Desc criterion syntax to use a colon (:)	2021-08-17 14:03:21 +02:00
Kerollmops	e9ada44509	AscDesc criterion returns documents ordered by numbers then by strings	2021-08-17 13:21:31 +02:00
Kerollmops	110bf6b778	Make the FacetStringIter work in both, ascending and descending orders	2021-08-17 11:18:40 +02:00
Kerollmops	22ebd2658f	Introduce the EitherString/RevRange private aliases	2021-08-17 10:47:15 +02:00
Kerollmops	7a5889bc5a	Introduce the highest_reverse_iter private method	2021-08-17 10:45:26 +02:00
Kerollmops	ad0d311f8a	Introduce the FacetStringLevelZeroRevRange struct	2021-08-17 10:44:43 +02:00
Kerollmops	6214c38da9	Introduce the FacetStringGroupRevRange struct	2021-08-17 10:44:27 +02:00
Kerollmops	1c604de158	Introduce the highest_iter private method on the FacetStringIter struct	2021-08-17 10:41:11 +02:00
Kerollmops	64df159057	Introduce the new_reducing constructor on the FacetStringIter struct	2021-08-17 10:35:06 +02:00
Kerollmops	01a4052828	Move the FacetStringIter creation logic into a private new method	2021-08-17 10:29:43 +02:00
many	7dbefae1e3	Make facet string iterator non reducing	2021-08-12 17:23:39 +02:00
many	8fdf860c17	Remove max values by facet limit for facet distribution	2021-08-12 11:29:20 +02:00
bors[bot]	89b9b61840	Merge #300 300: Fix prefix level position docids database r=curquiza a=ManyTheFish The prefix search was inverted when we generated the DB. Instead of searching if word had a prefix in prefix fst, we were searching if the word was a prefix of a prefix contained in the prefix fst. The indexer, now, iterate over prefix contained in the fst and search them by prefix in the word-level-position-docids database, aggregating matches in a sorter. Fix #299 Co-authored-by: many <maxime@meilisearch.com>	2021-08-04 16:52:09 +00:00
many	cdeb07f0fd	Fix prefix level position docids database The prefix search was inverted when we generated the DB. Instead of searching if word had a prefix in prefix fst, we were searching if the word was a prefix of a prefix contained in the prefix fst. The indexer, now, iterate over prefix contained in the fst and search them by prefix in the word-level-position-docids database, aggregating matches in a sorter. Fix #299	2021-08-04 14:11:49 +02:00
Kerollmops	90514e03d1	Fix invalid faceted documents ids buffer size	2021-07-29 15:49:23 +02:00
bors[bot]	200e98c211	Merge #293 293: Make sure that the relevancy is not impacted by other settings r=Kerollmops a=Kerollmops Fix https://github.com/meilisearch/meilisearch/issues/1505. fix https://github.com/meilisearch/MeiliSearch/issues/1529 Co-authored-by: Kerollmops <clement@meilisearch.com>	2021-07-27 16:04:52 +00:00
Kerollmops	dc2b63abdf	Introduce an empty FilterCondition variant to support unknown fields	2021-07-27 16:34:04 +02:00
Kerollmops	b12738cfe9	Use the right DB prefixes to store the faceted fields	2021-07-22 19:18:22 +02:00
Kerollmops	7aa6cc9b04	Do not insert fields in the map when changing the settings	2021-07-22 18:40:12 +02:00

1 2 3 4 5 ...

448 Commits