meilisearch

mirror of https://github.com/meilisearch/meilisearch.git synced 2024-11-23 18:45:06 +08:00

Author	SHA1	Message	Date
bors[bot]	fd177b63f8	Merge #423 423: Remove an unused file r=irevoire a=irevoire This empty file is not included anywhere Co-authored-by: Tamo <tamo@meilisearch.com>	2022-01-19 14:18:05 +00:00
Marin Postma	0c84a40298	document batch support reusable transform rework update api add indexer config fix tests review changes Co-authored-by: Clément Renault <clement@meilisearch.com> fmt	2022-01-19 12:40:20 +01:00
Tamo	01968d7ca7	ensure we get no documents and no error when filtering on an empty db	2022-01-18 11:40:30 +01:00
bors[bot]	8f4499090b	Merge #433 433: fix(filter): Fix two bugs. r=Kerollmops a=irevoire - Stop lowercasing the field when looking in the field id map - When a field id does not exist it means there is currently zero documents containing this field thus we return an empty RoaringBitmap instead of throwing an internal error Will fix https://github.com/meilisearch/MeiliSearch/issues/2082 once meilisearch is released Co-authored-by: Tamo <tamo@meilisearch.com>	2022-01-17 14:06:53 +00:00
Tamo	d1ac40ea14	fix(filter): Fix two bugs. - Stop lowercasing the field when looking in the field id map - When a field id does not exist it means there is currently zero documents containing this field thus we returns an empty RoaringBitmap instead of throwing an internal error	2022-01-17 13:51:46 +01:00
Samyak S Sarnayak	2d7607734e	Run cargo fmt on matching_words.rs	2022-01-17 13:04:33 +05:30
Samyak S Sarnayak	5ab505be33	Fix highlight by replacing num_graphemes_from_bytes num_graphemes_from_bytes has been renamed in the tokenizer to num_chars_from_bytes. Highlight now works correctly!	2022-01-17 13:02:55 +05:30
Samyak S Sarnayak	e752bd06f7	Fix matching_words tests to compile successfully The tests still fail due to a bug in https://github.com/meilisearch/tokenizer/pull/59	2022-01-17 11:37:45 +05:30
Samyak S Sarnayak	30247d70cd	Fix search highlight for non-unicode chars The `matching_bytes` function takes a `&Token` now and: - gets the number of bytes to highlight (unchanged). - uses `Token.num_graphemes_from_bytes` to get the number of grapheme clusters to highlight. In essence, the `matching_bytes` function returns the number of matching grapheme clusters instead of bytes. Should this function be renamed then? Added proper highlighting in the HTTP UI: - requires dependency on `unicode-segmentation` to extract grapheme clusters from tokens - `<mark>` tag is put around only the matched part - before this change, the entire word was highlighted even if only a part of it matched	2022-01-17 11:37:44 +05:30
Tamo	98a365aaae	store the geopoint in three dimensions	2021-12-14 12:21:24 +01:00
Tamo	d671d6f0f1	remove an unused file	2021-12-13 19:27:34 +01:00
Clément Renault	25faef67d0	Remove the database setup in the filter_depth test	2021-12-09 11:57:53 +01:00
Clément Renault	65519bc04b	Test that empty filters return a None	2021-12-09 11:57:53 +01:00
Clément Renault	ef59762d8e	Prefer returning None instead of the Empty Filter state	2021-12-09 11:57:52 +01:00
Clément Renault	ee856a7a46	Limit the max filter depth to 2000	2021-12-07 17:36:45 +01:00
Clément Renault	32bd9f091f	Detect the filters that are too deep and return an error	2021-12-07 17:20:11 +01:00
Clément Renault	90f49eab6d	Check the filter max depth limit and reject the invalid ones	2021-12-07 16:32:48 +01:00
many	8970246bc4	Sort positions before iterating over them during word pair proximity extraction	2021-11-22 18:16:54 +01:00
Marin Postma	6e977dd8e8	change visibility of DocumentDeletionResult	2021-11-22 15:44:44 +01:00
many	35f9499638	Export tokenizer from milli	2021-11-18 16:57:12 +01:00
Marin Postma	6eb47ab792	remove update_id in UpdateBuilder	2021-11-16 13:07:04 +01:00
Marin Postma	09b4281cff	improve document addition returned metaimprove document addition returned metaimprove document addition returned metaimprove document addition returned metaimprove document addition returned metaimprove document addition returned metaimprove document addition returned metaimprove document addition returned meta	2021-11-10 14:08:36 +01:00
Marin Postma	721fc294be	improve document deletion returned meta returns both the remaining number of documents and the number of deleted documents.	2021-11-10 14:08:18 +01:00
Irevoire	0ea0146e04	implement deref &str on the tokens	2021-11-09 11:34:10 +01:00
Tamo	7483c7513a	fix the filterable fields	2021-11-07 01:52:19 +01:00
Tamo	e5af3ac65c	rename the filter_condition.rs to filter.rs	2021-11-06 16:37:55 +01:00
Tamo	6831c23449	merge with main	2021-11-06 16:34:30 +01:00
Tamo	b249989bef	fix most of the tests	2021-11-06 01:32:12 +01:00
Tamo	27a6a26b4b	makes the parse function part of the filter_parser	2021-11-05 10:46:54 +01:00
Tamo	76d961cc77	implements the last errors	2021-11-04 17:42:06 +01:00
Tamo	8234f9fdf3	recreate most filter error except for the geosearch	2021-11-04 17:24:55 +01:00
Tamo	07a5ffb04c	update http-ui	2021-11-04 15:52:22 +01:00
Tamo	a58bc5bebb	update milli with the new parser_filter	2021-11-04 15:02:36 +01:00
many	7b3bac46a0	Change Attribute and Ranking rules errors	2021-11-04 13:19:32 +01:00
many	0c0038488c	Change last error messages	2021-11-03 11:24:06 +01:00
Tamo	76a2adb7c3	re-enable the tests in the parser and start the creation of an error type	2021-11-02 17:35:17 +01:00
bors[bot]	08ae47e475	Merge #405 405: Change some error messages r=ManyTheFish a=ManyTheFish Co-authored-by: many <maxime@meilisearch.com>	2021-10-28 13:35:55 +00:00
many	9f1e0d2a49	Refine asc/desc error messages	2021-10-28 14:47:17 +02:00
many	ed6db19681	Fix PR comments	2021-10-28 11:18:32 +02:00
marin postma	183d3dada7	return document count from builder	2021-10-28 10:33:04 +02:00
many	2be755ce75	Lower error check, already check in meilisearch	2021-10-27 19:50:41 +02:00
many	3599df77f0	Change some error messages	2021-10-27 19:33:01 +02:00
bors[bot]	d7943fe225	Merge #402 402: Optimize document transform r=MarinPostma a=MarinPostma This pr optimizes the transform of documents additions in the obkv format. Instead on accepting any serializable objects, we instead treat json and CSV specifically: - For json, we build a serde `Visitor`, that transform the json straight into obkv without intermediate representation. - For csv, we directly write the lines in the obkv, applying other optimization as well. Co-authored-by: marin postma <postma.marin@protonmail.com>	2021-10-26 09:55:28 +00:00
marin postma	baddd80069	implement review suggestions	2021-10-25 18:29:12 +02:00
marin postma	f9445c1d90	return float parsing error context in csv	2021-10-25 17:27:10 +02:00
Clémentine Urquizar	208903ddde	Revert "Replacing pest with nom "	2021-10-25 11:58:00 +02:00
marin postma	3fcccc31b5	add document builder example	2021-10-25 10:26:43 +02:00
marin postma	430e9b13d3	add csv builder tests	2021-10-25 10:26:43 +02:00
marin postma	53c79e85f2	document errors	2021-10-25 10:26:43 +02:00
marin postma	2e62925a6e	fix tests	2021-10-25 10:26:42 +02:00
marin postma	0f86d6b28f	implement csv serialization	2021-10-25 10:26:42 +02:00
marin postma	8d70b01714	optimize document deserialization	2021-10-25 10:26:42 +02:00
Tamo	1327807caa	add some error messages	2021-10-22 19:00:33 +02:00
Tamo	c8d03046bf	add a check on the fid in the geosearch	2021-10-22 18:08:18 +02:00
Tamo	3942b3732f	re-implement the geosearch	2021-10-22 18:03:39 +02:00
Tamo	7cd9109e2f	lowercase value extracted from Token	2021-10-22 17:50:15 +02:00
Tamo	e25ca9776f	start updating the exposed function to makes other modules happy	2021-10-22 17:23:22 +02:00
Tamo	6c9165b6a8	provide a helper to parse the token but to not handle the errors	2021-10-22 16:52:13 +02:00
Tamo	efb2f8b325	convert the errors	2021-10-22 16:38:35 +02:00
Tamo	c27870e765	integrate a first version without any error handling	2021-10-22 14:33:18 +02:00
Tamo	01dedde1c9	update some names and move some parser out of the lib.rs	2021-10-22 01:59:38 +02:00
Tamo	c634d43ac5	add a simple test on the filters with an integer	2021-10-21 17:10:27 +02:00
Tamo	6c15f50899	rewrite the parser logic	2021-10-21 16:45:42 +02:00
Tamo	e1d81342cf	add test on the or and and operator	2021-10-21 13:01:25 +02:00
Tamo	423baac08b	fix the tests	2021-10-21 12:45:40 +02:00
Tamo	36281a653f	write all the simple tests	2021-10-21 12:40:11 +02:00
Tamo	661bc21af5	Fix the filter parser And add a bunch of tests on the filter::from_array	2021-10-21 11:45:03 +02:00
bors[bot]	59cc59e93e	Merge #358 358: Replacing pest with nom r=Kerollmops a=CNLHC Co-authored-by: 刘瀚骋 <cn_lhc@qq.com>	2021-10-16 20:44:38 +00:00
刘瀚骋	7666e4f34a	follow the suggestions	2021-10-14 21:37:59 +08:00
刘瀚骋	2ea2f7570c	use nightly cargo to format the code	2021-10-14 16:46:13 +08:00
刘瀚骋	e750465e15	check logic for geolocation.	2021-10-14 16:12:00 +08:00
bors[bot]	aa5e099718	Merge #390 390: Add helper methods on the settings r=Kerollmops a=irevoire This would be a good addition to look at the content of a setting without consuming it. It’s useful for analytics. Co-authored-by: Irevoire <tamo@meilisearch.com>	2021-10-13 20:36:30 +00:00
bors[bot]	c7db4176f3	Merge #384 384: Replace memmap with memmap2 r=Kerollmops a=palfrey [memmap is unmaintained](https://rustsec.org/advisories/RUSTSEC-2020-0077.html) and needs replacing. memmap2 is a drop-in replacement fork that's well maintained. Note that the version numbers got reset on fork, hence the lower values. Co-authored-by: Tom Parker-Shemilt <palfrey@tevp.net>	2021-10-13 13:47:23 +00:00
Irevoire	a3e7c468cd	add helper methods on the settings	2021-10-13 13:05:07 +02:00
刘瀚骋	cd359cd96e	WIP: extract the error trait bound to new trait.	2021-10-13 18:04:15 +08:00
刘瀚骋	5de5dd80a3	WIP: remove '_nom' suffix/redundant error enum/...	2021-10-13 11:06:15 +08:00
刘瀚骋	2c65781d91	format	2021-10-12 22:20:22 +08:00
bors[bot]	6e3b869e6a	Merge #388 388: fix primary key inference r=MarinPostma a=MarinPostma The primary key is was infered from a hashtable index of the field. For this reason the order in which the fields were interated upon was not deterministic, and the primary key was chosed ffrom the first field containing "id". This fix sorts the the index by field_id when infering the primary key. Co-authored-by: mpostma <postma.marin@protonmail.com>	2021-10-12 09:25:16 +00:00
mpostma	86ead92ed5	infer primary key on sorted fields	2021-10-12 11:15:11 +02:00
mpostma	9a266a531b	test correct primary key inference	2021-10-12 11:08:53 +02:00
many	c5a6075484	Make max_position_per_attributes changable	2021-10-12 10:10:50 +02:00
many	360c5ff3df	Remove limit of 1000 position per attribute Instead of using an arbitrary limit we encode the absolute position in a u32 using one strong u16 for the field id and a weak u16 for the relative position in the attribute.	2021-10-12 10:10:50 +02:00
刘瀚骋	d323e35001	add a test case	2021-10-12 13:30:40 +08:00
刘瀚骋	70f576d5d3	error handling	2021-10-12 13:30:40 +08:00
刘瀚骋	28f9be8d7c	support syntax	2021-10-12 13:30:40 +08:00
刘瀚骋	469d92c569	tweak error handling	2021-10-12 13:30:40 +08:00
刘瀚骋	7a90a101ee	reorganize parser logic	2021-10-12 13:30:40 +08:00
刘瀚骋	f7796edc7e	remove everything about pest	2021-10-12 13:30:40 +08:00
刘瀚骋	ac1df9d9d7	fix typo and remove pest	2021-10-12 13:30:40 +08:00
刘瀚骋	50ad750ec1	enhance error handling	2021-10-12 13:30:40 +08:00
刘瀚骋	8748df2ca4	draft without error handling	2021-10-12 13:30:40 +08:00
mpostma	99889a0ed0	add obkv document serialization test	2021-10-11 15:13:17 +02:00
mpostma	799f3d43c8	fix serialization to obkv format	2021-10-11 15:04:47 +02:00
Tom Parker-Shemilt	2dfe24f067	memmap -> memmap2	2021-10-10 22:47:12 +01:00
Irevoire	b65aa7b5ac	Apply suggestions from code review Co-authored-by: Clément Renault <clement@meilisearch.com>	2021-10-07 17:51:52 +02:00
Tamo	11dfe38761	Update the check on the latitude and longitude Latitude are not supposed to go beyound 90 degrees or below -90. The same goes for longitude with 180 or -180. This was badly implemented in the filters, and was not implemented for the AscDesc rules.	2021-10-07 16:10:43 +02:00
many	085bc6440c	Apply PR comments	2021-10-06 11:12:26 +02:00
many	1bd15d849b	Reduce candidates threshold	2021-10-05 18:52:14 +02:00
many	ea4bd29d14	Apply PR comments	2021-10-05 17:35:07 +02:00
many	3296bb243c	Simplify word level position DB into a word position DB	2021-10-05 12:15:02 +02:00
many	75d341d928	Re-implement set based algorithm for attribute criterion	2021-10-05 12:14:50 +02:00
Tamo	d9eba9d145	improve and test the sort error message	2021-09-30 14:38:27 +02:00
Tamo	0ee67bb7d1	improve the reserved keyword error message for the filters	2021-09-30 14:38:27 +02:00
bors[bot]	22551d0941	Merge #379 379: Revert "Change chunk size to 4MiB to fit more the end user usage" r=curquiza a=ManyTheFish Reverts meilisearch/milli#370 Co-authored-by: Many <legendre.maxime.isn@gmail.com>	2021-09-29 13:20:53 +00:00
Many	26b5dad042	Revert "Change chunk size to 4MiB to fit more the end user usage"	2021-09-29 15:08:39 +02:00
Many	2e49230ca2	Update milli/src/search/criteria/attribute.rs Co-authored-by: Clément Renault <clement@meilisearch.com>	2021-09-29 14:49:45 +02:00
Many	7ad0214089	Update milli/src/search/criteria/attribute.rs Co-authored-by: Clément Renault <clement@meilisearch.com>	2021-09-29 14:49:41 +02:00
many	1df5b8712b	Hotfix meilisearch#1707	2021-09-29 14:41:56 +02:00
Tamo	f65153ad64	stop casting integer docids to string	2021-09-28 18:35:54 +02:00
Vishnu Gt	785c1372f2	Change "settings" to "setting" Co-authored-by: Clément Renault <renault.cle@gmail.com>	2021-09-28 20:11:32 +05:30
Vishnu Ganesan	3580b2d803	Fixes #365	2021-09-28 19:30:23 +05:30
bors[bot]	3a12f5887e	Merge #373 373: Improve error message for bad sort syntax with geosearch r=Kerollmops a=irevoire `@Kerollmops` This should be the last PR for the geosearch and error handling, sorry for doing it in so many steps 😬 Co-authored-by: Tamo <tamo@meilisearch.com>	2021-09-28 12:39:32 +00:00
Tamo	a80dcfd4a3	improve error message for bad sort syntax with geosearch	2021-09-28 14:32:24 +02:00
bors[bot]	b2a332599e	Merge #372 372: Fix Meilisearch 1714 r=Kerollmops a=ManyTheFish The bug comes from the typo tolerance, to know how many typos are accepted we were counting bytes instead of characters in a word. On Chinese Script characters, we were allowing 2 typos on 3 characters words. We are now counting the number of char instead of counting bytes to assign the typo tolerance. Related to [Meilisearch#1714](https://github.com/meilisearch/MeiliSearch/issues/1714) Co-authored-by: many <maxime@meilisearch.com>	2021-09-28 11:59:45 +00:00
many	8046ae4bd5	Count the number of char instead of counting bytes to assign the typo tolerance	2021-09-28 12:10:43 +02:00
many	1988416295	Add failing test related to Meilisearch#1714	2021-09-28 12:05:11 +02:00
Tamo	c7cb816ae1	simplify the error handling of the sort syntax for meilisearch	2021-09-27 19:07:22 +02:00
many	b188063869	Change chunk size to 4MiB to fit more the end user usage	2021-09-27 14:26:21 +02:00
many	551df0cb77	Add test checking the bug reported in meilisearch issue 1716	2021-09-23 15:55:39 +02:00
Irevoire	218f0a6661	Apply suggestions from code review Co-authored-by: Clément Renault <clement@meilisearch.com>	2021-09-22 17:00:27 +02:00
Tamo	47ee93b0bd	return an error when _geoPoint is used but _geo is not sortable	2021-09-22 16:37:41 +02:00
Tamo	1e5e3d57e2	auto convert AscDescError into CriterionError	2021-09-22 16:37:41 +02:00
Tamo	023446ecf3	create a smaller and easier to maintain CriterionError type	2021-09-22 16:37:41 +02:00
Tamo	86e272856a	create an asc_desc error type that is never supposed to be returned to the end user	2021-09-22 16:37:41 +02:00
Tamo	257e621d40	create an asc_desc module	2021-09-22 16:37:41 +02:00
Tamo	113a061bee	fix the error handling on the criterion side	2021-09-22 15:09:07 +02:00
Tamo	78b0bce9a1	fix the returned error when asc desc fails to be parsed	2021-09-22 11:37:05 +02:00
mpostma	aa6c5df0bc	Implement documents format document reader transform remove update format support document sequences fix document transform clean transform improve error handling add documents! macro fix transform bug fix tests remove csv dependency Add comments on the transform process replace search cli fmt review edits fix http ui fix clippy warnings Revert "fix clippy warnings" This reverts commit a1ce3cd96e603633dbf43e9e0b12b2453c9c5620. fix review comments remove smallvec in transform loop review edits	2021-09-21 16:58:33 +02:00
bors[bot]	31c8de1cca	Merge #322 322: Geosearch r=ManyTheFish a=irevoire This PR introduces [basic geo-search functionalities](https://github.com/meilisearch/specifications/pull/59), it makes the engine able to index, filter and, sort by geo-point. We decided to use [the rstar library](https://docs.rs/rstar) and to save the points in [an RTree](https://docs.rs/rstar/0.9.1/rstar/struct.RTree.html) that we de/serialize in the index database [by using serde](https://serde.rs/) with [bincode](https://docs.rs/bincode). This is not an efficient way to query this tree as it will consume a lot of CPU and memory when a search is made, but at least it is an easy first way to do so. ### What we will have to do on the indexing part: - [x] Index the `_geo` fields from the documents. - [x] Create a new module with an extractor in the `extract` module that takes the `obkv_documents` and retrieves the latitude and longitude coordinates, outputting them in a `grenad::Reader` for further process. - [x] Call the extractor in the `extract::extract_documents_data` function and send the result to the `TypedChunk` module. - [x] Get the `grenad::Reader` in the `typed_chunk::write_typed_chunk_into_index` function and store all the points in the `rtree` - [x] Delete the documents from the `RTree` when deleting documents from the database. All this can be done in the `delete_documents.rs` file by getting the data structure and removing the points from it, inserting it back after the modification. - [x] Clearing the `RTree` entirely when we clear the documents from the database, everything happens in the `clear_documents.rs` file. - [x] save a Roaring bitmap of all documents containing the `_geo` field ### What we will have to do on the query part: - [x] Filter the documents at a certain distance around a point, this is done by [collecting the documents from the searched point](https://docs.rs/rstar/0.9.1/rstar/struct.RTree.html#method.nearest_neighbor_iter) while they are in range. - [x] We must introduce new `geoLowerThan` and `geoGreaterThan` variants to the `Operator` filter enum. - [x] Implement the `negative` method on both variants where the `geoGreaterThan` variant is implemented by executing the `geoLowerThan` and removing the results found from the whole list of geo faceted documents. - [x] Add the `_geoRadius` function in the pest parser. - [x] Introduce a `_geo` ascending ranking function that takes a point in parameter, ~~this function must keep the iterator on the `RTree` and make it peekable~~ This was not possible for now, we had to collect the whole iterator. Only the documents that are part of the candidates must be sent too! - [x] This ascending ranking rule will only be active if the search is set up with the `_geoPoint` parameter that indicates the center point of the ascending ranking rule. ----------- - On Meilisearch part: We must introduce a new concept, returning the documents with a new `_geoDistance` field when it passed by the `_geo` ranking rule, this has never been done before. We could maybe just do it afterward when the documents have been retrieved from the database, computing the distance from the `_geoPoint` and all of the documents to be returned. Co-authored-by: Irevoire <tamo@meilisearch.com> Co-authored-by: cvermand <33010418+bidoubiwa@users.noreply.github.com> Co-authored-by: Tamo <tamo@meilisearch.com>	2021-09-20 19:04:57 +00:00
Irevoire	0d104a0fce	Update milli/src/criterion.rs Co-authored-by: Clément Renault <clement@meilisearch.com>	2021-09-20 18:13:17 +02:00
Tamo	f4b8e5675d	move the reserved keyword logic for the criterion and sort + add test	2021-09-20 17:21:02 +02:00
Irevoire	3b7a2cdbce	fix typo Co-authored-by: Clément Renault <clement@meilisearch.com>	2021-09-20 16:10:39 +02:00
Tamo	c695a1ffd2	add the possibility to sort by descending order on geoPoint	2021-09-15 11:49:58 +02:00
Tamo	91ce4d1721	Stop iterating through the whole list of points We stop when there is no possible candidates left	2021-09-15 11:49:58 +02:00
Tamo	cfc62a1c15	use geoutils instead of haversine	2021-09-09 18:11:38 +02:00
many	26deeb45a3	Add lacking parameter to word level position builder	2021-09-09 17:49:04 +02:00
Tamo	3fc145c254	if we have no rtree we return all other provided documents	2021-09-09 17:44:09 +02:00
Irevoire	a84f3a8b31	Apply suggestions from code review Co-authored-by: Clément Renault <clement@meilisearch.com>	2021-09-09 15:09:35 +02:00
Tamo	c81ff22c5b	delete the invalid criterion name error in favor of invalid ranking rule name	2021-09-08 19:17:00 +02:00
Tamo	bad8ea47d5	edit the two lasts TODO comments	2021-09-08 18:24:09 +02:00
Tamo	b15c77ebc4	return an error in case a user try to sort with :desc	2021-09-08 18:24:09 +02:00
Tamo	e5ef0cad9a	use meters in the filters	2021-09-08 18:24:09 +02:00
Tamo	4f69b190bc	remove the distance from the search, the computation of the distance will be made on meilisearch side	2021-09-08 18:24:09 +02:00
Tamo	7ae2a7341c	introduce the reserved keywords in the filters	2021-09-08 18:24:09 +02:00
Tamo	6d5762a6c8	handle the case where you forgot entirely the parenthesis	2021-09-08 18:24:09 +02:00
Tamo	ebf82ac28c	improve the error messages and add tests for the filters	2021-09-08 18:24:09 +02:00
Tamo	bd4c248292	improve the error handling in general and introduce the concept of reserved keywords	2021-09-08 18:24:09 +02:00
Tamo	e8c093c1d0	fix the error handling in the filters	2021-09-08 18:24:09 +02:00
Tamo	f0b74637dc	fix all the tests	2021-09-08 18:24:09 +02:00
Tamo	b1bf7d4f40	reformat	2021-09-08 18:24:09 +02:00

1 2 3 4 5 ...

625 Commits