meilisearch

mirror of https://github.com/meilisearch/meilisearch.git synced 2024-11-30 17:14:59 +08:00

Author	SHA1	Message	Date
Marin Postma	45c45e11dd	implement distinct attribute distinct can return error facet distinct on numbers return distinct error review fixes make get_facet_value more generic fixes	2021-04-15 16:25:55 +02:00
Clémentine Urquizar	2c5c79d68e	Update Tokenizer version to v0.2.1	2021-04-14 18:54:04 +02:00
tamo	dcb00b2e54	test a new implementation of the stop_words	2021-04-12 18:35:33 +02:00
tamo	da036dcc3e	Revert "Integrate the stop_words in the querytree" This reverts commit `12fb509d84`. We revert this commit because it's causing the bug #150. The initial algorithm we implemented for the stop_words was: 1. remove the stop_words from the dataset 2. keep the stop_words in the query to see if we can generate new words by integrating typos or if the word was a prefix => This was causing the bug since, in the case of “The hobbit”, we were always looking for something starting with “t he” or “th e” instead of ignoring the word completely. For now we are going to fix the bug by completely ignoring the stop_words in the query. This could cause another problem were someone mistyped a normal word and ended up typing a stop_word. For example imagine someone searching for the music “Won't he do it”. If that person misplace one space and write “Won' the do it” then we will loose a part of the request. One fix would be to update our query tree to something like that: --------------------- OR OR TOLERANT hobbit # the first option is to ignore the stop_word AND CONSECUTIVE # the second option is to do as we are doing EXACT t # currently EXACT he TOLERANT hobbit --------------------- This would increase drastically the size of our query tree on request with a lot of stop_words. For example think of “The Lord Of The Rings”. For now whatsoever we decided we were going to ignore this problem and consider that it doesn't reduce too much the relevancy of the search to do that while it improves the performances.	2021-04-12 18:35:33 +02:00
Alexey Shekhirin	84c1dda39d	test(http): setting enum serialize/deserialize	2021-04-08 17:03:40 +03:00
Alexey Shekhirin	dc636d190d	refactor(http, update): introduce setting enum	2021-04-08 17:03:40 +03:00
tamo	0a4bde1f2f	update the default ordering of the criterion	2021-04-01 19:45:31 +02:00
Alexey Shekhirin	2658c5c545	feat(index): update fields distribution in clear & delete operations fixes after review bump the version of the tokenizer implement a first version of the stop_words The front must provide a BTreeSet containing the stop words The stop_words are set at None if an empty Set is provided add the stop-words in the http-ui interface Use maplit in the test and remove all the useless drop(rtxn) at the end of all tests Integrate the stop_words in the querytree remove the stop_words from the querytree except if it was a prefix or a typo more fixes after review	2021-04-01 19:12:35 +03:00
Alexey Shekhirin	27c7ab6e00	feat(index): store fields distribution in index	2021-04-01 18:35:19 +03:00
tamo	12fb509d84	Integrate the stop_words in the querytree remove the stop_words from the querytree except if it was a prefix or a typo	2021-04-01 13:57:55 +02:00
tamo	a2f46029c7	implement a first version of the stop_words The front must provide a BTreeSet containing the stop words The stop_words are set at None if an empty Set is provided add the stop-words in the http-ui interface Use maplit in the test and remove all the useless drop(rtxn) at the end of all tests	2021-04-01 13:57:55 +02:00
tamo	62a8f1d707	bump the version of the tokenizer	2021-04-01 13:49:22 +02:00
Alexey Shekhirin	9205b640a4	feat(index): introduce fields_ids_distribution	2021-03-31 18:44:47 +03:00
Alexey Shekhirin	2cb32edaa9	fix(criterion): compile asc/desc regex only once use once_cell instead of lazy_static reorder imports	2021-03-30 16:07:14 +03:00
Alexey Shekhirin	1e3f05db8f	use fixed number of candidates as a threshold	2021-03-30 11:57:10 +03:00
Alexey Shekhirin	a776ec9718	fix division	2021-03-29 19:16:58 +03:00
Alexey Shekhirin	522e79f2e0	feat(search, criteria): introduce a percentage threshold to the asc/desc	2021-03-29 19:08:31 +03:00
tamo	73dcdb27f6	select a specific release of the tokenizer instead of using the latests git commit	2021-03-25 15:00:18 +01:00
mpostma	9c27183876	fix broken offset	2021-03-15 20:23:50 +01:00
mpostma	f0210453a6	add updated at on put primary key	2021-03-15 14:05:48 +01:00
mpostma	615fe095e1	update index updated at on index writes	2021-03-15 14:05:47 +01:00
mpostma	80d0f9c49d	methods to update index time metadata	2021-03-15 14:05:47 +01:00
Kerollmops	d48008339e	Introduce two new optional_words and authorize_typos Search options	2021-03-10 11:16:30 +01:00
Kerollmops	54b97ed8e1	Update the fetcher comments	2021-03-10 10:56:26 +01:00
Kerollmops	d301859bbd	Introduce a special word_derivations function for Proximity	2021-03-10 10:42:53 +01:00
Kerollmops	facfb4b615	Fix the bucket candidates	2021-03-10 10:42:53 +01:00
Kerollmops	42fd7dea78	Remove the useless typo cache	2021-03-10 10:42:53 +01:00
many	62a70c300d	Optimize words criterion	2021-03-10 10:42:53 +01:00
Kerollmops	f51eb46c69	Use the RoaringBitmapLenCodec to retrieve the count of documents	2021-03-09 10:25:39 +01:00
Kerollmops	d781a6164a	Rewrite some code with idiomatic Rust	2021-03-08 16:27:52 +01:00
Clément Renault	b18ec00a7a	Add a logging_timer macro to te criterion next methods	2021-03-08 16:12:06 +01:00
Kerollmops	82a0f678fb	Introduce a cache on the docid_word_positions database method	2021-03-08 16:12:03 +01:00
Clément Renault	5fcaedb880	Introduce a WordDerivationsCache struct	2021-03-08 16:00:53 +01:00
many	2606c92ef9	use plain sweep in proximity criterion	2021-03-08 15:58:39 +01:00
many	ae47bb3594	Introduce plane_sweep function in proximity criterion	2021-03-08 15:58:38 +01:00
Kerollmops	636a9df177	Temporarily fix the tinytemplate doc hidden issue	2021-03-08 15:57:45 +01:00
Clément Renault	3c76b3548d	Rework the Asc/Desc criteria to be facet iterator based	2021-03-08 13:32:25 +01:00
Clément Renault	a58d2b6137	Print the Asc/Desc criterion field name in the debug prints	2021-03-08 13:32:25 +01:00
mpostma	e3095be85c	Remove Debug use in Display impl	2021-03-08 12:09:09 +01:00
mpostma	9e1eb25232	implement display for criterion Update milli/src/criterion.rs Co-authored-by: Clément Renault <clement@meilisearch.com>	2021-03-08 11:00:30 +01:00
Clément Renault	e5bb96bc3b	Fix the searchable settings test	2021-03-06 12:48:41 +01:00
Kerollmops	9b6b35d9b7	Clean up some comments	2021-03-03 18:19:10 +01:00
Kerollmops	2cc4a467a6	Change the criterion output that cannot fail	2021-03-03 18:18:33 +01:00
Kerollmops	1fc25148da	Remove useless where clauses for the criteria	2021-03-03 18:09:19 +01:00
Kerollmops	07784c8990	Tune the words prefixes threshold to compute for 1/1000 instead	2021-03-03 15:51:28 +01:00
Kerollmops	f376c6a728	Make sure we retrieve the docid word positions	2021-03-03 15:45:03 +01:00
Kerollmops	5c5e51095c	Fix the Asc/Desc criteria to alsways return the QueryTree when available	2021-03-03 15:45:03 +01:00
many	cdaa96df63	optimize proximity criterion	2021-03-03 15:45:03 +01:00
many	246286f0eb	take hard separator into account	2021-03-03 15:45:03 +01:00
Kerollmops	6bf6b40495	Remove unused files	2021-03-03 15:45:03 +01:00
Kerollmops	f118d7e067	build criteria from settings	2021-03-03 15:45:03 +01:00
Kerollmops	025835c5b2	Fix the criteria to avoid always returning a placeholder	2021-03-03 15:45:03 +01:00
Kerollmops	36c1f93ceb	Do an union of the bucket candidates	2021-03-03 15:45:03 +01:00
many	b0e0c5eba0	remove option of bucket_candidates	2021-03-03 15:45:03 +01:00
Kerollmops	daf126a638	Introduce the final Fetcher criterion	2021-03-03 15:45:03 +01:00
many	7ac09d7b7c	remove option of bucket_candidates	2021-03-03 15:45:03 +01:00
Kerollmops	5af63c74e0	Speed-up the MatchingWords highlighting struct	2021-03-03 15:45:03 +01:00
Kerollmops	4510bbccca	Add a lot of debug	2021-03-03 15:43:44 +01:00
Kerollmops	ae4a237e58	Fix the maximum_proximity function	2021-03-03 15:43:44 +01:00
Kerollmops	9bc9b36645	Introduce the Proximity criterion	2021-03-03 15:43:44 +01:00
Kerollmops	22b84fe543	Use the words criterion in the search module	2021-03-03 15:43:44 +01:00
many	3d731cc861	remove option on bucket_candidates	2021-03-03 15:43:44 +01:00
Clément Renault	14f9f85c4b	Introduce the AscDesc criterion	2021-03-03 15:43:44 +01:00
many	b5b7ec0162	implement initial state for words criterion	2021-03-03 15:43:44 +01:00
Kerollmops	3415812b06	Imrpove the intersection speed in the words criterion	2021-03-03 15:43:43 +01:00
Clément Renault	ef381e17bb	Compute the candidates for each sub query tree	2021-03-03 15:43:43 +01:00
Kerollmops	e174ccbd8e	Use the words criterion in the search module	2021-03-03 15:43:43 +01:00
Clément Renault	1e47f9b3ff	Introduce the Words criterion	2021-03-03 15:43:43 +01:00
many	2d068bd45b	implement Context trait for criteria	2021-03-03 15:43:43 +01:00
many	d92ad5640a	remove option on bucket_candidates	2021-03-03 15:43:43 +01:00
many	64688b3786	fix query tree builder	2021-03-03 15:43:43 +01:00
many	fb7e6df790	add tests on typo criterion	2021-03-03 15:43:43 +01:00
Kerollmops	c5a32fd4fa	Fix the typo criterion	2021-03-03 15:43:42 +01:00
many	a273c46559	clean warnings	2021-03-03 15:43:42 +01:00
many	9e093d5ff3	add cache on alterate_query_tree function	2021-03-03 15:43:42 +01:00
many	41fc51ebcf	optimize alterate_query_tree when number_typos is zero	2021-03-03 15:43:42 +01:00
many	4da6e1ea9c	add cache in typo criterion	2021-03-03 15:43:42 +01:00
Kerollmops	67c71130df	Reduce the number of calls to alterate_query_tree	2021-03-03 15:43:42 +01:00
many	9ccaea2afc	simplify criterion context	2021-03-03 15:43:42 +01:00
Clément Renault	fea9ffc46a	Use the bucket candidates in the search module	2021-03-03 15:43:42 +01:00
Clément Renault	229130ed25	Correctly compute the bucket candidates for the Typo criterion	2021-03-03 15:43:42 +01:00
Clément Renault	5344abc008	Introduce the CriterionResult return type	2021-03-03 15:43:41 +01:00
many	86bcecf840	change variable's name from distance to proximity	2021-03-03 15:43:41 +01:00
many	4128bdc859	reduce match possibilities in docids fetchers	2021-03-03 15:43:41 +01:00
many	907482c8ac	clean docids fetchers	2021-03-03 15:43:41 +01:00
many	774a255f2e	use prefix cache in criteria	2021-03-03 15:43:41 +01:00
many	98e69e63d2	implement Context trait for criteria	2021-03-03 15:43:41 +01:00
Clément Renault	f091f370d0	Use the Typo criteria in the search module	2021-03-03 15:43:41 +01:00
Clément Renault	ad20d72a39	Introduce the Typo criterion	2021-03-03 15:43:41 +01:00
Clément Renault	f0ddea821c	Introduce the Typo criterion	2021-03-03 15:43:41 +01:00
many	73286dc8bf	Introduce the query tree data structure	2021-03-03 15:43:40 +01:00
Kerollmops	240b02e175	Remove unused Operation constructors	2021-03-03 13:40:19 +01:00
many	a463ae821e	Add methods optional_words and authorize_typos on the query tree	2021-03-03 13:40:19 +01:00
Kerollmops	6d135beb21	Introduce the maximum_proximity helper function	2021-03-03 13:40:18 +01:00
Kerollmops	6008f528d0	Introduce the maximum_typo helper function	2021-03-03 13:40:18 +01:00
Kerollmops	1dc857a4b2	Fix the query tree optional word generation with phrases	2021-03-03 13:40:18 +01:00
Kerollmops	4f19749252	Introduce the word_documents_count method on the Context trait	2021-03-03 13:40:18 +01:00
Kerollmops	79a143b32f	Introduce the query tree data structure	2021-03-03 13:40:18 +01:00
mpostma	e08b6b3ec7	add primary key to fields_id_map when not present	2021-03-01 16:10:16 +01:00
Clément Renault	c318373b88	Expose the WordsPrefixes update on the UpdateBuilder	2021-02-21 12:15:35 +01:00
Kerollmops	519b1cb5c9	Update dependencies	2021-02-21 10:26:04 +01:00
Kerollmops	c2ffcc4bd1	Return an heed error from the word_documents_count method	2021-02-18 14:59:37 +01:00
Kerollmops	2f561c77f5	Introduce the word documents count method on the index	2021-02-18 14:35:14 +01:00
Kerollmops	8d710c5130	Introduce heed codecs to retrieve the length of roaring bitmaps	2021-02-18 14:30:47 +01:00
Kerollmops	fcfb39c5de	Move the RoaringBitmap related codecs into a module	2021-02-18 13:56:28 +01:00
Kerollmops	a4a48be923	Run the words prefixes update inside of the indexing documents update	2021-02-17 11:22:26 +01:00
Kerollmops	616ed8f73c	Clean up the word prefix pair proximities when deleting documents	2021-02-17 11:22:26 +01:00
Clément Renault	ea37fd821d	Clean up the words prefixes when deleting documents and words	2021-02-17 11:22:25 +01:00
Clément Renault	62eee9c69e	Introduce the sorter_into_lmdb_database helper function	2021-02-17 11:12:39 +01:00
Clément Renault	b5b89990eb	Compute and write the word prefix pair proximities database	2021-02-17 11:12:38 +01:00
Kerollmops	9b03b0a1b2	Introduce the word prefix pair proximity docids database	2021-02-17 11:12:38 +01:00
Clément Renault	f365de636f	Compute and write the word-prefix-docids database	2021-02-17 11:12:38 +01:00
Clément Renault	ee5a60e1c5	Clear the words prefixes when clearing an index	2021-02-17 10:45:17 +01:00
Clément Renault	b3a21d5a50	Introduce the getters and setters for the words prefixes FST	2021-02-17 10:45:17 +01:00
Clément Renault	89ce4e74fe	Do not change the primary key type when we serialize documents	2021-02-15 21:24:36 +01:00
Clément Renault	69acdd437e	Deserialize documents ids into JSON Values on deletion	2021-02-15 21:24:36 +01:00
Clément Renault	b3776598d8	Add a test to check deletion of documents with number as primary key	2021-02-15 21:24:35 +01:00
Clément Renault	fecf3d6fc1	Move the command lines helpers into different crates	2021-02-14 18:55:15 +01:00
Clément Renault	d8f3421608	Update the dependencies and remove the unused ones	2021-02-14 18:32:46 +01:00
Clément Renault	e8639517da	Change the project to become a workspace with milli as a default-member	2021-02-12 16:15:09 +01:00

... 2 3 4 5 6

270 Commits