meilisearch

mirror of https://github.com/meilisearch/meilisearch.git synced 2024-11-30 09:04:59 +08:00

Author	SHA1	Message	Date
Louis Dureuil	4310928803	Fixes #3912	2023-07-12 10:08:56 +02:00
Louis Dureuil	74315b4ea8	Fixes #3911	2023-07-12 10:08:29 +02:00
Louis Dureuil	55cd7738b9	Update snapshots	2023-07-04 16:31:01 +02:00
Louis Dureuil	48409c9183	Add missing exactness.matchingWords, exactness.maxMatchingWords	2023-07-04 16:31:01 +02:00
Louis Dureuil	324d448236	Format let-else ❤️ 🎉	2023-07-03 10:20:28 +02:00
ManyTheFish	6ec7541026	Update inta snapshots	2023-06-29 17:18:39 +02:00
ManyTheFish	84845de9ef	Update Charabia	2023-06-29 15:56:32 +02:00
meili-bors[bot]	d4f10800f2	Merge #3834 3834: Define searchable fields at runtime r=Kerollmops a=ManyTheFish ## Summary This feature allows the end-user to search in one or multiple attributes using the search parameter `attributesToSearchOn`: ```json { "q": "Captain Marvel", "attributesToSearchOn": ["title"] } ``` This feature act like a filter, forcing Meilisearch to only return the documents containing the requested words in the attributes-to-search-on. Note that, with the matching strategy `last`, Meilisearch will only ensure that the first word is in the attributes-to-search-on, but, the retrieved documents will be ordered taking into account the word contained in the attributes-to-search-on. ## Trying the prototype A dedicated docker image has been released for this feature: #### last prototype version: ```bash docker pull getmeili/meilisearch:prototype-define-searchable-fields-at-search-time-1 ``` #### others prototype versions: ```bash docker pull getmeili/meilisearch:prototype-define-searchable-fields-at-search-time-0 ``` ## Technical Detail The attributes-to-search-on list is given to the search context, then, the search context uses the `fid_word_docids`database using only the allowed field ids instead of the global `word_docids` database. This is the same for the prefix databases. The database cache is updated with the merged values, meaning that the union of the field-id-database values is only made if the requested key is missing from the cache. ### Relevancy limits Almost all ranking rules behave as expected when ordering the documents. Only `proximity` could miss-order documents if all the searched words are in the restricted attribute but a better proximity is found in an ignored attribute in a document that should be ranked lower. I put below a failing test showing it: ```rust #[actix_rt::test] async fn proximity_ranking_rule_order() { let server = Server::new().await; let index = index_with_documents( &server, &json!([ { "title": "Captain super mega cool. A Marvel story", // Perfect distance between words in an ignored attribute "desc": "Captain Marvel", "id": "1", }, { "title": "Captain America from Marvel", "desc": "a Shazam ersatz", "id": "2", }]), ) .await; // Document 2 should appear before document 1. index .search(json!({"q": "Captain Marvel", "attributesToSearchOn": ["title"], "attributesToRetrieve": ["id"]}), \|response, code\| { assert_eq!(code, 200, "{}", response); assert_eq!( response["hits"], json!([ {"id": "2"}, {"id": "1"}, ]) ); }) .await; } ``` Fixing this would force us to create a `fid_word_pair_proximity_docids` and a `fid_word_prefix_pair_proximity_docids` databases which may multiply the keys of `word_pair_proximity_docids` and `word_prefix_pair_proximity_docids` by the number of attributes in the searchable_attributes list. If we think we should fix this test, I'll suggest doing it in another PR. ## Related Fixes #3772 Co-authored-by: Tamo <tamo@meilisearch.com> Co-authored-by: ManyTheFish <many@meilisearch.com>	2023-06-28 08:19:23 +00:00
Clément Renault	29d8268c94	Fix the vector query part by using the correct universe	2023-06-27 12:32:43 +02:00
Kerollmops	ab9f2269aa	Normalize the vectors during indexation and search	2023-06-27 12:32:41 +02:00
Kerollmops	3b560ef7d0	Make clippy happy	2023-06-27 12:32:40 +02:00
Kerollmops	3c31e1cdd1	Support more pages but in an ugly way	2023-06-27 12:32:39 +02:00
Kerollmops	c79e82c62a	Move back to the hnsw crate This reverts commit 7a4b6c065482f988b01298642f4c18775503f92f.	2023-06-27 12:32:39 +02:00
Kerollmops	268a9ef416	Move to the hgg crate	2023-06-27 12:32:38 +02:00
Clément Renault	642b0f3a1b	Expose a new vector field on the search route	2023-06-27 12:32:38 +02:00
ManyTheFish	63ca25290b	Take into account small Review requests	2023-06-26 14:56:19 +02:00
ManyTheFish	59f64a5256	Return an error when an attribute is not searchable	2023-06-26 14:56:19 +02:00
ManyTheFish	42709ea9a5	Fix clippy warnings	2023-06-26 14:55:57 +02:00
ManyTheFish	fb8fa07169	Restrict field ids in search context	2023-06-26 14:55:57 +02:00
ManyTheFish	0ccf1e2e40	Allow the search cache to store owned values	2023-06-26 14:55:57 +02:00
ManyTheFish	461b5118bd	Add API search setting	2023-06-26 14:55:14 +02:00
Louis Dureuil	d26e9a96ec	Add score details to new search tests	2023-06-22 12:39:14 +02:00
Louis Dureuil	49c8bc4de6	Fix tests	2023-06-22 12:39:14 +02:00
Louis Dureuil	da833eb095	Expose the scores and detailed scores in the API	2023-06-22 12:39:14 +02:00
Louis Dureuil	701d44bd91	Store the scores for each bucket Remove optimization where ranking rules are not executed on buckets of a single document when the score needs to be computed	2023-06-22 12:39:14 +02:00
Louis Dureuil	c621a250a7	Score for graph based ranking rules Count phrases in matchingWords and maxMatchingWords	2023-06-22 12:39:14 +02:00
Louis Dureuil	8939e85f60	Add rank_to_score for graph based ranking rules	2023-06-22 12:39:14 +02:00
Louis Dureuil	fa41d2489e	Score for sort	2023-06-22 12:39:14 +02:00
Louis Dureuil	59c5b992c2	Score for geosort	2023-06-22 12:39:14 +02:00
Louis Dureuil	2ea8194c18	Score for exact_attributes	2023-06-22 12:39:14 +02:00
Louis Dureuil	421df64602	RankingRuleOutput now contains a Score	2023-06-22 12:39:14 +02:00
Louis Dureuil	f050634b1e	add virtual conditions to fid and position to always have the max cost	2023-06-20 10:07:18 +02:00
Louis Dureuil	becf1f066a	Change how the cost of removing words is computed	2023-06-20 09:45:43 +02:00
Louis Dureuil	701d299369	Remove out-of-date comment	2023-06-20 09:45:42 +02:00
Louis Dureuil	a20e4d447c	Position now takes into account the distance to the position of the word in the query it used to be based on the distance to the position 0	2023-06-20 09:45:42 +02:00
Louis Dureuil	af57c3c577	Proximity costs 0 for documents that are perfectly matching	2023-06-20 09:45:42 +02:00
Louis Dureuil	0c40ef6911	Fix sort id	2023-06-20 09:45:42 +02:00
Loïc Lecrenier	2da86b31a6	Remove comments and add documentation	2023-06-14 12:39:42 +02:00
Louis Dureuil	a2a3b8c973	Fix offset difference between query and indexing for hard separators	2023-06-08 12:07:12 +02:00
Louis Dureuil	1dfc4038ab	Add test that fails before PR and passes now	2023-05-29 11:58:26 +02:00
Louis Dureuil	73198179f1	Consistently use wrapping add to avoid overflow in debug when query starts with a separator	2023-05-29 11:54:12 +02:00
meili-bors[bot]	2e49d6aec1	Merge #3768 3768: Fix bugs in graph-based ranking rules + make `words` a graph-based ranking rule r=dureuill a=loiclec This PR contains three changes: ## 1. Don't call the `words` ranking rule if the term matching strategy is `All` This is because the purpose of `words` is only to remove nodes from the query graph. It would never do any useful work when the matching strategy was `All`. Remember that the universe was already computed before by computing all the docids corresponding to the "maximally reduced" query graph, which, in the case of `All`, is equal to the original graph. ## 2. The `words` ranking rule is replaced by a graph-based ranking rule. This is for three reasons: 1. performance: graph-based ranking rules benefit from a lot of optimisations by default, which ensures that they are never too slow. The previous implementation of `words` could call `compute_query_graph_docids` many times if some words had to be removed from the query, which would be quite expensive. I was especially worried about its performance in cases where it is placed right after the `sort` ranking rule. Furthermore, `compute_query_graph_docids` would clone a lot of bitmaps many times unnecessarily. 2. consistency: every other ranking rule (except `sort`) is graph-based. It makes sense to implement `words` like that as well. It will automatically benefit from all the features, optimisations, and bug fixes that all the other ranking rules get. 3. surfacing bugs: as the first ranking rule to be called (most of the time), I'd like `words` to behave the same as the other ranking rules so that we can quickly detect bugs in our graph algorithms. This actually already happened, which is why this PR also contains a bug fix. ## 3. Fix the `update_all_costs_before_nodes` function It is a bit difficult to explain what was wrong, but I'll try. The bug happened when we had graphs like: <img width="730" alt="Screenshot 2023-05-16 at 10 58 57" src="https://github.com/meilisearch/meilisearch/assets/6040237/40db1a68-d852-4e89-99d5-0d65757242a7"> and we gave the node `is` as argument. Then, we'd walk backwards from the node breadth-first. We'd update the costs of: 1. `sun` 2. `thesun` 3. `start` 4. `the` which is an incorrect order. The correct order is: 1. `sun` 2. `thesun` 3. `the` 4. `start` That is, we can only update the cost of a node when all of its successors have either already been visited or were not affected by the update to the node passed as argument. To solve this bug, I factored out the graph-traversal logic into a `traverse_breadth_first_backward` function. Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com> Co-authored-by: Louis Dureuil <louis@meilisearch.com>	2023-05-23 13:28:08 +00:00
Louis Dureuil	51043f78f0	Remove trailing whitespace	2023-05-23 15:27:25 +02:00
Louis Dureuil	a490a11325	Add explanatory comment on the way we're recomputing costs	2023-05-23 15:24:24 +02:00
Loïc Lecrenier	ec8f685d84	Fix bug in cheapest path algorithm	2023-05-16 17:01:30 +02:00
Loïc Lecrenier	5758268866	Don't compute split_words for phrases	2023-05-16 17:01:18 +02:00
Loïc Lecrenier	3e19702de6	Update snapshot tests	2023-05-16 12:22:46 +02:00
Loïc Lecrenier	f6524a6858	Adjust costs of edges in position ranking rule To ensure good performance	2023-05-16 11:28:56 +02:00
meili-bors[bot]	65ad8cce36	Merge #3741 3741: Add ngram support to the highlighter r=ManyTheFish a=loiclec This PR fixes a bug introduced by the search refactor, where ngrams were not highlighted. The solution was to add the ngrams to the vector of `LocatedQueryTerm` that is given to the `MatchingWords` structure. Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>	2023-05-16 09:03:31 +00:00
Loïc Lecrenier	a37da36766	Implement `words` as a graph-based ranking rule and fix some bugs	2023-05-16 10:42:11 +02:00
Loïc Lecrenier	85d96d35a8	Highlight ngram matches as well	2023-05-16 10:39:36 +02:00
Loïc Lecrenier	4d352a21ac	Compute split words derivations of terms that don't accept typos	2023-05-10 13:31:19 +02:00
Loïc Lecrenier	3625389057	Highlight ngram matches as well	2023-05-08 15:35:41 +02:00
meili-bors[bot]	eace6df91b	Merge #3726 3726: Fix prefix highlighting r=loiclec a=ManyTheFish The prefix queries were not properly highlighted, this PR now highlights only the start of a word when it matched with a prefix Co-authored-by: ManyTheFish <many@meilisearch.com> Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>	2023-05-08 07:46:46 +00:00
Loïc Lecrenier	83ab8cf4e5	Remove dbg!(..) expression in highlighter tests	2023-05-08 09:45:23 +02:00
ManyTheFish	cd2573fcc3	Fix prefix highlighting	2023-05-04 16:53:50 +02:00
Jakub Jirutka	13f1277637	Allow to disable specialized tokenizations (again) In PR #2773, I added the `chinese`, `hebrew`, `japanese` and `thai` feature flags to allow melisearch to be built without huge specialed tokenizations that took up 90% of the melisearch binary size. Unfortunately, due to some recent changes, this doesn't work anymore. The problem lies in excessive use of the `default` feature flag, which infects the dependency graph. Instead of adding `default-features = false` here and there, it's easier and more future-proof to not declare `default` in `milli` and `meilisearch-types`. I've renamed it to `all-tokenizers`, which also makes it a bit clearer what it's about.	2023-05-04 15:45:40 +02:00
Louis Dureuil	f8f190cd40	Update exactness tests following charabia camelCase tokenization	2023-05-03 14:45:09 +02:00
Louis Dureuil	1aaf24ccbf	Cargo fmt	2023-05-03 12:21:58 +02:00
Louis Dureuil	342c4ff85d	geosort: Remove rtree unwrap	2023-05-03 09:52:16 +02:00
Tamo	c85392ce40	make the descendent geosort fast	2023-05-03 09:13:12 +02:00
Tamo	8875d24a48	deserialize the rtree only when its needed, and keep it in memory once it has been deserialized	2023-05-03 09:13:12 +02:00
Tamo	c470b67fa2	revamp the test to use execute_iterative_and_rtree_returns_the_same	2023-05-03 09:13:12 +02:00
Louis Dureuil	b60840ebff	Remove self.iterating from words	2023-05-02 18:54:23 +02:00
Louis Dureuil	fdc1763838	Use MultiOps for resolve_query_graph	2023-05-02 18:54:09 +02:00
Louis Dureuil	75819bc940	Remove too many arguments on resolve_maximally_reduced_query_graph	2023-05-02 18:53:40 +02:00
Louis Dureuil	7b8cc25625	rename located_query_terms_from_string -> located_query_terms_from_tokens	2023-05-02 18:53:01 +02:00
Loïc Lecrenier	aa63091752	Fix bug in exact_attribute	2023-05-02 10:48:32 +02:00
Loïc Lecrenier	1b514517f5	Fix bug in computation of query term at a position	2023-05-02 10:48:32 +02:00
Loïc Lecrenier	11f814821d	Minor cleanup	2023-05-02 10:48:32 +02:00
Loïc Lecrenier	30fb1153cc	Speed up graph based ranking rule when a lot of different costs exist	2023-05-02 09:59:42 +02:00
Loïc Lecrenier	3b2c8b9f25	Improve performance of position rr	2023-05-02 09:59:42 +02:00
Loïc Lecrenier	2a7f9adf78	Build query graph more correctly from paths Update snapshots	2023-05-02 09:59:42 +02:00
Loïc Lecrenier	608ceea440	Fix bug in position rr	2023-05-02 09:59:42 +02:00
Loïc Lecrenier	79001b9c97	Improve performance of the cheapest path finder algorithm	2023-05-02 09:59:42 +02:00
Loïc Lecrenier	59b12fca87	Fix errors, clippy warnings, and add review comments	2023-04-29 11:48:11 +02:00
Loïc Lecrenier	48f5bb1693	Implements the geo-sort ranking rule	2023-04-29 11:02:16 +02:00
Loïc Lecrenier	bc4efca611	Add more tests for the attribute ranking rule	2023-04-29 10:56:48 +02:00
Loïc Lecrenier	899baa0ea5	Update forgotten snapshot from previous commit	2023-04-27 13:43:04 +02:00
Loïc Lecrenier	374095d42c	Add tests for stop words and fix a couple of bugs	2023-04-27 13:30:09 +02:00
Louis Dureuil	b41a6cbd7a	Check sort criteria also in placeholder search	2023-04-26 16:28:17 +02:00
Louis Dureuil	c8af572697	Add tests for exact words and exact attributes	2023-04-26 16:13:01 +02:00
Loïc Lecrenier	b448aca49c	Add more tests for exactness rr	2023-04-26 11:04:18 +02:00
Loïc Lecrenier	55bad07c16	Fix bug in exact_attribute rr implementation	2023-04-26 10:40:05 +02:00
Loïc Lecrenier	3421125a55	Prevent the `exactness` ranking rule from removing random words Make it strictly follow the term matching strategy	2023-04-26 09:09:19 +02:00
Loïc Lecrenier	d3a94e8b25	Fix bugs and add tests to exactness ranking rule	2023-04-25 16:49:08 +02:00
Loïc Lecrenier	8f2e971879	Add tests for "exactness" rr, make correct universe computation	2023-04-24 16:57:34 +02:00
Loïc Lecrenier	d1fdbb63da	Make all search tests pass, fix distinctAttribute bug	2023-04-24 12:12:08 +02:00
Loïc Lecrenier	84d9c731f8	Fix bug in encoding of word_position_docids and word_fid_docids	2023-04-24 09:59:30 +02:00
Loïc Lecrenier	bd9aba4d77	Add "position" part of the attribute ranking rule	2023-04-13 10:46:09 +02:00
Loïc Lecrenier	8edad8291b	Add logger to attribute rr, fix a bug	2023-04-13 10:25:00 +02:00
Kerollmops	d9cebff61c	Add a simple test to check that attributes are ranking correctly	2023-04-13 08:27:09 +02:00
Loïc Lecrenier	30f7bd03f6	Fix compiler warning/errors caused by previous merge	2023-04-13 08:27:09 +02:00
Kerollmops	df0d9bb878	Introduce the attribute ranking rule in the list of ranking rules	2023-04-13 08:27:09 +02:00
Kerollmops	5230ddb3ea	Resolve the attribute ranking rule conditions	2023-04-13 08:27:09 +02:00
Kerollmops	d6a7c28e4d	Implement the attribute ranking rule edge computation	2023-04-13 08:27:09 +02:00
Kerollmops	e55efc419e	Introduce a new cache for the words fids	2023-04-13 08:27:09 +02:00
Loïc Lecrenier	644e136aee	Merge branch 'search-refactor-typo-attributes' into search-refactor	2023-04-13 08:26:56 +02:00
Louis Dureuil	38b7b31beb	Decide to use prefix DB if the word is not an ngram	2023-04-12 16:45:38 +02:00
Louis Dureuil	7a01f20df7	Use word_prefix_docids, make get_word_prefix_docids private	2023-04-12 16:45:38 +02:00

1 2 3 4 5 ...

320 Commits