meilisearch

mirror of https://github.com/meilisearch/meilisearch.git synced 2024-11-23 18:45:06 +08:00

Author	SHA1	Message	Date
meili-bors[bot]	d4f10800f2	Merge #3834 3834: Define searchable fields at runtime r=Kerollmops a=ManyTheFish ## Summary This feature allows the end-user to search in one or multiple attributes using the search parameter `attributesToSearchOn`: ```json { "q": "Captain Marvel", "attributesToSearchOn": ["title"] } ``` This feature act like a filter, forcing Meilisearch to only return the documents containing the requested words in the attributes-to-search-on. Note that, with the matching strategy `last`, Meilisearch will only ensure that the first word is in the attributes-to-search-on, but, the retrieved documents will be ordered taking into account the word contained in the attributes-to-search-on. ## Trying the prototype A dedicated docker image has been released for this feature: #### last prototype version: ```bash docker pull getmeili/meilisearch:prototype-define-searchable-fields-at-search-time-1 ``` #### others prototype versions: ```bash docker pull getmeili/meilisearch:prototype-define-searchable-fields-at-search-time-0 ``` ## Technical Detail The attributes-to-search-on list is given to the search context, then, the search context uses the `fid_word_docids`database using only the allowed field ids instead of the global `word_docids` database. This is the same for the prefix databases. The database cache is updated with the merged values, meaning that the union of the field-id-database values is only made if the requested key is missing from the cache. ### Relevancy limits Almost all ranking rules behave as expected when ordering the documents. Only `proximity` could miss-order documents if all the searched words are in the restricted attribute but a better proximity is found in an ignored attribute in a document that should be ranked lower. I put below a failing test showing it: ```rust #[actix_rt::test] async fn proximity_ranking_rule_order() { let server = Server::new().await; let index = index_with_documents( &server, &json!([ { "title": "Captain super mega cool. A Marvel story", // Perfect distance between words in an ignored attribute "desc": "Captain Marvel", "id": "1", }, { "title": "Captain America from Marvel", "desc": "a Shazam ersatz", "id": "2", }]), ) .await; // Document 2 should appear before document 1. index .search(json!({"q": "Captain Marvel", "attributesToSearchOn": ["title"], "attributesToRetrieve": ["id"]}), \|response, code\| { assert_eq!(code, 200, "{}", response); assert_eq!( response["hits"], json!([ {"id": "2"}, {"id": "1"}, ]) ); }) .await; } ``` Fixing this would force us to create a `fid_word_pair_proximity_docids` and a `fid_word_prefix_pair_proximity_docids` databases which may multiply the keys of `word_pair_proximity_docids` and `word_prefix_pair_proximity_docids` by the number of attributes in the searchable_attributes list. If we think we should fix this test, I'll suggest doing it in another PR. ## Related Fixes #3772 Co-authored-by: Tamo <tamo@meilisearch.com> Co-authored-by: ManyTheFish <many@meilisearch.com>	2023-06-28 08:19:23 +00:00
meili-bors[bot]	dc293911ad	Merge #3745 3745: tests: add unit test for `PayloadTooLarge` error r=curquiza a=cymruu # Pull Request Add a unit test for the `Payload`, which verifies that a request with a payload that is too large is rejected with the appropriate message. This was requested in this PR https://github.com/meilisearch/meilisearch/pull/3739 ## Related issue https://github.com/meilisearch/meilisearch/pull/3739 ## What does this PR do? - Adds requested test ## PR checklist Please check if your PR fulfills the following requirements: - [ ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)? - [ ] Have you read the contributing guidelines? - [ ] Have you made sure that the title is accurate and descriptive of the changes? Thank you so much for contributing to Meilisearch! Co-authored-by: Filip Bachul <filipbachul@gmail.com>	2023-06-27 14:58:23 +00:00
meili-bors[bot]	9d68e6969e	Merge #3859 3859: Merge all analytics events pertaining to updating the experimental features r=Kerollmops a=dureuill Follow-up to #3850 Co-authored-by: Louis Dureuil <louis@meilisearch.com>	2023-06-27 13:26:01 +00:00
Louis Dureuil	b4b686d253	Merge all analytics events pertaining to updating the experimental features	2023-06-27 15:16:23 +02:00
meili-bors[bot]	98ec476198	Merge #3855 3855: Change and add links to the Cloud r=Kerollmops a=dureuill - add cloud link in banner - add utm to existing links following https://github.com/meilisearch/integration-guides/issues/277#issuecomment-1592054536 Co-authored-by: Louis Dureuil <louis@meilisearch.com>	2023-06-27 12:29:36 +00:00
Louis Dureuil	c47b8a8bfe	Fix typo Co-authored-by: Guillaume Mourier <guillaume@meilisearch.com>	2023-06-27 14:27:54 +02:00
Louis Dureuil	054f81a021	Make message consistent with the one in integration repos	2023-06-27 14:20:55 +02:00
meili-bors[bot]	d8ea688481	Merge #3825 3825: Accept semantic vectors and allow users to query nearest neighbors r=Kerollmops a=Kerollmops This Pull Request brings a new feature to the current API. The engine accepts a new `_vectors` field akin to the `_geo` one. This vector is stored in Meilisearch and can be retrieved via search. This work is the first step toward hybrid search, bringing the best of both worlds: keyword and semantic search ❤️‍🔥 ## ToDo - [x] Make it possible to get the `limit` nearest neighbors from a user-generated vector by using the `vector` field of search route. - [x] Delete the documents and vectors from the HNSW-related data structures. - [x] Do it the slow and ugly way (we need to be able to iterate over all the values). - [ ] Do it the efficient way (Wait for a new method or implement it myself). - [ ] ~~Move from the `hnsw` crate to the hgg one~~ The hgg crate is too slow. Meilisearch takes approximately 88s to answer a query. It is related to the time it takes to deserialize the `Hgg` data structure or search in it. I didn't take the time to measure precisely. We moved back to the hnsw crate which takes approximately 40ms to answer. - [ ] ~~Wait for a fix for https://github.com/rust-cv/hgg/issues/4.~~ - [x] Fix the current dot product function. - [x] Fill in the other `SearchResult` fields. - [x] Remove the `hnsw` dependency of the meilisearch crate. - [x] Fix the pages by taking the offset into account. - [x] Release a first prototype https://github.com/meilisearch/product/discussions/621#discussioncomment-6183647 - [x] Make the pagination and filtering faster and more correct. - [x] Return the original vector in the output search results (like `query`). - [x] Return an `_semanticSimilarity` field in the documents (it's a dot product) - [x] Return this score even if the `_vectors` field is not displayed - [x] Rename the field `_semanticScore`. - [ ] Return the `_geoDistance` value even if the `_geo` field is not displayed - [x] Store the HNSW on possibly multiple LMDB values. - [ ] Measure it and make it faster if needed - [ ] Export the `ReadableSlices` type into a small external crate - [x] Accept an `_vectors` field instead of the `_vector` one. - [x] Normalize all vectors. - [ ] Remove the `_vectors` field from the default searchable attributes (as we do with `_geo`?). - [ ] Correctly compute the candidates by remembering the documents having a valid `_vectors` field. - [ ] Return the right errors: - [ ] Return an error when the query vector is not the same length as the vectors in the HNSW. - [ ] We must return the user document id that triggered the vector dimension issue. - [x] If an indexation error occurs. - [ ] Fix the error codes when using the search route. - [ ] ~~Introduce some settings:~~ We currently ensure that the vector length is consistent over the whole set of documents and return an error for when a vector dimension doesn't follow the current number of dimensions. - [ ] The length of the vector the user will provide. - [ ] The distance function (we only support dot as of now). - [ ] Introduce other distance functions - [ ] Euclidean - [ ] Dot Product - [ ] Cosine - [ ] Make them SIMD optimized - [ ] Give credit to qdrant - [ ] Add tests. - [ ] Write a mini spec. - [ ] Release it in v1.3 as an experimental feature. Co-authored-by: Clément Renault <clement@meilisearch.com> Co-authored-by: Kerollmops <clement@meilisearch.com>	2023-06-27 11:17:07 +00:00
Clément Renault	e69be93e42	Log warn about using both q and vector field parameters	2023-06-27 12:32:44 +02:00
Clément Renault	b2b413db12	Return all the _semanticScore values in the documents	2023-06-27 12:32:43 +02:00
Clément Renault	30741d17fa	Change the TODO message	2023-06-27 12:32:43 +02:00
Clément Renault	ebad1f396f	Remove the useless euclidean distance implementation	2023-06-27 12:32:43 +02:00
Clément Renault	29d8268c94	Fix the vector query part by using the correct universe	2023-06-27 12:32:43 +02:00
Clément Renault	63bfe1cee2	Ignore when there are too many vectors	2023-06-27 12:32:43 +02:00
Clément Renault	f3e4d70638	Send analytics about the query vector length	2023-06-27 12:32:43 +02:00
Kerollmops	eecf20f109	Introduce a new invalid_vector_store	2023-06-27 12:32:42 +02:00
Kerollmops	816d7ed174	Update the Vector Store product feature link	2023-06-27 12:32:42 +02:00
Louis Dureuil	864ad2a23c	Check that vector store feature is enabled	2023-06-27 12:32:42 +02:00
Kerollmops	66fb5c150c	Rename _semanticSimilarity into _semanticScore	2023-06-27 12:32:42 +02:00
Kerollmops	7c2f5f77b8	Make clippy and fmt happy	2023-06-27 12:32:42 +02:00
Kerollmops	66b8cfd8c8	Introduce a way to store the HNSW on multiple LMDB entries	2023-06-27 12:32:42 +02:00
Kerollmops	ff3664431f	Make rustfmt happy	2023-06-27 12:32:42 +02:00
Kerollmops	531748c536	Return a user error when the _vectors type is invalid	2023-06-27 12:32:41 +02:00
Kerollmops	7aa1275337	Display the _semanticSimilarity even if the `_vectors` field is not displayed	2023-06-27 12:32:41 +02:00
Kerollmops	737aec1705	Expose an _semanticSimilarity as a dot product in the documents	2023-06-27 12:32:41 +02:00
Kerollmops	3e3c743392	Make Rustfmt happy	2023-06-27 12:32:41 +02:00
Kerollmops	5c5a4e075d	Make clippy happy	2023-06-27 12:32:41 +02:00
Kerollmops	ab9f2269aa	Normalize the vectors during indexation and search	2023-06-27 12:32:41 +02:00
Kerollmops	321ec5f3fa	Accept multiple vectors by documents using the _vectors field	2023-06-27 12:32:40 +02:00
Kerollmops	1b2923f7c0	Return the vector in the output of the search routes	2023-06-27 12:32:40 +02:00
Kerollmops	717d4fddd4	Remove the unused distance	2023-06-27 12:32:40 +02:00
Kerollmops	a7e0f0de89	Introduce a new error message for invalid vector dimensions	2023-06-27 12:32:40 +02:00
Kerollmops	3b560ef7d0	Make clippy happy	2023-06-27 12:32:40 +02:00
Kerollmops	2cf747cb89	Fix the tests	2023-06-27 12:32:40 +02:00
Kerollmops	3c31e1cdd1	Support more pages but in an ugly way	2023-06-27 12:32:39 +02:00
Kerollmops	23eaaf1001	Change the name of the distance module	2023-06-27 12:32:39 +02:00
Kerollmops	c2a402f3ae	Implement an ugly deletion of values in the HNSW	2023-06-27 12:32:39 +02:00
Kerollmops	436a10bef4	Replace the euclidean with a dot product	2023-06-27 12:32:39 +02:00
Kerollmops	8debf6fe81	Use a basic euclidean distance function	2023-06-27 12:32:39 +02:00
Kerollmops	c79e82c62a	Move back to the hnsw crate This reverts commit 7a4b6c065482f988b01298642f4c18775503f92f.	2023-06-27 12:32:39 +02:00
Kerollmops	aca305bb77	Log more to make sure we insert vectors in the hgg data-structure	2023-06-27 12:32:38 +02:00
Kerollmops	5816008139	Introduce an optimized version of the euclidean distance function	2023-06-27 12:32:38 +02:00
Kerollmops	268a9ef416	Move to the hgg crate	2023-06-27 12:32:38 +02:00
Clément Renault	642b0f3a1b	Expose a new vector field on the search route	2023-06-27 12:32:38 +02:00
Clément Renault	cad90e8cbc	Add a vector field to the search routes	2023-06-27 12:32:38 +02:00
Clément Renault	4571e512d2	Store the vectors in an HNSW in LMDB	2023-06-27 12:32:38 +02:00
Clément Renault	7ac2f1489d	Extract the vectors from the documents	2023-06-27 12:32:37 +02:00
Clément Renault	34349faeae	Create a new _vector extractor	2023-06-27 12:32:37 +02:00
meili-bors[bot]	ed0a5be4b6	Merge #3853 3853: docs: fixed some broken links r=gillian-meilisearch a=0xflotus Some of the links in the README file were broken. Co-authored-by: 0xflotus <0xflotus@gmail.com>	2023-06-27 10:30:13 +00:00
meili-bors[bot]	f105df6599	Merge #3850 3850: Experimental features r=Kerollmops a=dureuill # Pull Request ## Related issue - Fixes https://github.com/meilisearch/meilisearch/issues/3857 - Related to https://github.com/meilisearch/meilisearch/issues/3771 ## What does this PR do? ### Example <details> <summary>Using the feature to enable `scoreDetails`</summary> ```json ❯ curl \ -X POST 'http://localhost:7700/indexes/index-word-count-10-count/search' \ -H 'Content-Type: application/json' \ --data-binary '{ "q": "Batman", "limit": 1, "showRankingScoreDetails": true, "attributesToRetrieve": ["title"]}' \| jsonxf { "message": "Computing score details requires enabling the `score details` experimental feature. See https://github.com/meilisearch/product/discussions/674", "code": "feature_not_enabled", "type": "invalid_request", "link": "https://docs.meilisearch.com/errors#feature_not_enabled" } ``` ```json ❯ curl \ -X PATCH 'http://localhost:7700/experimental-features/' \ -H 'Content-Type: application/json' \ --data-binary '{ "scoreDetails": true }' {"scoreDetails":true,"vectorSearch":false} ``` ```json ❯ curl \ -X POST 'http://localhost:7700/indexes/index-word-count-10-count/search' \ -H 'Content-Type: application/json' \ --data-binary '{ "q": "Batman", "limit": 1, "showRankingScoreDetails": true, "attributesToRetrieve": ["title"]}' \| jsonxf { "hits": [ { "title": "Batman", "_rankingScoreDetails": { "words": { "order": 0, "matchingWords": 1, "maxMatchingWords": 1, "score": 1.0 }, "typo": { "order": 1, "typoCount": 0, "maxTypoCount": 1, "score": 1.0 }, "proximity": { "order": 2, "score": 1.0 }, "attribute": { "order": 3, "attribute_ranking_order_score": 1.0, "query_word_distance_score": 1.0, "score": 1.0 }, "exactness": { "order": 4, "matchType": "exactMatch", "score": 1.0 } } } ], "query": "Batman", "processingTimeMs": 3, "limit": 1, "offset": 0, "estimatedTotalHits": 46 } ``` </details> ### User standpoint - Add new route GET/POST/PATCH/DELETE `/experimental-features` to switch on or off some of the experimental features in a manner persistent between instance restarts - Use these new routes to allow setting on/off the following experimental features: - vector store TODO: fill in issue - score details (related to https://github.com/meilisearch/meilisearch/issues/3771) - Make the way of checking feature availability and error message uniform for the Prometheus metrics experimental feature - Save the enabled features in dump, restore from dumps - TODO: tests: - Test new security permissions (do they allow access with ALL, do they prevent access when missing) - Test dump behavior, in particular ability to import existing v6 dumps - Test basic behavior when calling the rule ### Implementation standpoint - New DB "experimental-features" - dumps are modified to save the state of that new DB as a `experimental-features.json` file, that is then loaded back when importing the dump. This doesn't change the dump version, as the file is optional and it missing will not cause the dump to fail Co-authored-by: Louis Dureuil <louis@meilisearch.com>	2023-06-26 15:13:43 +00:00

... 4 5 6 7 8 ...

8431 Commits