meilisearch

mirror of https://github.com/meilisearch/meilisearch.git synced 2025-02-20 01:27:52 +08:00

Author	SHA1	Message	Date
meili-bors[bot]	73bb080a26	Merge #3699 3699: Search for Facet Values r=Kerollmops a=Kerollmops This PR introduces the first version of [the _Search for Facet Values_ feature](https://github.com/meilisearch/product/discussions/515) that allows a user to search for facets, by optionally using a prefix string and optionally specifying the `q` and `filter` original search parameters to restrict the candidates to search in. The steps to merge it into Meilisearch will first start by providing prototype Docker images. This way users will be able to test the prototypes before using them. The current route to use the _Search for Facet Values_ feature is the `POST /indexes/{index}/facet-search` where the body is a JSON object that looks like the following: ```json5 { "q": "spiderman", // optional "filter": "rating > 10", // optional "facetName": "genres", "facetQuery": "a" // optional } ``` ## What is missing? - [x] Send some analytics. - [x] Support the `matchingStrategy` parameter. - [x] Make sure that the errors are the right ones. - [x] Use the [Index typo tolerance settings](https://www.meilisearch.com/docs/learn/configuration/typo_tolerance#minwordsizefortypos) when matching facet values. - [x] minWordSizeForTypos.oneTypo - [x] minWordSizeForTypos.twoTypo - [x] Add tests - [x] Log the time it took to compute the results. - [x] Fix the compilation warnings. - [x] [Create an issue to fix potential performance issues when indexing](https://github.com/meilisearch/meilisearch/issues/3862). Co-authored-by: Clément Renault <clement@meilisearch.com> Co-authored-by: Kerollmops <clement@meilisearch.com>	2023-06-29 09:08:55 +00:00
Clément Renault	44b5b9e1a7	Improve the documentation of the FacetSearchQuery struct	2023-06-29 10:28:23 +02:00
Louis Dureuil	605c1dd54a	Fix analytics	2023-06-28 16:41:56 +02:00
Clément Renault	3e3f73ba1e	Fix the analytics	2023-06-28 15:45:09 +02:00
Clément Renault	efbe7ce78b	Clean the facet string FSTs when we clear the documents	2023-06-28 15:36:32 +02:00
Louis Dureuil	82e1f59f1e	Add attributes_to_search_on	2023-06-28 15:28:24 +02:00
Clément Renault	362e9ff845	Add more tests	2023-06-28 15:28:24 +02:00
Clément Renault	32f2556d22	Move the additional_search_parameters_provided analytic inside facets	2023-06-28 15:06:09 +02:00
Kerollmops	63fd10aaa5	Fix the invalid facet name field error code	2023-06-28 15:06:09 +02:00
Kerollmops	29b40295b8	Ignore unknown facet search query parameters	2023-06-28 15:06:09 +02:00
Kerollmops	26f0fa678d	Change the error message when a facet is not searchable	2023-06-28 15:06:09 +02:00
Kerollmops	60ddd53439	Return one of the original facet values when doing a facet search	2023-06-28 15:06:09 +02:00
Kerollmops	2bcd8d2983	Make sure the facet queries are normalized	2023-06-28 15:06:09 +02:00
Kerollmops	09079a4e88	Remove useless InvalidSearchFacet error	2023-06-28 15:06:09 +02:00
Kerollmops	904f6574bf	Make rustfmt happy	2023-06-28 15:06:08 +02:00
Kerollmops	6fb8af423c	Rename the hits and query output into facetHits and facetQuery respectively	2023-06-28 15:06:08 +02:00
Kerollmops	cb0bb399fa	Fix the error code returned when the facetName field is missing	2023-06-28 15:06:08 +02:00
Kerollmops	41760a9306	Introduce a new invalid_facet_search_facet_name error code	2023-06-28 15:06:07 +02:00
Kerollmops	e9a3029c30	Use the right field id to write the string facet values FST	2023-06-28 15:01:51 +02:00
Kerollmops	ed0ff47551	Return an empty list of results if attribute is set as filterable	2023-06-28 15:01:51 +02:00
Clément Renault	e1b8fb48ee	Use the minWordSizeForTypos index settings	2023-06-28 15:01:51 +02:00
Clément Renault	87e22e436a	Fix compilation issues	2023-06-28 15:01:51 +02:00
Clément Renault	0252cfe8b6	Simplify the placeholder search of the facet-search route	2023-06-28 15:01:50 +02:00
Clément Renault	f35ad96afa	Use the disableOnAttributes parameter on the facet-search route	2023-06-28 15:01:50 +02:00
Clément Renault	2ceb781c73	Use the disableOnWords parameter on the facet-search route	2023-06-28 15:01:50 +02:00
Clément Renault	7bd67543dd	Support the typoTolerant.enabled parameter	2023-06-28 15:01:50 +02:00
Clément Renault	8e86eb91bb	Log an error when a facet value is missing from the database	2023-06-28 15:01:50 +02:00
Clément Renault	55c17aa38b	Rename the SearchForFacetValues struct	2023-06-28 15:01:50 +02:00
Clément Renault	aadbe88048	Return an internal error when a field id is missing	2023-06-28 15:01:50 +02:00
Clément Renault	f36de2115f	Make clippy happy	2023-06-28 15:01:50 +02:00
Clément Renault	702041b7e1	Improve the returned errors from the facet-search route	2023-06-28 15:01:48 +02:00
Clément Renault	a05074e675	Fix the max number of facets to be returned to 100	2023-06-28 14:58:42 +02:00
Clément Renault	93f30e65a9	Return the correct response JSON object from the facet-search route	2023-06-28 14:58:42 +02:00
Clément Renault	893592c5e9	Send analytics about the facet-search route	2023-06-28 14:58:42 +02:00
Clément Renault	e81809aae7	Make the search for facet work	2023-06-28 14:58:41 +02:00
Kerollmops	ce7e7f12c8	Introduce the facet search route	2023-06-28 14:58:41 +02:00
Kerollmops	addb21f110	Restrict the number of facet search results to 1000	2023-06-28 14:58:41 +02:00
Kerollmops	c34de05106	Introduce the SearchForFacetValue struct	2023-06-28 14:58:41 +02:00
Clément Renault	15a4c05379	Store the facet string values in multiple FSTs	2023-06-28 14:58:41 +02:00
meili-bors[bot]	9deeec88e0	Merge #3861 3861: Add "meilisearch" prefix to last metrics that were missing it r=Kerollmops a=dureuill # Pull Request ## Related issue Related to #3790 ## What does this PR do? - change implementation to follow the spec on metrics name - regenerate grafana dashboard from the code ## PR checklist Please check if your PR fulfills the following requirements: - [ ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)? - [ ] Have you read the contributing guidelines? - [ ] Have you made sure that the title is accurate and descriptive of the changes? Thank you so much for contributing to Meilisearch! Co-authored-by: Louis Dureuil <louis@meilisearch.com>	2023-06-28 09:28:31 +00:00
Louis Dureuil	167ac55a2d	Update dashboard generated from grafana	2023-06-28 11:22:16 +02:00
Louis Dureuil	ea68ccd034	prefix http_* metrics by meilisearch	2023-06-28 11:21:50 +02:00
meili-bors[bot]	d4f10800f2	Merge #3834 3834: Define searchable fields at runtime r=Kerollmops a=ManyTheFish ## Summary This feature allows the end-user to search in one or multiple attributes using the search parameter `attributesToSearchOn`: ```json { "q": "Captain Marvel", "attributesToSearchOn": ["title"] } ``` This feature act like a filter, forcing Meilisearch to only return the documents containing the requested words in the attributes-to-search-on. Note that, with the matching strategy `last`, Meilisearch will only ensure that the first word is in the attributes-to-search-on, but, the retrieved documents will be ordered taking into account the word contained in the attributes-to-search-on. ## Trying the prototype A dedicated docker image has been released for this feature: #### last prototype version: ```bash docker pull getmeili/meilisearch:prototype-define-searchable-fields-at-search-time-1 ``` #### others prototype versions: ```bash docker pull getmeili/meilisearch:prototype-define-searchable-fields-at-search-time-0 ``` ## Technical Detail The attributes-to-search-on list is given to the search context, then, the search context uses the `fid_word_docids`database using only the allowed field ids instead of the global `word_docids` database. This is the same for the prefix databases. The database cache is updated with the merged values, meaning that the union of the field-id-database values is only made if the requested key is missing from the cache. ### Relevancy limits Almost all ranking rules behave as expected when ordering the documents. Only `proximity` could miss-order documents if all the searched words are in the restricted attribute but a better proximity is found in an ignored attribute in a document that should be ranked lower. I put below a failing test showing it: ```rust #[actix_rt::test] async fn proximity_ranking_rule_order() { let server = Server::new().await; let index = index_with_documents( &server, &json!([ { "title": "Captain super mega cool. A Marvel story", // Perfect distance between words in an ignored attribute "desc": "Captain Marvel", "id": "1", }, { "title": "Captain America from Marvel", "desc": "a Shazam ersatz", "id": "2", }]), ) .await; // Document 2 should appear before document 1. index .search(json!({"q": "Captain Marvel", "attributesToSearchOn": ["title"], "attributesToRetrieve": ["id"]}), \|response, code\| { assert_eq!(code, 200, "{}", response); assert_eq!( response["hits"], json!([ {"id": "2"}, {"id": "1"}, ]) ); }) .await; } ``` Fixing this would force us to create a `fid_word_pair_proximity_docids` and a `fid_word_prefix_pair_proximity_docids` databases which may multiply the keys of `word_pair_proximity_docids` and `word_prefix_pair_proximity_docids` by the number of attributes in the searchable_attributes list. If we think we should fix this test, I'll suggest doing it in another PR. ## Related Fixes #3772 Co-authored-by: Tamo <tamo@meilisearch.com> Co-authored-by: ManyTheFish <many@meilisearch.com>	2023-06-28 08:19:23 +00:00
meili-bors[bot]	dc293911ad	Merge #3745 3745: tests: add unit test for `PayloadTooLarge` error r=curquiza a=cymruu # Pull Request Add a unit test for the `Payload`, which verifies that a request with a payload that is too large is rejected with the appropriate message. This was requested in this PR https://github.com/meilisearch/meilisearch/pull/3739 ## Related issue https://github.com/meilisearch/meilisearch/pull/3739 ## What does this PR do? - Adds requested test ## PR checklist Please check if your PR fulfills the following requirements: - [ ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)? - [ ] Have you read the contributing guidelines? - [ ] Have you made sure that the title is accurate and descriptive of the changes? Thank you so much for contributing to Meilisearch! Co-authored-by: Filip Bachul <filipbachul@gmail.com>	2023-06-27 14:58:23 +00:00
meili-bors[bot]	9d68e6969e	Merge #3859 3859: Merge all analytics events pertaining to updating the experimental features r=Kerollmops a=dureuill Follow-up to #3850 Co-authored-by: Louis Dureuil <louis@meilisearch.com>	2023-06-27 13:26:01 +00:00
Louis Dureuil	b4b686d253	Merge all analytics events pertaining to updating the experimental features	2023-06-27 15:16:23 +02:00
meili-bors[bot]	98ec476198	Merge #3855 3855: Change and add links to the Cloud r=Kerollmops a=dureuill - add cloud link in banner - add utm to existing links following https://github.com/meilisearch/integration-guides/issues/277#issuecomment-1592054536 Co-authored-by: Louis Dureuil <louis@meilisearch.com>	2023-06-27 12:29:36 +00:00
Louis Dureuil	c47b8a8bfe	Fix typo Co-authored-by: Guillaume Mourier <guillaume@meilisearch.com>	2023-06-27 14:27:54 +02:00
Louis Dureuil	054f81a021	Make message consistent with the one in integration repos	2023-06-27 14:20:55 +02:00
meili-bors[bot]	d8ea688481	Merge #3825 3825: Accept semantic vectors and allow users to query nearest neighbors r=Kerollmops a=Kerollmops This Pull Request brings a new feature to the current API. The engine accepts a new `_vectors` field akin to the `_geo` one. This vector is stored in Meilisearch and can be retrieved via search. This work is the first step toward hybrid search, bringing the best of both worlds: keyword and semantic search ❤️‍🔥 ## ToDo - [x] Make it possible to get the `limit` nearest neighbors from a user-generated vector by using the `vector` field of search route. - [x] Delete the documents and vectors from the HNSW-related data structures. - [x] Do it the slow and ugly way (we need to be able to iterate over all the values). - [ ] Do it the efficient way (Wait for a new method or implement it myself). - [ ] ~~Move from the `hnsw` crate to the hgg one~~ The hgg crate is too slow. Meilisearch takes approximately 88s to answer a query. It is related to the time it takes to deserialize the `Hgg` data structure or search in it. I didn't take the time to measure precisely. We moved back to the hnsw crate which takes approximately 40ms to answer. - [ ] ~~Wait for a fix for https://github.com/rust-cv/hgg/issues/4.~~ - [x] Fix the current dot product function. - [x] Fill in the other `SearchResult` fields. - [x] Remove the `hnsw` dependency of the meilisearch crate. - [x] Fix the pages by taking the offset into account. - [x] Release a first prototype https://github.com/meilisearch/product/discussions/621#discussioncomment-6183647 - [x] Make the pagination and filtering faster and more correct. - [x] Return the original vector in the output search results (like `query`). - [x] Return an `_semanticSimilarity` field in the documents (it's a dot product) - [x] Return this score even if the `_vectors` field is not displayed - [x] Rename the field `_semanticScore`. - [ ] Return the `_geoDistance` value even if the `_geo` field is not displayed - [x] Store the HNSW on possibly multiple LMDB values. - [ ] Measure it and make it faster if needed - [ ] Export the `ReadableSlices` type into a small external crate - [x] Accept an `_vectors` field instead of the `_vector` one. - [x] Normalize all vectors. - [ ] Remove the `_vectors` field from the default searchable attributes (as we do with `_geo`?). - [ ] Correctly compute the candidates by remembering the documents having a valid `_vectors` field. - [ ] Return the right errors: - [ ] Return an error when the query vector is not the same length as the vectors in the HNSW. - [ ] We must return the user document id that triggered the vector dimension issue. - [x] If an indexation error occurs. - [ ] Fix the error codes when using the search route. - [ ] ~~Introduce some settings:~~ We currently ensure that the vector length is consistent over the whole set of documents and return an error for when a vector dimension doesn't follow the current number of dimensions. - [ ] The length of the vector the user will provide. - [ ] The distance function (we only support dot as of now). - [ ] Introduce other distance functions - [ ] Euclidean - [ ] Dot Product - [ ] Cosine - [ ] Make them SIMD optimized - [ ] Give credit to qdrant - [ ] Add tests. - [ ] Write a mini spec. - [ ] Release it in v1.3 as an experimental feature. Co-authored-by: Clément Renault <clement@meilisearch.com> Co-authored-by: Kerollmops <clement@meilisearch.com>	2023-06-27 11:17:07 +00:00

1 2 3 4 5 ...

8222 Commits