meilisearch

mirror of https://github.com/meilisearch/meilisearch.git synced 2024-11-23 02:27:40 +08:00

Author	SHA1	Message	Date
meili-bors[bot]	3a3414270d	Merge #3952 3952: Use the new safe `read-txn-no-tls` heed feature r=ManyTheFish a=Kerollmops [We recently found out](https://github.com/meilisearch/heed/issues/191#issuecomment-1650280513) that the `read-sync-txn` heed feature was invalid and must be removed from this crate. We were declaring it in milli/meilisearch but, fortunately, not sharing the `RoTxn`s across threads 😮‍💨 [I recently introduced the `read-txn-no-tls` heed feature](https://github.com/meilisearch/heed/pull/194), which implements `RoTxn: Send` and allows multiple read transactions on a single thread (which we use). This PR removes the `sync-read-txn` heed feature from the _Cargo.toml_ file. I will fix this in heed v0.20.0 and will fill a RustSec advisory in the meantime. Co-authored-by: Clément Renault <clement@meilisearch.com>	2023-07-26 16:40:58 +00:00
meili-bors[bot]	939b2fc6fd	Merge #3949 3949: Fix score details casing r=Kerollmops a=ManyTheFish # Pull Request Fixes #3941 Co-authored-by: ManyTheFish <many@meilisearch.com>	2023-07-26 14:14:59 +00:00
Clément Renault	d8b47b689e	Use the new read-txn-no-tls heed feature	2023-07-26 15:45:15 +02:00
ManyTheFish	b0c1a9504a	ensure the synonyms are updated when the tokenizer settings are changed	2023-07-26 09:33:42 +02:00
meili-bors[bot]	be72be7c0d	Merge #3942 3942: Normalize for the search the facets values r=ManyTheFish a=Kerollmops This PR improves and fixes the search for facet values feature. Searching for _bre_ wasn't returning facet values like _brévent_ or _brô_. The issue was related to the fact that facets are normalized but not in the same way as the `searchableAttributes` are. We decided to normalize them further and add another intermediate database where the key is the normalized facet value, and the value is a set of the non-normalized facets. We then use these non-normalized ones to get the correct counts by fetching the associated databases. ### What's missing in this PR? - [x] Apply the change to the whole set of `SearchForFacetValue::execute` conditions. - [x] Factorize the code that does an intermediate normalized value fetch in a function. - [x] Add or modify the search for facet value test. Co-authored-by: Clément Renault <clement@meilisearch.com> Co-authored-by: Kerollmops <clement@meilisearch.com>	2023-07-25 14:37:17 +00:00
ManyTheFish	88559a2d54	Fix score details casing	2023-07-25 15:49:33 +02:00
ManyTheFish	d57026cd96	Support synonyms sinergies	2023-07-25 15:01:42 +02:00
Kerollmops	29ab54b259	Replace the hnsw crate by the instant-distance one	2023-07-25 12:37:35 +02:00
ManyTheFish	d4ff59fcf5	Fix clippy	2023-07-24 18:42:26 +02:00
ManyTheFish	9c485f8563	Make the search and the indexing work	2023-07-24 18:35:20 +02:00
Kerollmops	691a536893	Implement the facet search with the normalized index	2023-07-24 17:56:17 +02:00
ManyTheFish	d8d12d5979	Be able to set and reset settings	2023-07-24 17:00:18 +02:00
Clément Renault	df528b41d8	Normalize for the search the facets values	2023-07-20 17:57:07 +02:00
ManyTheFish	0497f93494	Update Charabia to the last version	2023-07-19 15:19:32 +02:00
Kerollmops	eef95de30e	First iteration on exposing puffin profiling	2023-07-18 17:38:13 +02:00
Kerollmops	d383afc82b	Fix the geo sort when lat and lng are strings	2023-07-17 18:28:04 +02:00
meili-bors[bot]	7745cc9d3c	Merge #3921 3921: Deactivate camel case segmentation r=dureuill a=ManyTheFish # Pull Request This PR deactivates the camel case segmentation to retrieve the possibility to accept typos over camel-cased words ## Related issue Fixes #3869 Fixes #3818 ## What does this PR do? - deactivates camelcase segmentation related to #3919 Co-authored-by: ManyTheFish <many@meilisearch.com>	2023-07-13 11:00:14 +00:00
ManyTheFish	c106906f8f	deactivate camelCase segmentation	2023-07-13 12:06:27 +02:00
Louis Dureuil	4310928803	Fixes #3912	2023-07-12 10:08:56 +02:00
Louis Dureuil	74315b4ea8	Fixes #3911	2023-07-12 10:08:29 +02:00
Louis Dureuil	40fa59d64c	Sort by lexicographic order after normalization	2023-07-10 09:26:59 +02:00
Louis Dureuil	55cd7738b9	Update snapshots	2023-07-04 16:31:01 +02:00
Louis Dureuil	48409c9183	Add missing exactness.matchingWords, exactness.maxMatchingWords	2023-07-04 16:31:01 +02:00
Kerollmops	a442af6a7c	Update the features of the either dependency to compile milli successfully	2023-07-03 18:51:43 +02:00
Louis Dureuil	324d448236	Format let-else ❤️ 🎉	2023-07-03 10:20:28 +02:00
meili-bors[bot]	661d1f90dc	Merge #3866 3866: Update charabia v0.8.0 r=dureuill a=ManyTheFish # Pull Request Update Charabia: - enhance Japanese segmentation - enhance Latin Tokenization - words containing `_` are now properly segmented into several words - brackets `{([])}` are no more considered as context separators so word separated by brackets are now considered near together for the proximity ranking rule - fixes #3815 - fixes #3778 - fixes [product#151](https://github.com/meilisearch/product/discussions/151) > Important note: now the float numbers are segmented around the `.` so `3.22` is segmented as [`3`, `.`, `22`] but the middle dot isn't considered as a hard separator, which means that if we search `3.22` we find documents containing `3.22` Co-authored-by: ManyTheFish <many@meilisearch.com>	2023-06-29 15:24:36 +00:00
ManyTheFish	6ec7541026	Update inta snapshots	2023-06-29 17:18:39 +02:00
ManyTheFish	a82c49ab08	Update test	2023-06-29 15:56:36 +02:00
ManyTheFish	84845de9ef	Update Charabia	2023-06-29 15:56:32 +02:00
Clément Renault	7c157fc442	Document that the LevelEntry fields order is important	2023-06-29 14:33:32 +02:00
Clément Renault	0b97596c93	Replace unwraps with ?	2023-06-29 14:33:32 +02:00
Clément Renault	a0e0fce677	Simplify a Rust lifetime trick	2023-06-29 14:33:32 +02:00
Clément Renault	b951830461	Add more tests	2023-06-29 14:33:32 +02:00
Kerollmops	b132e859f7	Make clippy happy	2023-06-29 14:33:31 +02:00
Kerollmops	9917bf046a	Move the sortFacetValuesBy in the faceting settings	2023-06-29 14:33:31 +02:00
Kerollmops	d9fea0143f	Make Clippy happy	2023-06-29 14:33:31 +02:00
Kerollmops	a385642ec3	Replace the BTreeMap by an IndexMap to return values in order	2023-06-29 14:33:31 +02:00
Kerollmops	34b2e98fe9	Expose a sortFacetValuesBy parameter to the user	2023-06-29 14:33:00 +02:00
Kerollmops	80bbd4b6f3	Clean and make the facet order configurable internally	2023-06-29 14:31:17 +02:00
Kerollmops	f42bef2f66	Make the search to always return the facets ordered by count	2023-06-29 14:31:17 +02:00
Kerollmops	bd3c026406	First to-test version of the algorithm	2023-06-29 14:31:17 +02:00
Kerollmops	84f8938f33	Rename facet distribution to be explicit on the order to find them	2023-06-29 14:31:15 +02:00
Clément Renault	efbe7ce78b	Clean the facet string FSTs when we clear the documents	2023-06-28 15:36:32 +02:00
Kerollmops	26f0fa678d	Change the error message when a facet is not searchable	2023-06-28 15:06:09 +02:00
Kerollmops	60ddd53439	Return one of the original facet values when doing a facet search	2023-06-28 15:06:09 +02:00
Kerollmops	2bcd8d2983	Make sure the facet queries are normalized	2023-06-28 15:06:09 +02:00
Kerollmops	41760a9306	Introduce a new invalid_facet_search_facet_name error code	2023-06-28 15:06:07 +02:00
Kerollmops	e9a3029c30	Use the right field id to write the string facet values FST	2023-06-28 15:01:51 +02:00
Kerollmops	ed0ff47551	Return an empty list of results if attribute is set as filterable	2023-06-28 15:01:51 +02:00
Clément Renault	e1b8fb48ee	Use the minWordSizeForTypos index settings	2023-06-28 15:01:51 +02:00
Clément Renault	87e22e436a	Fix compilation issues	2023-06-28 15:01:51 +02:00
Clément Renault	0252cfe8b6	Simplify the placeholder search of the facet-search route	2023-06-28 15:01:50 +02:00
Clément Renault	f35ad96afa	Use the disableOnAttributes parameter on the facet-search route	2023-06-28 15:01:50 +02:00
Clément Renault	2ceb781c73	Use the disableOnWords parameter on the facet-search route	2023-06-28 15:01:50 +02:00
Clément Renault	7bd67543dd	Support the typoTolerant.enabled parameter	2023-06-28 15:01:50 +02:00
Clément Renault	8e86eb91bb	Log an error when a facet value is missing from the database	2023-06-28 15:01:50 +02:00
Clément Renault	55c17aa38b	Rename the SearchForFacetValues struct	2023-06-28 15:01:50 +02:00
Clément Renault	aadbe88048	Return an internal error when a field id is missing	2023-06-28 15:01:50 +02:00
Clément Renault	f36de2115f	Make clippy happy	2023-06-28 15:01:50 +02:00
Clément Renault	702041b7e1	Improve the returned errors from the facet-search route	2023-06-28 15:01:48 +02:00
Clément Renault	a05074e675	Fix the max number of facets to be returned to 100	2023-06-28 14:58:42 +02:00
Clément Renault	93f30e65a9	Return the correct response JSON object from the facet-search route	2023-06-28 14:58:42 +02:00
Clément Renault	e81809aae7	Make the search for facet work	2023-06-28 14:58:41 +02:00
Kerollmops	ce7e7f12c8	Introduce the facet search route	2023-06-28 14:58:41 +02:00
Kerollmops	addb21f110	Restrict the number of facet search results to 1000	2023-06-28 14:58:41 +02:00
Kerollmops	c34de05106	Introduce the SearchForFacetValue struct	2023-06-28 14:58:41 +02:00
Clément Renault	15a4c05379	Store the facet string values in multiple FSTs	2023-06-28 14:58:41 +02:00
meili-bors[bot]	d4f10800f2	Merge #3834 3834: Define searchable fields at runtime r=Kerollmops a=ManyTheFish ## Summary This feature allows the end-user to search in one or multiple attributes using the search parameter `attributesToSearchOn`: ```json { "q": "Captain Marvel", "attributesToSearchOn": ["title"] } ``` This feature act like a filter, forcing Meilisearch to only return the documents containing the requested words in the attributes-to-search-on. Note that, with the matching strategy `last`, Meilisearch will only ensure that the first word is in the attributes-to-search-on, but, the retrieved documents will be ordered taking into account the word contained in the attributes-to-search-on. ## Trying the prototype A dedicated docker image has been released for this feature: #### last prototype version: ```bash docker pull getmeili/meilisearch:prototype-define-searchable-fields-at-search-time-1 ``` #### others prototype versions: ```bash docker pull getmeili/meilisearch:prototype-define-searchable-fields-at-search-time-0 ``` ## Technical Detail The attributes-to-search-on list is given to the search context, then, the search context uses the `fid_word_docids`database using only the allowed field ids instead of the global `word_docids` database. This is the same for the prefix databases. The database cache is updated with the merged values, meaning that the union of the field-id-database values is only made if the requested key is missing from the cache. ### Relevancy limits Almost all ranking rules behave as expected when ordering the documents. Only `proximity` could miss-order documents if all the searched words are in the restricted attribute but a better proximity is found in an ignored attribute in a document that should be ranked lower. I put below a failing test showing it: ```rust #[actix_rt::test] async fn proximity_ranking_rule_order() { let server = Server::new().await; let index = index_with_documents( &server, &json!([ { "title": "Captain super mega cool. A Marvel story", // Perfect distance between words in an ignored attribute "desc": "Captain Marvel", "id": "1", }, { "title": "Captain America from Marvel", "desc": "a Shazam ersatz", "id": "2", }]), ) .await; // Document 2 should appear before document 1. index .search(json!({"q": "Captain Marvel", "attributesToSearchOn": ["title"], "attributesToRetrieve": ["id"]}), \|response, code\| { assert_eq!(code, 200, "{}", response); assert_eq!( response["hits"], json!([ {"id": "2"}, {"id": "1"}, ]) ); }) .await; } ``` Fixing this would force us to create a `fid_word_pair_proximity_docids` and a `fid_word_prefix_pair_proximity_docids` databases which may multiply the keys of `word_pair_proximity_docids` and `word_prefix_pair_proximity_docids` by the number of attributes in the searchable_attributes list. If we think we should fix this test, I'll suggest doing it in another PR. ## Related Fixes #3772 Co-authored-by: Tamo <tamo@meilisearch.com> Co-authored-by: ManyTheFish <many@meilisearch.com>	2023-06-28 08:19:23 +00:00
Clément Renault	30741d17fa	Change the TODO message	2023-06-27 12:32:43 +02:00
Clément Renault	ebad1f396f	Remove the useless euclidean distance implementation	2023-06-27 12:32:43 +02:00
Clément Renault	29d8268c94	Fix the vector query part by using the correct universe	2023-06-27 12:32:43 +02:00
Clément Renault	63bfe1cee2	Ignore when there are too many vectors	2023-06-27 12:32:43 +02:00
Kerollmops	7c2f5f77b8	Make clippy and fmt happy	2023-06-27 12:32:42 +02:00
Kerollmops	66b8cfd8c8	Introduce a way to store the HNSW on multiple LMDB entries	2023-06-27 12:32:42 +02:00
Kerollmops	ff3664431f	Make rustfmt happy	2023-06-27 12:32:42 +02:00
Kerollmops	531748c536	Return a user error when the _vectors type is invalid	2023-06-27 12:32:41 +02:00
Kerollmops	7aa1275337	Display the _semanticSimilarity even if the `_vectors` field is not displayed	2023-06-27 12:32:41 +02:00
Kerollmops	737aec1705	Expose an _semanticSimilarity as a dot product in the documents	2023-06-27 12:32:41 +02:00
Kerollmops	3e3c743392	Make Rustfmt happy	2023-06-27 12:32:41 +02:00
Kerollmops	5c5a4e075d	Make clippy happy	2023-06-27 12:32:41 +02:00
Kerollmops	ab9f2269aa	Normalize the vectors during indexation and search	2023-06-27 12:32:41 +02:00
Kerollmops	321ec5f3fa	Accept multiple vectors by documents using the _vectors field	2023-06-27 12:32:40 +02:00
Kerollmops	717d4fddd4	Remove the unused distance	2023-06-27 12:32:40 +02:00
Kerollmops	a7e0f0de89	Introduce a new error message for invalid vector dimensions	2023-06-27 12:32:40 +02:00
Kerollmops	3b560ef7d0	Make clippy happy	2023-06-27 12:32:40 +02:00
Kerollmops	2cf747cb89	Fix the tests	2023-06-27 12:32:40 +02:00
Kerollmops	3c31e1cdd1	Support more pages but in an ugly way	2023-06-27 12:32:39 +02:00
Kerollmops	23eaaf1001	Change the name of the distance module	2023-06-27 12:32:39 +02:00
Kerollmops	c2a402f3ae	Implement an ugly deletion of values in the HNSW	2023-06-27 12:32:39 +02:00
Kerollmops	436a10bef4	Replace the euclidean with a dot product	2023-06-27 12:32:39 +02:00
Kerollmops	8debf6fe81	Use a basic euclidean distance function	2023-06-27 12:32:39 +02:00
Kerollmops	c79e82c62a	Move back to the hnsw crate This reverts commit 7a4b6c065482f988b01298642f4c18775503f92f.	2023-06-27 12:32:39 +02:00
Kerollmops	aca305bb77	Log more to make sure we insert vectors in the hgg data-structure	2023-06-27 12:32:38 +02:00
Kerollmops	5816008139	Introduce an optimized version of the euclidean distance function	2023-06-27 12:32:38 +02:00
Kerollmops	268a9ef416	Move to the hgg crate	2023-06-27 12:32:38 +02:00
Clément Renault	642b0f3a1b	Expose a new vector field on the search route	2023-06-27 12:32:38 +02:00
Clément Renault	4571e512d2	Store the vectors in an HNSW in LMDB	2023-06-27 12:32:38 +02:00
Clément Renault	7ac2f1489d	Extract the vectors from the documents	2023-06-27 12:32:37 +02:00
Clément Renault	34349faeae	Create a new _vector extractor	2023-06-27 12:32:37 +02:00
ManyTheFish	63ca25290b	Take into account small Review requests	2023-06-26 14:56:19 +02:00

1 2 3 4 5 ...

1888 Commits