meilisearch

mirror of https://github.com/meilisearch/meilisearch.git synced 2024-11-30 17:14:59 +08:00

Author	SHA1	Message	Date
ManyTheFish	bddc168d83	List TODOs	2023-12-06 14:59:23 +01:00
ManyTheFish	3b3fa38f27	Put the restrict list in a sub-struct	2023-11-28 18:37:57 +01:00
ManyTheFish	d6c2ee15a9	Filter on attributes before computing the docids when attribute restriction is on	2023-11-28 14:55:29 +01:00
Clément Renault	d32eb11329	Move to the v0.20.0-alpha.9 of heed	2023-11-27 11:52:22 +01:00
Clément Renault	58dac8af42	Remove the panics and unwraps	2023-11-23 15:00:48 +01:00
Clément Renault	0dbf1a16ff	Make clippy happy	2023-11-23 14:11:38 +01:00
Clément Renault	0d4482625a	Make the changes to use heed v0.20-alpha.6	2023-11-23 11:43:58 +01:00
Clément Renault	7cb7e37ba8	Merge branch 'main' into tmp-release-v1.5.0	2023-11-21 16:30:46 +01:00
ManyTheFish	1f36410541	Update tests	2023-11-13 13:36:39 +01:00
Louis Dureuil	8c649d8061	Throw error when the vector search is sent with the wrong size	2023-11-13 09:57:42 +01:00
ManyTheFish	688266c83e	Remove word pair proximity prefix cache and compute it at search time	2023-11-08 14:16:01 +01:00
Louis Dureuil	1bccf2079e	Correctly mark non-tests as non-tests	2023-11-06 11:03:56 +01:00
ManyTheFish	94206b0055	Update tests	2023-10-31 13:48:47 +01:00
Louis Dureuil	113527f466	Remove soft-deleted related methods from Index	2023-10-30 11:41:22 +01:00
ManyTheFish	1c5705c164	clean PR warnings	2023-10-30 11:22:05 +01:00
ManyTheFish	df9e5c8651	Generalize usage of CboRoaringBitmap codec to ease the use	2023-10-30 11:15:02 +01:00
ManyTheFish	17b647dfe5	Wip	2023-10-30 11:13:08 +01:00
Tamo	e7244aa485	fix warnings	2023-10-30 11:00:46 +01:00
Louis Dureuil	2bae9550c8	Add explanatory comment	2023-10-23 12:06:28 +02:00
Vivek Kumar	5fe7c4545a	compute all candidates correctly when skipping	2023-10-23 12:02:45 +02:00
meili-bors[bot]	5e0485d8dd	Merge #4131 4131: Reduce proximity range from 7 to 3 r=Kerollmops a=ManyTheFish ## Summary This PR aims to reduce the impact of the proximity databases on the indexing time and on the database size by reducing the maximum distance between two words to be indexed in the proximity database. ## Stats ### Impact on database size and indexing time ![Impact on datasets](https://github.com/meilisearch/meilisearch/assets/6482087/28ed3d96-bdde-41c1-bdac-e90c1b1dbb23) ### Impact on search relevancy <details> \| dataset_name \| host_name \| Relevancy rate (Precision) \| completion_rate 25.00% \| completion_rate 50.00% \| completion_rate 75.00% \| completion_rate 100.00% \| \|--------------\|------------------\|------------------------------------\|-----------------\|-----------------\|-----------------\|-----------------\| \| FBIS \| 1_4_0 \| percentile-10 \| 0.00% \| 0.00% \| 0.00% \| 0.00% \| \| FBIS \| 1_4_0 \| percentile-25 \| 0.00% \| 0.00% \| 0.00% \| 0.00% \| \| FBIS \| 1_4_0 \| percentile-50 \| 0.00% \| 0.00% \| 5.00% \| 5.56% \| \| FBIS \| 1_4_0 \| percentile-75 \| 0.00% \| 12.50% \| 35.00% \| 45.00% \| \| FBIS \| 1_4_0 \| percentile-90 \| 20.00% \| 40.00% \| \| 100.00% \| \| FBIS \| 1_4_0 \| average \| 5.78% \| 11.16% \| 21.90% \| 26.29% \| \| FBIS \| reduce_proximity \| percentile-10 \| 0.00% \| 0.00% \| 0.00% \| 0.00% \| \| FBIS \| reduce_proximity \| percentile-25 \| 0.00% \| 0.00% \| 0.00% \| 0.00% \| \| FBIS \| reduce_proximity \| percentile-50 \| 0.00% \| 0.00% \| 5.00% \| 5.56% \| \| FBIS \| reduce_proximity \| percentile-75 \| 0.00% \| 15.00% \| 35.00% \| 40.00% \| \| FBIS \| reduce_proximity \| percentile-90 \| 20.00% \| 40.00% \| 85.00% \| 100.00% \| \| FBIS \| reduce_proximity \| average \| 5.55% \| 11.34% \| 21.75% \| 26.14% \| \| FR94 \| 1_4_0 \| percentile-10 \| 0.00% \| 0.00% \| 0.00% \| 0.00% \| \| FR94 \| 1_4_0 \| percentile-25 \| 0.00% \| 0.00% \| 0.00% \| 0.00% \| \| FR94 \| 1_4_0 \| percentile-50 \| 0.00% \| 0.00% \| 0.00% \| 0.00% \| \| FR94 \| 1_4_0 \| percentile-75 \| 0.00% \| 5.00% \| 15.00% \| 42.11% \| \| FR94 \| 1_4_0 \| percentile-90 \| 15.00% \| 54.55% \| 100.00% \| 100.00% \| \| FR94 \| 1_4_0 \| average \| 5.95% \| 12.07% \| 18.70% \| 25.57% \| \| FR94 \| reduce_proximity \| percentile-10 \| 0.00% \| 0.00% \| 0.00% \| 0.00% \| \| FR94 \| reduce_proximity \| percentile-25 \| 0.00% \| 0.00% \| 0.00% \| 0.00% \| \| FR94 \| reduce_proximity \| percentile-50 \| 0.00% \| 0.00% \| 0.00% \| 0.00% \| \| FR94 \| reduce_proximity \| percentile-75 \| 0.00% \| 5.00% \| 15.00% \| 42.11% \| \| FR94 \| reduce_proximity \| percentile-90 \| 15.00% \| 54.55% \| 100.00% \| 100.00% \| \| FR94 \| reduce_proximity \| average \| 5.79% \| 12.00% \| 18.70% \| 25.53% \| \| FT \| 1_4_0 \| percentile-10 \| 0.00% \| 0.00% \| 0.00% \| 0.00% \| \| FT \| 1_4_0 \| percentile-25 \| 0.00% \| 0.00% \| 0.00% \| 0.00% \| \| FT \| 1_4_0 \| percentile-50 \| 0.00% \| 0.00% \| 5.00% \| 10.00% \| \| FT \| 1_4_0 \| percentile-75 \| 0.00% \| 15.00% \| 30.00% \| 40.00% \| \| FT \| 1_4_0 \| percentile-90 \| 20.00% \| 50.00% \| 65.00% \| 100.00% \| \| FT \| 1_4_0 \| average \| 5.08% \| 12.58% \| 20.00% \| 25.49% \| \| FT \| reduce_proximity \| percentile-10 \| 0.00% \| 0.00% \| 0.00% \| 0.00% \| \| FT \| reduce_proximity \| percentile-25 \| 0.00% \| 0.00% \| 0.00% \| 0.00% \| \| FT \| reduce_proximity \| percentile-50 \| 0.00% \| 0.00% \| 5.00% \| 10.00% \| \| FT \| reduce_proximity \| percentile-75 \| 0.00% \| 15.00% \| 30.00% \| 40.00% \| \| FT \| reduce_proximity \| percentile-90 \| 10.00% \| 45.00% \| 60.00% \| 100.00% \| \| FT \| reduce_proximity \| average \| 5.01% \| 12.64% \| 20.10% \| 25.53% \| \| LAT \| 1_4_0 \| percentile-10 \| 0.00% \| 0.00% \| 0.00% \| 0.00% \| \| LAT \| 1_4_0 \| percentile-25 \| 0.00% \| 0.00% \| 0.00% \| 0.00% \| \| LAT \| 1_4_0 \| percentile-50 \| 0.00% \| 0.00% \| 5.00% \| 5.00% \| \| LAT \| 1_4_0 \| percentile-75 \| 5.00% \| 15.00% \| 30.00% \| 30.00% \| \| LAT \| 1_4_0 \| percentile-90 \| 15.00% \| 45.00% \| 60.00% \| 80.00% \| \| LAT \| 1_4_0 \| average \| 4.80% \| 11.80% \| 17.88% \| 21.62% \| \| LAT \| reduce_proximity \| percentile-10 \| 0.00% \| 0.00% \| 0.00% \| 0.00% \| \| LAT \| reduce_proximity \| percentile-25 \| 0.00% \| 0.00% \| 0.00% \| 0.00% \| \| LAT \| reduce_proximity \| percentile-50 \| 0.00% \| 0.00% \| 5.00% \| 5.00% \| \| LAT \| reduce_proximity \| percentile-75 \| 0.00% \| 11.11% \| 25.00% \| 35.00% \| \| LAT \| reduce_proximity \| percentile-90 \| 15.00% \| 45.00% \| 55.00% \| 80.00% \| \| LAT \| reduce_proximity \| average \| 4.43% \| 11.23% \| 17.32% \| 21.45% \| </details> ### Impact on Search time \| dataset_name \| host_name \| 25.00% \| 50.00% \| 75.00% \| 100.00% \| Average \| \|--------------\|------------------\|------------:\|------------:\|------------:\|------------:\|-------------\| \| FBIS \| 1_4_0 \| 3.45 \| 7.446666667 \| 9.773489933 \| 9.620300752 \| 7.572614338 \| \| FBIS \| reduce_proximity \| 2.983333333 \| 5.316666667 \| 6.911073826 \| 7.637218045 \| 5.712072968 \| \| FR94 \| 1_4_0 \| 2.236666667 \| 4.45 \| 5.523489933 \| 4.560150376 \| 4.192576744 \| \| FR94 \| reduce_proximity \| 2.09 \| 3.991666667 \| 4.981543624 \| 4.266917293 \| 3.832531896 \| \| FT \| 1_4_0 \| 5.956666667 \| 9.656666667 \| 13.86912752 \| 10.83270677 \| 10.0787919 \| \| FT \| reduce_proximity \| 4.51 \| 5.981666667 \| 7.701342282 \| 6.766917293 \| 6.23998156 \| \| LAT \| 1_4_0 \| 5.856666667 \| 9.233333333 \| 12.98322148 \| 10.78759398 \| 9.715203865 \| \| LAT \| reduce_proximity \| 6.91 \| 6.706666667 \| 8.463087248 \| 8.265037594 \| 7.586197877 \| ## Technical approach - Ensure the MAX_DISTANCE constant is used everywhere needed - Reduce the MAX_DISTANCE from 8 to 4 ## Related TBD Co-authored-by: ManyTheFish <many@meilisearch.com>	2023-10-18 14:56:08 +00:00
ManyTheFish	27eec21415	Fix tests	2023-10-18 16:03:22 +02:00
Vivek Kumar	d4da06ff47	fix bug where distinct search with no ranking returns offset+limit hits	2023-10-11 19:02:16 +05:30
ManyTheFish	43989fe2e4	Reduce porximity range from 7 to 3	2023-10-03 12:16:48 +02:00
Vivek Kumar	abfa7ded25	use a new temp index in the test	2023-09-08 12:32:47 +05:30
Vivek Kumar	f2837aaec2	add another test case	2023-09-08 11:39:54 +05:30
Vivek Kumar	11df155598	fix highlighting bug when searching for a phrase with cropping	2023-09-08 11:39:52 +05:30
meili-bors[bot]	ccf3ba3f32	Merge #4019 4019: Bringing back changes from `v1.3.2` onto `main` r=irevoire a=Kerollmops Co-authored-by: Kerollmops <clement@meilisearch.com> Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com> Co-authored-by: irevoire <irevoire@users.noreply.github.com> Co-authored-by: Clément Renault <clement@meilisearch.com>	2023-08-28 12:14:11 +00:00
Clément Renault	8c0ebd1331	Update milli/src/search/new/bucket_sort.rs Co-authored-by: Louis Dureuil <louis@meilisearch.com>	2023-08-23 16:40:39 +02:00
Kerollmops	5130e06b41	Temporarily disable an assert in the ranking rules	2023-08-23 16:11:54 +02:00
meili-bors[bot]	914b125c5f	Merge #3945 3945: Do not leak field information on error r=Kerollmops a=vivek-26 # Pull Request ## Related issue Fixes #3865 ## What does this PR do? This PR ensures that `InvalidSortableAttribute`and `InvalidFacetSearchFacetName` errors do not leak field information i.e. fields which are not part of `displayedAttributes` in the settings are hidden from the error message. ## PR checklist Please check if your PR fulfills the following requirements: - [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)? - [x] Have you read the contributing guidelines? - [x] Have you made sure that the title is accurate and descriptive of the changes? Thank you so much for contributing to Meilisearch! Co-authored-by: Vivek Kumar <vivek.26@outlook.com>	2023-08-22 18:55:27 +00:00
ManyTheFish	4a21fecf67	Merge branch 'main' into settings-customizing-tokenization	2023-08-08 16:08:16 +02:00
Vivek Kumar	dd57873f8e	hide fields not in the displayedAttributes list from errors	2023-08-05 16:03:10 +05:30
ManyTheFish	b0c1a9504a	ensure the synonyms are updated when the tokenizer settings are changed	2023-07-26 09:33:42 +02:00
meili-bors[bot]	be72be7c0d	Merge #3942 3942: Normalize for the search the facets values r=ManyTheFish a=Kerollmops This PR improves and fixes the search for facet values feature. Searching for _bre_ wasn't returning facet values like _brévent_ or _brô_. The issue was related to the fact that facets are normalized but not in the same way as the `searchableAttributes` are. We decided to normalize them further and add another intermediate database where the key is the normalized facet value, and the value is a set of the non-normalized facets. We then use these non-normalized ones to get the correct counts by fetching the associated databases. ### What's missing in this PR? - [x] Apply the change to the whole set of `SearchForFacetValue::execute` conditions. - [x] Factorize the code that does an intermediate normalized value fetch in a function. - [x] Add or modify the search for facet value test. Co-authored-by: Clément Renault <clement@meilisearch.com> Co-authored-by: Kerollmops <clement@meilisearch.com>	2023-07-25 14:37:17 +00:00
Kerollmops	29ab54b259	Replace the hnsw crate by the instant-distance one	2023-07-25 12:37:35 +02:00
ManyTheFish	9c485f8563	Make the search and the indexing work	2023-07-24 18:35:20 +02:00
Kerollmops	691a536893	Implement the facet search with the normalized index	2023-07-24 17:56:17 +02:00
Clément Renault	df528b41d8	Normalize for the search the facets values	2023-07-20 17:57:07 +02:00
Kerollmops	d383afc82b	Fix the geo sort when lat and lng are strings	2023-07-17 18:28:04 +02:00
Louis Dureuil	4310928803	Fixes #3912	2023-07-12 10:08:56 +02:00
Louis Dureuil	74315b4ea8	Fixes #3911	2023-07-12 10:08:29 +02:00
Louis Dureuil	55cd7738b9	Update snapshots	2023-07-04 16:31:01 +02:00
Louis Dureuil	48409c9183	Add missing exactness.matchingWords, exactness.maxMatchingWords	2023-07-04 16:31:01 +02:00
Louis Dureuil	324d448236	Format let-else ❤️ 🎉	2023-07-03 10:20:28 +02:00
meili-bors[bot]	661d1f90dc	Merge #3866 3866: Update charabia v0.8.0 r=dureuill a=ManyTheFish # Pull Request Update Charabia: - enhance Japanese segmentation - enhance Latin Tokenization - words containing `_` are now properly segmented into several words - brackets `{([])}` are no more considered as context separators so word separated by brackets are now considered near together for the proximity ranking rule - fixes #3815 - fixes #3778 - fixes [product#151](https://github.com/meilisearch/product/discussions/151) > Important note: now the float numbers are segmented around the `.` so `3.22` is segmented as [`3`, `.`, `22`] but the middle dot isn't considered as a hard separator, which means that if we search `3.22` we find documents containing `3.22` Co-authored-by: ManyTheFish <many@meilisearch.com>	2023-06-29 15:24:36 +00:00
ManyTheFish	6ec7541026	Update inta snapshots	2023-06-29 17:18:39 +02:00
ManyTheFish	84845de9ef	Update Charabia	2023-06-29 15:56:32 +02:00
Clément Renault	7c157fc442	Document that the LevelEntry fields order is important	2023-06-29 14:33:32 +02:00
Clément Renault	0b97596c93	Replace unwraps with ?	2023-06-29 14:33:32 +02:00
Clément Renault	a0e0fce677	Simplify a Rust lifetime trick	2023-06-29 14:33:32 +02:00
Clément Renault	b951830461	Add more tests	2023-06-29 14:33:32 +02:00
Kerollmops	b132e859f7	Make clippy happy	2023-06-29 14:33:31 +02:00
Kerollmops	9917bf046a	Move the sortFacetValuesBy in the faceting settings	2023-06-29 14:33:31 +02:00
Kerollmops	d9fea0143f	Make Clippy happy	2023-06-29 14:33:31 +02:00
Kerollmops	a385642ec3	Replace the BTreeMap by an IndexMap to return values in order	2023-06-29 14:33:31 +02:00
Kerollmops	34b2e98fe9	Expose a sortFacetValuesBy parameter to the user	2023-06-29 14:33:00 +02:00
Kerollmops	80bbd4b6f3	Clean and make the facet order configurable internally	2023-06-29 14:31:17 +02:00
Kerollmops	f42bef2f66	Make the search to always return the facets ordered by count	2023-06-29 14:31:17 +02:00
Kerollmops	bd3c026406	First to-test version of the algorithm	2023-06-29 14:31:17 +02:00
Kerollmops	84f8938f33	Rename facet distribution to be explicit on the order to find them	2023-06-29 14:31:15 +02:00
Kerollmops	60ddd53439	Return one of the original facet values when doing a facet search	2023-06-28 15:06:09 +02:00
Kerollmops	2bcd8d2983	Make sure the facet queries are normalized	2023-06-28 15:06:09 +02:00
Kerollmops	41760a9306	Introduce a new invalid_facet_search_facet_name error code	2023-06-28 15:06:07 +02:00
Kerollmops	ed0ff47551	Return an empty list of results if attribute is set as filterable	2023-06-28 15:01:51 +02:00
Clément Renault	e1b8fb48ee	Use the minWordSizeForTypos index settings	2023-06-28 15:01:51 +02:00
Clément Renault	87e22e436a	Fix compilation issues	2023-06-28 15:01:51 +02:00
Clément Renault	0252cfe8b6	Simplify the placeholder search of the facet-search route	2023-06-28 15:01:50 +02:00
Clément Renault	f35ad96afa	Use the disableOnAttributes parameter on the facet-search route	2023-06-28 15:01:50 +02:00
Clément Renault	2ceb781c73	Use the disableOnWords parameter on the facet-search route	2023-06-28 15:01:50 +02:00
Clément Renault	7bd67543dd	Support the typoTolerant.enabled parameter	2023-06-28 15:01:50 +02:00
Clément Renault	8e86eb91bb	Log an error when a facet value is missing from the database	2023-06-28 15:01:50 +02:00
Clément Renault	55c17aa38b	Rename the SearchForFacetValues struct	2023-06-28 15:01:50 +02:00
Clément Renault	aadbe88048	Return an internal error when a field id is missing	2023-06-28 15:01:50 +02:00
Clément Renault	702041b7e1	Improve the returned errors from the facet-search route	2023-06-28 15:01:48 +02:00
Clément Renault	a05074e675	Fix the max number of facets to be returned to 100	2023-06-28 14:58:42 +02:00
Clément Renault	93f30e65a9	Return the correct response JSON object from the facet-search route	2023-06-28 14:58:42 +02:00
Clément Renault	e81809aae7	Make the search for facet work	2023-06-28 14:58:41 +02:00
Kerollmops	ce7e7f12c8	Introduce the facet search route	2023-06-28 14:58:41 +02:00
Kerollmops	addb21f110	Restrict the number of facet search results to 1000	2023-06-28 14:58:41 +02:00
Kerollmops	c34de05106	Introduce the SearchForFacetValue struct	2023-06-28 14:58:41 +02:00
meili-bors[bot]	d4f10800f2	Merge #3834 3834: Define searchable fields at runtime r=Kerollmops a=ManyTheFish ## Summary This feature allows the end-user to search in one or multiple attributes using the search parameter `attributesToSearchOn`: ```json { "q": "Captain Marvel", "attributesToSearchOn": ["title"] } ``` This feature act like a filter, forcing Meilisearch to only return the documents containing the requested words in the attributes-to-search-on. Note that, with the matching strategy `last`, Meilisearch will only ensure that the first word is in the attributes-to-search-on, but, the retrieved documents will be ordered taking into account the word contained in the attributes-to-search-on. ## Trying the prototype A dedicated docker image has been released for this feature: #### last prototype version: ```bash docker pull getmeili/meilisearch:prototype-define-searchable-fields-at-search-time-1 ``` #### others prototype versions: ```bash docker pull getmeili/meilisearch:prototype-define-searchable-fields-at-search-time-0 ``` ## Technical Detail The attributes-to-search-on list is given to the search context, then, the search context uses the `fid_word_docids`database using only the allowed field ids instead of the global `word_docids` database. This is the same for the prefix databases. The database cache is updated with the merged values, meaning that the union of the field-id-database values is only made if the requested key is missing from the cache. ### Relevancy limits Almost all ranking rules behave as expected when ordering the documents. Only `proximity` could miss-order documents if all the searched words are in the restricted attribute but a better proximity is found in an ignored attribute in a document that should be ranked lower. I put below a failing test showing it: ```rust #[actix_rt::test] async fn proximity_ranking_rule_order() { let server = Server::new().await; let index = index_with_documents( &server, &json!([ { "title": "Captain super mega cool. A Marvel story", // Perfect distance between words in an ignored attribute "desc": "Captain Marvel", "id": "1", }, { "title": "Captain America from Marvel", "desc": "a Shazam ersatz", "id": "2", }]), ) .await; // Document 2 should appear before document 1. index .search(json!({"q": "Captain Marvel", "attributesToSearchOn": ["title"], "attributesToRetrieve": ["id"]}), \|response, code\| { assert_eq!(code, 200, "{}", response); assert_eq!( response["hits"], json!([ {"id": "2"}, {"id": "1"}, ]) ); }) .await; } ``` Fixing this would force us to create a `fid_word_pair_proximity_docids` and a `fid_word_prefix_pair_proximity_docids` databases which may multiply the keys of `word_pair_proximity_docids` and `word_prefix_pair_proximity_docids` by the number of attributes in the searchable_attributes list. If we think we should fix this test, I'll suggest doing it in another PR. ## Related Fixes #3772 Co-authored-by: Tamo <tamo@meilisearch.com> Co-authored-by: ManyTheFish <many@meilisearch.com>	2023-06-28 08:19:23 +00:00
Clément Renault	29d8268c94	Fix the vector query part by using the correct universe	2023-06-27 12:32:43 +02:00
Kerollmops	ab9f2269aa	Normalize the vectors during indexation and search	2023-06-27 12:32:41 +02:00
Kerollmops	3b560ef7d0	Make clippy happy	2023-06-27 12:32:40 +02:00
Kerollmops	3c31e1cdd1	Support more pages but in an ugly way	2023-06-27 12:32:39 +02:00
Kerollmops	c79e82c62a	Move back to the hnsw crate This reverts commit 7a4b6c065482f988b01298642f4c18775503f92f.	2023-06-27 12:32:39 +02:00
Kerollmops	268a9ef416	Move to the hgg crate	2023-06-27 12:32:38 +02:00
Clément Renault	642b0f3a1b	Expose a new vector field on the search route	2023-06-27 12:32:38 +02:00
ManyTheFish	63ca25290b	Take into account small Review requests	2023-06-26 14:56:19 +02:00
ManyTheFish	59f64a5256	Return an error when an attribute is not searchable	2023-06-26 14:56:19 +02:00
ManyTheFish	42709ea9a5	Fix clippy warnings	2023-06-26 14:55:57 +02:00
ManyTheFish	fb8fa07169	Restrict field ids in search context	2023-06-26 14:55:57 +02:00
ManyTheFish	0ccf1e2e40	Allow the search cache to store owned values	2023-06-26 14:55:57 +02:00
ManyTheFish	461b5118bd	Add API search setting	2023-06-26 14:55:14 +02:00
Tamo	a3716c5678	add the new parameter to the search builder of milli	2023-06-26 14:55:14 +02:00
Louis Dureuil	d26e9a96ec	Add score details to new search tests	2023-06-22 12:39:14 +02:00
Louis Dureuil	49c8bc4de6	Fix tests	2023-06-22 12:39:14 +02:00
Louis Dureuil	da833eb095	Expose the scores and detailed scores in the API	2023-06-22 12:39:14 +02:00
Louis Dureuil	701d44bd91	Store the scores for each bucket Remove optimization where ranking rules are not executed on buckets of a single document when the score needs to be computed	2023-06-22 12:39:14 +02:00
Louis Dureuil	c621a250a7	Score for graph based ranking rules Count phrases in matchingWords and maxMatchingWords	2023-06-22 12:39:14 +02:00
Louis Dureuil	8939e85f60	Add rank_to_score for graph based ranking rules	2023-06-22 12:39:14 +02:00
Louis Dureuil	fa41d2489e	Score for sort	2023-06-22 12:39:14 +02:00
Louis Dureuil	59c5b992c2	Score for geosort	2023-06-22 12:39:14 +02:00
Louis Dureuil	2ea8194c18	Score for exact_attributes	2023-06-22 12:39:14 +02:00
Louis Dureuil	421df64602	RankingRuleOutput now contains a Score	2023-06-22 12:39:14 +02:00
Louis Dureuil	f050634b1e	add virtual conditions to fid and position to always have the max cost	2023-06-20 10:07:18 +02:00
Louis Dureuil	becf1f066a	Change how the cost of removing words is computed	2023-06-20 09:45:43 +02:00
Louis Dureuil	701d299369	Remove out-of-date comment	2023-06-20 09:45:42 +02:00
Louis Dureuil	a20e4d447c	Position now takes into account the distance to the position of the word in the query it used to be based on the distance to the position 0	2023-06-20 09:45:42 +02:00
Louis Dureuil	af57c3c577	Proximity costs 0 for documents that are perfectly matching	2023-06-20 09:45:42 +02:00
Louis Dureuil	0c40ef6911	Fix sort id	2023-06-20 09:45:42 +02:00
Loïc Lecrenier	2da86b31a6	Remove comments and add documentation	2023-06-14 12:39:42 +02:00
Louis Dureuil	a2a3b8c973	Fix offset difference between query and indexing for hard separators	2023-06-08 12:07:12 +02:00
Louis Dureuil	1dfc4038ab	Add test that fails before PR and passes now	2023-05-29 11:58:26 +02:00
Louis Dureuil	73198179f1	Consistently use wrapping add to avoid overflow in debug when query starts with a separator	2023-05-29 11:54:12 +02:00
meili-bors[bot]	2e49d6aec1	Merge #3768 3768: Fix bugs in graph-based ranking rules + make `words` a graph-based ranking rule r=dureuill a=loiclec This PR contains three changes: ## 1. Don't call the `words` ranking rule if the term matching strategy is `All` This is because the purpose of `words` is only to remove nodes from the query graph. It would never do any useful work when the matching strategy was `All`. Remember that the universe was already computed before by computing all the docids corresponding to the "maximally reduced" query graph, which, in the case of `All`, is equal to the original graph. ## 2. The `words` ranking rule is replaced by a graph-based ranking rule. This is for three reasons: 1. performance: graph-based ranking rules benefit from a lot of optimisations by default, which ensures that they are never too slow. The previous implementation of `words` could call `compute_query_graph_docids` many times if some words had to be removed from the query, which would be quite expensive. I was especially worried about its performance in cases where it is placed right after the `sort` ranking rule. Furthermore, `compute_query_graph_docids` would clone a lot of bitmaps many times unnecessarily. 2. consistency: every other ranking rule (except `sort`) is graph-based. It makes sense to implement `words` like that as well. It will automatically benefit from all the features, optimisations, and bug fixes that all the other ranking rules get. 3. surfacing bugs: as the first ranking rule to be called (most of the time), I'd like `words` to behave the same as the other ranking rules so that we can quickly detect bugs in our graph algorithms. This actually already happened, which is why this PR also contains a bug fix. ## 3. Fix the `update_all_costs_before_nodes` function It is a bit difficult to explain what was wrong, but I'll try. The bug happened when we had graphs like: <img width="730" alt="Screenshot 2023-05-16 at 10 58 57" src="https://github.com/meilisearch/meilisearch/assets/6040237/40db1a68-d852-4e89-99d5-0d65757242a7"> and we gave the node `is` as argument. Then, we'd walk backwards from the node breadth-first. We'd update the costs of: 1. `sun` 2. `thesun` 3. `start` 4. `the` which is an incorrect order. The correct order is: 1. `sun` 2. `thesun` 3. `the` 4. `start` That is, we can only update the cost of a node when all of its successors have either already been visited or were not affected by the update to the node passed as argument. To solve this bug, I factored out the graph-traversal logic into a `traverse_breadth_first_backward` function. Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com> Co-authored-by: Louis Dureuil <louis@meilisearch.com>	2023-05-23 13:28:08 +00:00
Louis Dureuil	51043f78f0	Remove trailing whitespace	2023-05-23 15:27:25 +02:00
Louis Dureuil	a490a11325	Add explanatory comment on the way we're recomputing costs	2023-05-23 15:24:24 +02:00
Loïc Lecrenier	ec8f685d84	Fix bug in cheapest path algorithm	2023-05-16 17:01:30 +02:00
Loïc Lecrenier	5758268866	Don't compute split_words for phrases	2023-05-16 17:01:18 +02:00
Loïc Lecrenier	3e19702de6	Update snapshot tests	2023-05-16 12:22:46 +02:00
Loïc Lecrenier	f6524a6858	Adjust costs of edges in position ranking rule To ensure good performance	2023-05-16 11:28:56 +02:00
meili-bors[bot]	65ad8cce36	Merge #3741 3741: Add ngram support to the highlighter r=ManyTheFish a=loiclec This PR fixes a bug introduced by the search refactor, where ngrams were not highlighted. The solution was to add the ngrams to the vector of `LocatedQueryTerm` that is given to the `MatchingWords` structure. Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>	2023-05-16 09:03:31 +00:00
Loïc Lecrenier	a37da36766	Implement `words` as a graph-based ranking rule and fix some bugs	2023-05-16 10:42:11 +02:00
Loïc Lecrenier	85d96d35a8	Highlight ngram matches as well	2023-05-16 10:39:36 +02:00
Loïc Lecrenier	4d352a21ac	Compute split words derivations of terms that don't accept typos	2023-05-10 13:31:19 +02:00
Loïc Lecrenier	3625389057	Highlight ngram matches as well	2023-05-08 15:35:41 +02:00
meili-bors[bot]	eace6df91b	Merge #3726 3726: Fix prefix highlighting r=loiclec a=ManyTheFish The prefix queries were not properly highlighted, this PR now highlights only the start of a word when it matched with a prefix Co-authored-by: ManyTheFish <many@meilisearch.com> Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>	2023-05-08 07:46:46 +00:00
Loïc Lecrenier	83ab8cf4e5	Remove dbg!(..) expression in highlighter tests	2023-05-08 09:45:23 +02:00
ManyTheFish	cd2573fcc3	Fix prefix highlighting	2023-05-04 16:53:50 +02:00
Jakub Jirutka	e615fa5ec6	Fix unused_imports warning in milli when japanese is not enabled	2023-05-04 15:46:11 +02:00
Jakub Jirutka	13f1277637	Allow to disable specialized tokenizations (again) In PR #2773, I added the `chinese`, `hebrew`, `japanese` and `thai` feature flags to allow melisearch to be built without huge specialed tokenizations that took up 90% of the melisearch binary size. Unfortunately, due to some recent changes, this doesn't work anymore. The problem lies in excessive use of the `default` feature flag, which infects the dependency graph. Instead of adding `default-features = false` here and there, it's easier and more future-proof to not declare `default` in `milli` and `meilisearch-types`. I've renamed it to `all-tokenizers`, which also makes it a bit clearer what it's about.	2023-05-04 15:45:40 +02:00
Louis Dureuil	732c52093d	Processing time without autobatching implementation	2023-05-03 17:41:48 +02:00
Louis Dureuil	f8f190cd40	Update exactness tests following charabia camelCase tokenization	2023-05-03 14:45:09 +02:00
Louis Dureuil	1aaf24ccbf	Cargo fmt	2023-05-03 12:21:58 +02:00
Louis Dureuil	90bc230820	Merge remote-tracking branch 'origin/main' into search-refactor Conflicts \| resolution ----------\|----------- Cargo.lock \| added mimalloc Cargo.toml \| took origin/main version milli/src/search/criteria/exactness.rs \| deleted after checking it was only clippy changes milli/src/search/query_tree.rs \| deleted after checking it was only clippy changes	2023-05-03 12:19:06 +02:00
Louis Dureuil	342c4ff85d	geosort: Remove rtree unwrap	2023-05-03 09:52:16 +02:00
Tamo	c85392ce40	make the descendent geosort fast	2023-05-03 09:13:12 +02:00
Tamo	8875d24a48	deserialize the rtree only when its needed, and keep it in memory once it has been deserialized	2023-05-03 09:13:12 +02:00
Tamo	c470b67fa2	revamp the test to use execute_iterative_and_rtree_returns_the_same	2023-05-03 09:13:12 +02:00
Louis Dureuil	b60840ebff	Remove self.iterating from words	2023-05-02 18:54:23 +02:00
Louis Dureuil	fdc1763838	Use MultiOps for resolve_query_graph	2023-05-02 18:54:09 +02:00
Louis Dureuil	75819bc940	Remove too many arguments on resolve_maximally_reduced_query_graph	2023-05-02 18:53:40 +02:00
Louis Dureuil	7b8cc25625	rename located_query_terms_from_string -> located_query_terms_from_tokens	2023-05-02 18:53:01 +02:00
Loïc Lecrenier	aa63091752	Fix bug in exact_attribute	2023-05-02 10:48:32 +02:00
Loïc Lecrenier	1b514517f5	Fix bug in computation of query term at a position	2023-05-02 10:48:32 +02:00
Loïc Lecrenier	11f814821d	Minor cleanup	2023-05-02 10:48:32 +02:00
Loïc Lecrenier	30fb1153cc	Speed up graph based ranking rule when a lot of different costs exist	2023-05-02 09:59:42 +02:00
Loïc Lecrenier	3b2c8b9f25	Improve performance of position rr	2023-05-02 09:59:42 +02:00

1 2 3 4 5 ...

1050 Commits