Commit Graph

96 Commits

Author SHA1 Message Date
ManyTheFish
6f55e7844c Add some code comments 2022-10-17 14:41:57 +02:00
ManyTheFish
d71bc1e69f Compute an exact count when using distinct 2022-10-17 14:13:44 +02:00
ManyTheFish
a396806343 Add settings to force milli to exhaustively compute the total number of hits 2022-10-17 14:13:44 +02:00
ManyTheFish
5391e3842c replace optional_words by term_matching_strategy 2022-08-22 17:47:19 +02:00
ManyTheFish
9640976c79 Rename TermMatchingPolicies 2022-08-18 17:36:08 +02:00
Tamo
3b309f654a
Fasten the document deletion
When a document deletion occurs, instead of deleting the document we mark it as deleted
in the new “soft deleted” bitmap. It is then removed from the search, and all the other
endpoints.
2022-07-05 15:30:33 +02:00
Kerollmops
d2f84a9d9e
Improve the estimatedNbHits when distinct is enabled 2022-06-22 11:39:21 +02:00
Kerollmops
69931e50d2
Add the max_values_by_facet setting to the database 2022-06-08 17:54:56 +02:00
ManyTheFish
86ac8568e6 Use Charabia in milli 2022-06-02 16:59:11 +02:00
ad hoc
ac975cc747
cache context's exact words 2022-05-24 09:43:17 +02:00
bors[bot]
ea4bb9402f
Merge #483
483: Enhance matching words r=Kerollmops a=ManyTheFish

# Summary

Enhance milli word-matcher making it handle match computing and cropping.

# Implementation

## Computing best matches for cropping

Before we were considering that the first match of the attribute was the best one, this was accurate when only one word was searched but was missing the target when more than one word was searched.

Now we are searching for the best matches interval to crop around, the chosen interval is the one:
1) that have the highest count of unique matches
> for example, if we have a query `split the world`, then the interval `the split the split the` has 5 matches but only 2 unique matches (1 for `split` and 1 for `the`) where the interval `split of the world` has 3 matches and 3 unique matches. So the interval `split of the world` is considered better.
2) that have the minimum distance between matches
> for example, if we have a query `split the world`, then the interval `split of the world` has a distance of 3 (2 between `split` and `the`, and 1 between `the` and `world`) where the interval `split the world` has a distance of 2. So the interval `split the world` is considered better.
3) that have the highest count of ordered matches
> for example, if we have a query `split the world`, then the interval `the world split` has 2 ordered words where the interval `split the world` has 3. So the interval `split the world` is considered better.

## Cropping around the best matches interval

Before we were cropping around the interval without checking the context.

Now we are cropping around words in the same context as matching words.
This means that we will keep words that are farther from the matching words but are in the same phrase, than words that are nearer but separated by a dot.

> For instance, for the matching word `Split` the text:
`Natalie risk her future. Split The World is a book written by Emily Henry. I never read it.`
will be cropped like:
`…. Split The World is a book written by Emily Henry. …`
and  not like:
`Natalie risk her future. Split The World is a book …`


Co-authored-by: ManyTheFish <many@meilisearch.com>
2022-04-19 11:42:32 +00:00
ad hoc
dda28d7415
exclude excluded canditates from search result candidates 2022-04-13 12:10:35 +02:00
ad hoc
bbb6728d2f
add distinct attributes to cli 2022-04-13 12:10:35 +02:00
ManyTheFish
5809d3ae0d Add first benchmarks on formatting 2022-04-12 16:31:58 +02:00
ManyTheFish
827cedcd15 Add format option structure 2022-04-12 13:42:14 +02:00
Irevoire
4f3ce6d9cd
nested fields 2022-04-07 16:58:46 +02:00
ManyTheFish
3bb1e35ada Fix match count 2022-04-05 17:48:45 +02:00
ManyTheFish
b3f0f39106 Make some cleaning 2022-04-05 17:41:32 +02:00
ManyTheFish
734d0899d3 Publish Matcher 2022-04-05 17:41:32 +02:00
ManyTheFish
d96e72e5dc Create formater with some tests 2022-04-05 17:41:32 +02:00
ad hoc
9fe40df960
add word derivations tests 2022-04-01 11:05:18 +02:00
ad hoc
d5ddc6b080
fix 2 typos word derivation bug 2022-04-01 10:51:22 +02:00
ad hoc
6ef3bb9d83
fmt 2022-03-31 14:06:23 +02:00
ad hoc
f782fe2062
add authorize_typo_test 2022-03-31 10:08:39 +02:00
ad hoc
c4653347fd
add authorize typo setting 2022-03-31 10:05:44 +02:00
ad hoc
3f24555c3d
custom fst automatons 2022-03-15 17:38:35 +01:00
ad hoc
628c835a22
fix tests 2022-03-15 17:38:34 +01:00
mpostma
7541ab99cd
review changes 2022-02-02 12:59:01 +01:00
mpostma
d0aabde502
optimize 2 typos case 2022-02-02 12:56:09 +01:00
mpostma
55e6cb9c7b
typos on first letter counts as 2 2022-02-02 12:56:09 +01:00
Tamo
6831c23449
merge with main 2021-11-06 16:34:30 +01:00
Tamo
a58bc5bebb
update milli with the new parser_filter 2021-11-04 15:02:36 +01:00
many
ed6db19681
Fix PR comments 2021-10-28 11:18:32 +02:00
Clémentine Urquizar
208903ddde
Revert "Replacing pest with nom " 2021-10-25 11:58:00 +02:00
Tamo
e25ca9776f
start updating the exposed function to makes other modules happy 2021-10-22 17:23:22 +02:00
Tamo
c27870e765
integrate a first version without any error handling 2021-10-22 14:33:18 +02:00
Tamo
01dedde1c9
update some names and move some parser out of the lib.rs 2021-10-22 01:59:38 +02:00
刘瀚骋
7a90a101ee reorganize parser logic 2021-10-12 13:30:40 +08:00
刘瀚骋
f7796edc7e remove everything about pest 2021-10-12 13:30:40 +08:00
Tamo
47ee93b0bd
return an error when _geoPoint is used but _geo is not sortable 2021-09-22 16:37:41 +02:00
Tamo
257e621d40
create an asc_desc module 2021-09-22 16:37:41 +02:00
Tamo
13c78e5aa2
Implement the _geoPoint in the sortable 2021-09-08 18:24:09 +02:00
Kerollmops
fd3daa4423
Throw a query time error when a sort param is used but sort ranking rule is missing 2021-09-07 11:02:00 +02:00
Alexey Shekhirin
0e379558a1
fix(search): get sortable_fields only if criteria present 2021-08-31 21:35:41 +03:00
Clément Renault
89d0758713
Revert "Revert "Sort at query time"" 2021-08-24 11:55:16 +02:00
Clémentine Urquizar
922f9fd4d5
Revert "Sort at query time" 2021-08-20 18:09:17 +02:00
Kerollmops
1b7f6ea1e7
Return a new error when the sort criteria is not sortable 2021-08-18 15:04:07 +02:00
Kerollmops
407f53872a
Add a sort_criteria method to the Search builder struct 2021-08-18 15:04:07 +02:00
Kerollmops
687cd2e205
Introduce the new Sort criterion and AscDesc enum 2021-08-18 15:04:07 +02:00
Kerollmops
7aa6cc9b04
Do not insert fields in the map when changing the settings 2021-07-22 18:40:12 +02:00