Kerollmops
69931e50d2
Add the max_values_by_facet setting to the database
2022-06-08 17:54:56 +02:00
ManyTheFish
86ac8568e6
Use Charabia in milli
2022-06-02 16:59:11 +02:00
ad hoc
ac975cc747
cache context's exact words
2022-05-24 09:43:17 +02:00
bors[bot]
ea4bb9402f
Merge #483
...
483: Enhance matching words r=Kerollmops a=ManyTheFish
# Summary
Enhance milli word-matcher making it handle match computing and cropping.
# Implementation
## Computing best matches for cropping
Before we were considering that the first match of the attribute was the best one, this was accurate when only one word was searched but was missing the target when more than one word was searched.
Now we are searching for the best matches interval to crop around, the chosen interval is the one:
1) that have the highest count of unique matches
> for example, if we have a query `split the world`, then the interval `the split the split the` has 5 matches but only 2 unique matches (1 for `split` and 1 for `the`) where the interval `split of the world` has 3 matches and 3 unique matches. So the interval `split of the world` is considered better.
2) that have the minimum distance between matches
> for example, if we have a query `split the world`, then the interval `split of the world` has a distance of 3 (2 between `split` and `the`, and 1 between `the` and `world`) where the interval `split the world` has a distance of 2. So the interval `split the world` is considered better.
3) that have the highest count of ordered matches
> for example, if we have a query `split the world`, then the interval `the world split` has 2 ordered words where the interval `split the world` has 3. So the interval `split the world` is considered better.
## Cropping around the best matches interval
Before we were cropping around the interval without checking the context.
Now we are cropping around words in the same context as matching words.
This means that we will keep words that are farther from the matching words but are in the same phrase, than words that are nearer but separated by a dot.
> For instance, for the matching word `Split` the text:
`Natalie risk her future. Split The World is a book written by Emily Henry. I never read it.`
will be cropped like:
`…. Split The World is a book written by Emily Henry. …`
and not like:
`Natalie risk her future. Split The World is a book …`
Co-authored-by: ManyTheFish <many@meilisearch.com>
2022-04-19 11:42:32 +00:00
ad hoc
dda28d7415
exclude excluded canditates from search result candidates
2022-04-13 12:10:35 +02:00
ad hoc
bbb6728d2f
add distinct attributes to cli
2022-04-13 12:10:35 +02:00
ManyTheFish
5809d3ae0d
Add first benchmarks on formatting
2022-04-12 16:31:58 +02:00
ManyTheFish
827cedcd15
Add format option structure
2022-04-12 13:42:14 +02:00
Irevoire
4f3ce6d9cd
nested fields
2022-04-07 16:58:46 +02:00
ManyTheFish
3bb1e35ada
Fix match count
2022-04-05 17:48:45 +02:00
ManyTheFish
b3f0f39106
Make some cleaning
2022-04-05 17:41:32 +02:00
ManyTheFish
734d0899d3
Publish Matcher
2022-04-05 17:41:32 +02:00
ManyTheFish
d96e72e5dc
Create formater with some tests
2022-04-05 17:41:32 +02:00
ad hoc
9fe40df960
add word derivations tests
2022-04-01 11:05:18 +02:00
ad hoc
d5ddc6b080
fix 2 typos word derivation bug
2022-04-01 10:51:22 +02:00
ad hoc
6ef3bb9d83
fmt
2022-03-31 14:06:23 +02:00
ad hoc
f782fe2062
add authorize_typo_test
2022-03-31 10:08:39 +02:00
ad hoc
c4653347fd
add authorize typo setting
2022-03-31 10:05:44 +02:00
ad hoc
3f24555c3d
custom fst automatons
2022-03-15 17:38:35 +01:00
ad hoc
628c835a22
fix tests
2022-03-15 17:38:34 +01:00
mpostma
7541ab99cd
review changes
2022-02-02 12:59:01 +01:00
mpostma
d0aabde502
optimize 2 typos case
2022-02-02 12:56:09 +01:00
mpostma
55e6cb9c7b
typos on first letter counts as 2
2022-02-02 12:56:09 +01:00
Tamo
6831c23449
merge with main
2021-11-06 16:34:30 +01:00
Tamo
a58bc5bebb
update milli with the new parser_filter
2021-11-04 15:02:36 +01:00
many
ed6db19681
Fix PR comments
2021-10-28 11:18:32 +02:00
Clémentine Urquizar
208903ddde
Revert "Replacing pest with nom "
2021-10-25 11:58:00 +02:00
Tamo
e25ca9776f
start updating the exposed function to makes other modules happy
2021-10-22 17:23:22 +02:00
Tamo
c27870e765
integrate a first version without any error handling
2021-10-22 14:33:18 +02:00
Tamo
01dedde1c9
update some names and move some parser out of the lib.rs
2021-10-22 01:59:38 +02:00
刘瀚骋
7a90a101ee
reorganize parser logic
2021-10-12 13:30:40 +08:00
刘瀚骋
f7796edc7e
remove everything about pest
2021-10-12 13:30:40 +08:00
Tamo
47ee93b0bd
return an error when _geoPoint is used but _geo is not sortable
2021-09-22 16:37:41 +02:00
Tamo
257e621d40
create an asc_desc module
2021-09-22 16:37:41 +02:00
Tamo
13c78e5aa2
Implement the _geoPoint in the sortable
2021-09-08 18:24:09 +02:00
Kerollmops
fd3daa4423
Throw a query time error when a sort param is used but sort ranking rule is missing
2021-09-07 11:02:00 +02:00
Alexey Shekhirin
0e379558a1
fix(search): get sortable_fields only if criteria present
2021-08-31 21:35:41 +03:00
Clément Renault
89d0758713
Revert "Revert "Sort at query time""
2021-08-24 11:55:16 +02:00
Clémentine Urquizar
922f9fd4d5
Revert "Sort at query time"
2021-08-20 18:09:17 +02:00
Kerollmops
1b7f6ea1e7
Return a new error when the sort criteria is not sortable
2021-08-18 15:04:07 +02:00
Kerollmops
407f53872a
Add a sort_criteria method to the Search builder struct
2021-08-18 15:04:07 +02:00
Kerollmops
687cd2e205
Introduce the new Sort criterion and AscDesc enum
2021-08-18 15:04:07 +02:00
Kerollmops
7aa6cc9b04
Do not insert fields in the map when changing the settings
2021-07-22 18:40:12 +02:00
Kerollmops
f858f64b1f
Move the facet number iterators into their own module
2021-07-21 16:59:37 +02:00
Kerollmops
32b7bd366f
Remove the roaring operation functions warnings
2021-06-30 14:12:56 +02:00
Tamo
3d90b03d7b
fix the limit
...
There was no check on the limit and thus, if a user especified a very large number this line could causes a panic
2021-06-22 14:52:13 +02:00
Tamo
9716fb3b36
format the whole project
2021-06-16 18:33:33 +02:00
Kerollmops
7ac441e473
Fix small typos
2021-06-16 11:03:37 +02:00
Kerollmops
a7d6930905
Replace the panicking expect by tracked Errors
2021-06-15 11:51:32 +02:00
Kerollmops
312c2d1d8e
Use the Error enum everywhere in the project
2021-06-14 16:58:38 +02:00
Kerollmops
3c304c89d4
Make sure that we generate the faceted database when required
2021-06-02 16:24:58 +02:00
Kerollmops
3b1cd4c4b4
Rename the FacetCondition into FilterCondition
2021-06-02 16:24:58 +02:00
Marin Postma
1e366dae3e
remove useless lifetime on Distinct Trait
2021-06-02 16:24:58 +02:00
Kerollmops
187c713de5
Remove the MapDistinct struct as now distinct attributes are faceted
2021-06-02 16:24:57 +02:00
Kerollmops
2a3f9b32ff
Rename the faceted fields into filterable fields
2021-06-02 16:24:57 +02:00
many
1df68d342a
Make the MatchingWords return the number of matching bytes
2021-05-31 18:22:29 +02:00
Clément Renault
02c655ff1a
Refine the facet distribution to use both databases
2021-05-25 11:30:00 +02:00
Clément Renault
f7efde11d9
Refine the facet condition to use both facet databases
2021-05-25 11:30:00 +02:00
Clément Renault
bd7b285bae
Split the update side to use the number and the strings facet databases
2021-05-25 11:30:00 +02:00
many
a3f8686fbf
Introduce exactness criterion
2021-05-06 14:28:30 +02:00
many
ee09e50e7f
Remove excluded document in criteria iterations
...
- pass excluded document to criteria to remove them in higher levels of the bucket-sort
- merge already returned document with excluded documents to avoid duplicas
Related to #125 and #112
Fix #170
2021-04-29 12:09:38 +02:00
Clément Renault
658f316511
Introduce the Initial Criterion
2021-04-27 14:35:43 +02:00
Alexey Shekhirin
6fa00c61d2
feat(search): support words_limit
2021-04-20 12:22:04 +03:00
Marin Postma
75464a1baa
review fixes
2021-04-15 16:25:56 +02:00
Marin Postma
45c45e11dd
implement distinct attribute
...
distinct can return error
facet distinct on numbers
return distinct error
review fixes
make get_facet_value more generic
fixes
2021-04-15 16:25:55 +02:00
tamo
dcb00b2e54
test a new implementation of the stop_words
2021-04-12 18:35:33 +02:00
tamo
a2f46029c7
implement a first version of the stop_words
...
The front must provide a BTreeSet containing the stop words
The stop_words are set at None if an empty Set is provided
add the stop-words in the http-ui interface
Use maplit in the test
and remove all the useless drop(rtxn) at the end of all tests
2021-04-01 13:57:55 +02:00
mpostma
9c27183876
fix broken offset
2021-03-15 20:23:50 +01:00
Kerollmops
d48008339e
Introduce two new optional_words and authorize_typos Search options
2021-03-10 11:16:30 +01:00
many
62a70c300d
Optimize words criterion
2021-03-10 10:42:53 +01:00
Clément Renault
5fcaedb880
Introduce a WordDerivationsCache struct
2021-03-08 16:00:53 +01:00
Kerollmops
9b6b35d9b7
Clean up some comments
2021-03-03 18:19:10 +01:00
Kerollmops
f118d7e067
build criteria from settings
2021-03-03 15:45:03 +01:00
Kerollmops
daf126a638
Introduce the final Fetcher criterion
2021-03-03 15:45:03 +01:00
Kerollmops
5af63c74e0
Speed-up the MatchingWords highlighting struct
2021-03-03 15:45:03 +01:00
Kerollmops
4510bbccca
Add a lot of debug
2021-03-03 15:43:44 +01:00
Kerollmops
9bc9b36645
Introduce the Proximity criterion
2021-03-03 15:43:44 +01:00
Kerollmops
22b84fe543
Use the words criterion in the search module
2021-03-03 15:43:44 +01:00
Clément Renault
14f9f85c4b
Introduce the AscDesc criterion
2021-03-03 15:43:44 +01:00
Kerollmops
e174ccbd8e
Use the words criterion in the search module
2021-03-03 15:43:43 +01:00
many
d92ad5640a
remove option on bucket_candidates
2021-03-03 15:43:43 +01:00
many
64688b3786
fix query tree builder
2021-03-03 15:43:43 +01:00
many
a273c46559
clean warnings
2021-03-03 15:43:42 +01:00
Clément Renault
fea9ffc46a
Use the bucket candidates in the search module
2021-03-03 15:43:42 +01:00
Clément Renault
5344abc008
Introduce the CriterionResult return type
2021-03-03 15:43:41 +01:00
many
98e69e63d2
implement Context trait for criteria
2021-03-03 15:43:41 +01:00
Clément Renault
f091f370d0
Use the Typo criteria in the search module
2021-03-03 15:43:41 +01:00
Kerollmops
79a143b32f
Introduce the query tree data structure
2021-03-03 13:40:18 +01:00
Clément Renault
e8639517da
Change the project to become a workspace with milli as a default-member
2021-02-12 16:15:09 +01:00