bors[bot]
74d1914a64
Merge #535
...
535: Reintroduce the max values by facet limit r=ManyTheFish a=Kerollmops
This PR reintroduces the max values by facet limit this is related to https://github.com/meilisearch/meilisearch/issues/2349 .
~I would like some help in deciding on whether I keep the default 100 max values in milli and set up the `FacetDistribution` settings in Meilisearch to use 1000 as the new value, I expose the `max_values_by_facet` for this purpose.~
I changed the default value to 1000 and the max to 10000, thank you `@ManyTheFish` for the help!
Co-authored-by: Kerollmops <clement@meilisearch.com>
2022-06-01 14:30:50 +00:00
ad hoc
25fc576696
review changes
2022-05-24 14:15:33 +02:00
ad hoc
69dc4de80f
change &Option<Set> to Option<&Set>
2022-05-24 12:14:55 +02:00
ad hoc
ac975cc747
cache context's exact words
2022-05-24 09:43:17 +02:00
ad hoc
8993fec8a3
return optional exact words
2022-05-24 09:15:49 +02:00
Kerollmops
cd7c6e19ed
Reintroduce the max values by facet limit
2022-05-18 15:57:57 +02:00
ManyTheFish
137434a1c8
Add some implementation on MatchBounds
2022-05-17 15:57:09 +02:00
bors[bot]
9db86aac51
Merge #518
...
518: Return facets even when there is no value associated to it r=Kerollmops a=Kerollmops
This PR is related to https://github.com/meilisearch/meilisearch/issues/2352 and should fix the issue when Meilisearch is up-to-date with this PR.
Co-authored-by: Kerollmops <clement@meilisearch.com>
2022-04-28 09:04:36 +00:00
Kerollmops
7d1c2d97bf
Return facets even when there is no values associated to it
2022-04-26 17:59:53 +02:00
ad hoc
5c29258e8e
fix cargo warnings
2022-04-26 17:33:11 +02:00
bors[bot]
ea4bb9402f
Merge #483
...
483: Enhance matching words r=Kerollmops a=ManyTheFish
# Summary
Enhance milli word-matcher making it handle match computing and cropping.
# Implementation
## Computing best matches for cropping
Before we were considering that the first match of the attribute was the best one, this was accurate when only one word was searched but was missing the target when more than one word was searched.
Now we are searching for the best matches interval to crop around, the chosen interval is the one:
1) that have the highest count of unique matches
> for example, if we have a query `split the world`, then the interval `the split the split the` has 5 matches but only 2 unique matches (1 for `split` and 1 for `the`) where the interval `split of the world` has 3 matches and 3 unique matches. So the interval `split of the world` is considered better.
2) that have the minimum distance between matches
> for example, if we have a query `split the world`, then the interval `split of the world` has a distance of 3 (2 between `split` and `the`, and 1 between `the` and `world`) where the interval `split the world` has a distance of 2. So the interval `split the world` is considered better.
3) that have the highest count of ordered matches
> for example, if we have a query `split the world`, then the interval `the world split` has 2 ordered words where the interval `split the world` has 3. So the interval `split the world` is considered better.
## Cropping around the best matches interval
Before we were cropping around the interval without checking the context.
Now we are cropping around words in the same context as matching words.
This means that we will keep words that are farther from the matching words but are in the same phrase, than words that are nearer but separated by a dot.
> For instance, for the matching word `Split` the text:
`Natalie risk her future. Split The World is a book written by Emily Henry. I never read it.`
will be cropped like:
`…. Split The World is a book written by Emily Henry. …`
and not like:
`Natalie risk her future. Split The World is a book …`
Co-authored-by: ManyTheFish <many@meilisearch.com>
2022-04-19 11:42:32 +00:00
ManyTheFish
f1115e274f
Use Copy impl of FormatOption instead of clonning
2022-04-19 10:35:50 +02:00
ad hoc
dda28d7415
exclude excluded canditates from search result candidates
2022-04-13 12:10:35 +02:00
ad hoc
bbb6728d2f
add distinct attributes to cli
2022-04-13 12:10:35 +02:00
ManyTheFish
5809d3ae0d
Add first benchmarks on formatting
2022-04-12 16:31:58 +02:00
ManyTheFish
827cedcd15
Add format option structure
2022-04-12 13:42:14 +02:00
ManyTheFish
011f8210ed
Make compute_matches more rust idiomatic
2022-04-12 10:19:02 +02:00
ManyTheFish
a16de5de84
Symplify format and remove intermediate function
2022-04-08 11:20:41 +02:00
ManyTheFish
a769e09dfa
Make token_crop_bounds more rust idiomatic
2022-04-07 20:15:14 +02:00
ManyTheFish
c8ed1675a7
Add some documentation
2022-04-07 17:32:13 +02:00
ManyTheFish
b1905dfa24
Make split_best_frequency returns references instead of owned data
2022-04-07 17:05:44 +02:00
Irevoire
4f3ce6d9cd
nested fields
2022-04-07 16:58:46 +02:00
ManyTheFish
fa7d3a37c0
Make some cleaning and add comments
2022-04-05 17:48:56 +02:00
ManyTheFish
3bb1e35ada
Fix match count
2022-04-05 17:48:45 +02:00
ManyTheFish
56e0edd621
Put crop markers direclty around words
2022-04-05 17:41:32 +02:00
ManyTheFish
a93cd8c61c
Fix prefix highlight with special chars
2022-04-05 17:41:32 +02:00
ManyTheFish
b3f0f39106
Make some cleaning
2022-04-05 17:41:32 +02:00
ManyTheFish
6dc345bc53
Test and Fix prefix highlight
2022-04-05 17:41:32 +02:00
ManyTheFish
bd30ee97b8
Keep separators at start of the croped string
2022-04-05 17:41:32 +02:00
ManyTheFish
29c5f76d7f
Use new matcher in http-ui
2022-04-05 17:41:32 +02:00
ManyTheFish
734d0899d3
Publish Matcher
2022-04-05 17:41:32 +02:00
ManyTheFish
4428cb5909
Add some tests and fix some corner cases
2022-04-05 17:41:32 +02:00
ManyTheFish
844f546a8b
Add matches algorithm V1
2022-04-05 17:41:32 +02:00
ManyTheFish
3be1790803
Add crop algorithm with naive match algorithm
2022-04-05 17:41:32 +02:00
ManyTheFish
d96e72e5dc
Create formater with some tests
2022-04-05 17:41:32 +02:00
ad hoc
6b2c2509b2
fix bug in exact search
2022-04-04 20:54:03 +02:00
ad hoc
56b4f5dce2
add exact prefix to query_docids
2022-04-04 20:54:03 +02:00
ad hoc
21ae4143b1
add exact_word_prefix to Context
2022-04-04 20:54:03 +02:00
ad hoc
c4c6e35352
query exact_word_docids in resolve_query_tree
2022-04-04 20:54:02 +02:00
ad hoc
c882d8daf0
add test for exact words
2022-04-04 20:54:01 +02:00
ad hoc
7e9d56a9e7
disable typos on exact words
2022-04-04 20:54:01 +02:00
ad hoc
0fd55db21c
fmt
2022-04-04 20:10:55 +02:00
ad hoc
559e46be5e
fix bad rebase bug
2022-04-04 20:10:55 +02:00
ad hoc
8b1e5d9c6d
add test for exact words
2022-04-04 20:10:55 +02:00
ad hoc
774fa8f065
disable typos on exact words
2022-04-04 20:10:55 +02:00
ad hoc
853b4a520f
fmt
2022-04-04 10:41:46 +02:00
ad hoc
fdaf45aab2
replace hardcoded value with constant in TestContext
2022-04-04 10:41:46 +02:00
ad hoc
950a740bd4
refactor typos for readability
2022-04-04 10:41:46 +02:00
ad hoc
66020cd923
rename min_word_len* to use plain letter numbers
2022-04-04 10:41:46 +02:00
ad hoc
286dd7b2e4
rename min_word_len_2_typo
2022-04-01 11:17:03 +02:00
ad hoc
55af85db3c
add tests for min_word_len_for_typo
2022-04-01 11:17:02 +02:00
ad hoc
a1a3a49bc9
dynamic minimum word len for typos in query tree builder
2022-04-01 11:17:02 +02:00
ad hoc
9fe40df960
add word derivations tests
2022-04-01 11:05:18 +02:00
ad hoc
d5ddc6b080
fix 2 typos word derivation bug
2022-04-01 10:51:22 +02:00
ad hoc
6ef3bb9d83
fmt
2022-03-31 14:06:23 +02:00
ad hoc
f782fe2062
add authorize_typo_test
2022-03-31 10:08:39 +02:00
ad hoc
c4653347fd
add authorize typo setting
2022-03-31 10:05:44 +02:00
bors[bot]
90276d9a2d
Merge #472
...
472: Remove useless variables in proximity r=Kerollmops a=ManyTheFish
Was passing by plane sweep algorithm to find some inspiration, and I discover that we have useless variables that were not detected because of the recursive function.
Co-authored-by: ManyTheFish <many@meilisearch.com>
2022-03-16 15:33:11 +00:00
ManyTheFish
49d59d88c2
Remove useless variables in proximity
2022-03-16 16:12:52 +01:00
Bruno Casali
adc71742c8
Move string concat to the struct instead of in the calling
2022-03-16 10:26:12 -03:00
Bruno Casali
4822fe1beb
Add a better error message when the filterable attrs are empty
...
Fixes https://github.com/meilisearch/meilisearch/issues/2140
2022-03-15 18:13:59 -03:00
bors[bot]
ad4c982c68
Merge #439
...
439: Optimize typo criterion r=Kerollmops a=MarinPostma
This pr implements a couple of optimization for the typo criterion:
- clamp max typo on concatenated query words to 1: By considering that a concatenated query word is a typo, we clamp the max number of typos allowed o it to 1. This is useful because we noticed that concatenated query words often introduced words with 2 typos in queries that otherwise didn't allow for 2 typo words.
- Make typos on the first letter count for 2. This change is a big performance gain: by considering the typos on the first letter to count as 2 typos, we drastically restrict the search space for 1 typo, and if we reach 2 typos, the search space is reduced as well, as we only consider: (2 typos ∩ correct first letter) ∪ (wrong first letter ∩ 1 typo) instead of 2 typos anywhere in the word.
## benches
```
group main typo
----- ---- ----
smol-songs.csv: asc + default/Notstandskomitee 2.51 5.8±0.01ms ? ?/sec 1.00 2.3±0.01ms ? ?/sec
smol-songs.csv: asc + default/charles 2.48 3.0±0.01ms ? ?/sec 1.00 1190.9±1.29µs ? ?/sec
smol-songs.csv: asc + default/charles mingus 5.56 10.8±0.01ms ? ?/sec 1.00 1935.3±1.00µs ? ?/sec
smol-songs.csv: asc + default/david 1.65 3.9±0.00ms ? ?/sec 1.00 2.4±0.01ms ? ?/sec
smol-songs.csv: asc + default/david bowie 3.34 12.5±0.02ms ? ?/sec 1.00 3.7±0.00ms ? ?/sec
smol-songs.csv: asc + default/john 1.00 1849.7±3.74µs ? ?/sec 1.01 1875.1±4.65µs ? ?/sec
smol-songs.csv: asc + default/marcus miller 4.32 15.7±0.01ms ? ?/sec 1.00 3.6±0.01ms ? ?/sec
smol-songs.csv: asc + default/michael jackson 3.31 12.5±0.01ms ? ?/sec 1.00 3.8±0.00ms ? ?/sec
smol-songs.csv: asc + default/tamo 1.05 565.4±0.86µs ? ?/sec 1.00 539.3±1.22µs ? ?/sec
smol-songs.csv: asc + default/thelonious monk 3.49 11.5±0.01ms ? ?/sec 1.00 3.3±0.00ms ? ?/sec
smol-songs.csv: asc/Notstandskomitee 2.59 5.6±0.02ms ? ?/sec 1.00 2.2±0.01ms ? ?/sec
smol-songs.csv: asc/charles 6.05 2.1±0.00ms ? ?/sec 1.00 347.8±0.60µs ? ?/sec
smol-songs.csv: asc/charles mingus 14.46 9.4±0.01ms ? ?/sec 1.00 649.2±0.97µs ? ?/sec
smol-songs.csv: asc/david 3.87 2.4±0.00ms ? ?/sec 1.00 618.2±0.69µs ? ?/sec
smol-songs.csv: asc/david bowie 10.14 9.8±0.01ms ? ?/sec 1.00 970.8±1.55µs ? ?/sec
smol-songs.csv: asc/john 1.00 546.5±1.10µs ? ?/sec 1.00 547.1±2.11µs ? ?/sec
smol-songs.csv: asc/marcus miller 11.45 10.4±0.06ms ? ?/sec 1.00 907.9±1.37µs ? ?/sec
smol-songs.csv: asc/michael jackson 10.56 9.7±0.01ms ? ?/sec 1.00 919.6±1.03µs ? ?/sec
smol-songs.csv: asc/tamo 1.03 43.3±0.18µs ? ?/sec 1.00 42.2±0.23µs ? ?/sec
smol-songs.csv: asc/thelonious monk 4.16 10.7±0.02ms ? ?/sec 1.00 2.6±0.00ms ? ?/sec
smol-songs.csv: basic filter: <=/Notstandskomitee 1.00 95.7±0.20µs ? ?/sec 1.15 109.6±10.40µs ? ?/sec
smol-songs.csv: basic filter: <=/charles 1.00 27.8±0.15µs ? ?/sec 1.01 27.9±0.18µs ? ?/sec
smol-songs.csv: basic filter: <=/charles mingus 1.72 119.2±0.67µs ? ?/sec 1.00 69.1±0.13µs ? ?/sec
smol-songs.csv: basic filter: <=/david 1.00 22.3±0.33µs ? ?/sec 1.05 23.4±0.19µs ? ?/sec
smol-songs.csv: basic filter: <=/david bowie 1.59 86.9±0.79µs ? ?/sec 1.00 54.5±0.31µs ? ?/sec
smol-songs.csv: basic filter: <=/john 1.00 17.9±0.06µs ? ?/sec 1.06 18.9±0.15µs ? ?/sec
smol-songs.csv: basic filter: <=/marcus miller 1.65 102.7±1.63µs ? ?/sec 1.00 62.3±0.18µs ? ?/sec
smol-songs.csv: basic filter: <=/michael jackson 1.76 128.2±1.85µs ? ?/sec 1.00 72.9±0.19µs ? ?/sec
smol-songs.csv: basic filter: <=/tamo 1.00 17.9±0.13µs ? ?/sec 1.05 18.7±0.20µs ? ?/sec
smol-songs.csv: basic filter: <=/thelonious monk 1.53 157.5±2.38µs ? ?/sec 1.00 102.8±0.88µs ? ?/sec
smol-songs.csv: basic filter: TO/Notstandskomitee 1.00 100.9±4.36µs ? ?/sec 1.04 105.0±8.25µs ? ?/sec
smol-songs.csv: basic filter: TO/charles 1.00 28.4±0.36µs ? ?/sec 1.03 29.4±0.33µs ? ?/sec
smol-songs.csv: basic filter: TO/charles mingus 1.71 118.1±1.08µs ? ?/sec 1.00 68.9±0.26µs ? ?/sec
smol-songs.csv: basic filter: TO/david 1.00 24.0±0.26µs ? ?/sec 1.03 24.6±0.43µs ? ?/sec
smol-songs.csv: basic filter: TO/david bowie 1.72 95.2±0.30µs ? ?/sec 1.00 55.2±0.14µs ? ?/sec
smol-songs.csv: basic filter: TO/john 1.00 18.8±0.09µs ? ?/sec 1.06 19.8±0.17µs ? ?/sec
smol-songs.csv: basic filter: TO/marcus miller 1.61 102.4±1.65µs ? ?/sec 1.00 63.4±0.24µs ? ?/sec
smol-songs.csv: basic filter: TO/michael jackson 1.77 132.1±1.41µs ? ?/sec 1.00 74.5±0.59µs ? ?/sec
smol-songs.csv: basic filter: TO/tamo 1.00 18.2±0.14µs ? ?/sec 1.05 19.2±0.46µs ? ?/sec
smol-songs.csv: basic filter: TO/thelonious monk 1.49 150.8±1.92µs ? ?/sec 1.00 101.3±0.44µs ? ?/sec
smol-songs.csv: basic placeholder/ 1.00 27.3±0.07µs ? ?/sec 1.03 28.0±0.05µs ? ?/sec
smol-songs.csv: basic with quote/"Notstandskomitee" 1.00 122.4±0.17µs ? ?/sec 1.03 125.6±0.16µs ? ?/sec
smol-songs.csv: basic with quote/"charles" 1.00 88.8±0.30µs ? ?/sec 1.00 88.4±0.15µs ? ?/sec
smol-songs.csv: basic with quote/"charles" "mingus" 1.00 685.2±0.74µs ? ?/sec 1.01 689.4±6.07µs ? ?/sec
smol-songs.csv: basic with quote/"david" 1.00 161.6±0.42µs ? ?/sec 1.01 162.6±0.17µs ? ?/sec
smol-songs.csv: basic with quote/"david" "bowie" 1.00 731.7±0.73µs ? ?/sec 1.02 743.1±0.77µs ? ?/sec
smol-songs.csv: basic with quote/"john" 1.00 267.1±0.33µs ? ?/sec 1.01 270.9±0.33µs ? ?/sec
smol-songs.csv: basic with quote/"marcus" "miller" 1.00 138.7±0.31µs ? ?/sec 1.02 140.9±0.13µs ? ?/sec
smol-songs.csv: basic with quote/"michael" "jackson" 1.01 841.4±0.72µs ? ?/sec 1.00 833.8±0.92µs ? ?/sec
smol-songs.csv: basic with quote/"tamo" 1.01 189.2±0.26µs ? ?/sec 1.00 188.2±0.71µs ? ?/sec
smol-songs.csv: basic with quote/"thelonious" "monk" 1.00 1100.5±1.36µs ? ?/sec 1.01 1111.7±2.17µs ? ?/sec
smol-songs.csv: basic without quote/Notstandskomitee 3.40 7.9±0.02ms ? ?/sec 1.00 2.3±0.02ms ? ?/sec
smol-songs.csv: basic without quote/charles 2.57 494.4±0.89µs ? ?/sec 1.00 192.5±0.18µs ? ?/sec
smol-songs.csv: basic without quote/charles mingus 1.29 2.8±0.02ms ? ?/sec 1.00 2.1±0.01ms ? ?/sec
smol-songs.csv: basic without quote/david 1.95 623.8±0.90µs ? ?/sec 1.00 319.2±1.22µs ? ?/sec
smol-songs.csv: basic without quote/david bowie 1.12 5.9±0.00ms ? ?/sec 1.00 5.2±0.00ms ? ?/sec
smol-songs.csv: basic without quote/john 1.24 1340.9±2.25µs ? ?/sec 1.00 1084.7±7.76µs ? ?/sec
smol-songs.csv: basic without quote/marcus miller 7.97 14.6±0.01ms ? ?/sec 1.00 1826.0±6.84µs ? ?/sec
smol-songs.csv: basic without quote/michael jackson 1.19 3.9±0.00ms ? ?/sec 1.00 3.3±0.00ms ? ?/sec
smol-songs.csv: basic without quote/tamo 1.65 737.7±3.58µs ? ?/sec 1.00 446.7±0.51µs ? ?/sec
smol-songs.csv: basic without quote/thelonious monk 1.16 4.5±0.02ms ? ?/sec 1.00 3.9±0.04ms ? ?/sec
smol-songs.csv: big filter/Notstandskomitee 3.27 7.6±0.02ms ? ?/sec 1.00 2.3±0.01ms ? ?/sec
smol-songs.csv: big filter/charles 8.26 1957.5±1.37µs ? ?/sec 1.00 236.8±0.34µs ? ?/sec
smol-songs.csv: big filter/charles mingus 18.49 11.2±0.06ms ? ?/sec 1.00 607.7±3.03µs ? ?/sec
smol-songs.csv: big filter/david 3.78 2.4±0.00ms ? ?/sec 1.00 622.8±0.80µs ? ?/sec
smol-songs.csv: big filter/david bowie 9.00 12.0±0.01ms ? ?/sec 1.00 1336.0±3.17µs ? ?/sec
smol-songs.csv: big filter/john 1.00 554.2±0.95µs ? ?/sec 1.01 560.4±0.79µs ? ?/sec
smol-songs.csv: big filter/marcus miller 18.09 12.0±0.01ms ? ?/sec 1.00 664.7±0.60µs ? ?/sec
smol-songs.csv: big filter/michael jackson 8.43 12.0±0.01ms ? ?/sec 1.00 1421.6±1.37µs ? ?/sec
smol-songs.csv: big filter/tamo 1.00 86.3±0.14µs ? ?/sec 1.01 87.3±0.21µs ? ?/sec
smol-songs.csv: big filter/thelonious monk 5.55 14.3±0.02ms ? ?/sec 1.00 2.6±0.01ms ? ?/sec
smol-songs.csv: desc + default/Notstandskomitee 2.52 5.8±0.01ms ? ?/sec 1.00 2.3±0.01ms ? ?/sec
smol-songs.csv: desc + default/charles 3.04 2.7±0.01ms ? ?/sec 1.00 893.4±1.08µs ? ?/sec
smol-songs.csv: desc + default/charles mingus 6.77 10.3±0.01ms ? ?/sec 1.00 1520.8±1.90µs ? ?/sec
smol-songs.csv: desc + default/david 1.39 5.7±0.00ms ? ?/sec 1.00 4.1±0.00ms ? ?/sec
smol-songs.csv: desc + default/david bowie 2.34 15.8±0.02ms ? ?/sec 1.00 6.7±0.01ms ? ?/sec
smol-songs.csv: desc + default/john 1.00 2.5±0.00ms ? ?/sec 1.02 2.6±0.01ms ? ?/sec
smol-songs.csv: desc + default/marcus miller 5.06 14.5±0.02ms ? ?/sec 1.00 2.9±0.01ms ? ?/sec
smol-songs.csv: desc + default/michael jackson 2.64 14.1±0.05ms ? ?/sec 1.00 5.4±0.00ms ? ?/sec
smol-songs.csv: desc + default/tamo 1.00 567.0±0.65µs ? ?/sec 1.00 565.7±0.97µs ? ?/sec
smol-songs.csv: desc + default/thelonious monk 3.55 11.6±0.02ms ? ?/sec 1.00 3.3±0.00ms ? ?/sec
smol-songs.csv: desc/Notstandskomitee 2.58 5.6±0.02ms ? ?/sec 1.00 2.2±0.02ms ? ?/sec
smol-songs.csv: desc/charles 6.04 2.1±0.00ms ? ?/sec 1.00 348.1±0.57µs ? ?/sec
smol-songs.csv: desc/charles mingus 14.51 9.4±0.01ms ? ?/sec 1.00 646.7±0.99µs ? ?/sec
smol-songs.csv: desc/david 3.86 2.4±0.00ms ? ?/sec 1.00 620.7±2.46µs ? ?/sec
smol-songs.csv: desc/david bowie 10.10 9.8±0.01ms ? ?/sec 1.00 973.9±3.31µs ? ?/sec
smol-songs.csv: desc/john 1.00 545.5±0.78µs ? ?/sec 1.00 547.2±0.48µs ? ?/sec
smol-songs.csv: desc/marcus miller 11.39 10.3±0.01ms ? ?/sec 1.00 903.7±0.95µs ? ?/sec
smol-songs.csv: desc/michael jackson 10.51 9.7±0.01ms ? ?/sec 1.00 924.7±2.02µs ? ?/sec
smol-songs.csv: desc/tamo 1.01 43.2±0.33µs ? ?/sec 1.00 42.6±0.35µs ? ?/sec
smol-songs.csv: desc/thelonious monk 4.19 10.8±0.03ms ? ?/sec 1.00 2.6±0.00ms ? ?/sec
smol-songs.csv: prefix search/a 1.00 1008.7±1.00µs ? ?/sec 1.00 1005.5±0.91µs ? ?/sec
smol-songs.csv: prefix search/b 1.00 885.0±0.70µs ? ?/sec 1.01 890.6±1.11µs ? ?/sec
smol-songs.csv: prefix search/i 1.00 1051.8±1.25µs ? ?/sec 1.00 1056.6±4.12µs ? ?/sec
smol-songs.csv: prefix search/s 1.00 724.7±1.77µs ? ?/sec 1.00 721.6±0.59µs ? ?/sec
smol-songs.csv: prefix search/x 1.01 212.4±0.21µs ? ?/sec 1.00 210.9±0.38µs ? ?/sec
smol-songs.csv: proximity/7000 Danses Un Jour Dans Notre Vie 18.55 48.5±0.09ms ? ?/sec 1.00 2.6±0.03ms ? ?/sec
smol-songs.csv: proximity/The Disneyland Sing-Along Chorus 8.41 56.7±0.45ms ? ?/sec 1.00 6.7±0.05ms ? ?/sec
smol-songs.csv: proximity/Under Great Northern Lights 15.74 38.9±0.14ms ? ?/sec 1.00 2.5±0.00ms ? ?/sec
smol-songs.csv: proximity/black saint sinner lady 11.82 40.1±0.13ms ? ?/sec 1.00 3.4±0.02ms ? ?/sec
smol-songs.csv: proximity/les dangeureuses 1960 6.90 26.1±0.13ms ? ?/sec 1.00 3.8±0.04ms ? ?/sec
smol-songs.csv: typo/Arethla Franklin 14.93 5.8±0.01ms ? ?/sec 1.00 390.1±1.89µs ? ?/sec
smol-songs.csv: typo/Disnaylande 3.18 7.3±0.01ms ? ?/sec 1.00 2.3±0.00ms ? ?/sec
smol-songs.csv: typo/dire straights 5.55 15.2±0.02ms ? ?/sec 1.00 2.7±0.00ms ? ?/sec
smol-songs.csv: typo/fear of the duck 28.03 20.0±0.03ms ? ?/sec 1.00 713.3±1.54µs ? ?/sec
smol-songs.csv: typo/indochie 19.25 1851.4±2.38µs ? ?/sec 1.00 96.2±0.13µs ? ?/sec
smol-songs.csv: typo/indochien 14.66 1887.7±3.18µs ? ?/sec 1.00 128.8±0.18µs ? ?/sec
smol-songs.csv: typo/klub des loopers 37.73 18.0±0.02ms ? ?/sec 1.00 476.7±0.73µs ? ?/sec
smol-songs.csv: typo/michel depech 10.17 5.8±0.01ms ? ?/sec 1.00 565.8±1.16µs ? ?/sec
smol-songs.csv: typo/mongus 15.33 1897.4±3.44µs ? ?/sec 1.00 123.8±0.13µs ? ?/sec
smol-songs.csv: typo/stromal 14.63 1859.3±2.40µs ? ?/sec 1.00 127.1±0.29µs ? ?/sec
smol-songs.csv: typo/the white striper 10.83 9.4±0.01ms ? ?/sec 1.00 866.0±0.98µs ? ?/sec
smol-songs.csv: typo/thelonius monk 14.40 3.8±0.00ms ? ?/sec 1.00 261.5±1.30µs ? ?/sec
smol-songs.csv: words/7000 Danses / Le Baiser / je me trompe de mots 5.54 70.8±0.09ms ? ?/sec 1.00 12.8±0.03ms ? ?/sec
smol-songs.csv: words/Bring Your Daughter To The Slaughter but now this is not part of the title 3.48 119.8±0.14ms ? ?/sec 1.00 34.4±0.04ms ? ?/sec
smol-songs.csv: words/The Disneyland Children's Sing-Alone song 8.98 71.9±0.12ms ? ?/sec 1.00 8.0±0.01ms ? ?/sec
smol-songs.csv: words/les liaisons dangeureuses 1793 11.88 37.4±0.07ms ? ?/sec 1.00 3.1±0.01ms ? ?/sec
smol-songs.csv: words/seven nation mummy 22.86 23.4±0.04ms ? ?/sec 1.00 1024.8±1.57µs ? ?/sec
smol-songs.csv: words/the black saint and the sinner lady and the good doggo 2.76 124.4±0.15ms ? ?/sec 1.00 45.1±0.09ms ? ?/sec
smol-songs.csv: words/whathavenotnsuchforth and a good amount of words to pop to match the first one 2.52 107.0±0.23ms ? ?/sec 1.00 42.4±0.66ms ? ?/sec
group main-wiki typo-wiki
----- --------- ---------
smol-wiki-articles.csv: basic placeholder/ 1.02 13.7±0.02µs ? ?/sec 1.00 13.4±0.03µs ? ?/sec
smol-wiki-articles.csv: basic with quote/"film" 1.02 409.8±0.67µs ? ?/sec 1.00 402.6±0.48µs ? ?/sec
smol-wiki-articles.csv: basic with quote/"france" 1.00 325.9±0.91µs ? ?/sec 1.00 326.4±0.49µs ? ?/sec
smol-wiki-articles.csv: basic with quote/"japan" 1.00 218.4±0.26µs ? ?/sec 1.01 220.5±0.20µs ? ?/sec
smol-wiki-articles.csv: basic with quote/"machine" 1.00 143.0±0.12µs ? ?/sec 1.04 148.8±0.21µs ? ?/sec
smol-wiki-articles.csv: basic with quote/"miles" "davis" 1.00 11.7±0.06ms ? ?/sec 1.00 11.8±0.01ms ? ?/sec
smol-wiki-articles.csv: basic with quote/"mingus" 1.00 4.4±0.03ms ? ?/sec 1.00 4.4±0.00ms ? ?/sec
smol-wiki-articles.csv: basic with quote/"rock" "and" "roll" 1.00 43.5±0.08ms ? ?/sec 1.01 43.8±0.06ms ? ?/sec
smol-wiki-articles.csv: basic with quote/"spain" 1.00 137.3±0.35µs ? ?/sec 1.05 144.4±0.23µs ? ?/sec
smol-wiki-articles.csv: basic without quote/film 1.00 125.3±0.30µs ? ?/sec 1.06 133.1±0.37µs ? ?/sec
smol-wiki-articles.csv: basic without quote/france 1.21 1782.6±1.65µs ? ?/sec 1.00 1477.0±1.39µs ? ?/sec
smol-wiki-articles.csv: basic without quote/japan 1.28 1363.9±0.80µs ? ?/sec 1.00 1064.3±1.79µs ? ?/sec
smol-wiki-articles.csv: basic without quote/machine 1.73 760.3±0.81µs ? ?/sec 1.00 439.6±0.75µs ? ?/sec
smol-wiki-articles.csv: basic without quote/miles davis 1.03 17.0±0.03ms ? ?/sec 1.00 16.5±0.02ms ? ?/sec
smol-wiki-articles.csv: basic without quote/mingus 1.07 5.3±0.01ms ? ?/sec 1.00 5.0±0.00ms ? ?/sec
smol-wiki-articles.csv: basic without quote/rock and roll 1.01 63.9±0.18ms ? ?/sec 1.00 63.0±0.07ms ? ?/sec
smol-wiki-articles.csv: basic without quote/spain 2.07 667.4±0.93µs ? ?/sec 1.00 322.8±0.29µs ? ?/sec
smol-wiki-articles.csv: prefix search/c 1.00 343.1±0.47µs ? ?/sec 1.00 344.0±0.34µs ? ?/sec
smol-wiki-articles.csv: prefix search/g 1.00 374.4±3.42µs ? ?/sec 1.00 374.1±0.44µs ? ?/sec
smol-wiki-articles.csv: prefix search/j 1.00 359.9±0.31µs ? ?/sec 1.00 361.2±0.79µs ? ?/sec
smol-wiki-articles.csv: prefix search/q 1.01 102.0±0.12µs ? ?/sec 1.00 101.4±0.32µs ? ?/sec
smol-wiki-articles.csv: prefix search/t 1.00 536.7±1.39µs ? ?/sec 1.00 534.3±0.84µs ? ?/sec
smol-wiki-articles.csv: prefix search/x 1.00 400.9±1.00µs ? ?/sec 1.00 399.5±0.45µs ? ?/sec
smol-wiki-articles.csv: proximity/april paris 3.86 14.4±0.01ms ? ?/sec 1.00 3.7±0.01ms ? ?/sec
smol-wiki-articles.csv: proximity/diesel engine 12.98 10.4±0.01ms ? ?/sec 1.00 803.5±1.13µs ? ?/sec
smol-wiki-articles.csv: proximity/herald sings 1.00 12.7±0.06ms ? ?/sec 5.29 67.1±0.09ms ? ?/sec
smol-wiki-articles.csv: proximity/tea two 6.48 1452.1±2.78µs ? ?/sec 1.00 224.1±0.38µs ? ?/sec
smol-wiki-articles.csv: typo/Disnaylande 3.89 8.5±0.01ms ? ?/sec 1.00 2.2±0.01ms ? ?/sec
smol-wiki-articles.csv: typo/aritmetric 3.78 10.3±0.01ms ? ?/sec 1.00 2.7±0.00ms ? ?/sec
smol-wiki-articles.csv: typo/linax 8.91 1426.7±0.97µs ? ?/sec 1.00 160.1±0.18µs ? ?/sec
smol-wiki-articles.csv: typo/migrosoft 7.48 1417.3±5.84µs ? ?/sec 1.00 189.5±0.88µs ? ?/sec
smol-wiki-articles.csv: typo/nympalidea 3.96 7.2±0.01ms ? ?/sec 1.00 1810.1±2.03µs ? ?/sec
smol-wiki-articles.csv: typo/phytogropher 3.71 7.2±0.01ms ? ?/sec 1.00 1934.3±6.51µs ? ?/sec
smol-wiki-articles.csv: typo/sisan 6.44 1497.2±1.38µs ? ?/sec 1.00 232.7±0.94µs ? ?/sec
smol-wiki-articles.csv: typo/the fronce 6.92 2.9±0.00ms ? ?/sec 1.00 418.0±1.76µs ? ?/sec
smol-wiki-articles.csv: words/Abraham machin 16.63 10.8±0.01ms ? ?/sec 1.00 649.7±1.08µs ? ?/sec
smol-wiki-articles.csv: words/Idaho Bellevue pizza 27.15 25.6±0.03ms ? ?/sec 1.00 944.2±5.07µs ? ?/sec
smol-wiki-articles.csv: words/Kameya Tokujirō mingus monk 26.87 40.7±0.05ms ? ?/sec 1.00 1515.3±2.73µs ? ?/sec
smol-wiki-articles.csv: words/Ulrich Hensel meilisearch milli 11.99 48.8±0.10ms ? ?/sec 1.00 4.1±0.02ms ? ?/sec
smol-wiki-articles.csv: words/the black saint and the sinner lady and the good doggo 4.90 110.0±0.15ms ? ?/sec 1.00 22.4±0.03ms ? ?/sec
```
Co-authored-by: mpostma <postma.marin@protonmail.com>
Co-authored-by: ad hoc <postma.marin@protonmail.com>
2022-03-15 16:43:36 +00:00
ad hoc
3f24555c3d
custom fst automatons
2022-03-15 17:38:35 +01:00
ad hoc
628c835a22
fix tests
2022-03-15 17:38:34 +01:00
Kerollmops
21ec334dcc
Fix the compilation error of the dependency versions
2022-03-15 11:17:45 +01:00
ad hoc
13de251047
rewrite word pair distance gathering
2022-02-03 15:57:20 +01:00
mpostma
7541ab99cd
review changes
2022-02-02 12:59:01 +01:00
mpostma
d0aabde502
optimize 2 typos case
2022-02-02 12:56:09 +01:00
mpostma
55e6cb9c7b
typos on first letter counts as 2
2022-02-02 12:56:09 +01:00
mpostma
642c01d0dc
set max typos on ngram to 1
2022-02-02 12:56:08 +01:00
ad hoc
d852dc0d2b
fix phrase search
2022-02-01 20:21:33 +01:00
Marin Postma
0c84a40298
document batch support
...
reusable transform
rework update api
add indexer config
fix tests
review changes
Co-authored-by: Clément Renault <clement@meilisearch.com>
fmt
2022-01-19 12:40:20 +01:00
Tamo
01968d7ca7
ensure we get no documents and no error when filtering on an empty db
2022-01-18 11:40:30 +01:00
bors[bot]
8f4499090b
Merge #433
...
433: fix(filter): Fix two bugs. r=Kerollmops a=irevoire
- Stop lowercasing the field when looking in the field id map
- When a field id does not exist it means there is currently zero
documents containing this field thus we return an empty RoaringBitmap
instead of throwing an internal error
Will fix https://github.com/meilisearch/MeiliSearch/issues/2082 once meilisearch is released
Co-authored-by: Tamo <tamo@meilisearch.com>
2022-01-17 14:06:53 +00:00
Tamo
d1ac40ea14
fix(filter): Fix two bugs.
...
- Stop lowercasing the field when looking in the field id map
- When a field id does not exist it means there is currently zero
documents containing this field thus we returns an empty RoaringBitmap
instead of throwing an internal error
2022-01-17 13:51:46 +01:00
Samyak S Sarnayak
2d7607734e
Run cargo fmt on matching_words.rs
2022-01-17 13:04:33 +05:30
Samyak S Sarnayak
5ab505be33
Fix highlight by replacing num_graphemes_from_bytes
...
num_graphemes_from_bytes has been renamed in the tokenizer to
num_chars_from_bytes.
Highlight now works correctly!
2022-01-17 13:02:55 +05:30
Samyak S Sarnayak
e752bd06f7
Fix matching_words tests to compile successfully
...
The tests still fail due to a bug in https://github.com/meilisearch/tokenizer/pull/59
2022-01-17 11:37:45 +05:30
Samyak S Sarnayak
30247d70cd
Fix search highlight for non-unicode chars
...
The `matching_bytes` function takes a `&Token` now and:
- gets the number of bytes to highlight (unchanged).
- uses `Token.num_graphemes_from_bytes` to get the number of grapheme
clusters to highlight.
In essence, the `matching_bytes` function returns the number of matching
grapheme clusters instead of bytes. Should this function be renamed
then?
Added proper highlighting in the HTTP UI:
- requires dependency on `unicode-segmentation` to extract grapheme
clusters from tokens
- `<mark>` tag is put around only the matched part
- before this change, the entire word was highlighted even if only a
part of it matched
2022-01-17 11:37:44 +05:30
Tamo
98a365aaae
store the geopoint in three dimensions
2021-12-14 12:21:24 +01:00
Clément Renault
25faef67d0
Remove the database setup in the filter_depth test
2021-12-09 11:57:53 +01:00
Clément Renault
65519bc04b
Test that empty filters return a None
2021-12-09 11:57:53 +01:00
Clément Renault
ef59762d8e
Prefer returning None instead of the Empty Filter state
2021-12-09 11:57:52 +01:00
Clément Renault
ee856a7a46
Limit the max filter depth to 2000
2021-12-07 17:36:45 +01:00
Clément Renault
32bd9f091f
Detect the filters that are too deep and return an error
2021-12-07 17:20:11 +01:00
Clément Renault
90f49eab6d
Check the filter max depth limit and reject the invalid ones
2021-12-07 16:32:48 +01:00
Marin Postma
6eb47ab792
remove update_id in UpdateBuilder
2021-11-16 13:07:04 +01:00
Irevoire
0ea0146e04
implement deref &str on the tokens
2021-11-09 11:34:10 +01:00
Tamo
7483c7513a
fix the filterable fields
2021-11-07 01:52:19 +01:00
Tamo
e5af3ac65c
rename the filter_condition.rs to filter.rs
2021-11-06 16:37:55 +01:00
Tamo
6831c23449
merge with main
2021-11-06 16:34:30 +01:00
Tamo
b249989bef
fix most of the tests
2021-11-06 01:32:12 +01:00
Tamo
27a6a26b4b
makes the parse function part of the filter_parser
2021-11-05 10:46:54 +01:00
Tamo
76d961cc77
implements the last errors
2021-11-04 17:42:06 +01:00
Tamo
8234f9fdf3
recreate most filter error except for the geosearch
2021-11-04 17:24:55 +01:00
Tamo
07a5ffb04c
update http-ui
2021-11-04 15:52:22 +01:00
Tamo
a58bc5bebb
update milli with the new parser_filter
2021-11-04 15:02:36 +01:00
Tamo
76a2adb7c3
re-enable the tests in the parser and start the creation of an error type
2021-11-02 17:35:17 +01:00
many
ed6db19681
Fix PR comments
2021-10-28 11:18:32 +02:00
many
2be755ce75
Lower error check, already check in meilisearch
2021-10-27 19:50:41 +02:00
many
3599df77f0
Change some error messages
2021-10-27 19:33:01 +02:00
bors[bot]
d7943fe225
Merge #402
...
402: Optimize document transform r=MarinPostma a=MarinPostma
This pr optimizes the transform of documents additions in the obkv format. Instead on accepting any serializable objects, we instead treat json and CSV specifically:
- For json, we build a serde `Visitor`, that transform the json straight into obkv without intermediate representation.
- For csv, we directly write the lines in the obkv, applying other optimization as well.
Co-authored-by: marin postma <postma.marin@protonmail.com>
2021-10-26 09:55:28 +00:00
Clémentine Urquizar
208903ddde
Revert "Replacing pest with nom "
2021-10-25 11:58:00 +02:00
marin postma
2e62925a6e
fix tests
2021-10-25 10:26:42 +02:00
marin postma
8d70b01714
optimize document deserialization
2021-10-25 10:26:42 +02:00
Tamo
1327807caa
add some error messages
2021-10-22 19:00:33 +02:00
Tamo
c8d03046bf
add a check on the fid in the geosearch
2021-10-22 18:08:18 +02:00
Tamo
3942b3732f
re-implement the geosearch
2021-10-22 18:03:39 +02:00
Tamo
7cd9109e2f
lowercase value extracted from Token
2021-10-22 17:50:15 +02:00
Tamo
e25ca9776f
start updating the exposed function to makes other modules happy
2021-10-22 17:23:22 +02:00
Tamo
6c9165b6a8
provide a helper to parse the token but to not handle the errors
2021-10-22 16:52:13 +02:00
Tamo
efb2f8b325
convert the errors
2021-10-22 16:38:35 +02:00
Tamo
c27870e765
integrate a first version without any error handling
2021-10-22 14:33:18 +02:00
Tamo
01dedde1c9
update some names and move some parser out of the lib.rs
2021-10-22 01:59:38 +02:00
Tamo
c634d43ac5
add a simple test on the filters with an integer
2021-10-21 17:10:27 +02:00
Tamo
6c15f50899
rewrite the parser logic
2021-10-21 16:45:42 +02:00
Tamo
e1d81342cf
add test on the or and and operator
2021-10-21 13:01:25 +02:00
Tamo
423baac08b
fix the tests
2021-10-21 12:45:40 +02:00
Tamo
36281a653f
write all the simple tests
2021-10-21 12:40:11 +02:00
Tamo
661bc21af5
Fix the filter parser
...
And add a bunch of tests on the filter::from_array
2021-10-21 11:45:03 +02:00
bors[bot]
59cc59e93e
Merge #358
...
358: Replacing pest with nom r=Kerollmops a=CNLHC
Co-authored-by: 刘瀚骋 <cn_lhc@qq.com>
2021-10-16 20:44:38 +00:00
刘瀚骋
7666e4f34a
follow the suggestions
2021-10-14 21:37:59 +08:00
刘瀚骋
2ea2f7570c
use nightly cargo to format the code
2021-10-14 16:46:13 +08:00
刘瀚骋
e750465e15
check logic for geolocation.
2021-10-14 16:12:00 +08:00
刘瀚骋
cd359cd96e
WIP: extract the error trait bound to new trait.
2021-10-13 18:04:15 +08:00
刘瀚骋
5de5dd80a3
WIP: remove '_nom' suffix/redundant error enum/...
2021-10-13 11:06:15 +08:00
刘瀚骋
2c65781d91
format
2021-10-12 22:20:22 +08:00
many
360c5ff3df
Remove limit of 1000 position per attribute
...
Instead of using an arbitrary limit we encode the absolute position in a u32
using one strong u16 for the field id and a weak u16 for the relative position in the attribute.
2021-10-12 10:10:50 +02:00
刘瀚骋
d323e35001
add a test case
2021-10-12 13:30:40 +08:00
刘瀚骋
70f576d5d3
error handling
2021-10-12 13:30:40 +08:00
刘瀚骋
28f9be8d7c
support syntax
2021-10-12 13:30:40 +08:00
刘瀚骋
469d92c569
tweak error handling
2021-10-12 13:30:40 +08:00
刘瀚骋
7a90a101ee
reorganize parser logic
2021-10-12 13:30:40 +08:00
刘瀚骋
f7796edc7e
remove everything about pest
2021-10-12 13:30:40 +08:00
刘瀚骋
ac1df9d9d7
fix typo and remove pest
2021-10-12 13:30:40 +08:00
刘瀚骋
50ad750ec1
enhance error handling
2021-10-12 13:30:40 +08:00
刘瀚骋
8748df2ca4
draft without error handling
2021-10-12 13:30:40 +08:00
Tamo
11dfe38761
Update the check on the latitude and longitude
...
Latitude are not supposed to go beyound 90 degrees or below -90.
The same goes for longitude with 180 or -180.
This was badly implemented in the filters, and was not implemented for the AscDesc rules.
2021-10-07 16:10:43 +02:00
many
085bc6440c
Apply PR comments
2021-10-06 11:12:26 +02:00
many
1bd15d849b
Reduce candidates threshold
2021-10-05 18:52:14 +02:00
many
ea4bd29d14
Apply PR comments
2021-10-05 17:35:07 +02:00
many
3296bb243c
Simplify word level position DB into a word position DB
2021-10-05 12:15:02 +02:00
many
75d341d928
Re-implement set based algorithm for attribute criterion
2021-10-05 12:14:50 +02:00
Tamo
0ee67bb7d1
improve the reserved keyword error message for the filters
2021-09-30 14:38:27 +02:00
Many
2e49230ca2
Update milli/src/search/criteria/attribute.rs
...
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-09-29 14:49:45 +02:00
Many
7ad0214089
Update milli/src/search/criteria/attribute.rs
...
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-09-29 14:49:41 +02:00
many
1df5b8712b
Hotfix meilisearch#1707
2021-09-29 14:41:56 +02:00
many
8046ae4bd5
Count the number of char instead of counting bytes to assign the typo tolerance
2021-09-28 12:10:43 +02:00
Tamo
47ee93b0bd
return an error when _geoPoint is used but _geo is not sortable
2021-09-22 16:37:41 +02:00
Tamo
257e621d40
create an asc_desc module
2021-09-22 16:37:41 +02:00
mpostma
aa6c5df0bc
Implement documents format
...
document reader transform
remove update format
support document sequences
fix document transform
clean transform
improve error handling
add documents! macro
fix transform bug
fix tests
remove csv dependency
Add comments on the transform process
replace search cli
fmt
review edits
fix http ui
fix clippy warnings
Revert "fix clippy warnings"
This reverts commit a1ce3cd96e603633dbf43e9e0b12b2453c9c5620.
fix review comments
remove smallvec in transform loop
review edits
2021-09-21 16:58:33 +02:00
Tamo
c695a1ffd2
add the possibility to sort by descending order on geoPoint
2021-09-15 11:49:58 +02:00
Tamo
91ce4d1721
Stop iterating through the whole list of points
...
We stop when there is no possible candidates left
2021-09-15 11:49:58 +02:00
Tamo
3fc145c254
if we have no rtree we return all other provided documents
2021-09-09 17:44:09 +02:00
Irevoire
a84f3a8b31
Apply suggestions from code review
...
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-09-09 15:09:35 +02:00
Tamo
b15c77ebc4
return an error in case a user try to sort with :desc
2021-09-08 18:24:09 +02:00
Tamo
e5ef0cad9a
use meters in the filters
2021-09-08 18:24:09 +02:00
Tamo
4f69b190bc
remove the distance from the search, the computation of the distance will be made on meilisearch side
2021-09-08 18:24:09 +02:00
Tamo
7ae2a7341c
introduce the reserved keywords in the filters
2021-09-08 18:24:09 +02:00
Tamo
6d5762a6c8
handle the case where you forgot entirely the parenthesis
2021-09-08 18:24:09 +02:00
Tamo
ebf82ac28c
improve the error messages and add tests for the filters
2021-09-08 18:24:09 +02:00
Tamo
e8c093c1d0
fix the error handling in the filters
2021-09-08 18:24:09 +02:00
Tamo
b1bf7d4f40
reformat
2021-09-08 18:24:09 +02:00
Tamo
aca707413c
remove the memory leak
2021-09-08 18:24:09 +02:00
Tamo
a8a1f5bd55
move the geosearch criteria out of asc_desc.rs
2021-09-08 18:24:09 +02:00
Tamo
13c78e5aa2
Implement the _geoPoint in the sortable
2021-09-08 18:24:09 +02:00
Tamo
5bb175fc90
only index _geo if it's set as sortable OR filterable
...
and only allow the filters if geo was set to filterable
2021-09-08 17:51:08 +02:00
Irevoire
4b459768a0
create the _geoRadius filter
2021-09-08 17:51:07 +02:00
Irevoire
6d70978edc
update the facet filter grammar
2021-09-08 17:51:07 +02:00
Kerollmops
fd3daa4423
Throw a query time error when a sort param is used but sort ranking rule is missing
2021-09-07 11:02:00 +02:00
Alexey Shekhirin
c2517e7d5f
fix(facet): string fields sorting
2021-09-03 11:58:26 +03:00
bors[bot]
5cbe879325
Merge #308
...
308: Implement a better parallel indexer r=Kerollmops a=ManyTheFish
Rewrite the indexer:
- enhance memory consumption control
- optimize parallelism using rayon and crossbeam channel
- factorize the different parts and make new DB implementation easier
- optimize and fix prefix databases
Co-authored-by: many <maxime@meilisearch.com>
2021-09-02 15:03:52 +00:00
many
5c962c03dd
Fix and optimize word_prefix_pair_proximity_docids database
2021-09-01 16:48:40 +02:00
many
1d314328f0
Plug new indexer
2021-09-01 16:48:36 +02:00
Alexey Shekhirin
0e379558a1
fix(search): get sortable_fields only if criteria present
2021-08-31 21:35:41 +03:00
Clément Renault
89d0758713
Revert "Revert "Sort at query time""
2021-08-24 11:55:16 +02:00
Clémentine Urquizar
922f9fd4d5
Revert "Sort at query time"
2021-08-20 18:09:17 +02:00
Kerollmops
1b7f6ea1e7
Return a new error when the sort criteria is not sortable
2021-08-18 15:04:07 +02:00
Kerollmops
407f53872a
Add a sort_criteria method to the Search builder struct
2021-08-18 15:04:07 +02:00
Kerollmops
687cd2e205
Introduce the new Sort criterion and AscDesc enum
2021-08-18 15:04:07 +02:00
Kerollmops
e9ada44509
AscDesc criterion returns documents ordered by numbers then by strings
2021-08-17 13:21:31 +02:00
Kerollmops
110bf6b778
Make the FacetStringIter work in both, ascending and descending orders
2021-08-17 11:18:40 +02:00
Kerollmops
22ebd2658f
Introduce the EitherString/RevRange private aliases
2021-08-17 10:47:15 +02:00
Kerollmops
7a5889bc5a
Introduce the highest_reverse_iter private method
2021-08-17 10:45:26 +02:00
Kerollmops
ad0d311f8a
Introduce the FacetStringLevelZeroRevRange struct
2021-08-17 10:44:43 +02:00
Kerollmops
6214c38da9
Introduce the FacetStringGroupRevRange struct
2021-08-17 10:44:27 +02:00
Kerollmops
1c604de158
Introduce the highest_iter private method on the FacetStringIter struct
2021-08-17 10:41:11 +02:00
Kerollmops
64df159057
Introduce the new_reducing constructor on the FacetStringIter struct
2021-08-17 10:35:06 +02:00
Kerollmops
01a4052828
Move the FacetStringIter creation logic into a private new method
2021-08-17 10:29:43 +02:00
many
7dbefae1e3
Make facet string iterator non reducing
2021-08-12 17:23:39 +02:00
many
8fdf860c17
Remove max values by facet limit for facet distribution
2021-08-12 11:29:20 +02:00
Kerollmops
dc2b63abdf
Introduce an empty FilterCondition variant to support unknown fields
2021-07-27 16:34:04 +02:00
Kerollmops
7aa6cc9b04
Do not insert fields in the map when changing the settings
2021-07-22 18:40:12 +02:00
Clément Renault
0227254a65
Return the original string values for the inverted facet index database
2021-07-21 16:59:39 +02:00
Kerollmops
03a01166ba
Display the original facet string value from the linear facet database
2021-07-21 16:59:39 +02:00
Clément Renault
d23c250ad5
Fix a bound error in the facet string range construction
2021-07-21 16:59:39 +02:00
Clément Renault
081278dfd6
Use the facet string levels when computing the facet distribution
2021-07-21 16:59:39 +02:00
Kerollmops
8c86348119
Indexing the facet strings levels
2021-07-21 16:59:38 +02:00
Kerollmops
a7ae552ba7
Fix the FacetStringLevelZeroRange range when unbounded
2021-07-21 16:59:38 +02:00
Kerollmops
757b2b502a
Remove the FacetValueStringCodec
2021-07-21 16:59:38 +02:00
Kerollmops
adfd4da24c
Introduce the FacetStringIter iterator
2021-07-21 16:59:38 +02:00
Kerollmops
a79661c6dc
Introduce a lot of facet string helper iterators
2021-07-21 16:59:38 +02:00
Kerollmops
851f979039
Describe the way we want to group the facet strings
2021-07-21 16:59:38 +02:00
Kerollmops
f858f64b1f
Move the facet number iterators into their own module
2021-07-21 16:59:37 +02:00
Kerollmops
838ed1cd32
Use an u16 field id instead of one byte
2021-07-06 11:58:03 +02:00
many
9f62149b94
Fix matching lenghth in matching_words
2021-07-01 19:03:28 +02:00
Kerollmops
32b7bd366f
Remove the roaring operation functions warnings
2021-06-30 14:12:56 +02:00
Irevoire
6044b80362
Update milli/src/search/matching_words.rs
...
Co-authored-by: Clément Renault <renault.cle@gmail.com>
2021-06-30 00:35:26 +02:00
Tamo
be75e738b1
add more tests
2021-06-29 16:24:58 +02:00
Tamo
56fceb1928
re-implement the Damerau-Levenshtein used for the highlighting
2021-06-29 15:36:03 +02:00
Kerollmops
a6218a20ae
Introduce a new InvalidFacetsDistribution user error
2021-06-23 13:56:19 +02:00
Kerollmops
2364777838
Return an error for when a field distribution cannot be done
2021-06-23 11:50:49 +02:00
Kerollmops
aeaac743ff
Replace an if let some by a match
2021-06-23 11:33:30 +02:00
Tamo
3d90b03d7b
fix the limit
...
There was no check on the limit and thus, if a user especified a very large number this line could causes a panic
2021-06-22 14:52:13 +02:00
Tamo
9716fb3b36
format the whole project
2021-06-16 18:33:33 +02:00
Kerollmops
7ac441e473
Fix small typos
2021-06-16 11:03:37 +02:00
Kerollmops
adf0c389c5
Rename FilterParsing into InvalidFilter
2021-06-16 11:03:36 +02:00
Kerollmops
8cfe3e1ec0
Rename DatabaseSizeReached into MaxDatabaseSizeReached
2021-06-16 11:03:36 +02:00
Kerollmops
a7d6930905
Replace the panicking expect by tracked Errors
2021-06-15 11:51:32 +02:00
Kerollmops
f0e804afd5
Rename the FieldIdMapMissingEntry from_db_name field into process
2021-06-15 11:13:04 +02:00
Kerollmops
312c2d1d8e
Use the Error enum everywhere in the project
2021-06-14 16:58:38 +02:00
Many
f4cab080a6
Update milli/src/search/query_tree.rs
...
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-06-10 11:30:51 +02:00
Many
36715f571c
Update milli/src/search/criteria/proximity.rs
...
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-06-10 11:30:33 +02:00
many
e923a3ed6a
Replace Consecutive by Phrase in query tree
...
Replace Consecutive by Phrase in query tree in order to remove theorical bugs,
due of the Consecutive enum type.
2021-06-10 11:16:16 +02:00
Many
faf148d297
Update milli/src/search/query_tree.rs
...
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-06-08 17:52:37 +02:00
many
b489d699ce
Make hard separators split phrase query
...
hard separators will now split a phrase query as double double-quotes
Fix #208
2021-06-08 17:29:38 +02:00
many
26a9974667
Make asc/desc criterion return resting documents
...
Fix #161.2
2021-06-02 17:41:48 +02:00
Kerollmops
3c304c89d4
Make sure that we generate the faceted database when required
2021-06-02 16:24:58 +02:00
Kerollmops
3b1cd4c4b4
Rename the FacetCondition into FilterCondition
2021-06-02 16:24:58 +02:00
Kerollmops
c2afdbb1fb
Move and comment some internal facet_condition helper functions
2021-06-02 16:24:58 +02:00
Marin Postma
1e366dae3e
remove useless lifetime on Distinct Trait
2021-06-02 16:24:58 +02:00
Kerollmops
187c713de5
Remove the MapDistinct struct as now distinct attributes are faceted
2021-06-02 16:24:57 +02:00
Kerollmops
2a3f9b32ff
Rename the faceted fields into filterable fields
2021-06-02 16:24:57 +02:00
bors[bot]
270da98c46
Merge #202
...
202: Add field id word count docids database r=Kerollmops a=LegendreM
This PR introduces a new database, `field_id_word_count_docids`, that maps the number of words in an attribute with a list of document ids. This relation is limited to attributes that contain less than 11 words.
This database is used by the exactness criterion to know if a document has an attribute that contains exactly the query without any additional word.
Fix #165
Fix #196
Related to [specifications:#36](https://github.com/meilisearch/specifications/pull/36 )
Co-authored-by: many <maxime@meilisearch.com>
Co-authored-by: Many <legendre.maxime.isn@gmail.com>
2021-06-01 16:09:48 +00:00
many
e857ca4d7d
Fix PR comments
2021-06-01 18:06:46 +02:00
many
225ae6fd25
Resolve PR comments
2021-06-01 11:53:09 +02:00
many
1df68d342a
Make the MatchingWords return the number of matching bytes
2021-05-31 18:22:29 +02:00
many
c701f8bf36
Use field id word count database in exactness criterion
2021-05-31 16:27:28 +02:00
bors[bot]
2f5e61bacb
Merge #184
...
184: Transfer numbers and strings facets into the appropriate facet databases r=Kerollmops a=Kerollmops
This pull request is related to https://github.com/meilisearch/milli/issues/152 and changes the layout of the facets values, numbers and strings are now in dedicated databases and the user no more needs to define the type of the fields. No more conversion between the two types is done, numbers (floats and integers converted to f64) go to the facet float database and strings go to the strings facet database.
There is one related issue that I found regarding CSVs, the values in a CSV are always considered to be strings, [meilisearch/specifications#28 ](d916b57d74/text/0028-indexing-csv.md
) fixes this issue by allowing the user to define the fields types using `:` in the "CSV Formatting Rules" section.
All previous tests on facets have been modified to pass again and I have also done hand-driven tests with the 115m songs dataset. Everything seems to be good!
Fixes #192 .
Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
2021-05-31 13:32:58 +00:00
Kerollmops
1c0a5cd136
Resolve code modification suggestions
2021-05-31 15:22:50 +02:00
many
a5e98cf46d
Fix plane sweep algorithm
2021-05-25 18:21:55 +02:00
Clément Renault
3a4a150ef0
Fix the tests and remaining warnings
2021-05-25 11:31:06 +02:00
Clément Renault
02c655ff1a
Refine the facet distribution to use both databases
2021-05-25 11:30:00 +02:00
Clément Renault
79efded841
Refine the FacetCondition from_array constructor
2021-05-25 11:30:00 +02:00
Clément Renault
f7efde11d9
Refine the facet condition to use both facet databases
2021-05-25 11:30:00 +02:00
Clément Renault
e62b89a2ed
Make the facet distinct work with the new split facets
2021-05-25 11:30:00 +02:00
Clément Renault
bd7b285bae
Split the update side to use the number and the strings facet databases
2021-05-25 11:30:00 +02:00
Clément Renault
038e03a4e4
Use both facet databases in the FacetIter type
2021-05-25 11:30:00 +02:00
Clément Renault
597144b0b9
Use both number and string facet databases in the distinct system
2021-05-25 11:29:59 +02:00
many
a3944a7083
Introduce a filtered_candidates field
2021-05-11 11:37:40 +02:00
many
efba662ca6
Fix clippy warnings in cirteria
2021-05-10 10:27:18 +02:00
many
e923d51b8f
Make bucket candidates optionals
2021-05-10 10:27:04 +02:00
Many
44b6843de7
Fix pull request reviews
...
Update milli/src/fields_ids_map.rs
Update milli/src/search/criteria/exactness.rs
Update milli/src/search/criteria/mod.rs
2021-05-06 14:31:03 +02:00
many
c1ce4e4ca9
Introduce mocked ExactAttribute step in exactness criterion
2021-05-06 14:28:31 +02:00
many
a3f8686fbf
Introduce exactness criterion
2021-05-06 14:28:30 +02:00
many
ee09e50e7f
Remove excluded document in criteria iterations
...
- pass excluded document to criteria to remove them in higher levels of the bucket-sort
- merge already returned document with excluded documents to avoid duplicas
Related to #125 and #112
Fix #170
2021-04-29 12:09:38 +02:00
many
31607bf9cd
Add a threshold on proximity when choosing between linear/set algorithm
2021-04-28 14:57:22 +02:00
many
3b7e6afb55
Make some refacto and add documentation
2021-04-28 13:53:27 +02:00
Many
0add4d735c
Update milli/src/search/criteria/attribute.rs
...
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-04-27 17:40:34 +02:00
Many
3794ffc952
Update milli/src/search/criteria/attribute.rs
...
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-04-27 17:39:23 +02:00
Many
329bd4a1bb
Update milli/src/search/criteria/attribute.rs
...
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-04-27 17:39:03 +02:00
Many
3b1358b62f
Update milli/src/search/criteria/attribute.rs
...
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-04-27 17:32:19 +02:00
Many
c862b1bc6b
Update milli/src/search/criteria/attribute.rs
...
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-04-27 17:32:10 +02:00
Many
e92d137676
Update milli/src/search/criteria/attribute.rs
...
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-04-27 17:31:42 +02:00
Many
b3d6c6a9a0
Update milli/src/search/criteria/attribute.rs
...
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-04-27 17:31:13 +02:00
Many
498c2b298c
Update milli/src/search/criteria/attribute.rs
...
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-04-27 17:30:02 +02:00
Many
0e4e6dfada
Update milli/src/search/criteria/proximity.rs
...
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-04-27 17:29:52 +02:00
Many
47d780b8ce
Update milli/src/search/criteria/mod.rs
...
Co-authored-by: Irevoire <tamo@meilisearch.com>
2021-04-27 14:39:53 +02:00
Many
0daa0e170a
Fix PR comments
...
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-04-27 14:39:53 +02:00
many
71740805a7
Fix forgotten typo tests
2021-04-27 14:39:53 +02:00
many
e77291a6f3
Optimize Atrribute criterion on big requests
2021-04-27 14:39:53 +02:00
many
716c8e22b0
Add style and comments
2021-04-27 14:39:52 +02:00
many
f853790016
Use the LCM of 10 first numbers to compute attribute rank
2021-04-27 14:39:52 +02:00
many
2b036449be
Fix the return of equal candidates in different pages
2021-04-27 14:39:52 +02:00
many
0efa011e09
Make a small code clean-up
2021-04-27 14:39:52 +02:00
many
17c8c6f945
Make set algorithm return None when nothing can be returned
2021-04-27 14:39:52 +02:00
many
b3e2280bb9
Debug attribute criterion
...
* debug folding when initializing iterators
2021-04-27 14:39:52 +02:00
many
1eee0029a8
Make attribute criterion typo/prefix tolerant
2021-04-27 14:39:52 +02:00
many
59f58c15f7
Implement attribute criterion
...
* Implement WordLevelIterator
* Implement QueryLevelIterator
* Implement set algorithm based on iterators
Not tested + Some TODO to fix
2021-04-27 14:39:52 +02:00
Clément Renault
361193099f
Reduce the amount of branches when query tree flattened
2021-04-27 14:39:52 +02:00
many
ab92c814c3
Fix attributes score
2021-04-27 14:35:43 +02:00
Clément Renault
658f316511
Introduce the Initial Criterion
2021-04-27 14:35:43 +02:00
many
75e7b1e3da
Implement test Context methods
2021-04-27 14:25:34 +02:00
many
4ff67ec2ee
Implement attribute criterion for small amounts of candidates
2021-04-27 14:25:34 +02:00
Kerollmops
0f4c0beffd
Introduce the Attribute criterion
2021-04-27 14:25:34 +02:00
tamo
f8dee1b402
[makes clippy happy] search/criteria/proximity.rs
2021-04-21 12:36:45 +02:00
Alexey Shekhirin
6fa00c61d2
feat(search): support words_limit
2021-04-20 12:22:04 +03:00
Kerollmops
2aeef09316
Remove debug logs while iterating through the facet levels
2021-04-20 10:23:31 +02:00
Kerollmops
51767725b2
Simplify integer and float functions trait bounds
2021-04-20 10:23:31 +02:00
Alexey Shekhirin
33860bc3b7
test(update, settings): set & reset synonyms
...
fixes after review
more fixes after review
2021-04-18 11:24:17 +03:00
Alexey Shekhirin
e39aabbfe6
feat(search, update): synonyms
2021-04-18 11:24:17 +03:00
Marin Postma
9c4660d3d6
add tests
2021-04-15 16:25:56 +02:00
Marin Postma
75464a1baa
review fixes
2021-04-15 16:25:56 +02:00
Marin Postma
2f73fa55ae
add documentation
2021-04-15 16:25:55 +02:00
Marin Postma
45c45e11dd
implement distinct attribute
...
distinct can return error
facet distinct on numbers
return distinct error
review fixes
make get_facet_value more generic
fixes
2021-04-15 16:25:55 +02:00
tamo
dcb00b2e54
test a new implementation of the stop_words
2021-04-12 18:35:33 +02:00
tamo
da036dcc3e
Revert "Integrate the stop_words in the querytree"
...
This reverts commit 12fb509d84
.
We revert this commit because it's causing the bug #150 .
The initial algorithm we implemented for the stop_words was:
1. remove the stop_words from the dataset
2. keep the stop_words in the query to see if we can generate new words by
integrating typos or if the word was a prefix
=> This was causing the bug since, in the case of “The hobbit”, we were
**always** looking for something starting with “t he” or “th e”
instead of ignoring the word completely.
For now we are going to fix the bug by completely ignoring the
stop_words in the query.
This could cause another problem were someone mistyped a normal word and
ended up typing a stop_word.
For example imagine someone searching for the music “Won't he do it”.
If that person misplace one space and write “Won' the do it” then we
will loose a part of the request.
One fix would be to update our query tree to something like that:
---------------------
OR
OR
TOLERANT hobbit # the first option is to ignore the stop_word
AND
CONSECUTIVE # the second option is to do as we are doing
EXACT t # currently
EXACT he
TOLERANT hobbit
---------------------
This would increase drastically the size of our query tree on request
with a lot of stop_words. For example think of “The Lord Of The Rings”.
For now whatsoever we decided we were going to ignore this problem and consider
that it doesn't reduce too much the relevancy of the search to do that
while it improves the performances.
2021-04-12 18:35:33 +02:00
tamo
12fb509d84
Integrate the stop_words in the querytree
...
remove the stop_words from the querytree except if it was a prefix or a typo
2021-04-01 13:57:55 +02:00
tamo
a2f46029c7
implement a first version of the stop_words
...
The front must provide a BTreeSet containing the stop words
The stop_words are set at None if an empty Set is provided
add the stop-words in the http-ui interface
Use maplit in the test
and remove all the useless drop(rtxn) at the end of all tests
2021-04-01 13:57:55 +02:00
Alexey Shekhirin
1e3f05db8f
use fixed number of candidates as a threshold
2021-03-30 11:57:10 +03:00