Commit Graph

334 Commits

Author SHA1 Message Date
Louis Dureuil
928e6e4c05
Breaking change: remove vector for score details 2024-04-04 15:57:29 +02:00
Clément Renault
877f4b1045
Support negative phrases 2024-03-28 15:51:43 +01:00
Clément Renault
69f8b2730d
Fix the tests 2024-03-28 10:47:04 +01:00
Clément Renault
34262c7a0d
Add analytics for the negative operator 2024-03-26 18:01:27 +01:00
Clément Renault
1da9e0f246
Better support space around the negative operator (-) 2024-03-26 17:47:13 +01:00
Clément Renault
e4a3e603b3
Expose a first working version of the negative keyword 2024-03-26 17:47:13 +01:00
Tamo
6079141ea6 snapshot the scores side by side with the score details 2024-03-19 18:30:14 +01:00
Tamo
2c3af8e513 query the detailed score detail in the test 2024-03-19 18:09:02 +01:00
Tamo
b8cda6c300 fix the search cutoff and add a test 2024-03-19 10:35:47 +01:00
Tamo
4a467739cd implements a first version of the cutoff without settings 2024-03-19 10:28:21 +01:00
shuangcui
5c95b5c933 chore: remove repetitive words
Signed-off-by: shuangcui <fliter@qq.com>
2024-03-14 21:28:55 +08:00
Louis Dureuil
25f64ce7df
Replace logging timer by spans 2024-03-05 11:05:42 +01:00
Louis Dureuil
452a343a2b
Fix imports 2024-02-28 18:09:40 +01:00
Tamo
e773dfa9ba
get rids of log in milli and add logs for the bucket sort 2024-02-08 15:04:05 +01:00
ManyTheFish
5f5a486895 Reduce formatting time 2024-01-11 11:36:41 +01:00
ManyTheFish
5f4fc6c955 Add timer logs 2024-01-11 09:44:16 +01:00
Many the fish
9e1b458010
Merge branch 'main' into change-proximity-precision-settings 2023-12-18 09:08:47 +01:00
ManyTheFish
6425996e36 Change the naming of attributeScale and wordScale into byAttribute and byWord 2023-12-14 16:31:00 +01:00
Louis Dureuil
806e5b6899
Tests pass 2023-12-14 16:08:41 +01:00
Louis Dureuil
e0cc775dc4
Various changes
- DistributionShift in Search object (to be set from model in embed?)
- Fix issue where embedder index wasn't computed at search time
- Accept as default embedder either the "default" one, or the only embedder when there is only one
2023-12-14 16:08:41 +01:00
Louis Dureuil
922a640188
WIP multi embedders
fixed template bugs
2023-12-14 16:08:41 +01:00
Louis Dureuil
d4715e0c4d
Fix same vector sort bug 2023-12-14 16:08:41 +01:00
Louis Dureuil
11e2a2c1aa
Fix geosort bug 2023-12-14 16:08:41 +01:00
Louis Dureuil
65e49b7092
Remove stuff, add distribution shift (WIP) 2023-12-14 16:08:38 +01:00
Louis Dureuil
cb4ebe163e
WIP 2023-12-14 16:07:49 +01:00
Louis Dureuil
dde3a04679
WIP arroy integration 2023-12-14 16:07:49 +01:00
Louis Dureuil
13c2c6c16b
Small commit to add hybrid search and autoembedding 2023-12-14 16:07:48 +01:00
Clément Renault
56571f762a
Merge remote-tracking branch 'origin/main' into tmp-release-v1.5.1 2023-12-13 11:57:01 +01:00
ManyTheFish
467b49153d Implement proximityPrecision setting on milli side 2023-12-06 15:49:02 +01:00
ManyTheFish
bddc168d83 List TODOs 2023-12-06 14:59:23 +01:00
ManyTheFish
3b3fa38f27 Put the restrict list in a sub-struct 2023-11-28 18:37:57 +01:00
ManyTheFish
d6c2ee15a9 Filter on attributes before computing the docids when attribute restriction is on 2023-11-28 14:55:29 +01:00
Clément Renault
d32eb11329
Move to the v0.20.0-alpha.9 of heed 2023-11-27 11:52:22 +01:00
Clément Renault
58dac8af42
Remove the panics and unwraps 2023-11-23 15:00:48 +01:00
Clément Renault
0dbf1a16ff
Make clippy happy 2023-11-23 14:11:38 +01:00
Clément Renault
0d4482625a
Make the changes to use heed v0.20-alpha.6 2023-11-23 11:43:58 +01:00
Clément Renault
7cb7e37ba8
Merge branch 'main' into tmp-release-v1.5.0 2023-11-21 16:30:46 +01:00
ManyTheFish
1f36410541 Update tests 2023-11-13 13:36:39 +01:00
Louis Dureuil
8c649d8061
Throw error when the vector search is sent with the wrong size 2023-11-13 09:57:42 +01:00
ManyTheFish
688266c83e Remove word pair proximity prefix cache and compute it at search time 2023-11-08 14:16:01 +01:00
ManyTheFish
94206b0055 Update tests 2023-10-31 13:48:47 +01:00
ManyTheFish
1c5705c164
clean PR warnings 2023-10-30 11:22:05 +01:00
ManyTheFish
df9e5c8651
Generalize usage of CboRoaringBitmap codec to ease the use 2023-10-30 11:15:02 +01:00
ManyTheFish
17b647dfe5
Wip 2023-10-30 11:13:08 +01:00
Tamo
e7244aa485 fix warnings 2023-10-30 11:00:46 +01:00
Louis Dureuil
2bae9550c8
Add explanatory comment 2023-10-23 12:06:28 +02:00
Vivek Kumar
5fe7c4545a
compute all candidates correctly when skipping 2023-10-23 12:02:45 +02:00
meili-bors[bot]
5e0485d8dd
Merge #4131
4131: Reduce proximity range from 7 to 3 r=Kerollmops a=ManyTheFish

## Summary
This PR aims to reduce the impact of the proximity databases on the indexing time and on the database size by reducing the maximum distance between two words to be indexed in the proximity database.

## Stats

### Impact on database size and indexing time
![Impact on datasets](https://github.com/meilisearch/meilisearch/assets/6482087/28ed3d96-bdde-41c1-bdac-e90c1b1dbb23)

### Impact on search relevancy

<details>

| dataset_name | host_name        | Relevancy rate (Precision) | completion_rate  25.00% | completion_rate 50.00% | completion_rate 75.00% | completion_rate 100.00% |
|--------------|------------------|------------------------------------|-----------------|-----------------|-----------------|-----------------|
| FBIS         | 1_4_0            | percentile-10 |           0.00% |           0.00% |           0.00% |           0.00% |
| FBIS         | 1_4_0            | percentile-25 |           0.00% |           0.00% |           0.00% |           0.00% |
| FBIS         | 1_4_0            | percentile-50 |           0.00% |           0.00% |           5.00% |           5.56% |
| FBIS         | 1_4_0            | percentile-75 |           0.00% |          12.50% |          35.00% |          45.00% |
| FBIS         | 1_4_0            | percentile-90 |          20.00% |          40.00% |                 |         100.00% |
| FBIS         | 1_4_0            | average       |           5.78% |          11.16% |          21.90% |          26.29% |
| FBIS         | reduce_proximity | percentile-10 |           0.00% |           0.00% |           0.00% |           0.00% |
| FBIS         | reduce_proximity | percentile-25 |           0.00% |           0.00% |           0.00% |           0.00% |
| FBIS         | reduce_proximity | percentile-50 |           0.00% |           0.00% |           5.00% |           5.56% |
| FBIS         | reduce_proximity | percentile-75 |           0.00% |          15.00% |          35.00% |          40.00% |
| FBIS         | reduce_proximity | percentile-90 |          20.00% |          40.00% |          85.00% |         100.00% |
| FBIS         | reduce_proximity | average       |           5.55% |          11.34% |          21.75% |          26.14% |
| FR94         | 1_4_0            | percentile-10 |           0.00% |           0.00% |           0.00% |           0.00% |
| FR94         | 1_4_0            | percentile-25 |           0.00% |           0.00% |           0.00% |           0.00% |
| FR94         | 1_4_0            | percentile-50 |           0.00% |           0.00% |           0.00% |           0.00% |
| FR94         | 1_4_0            | percentile-75 |           0.00% |           5.00% |          15.00% |          42.11% |
| FR94         | 1_4_0            | percentile-90 |          15.00% |          54.55% |         100.00% |         100.00% |
| FR94         | 1_4_0            | average       |           5.95% |          12.07% |          18.70% |          25.57% |
| FR94         | reduce_proximity | percentile-10 |           0.00% |           0.00% |           0.00% |           0.00% |
| FR94         | reduce_proximity | percentile-25 |           0.00% |           0.00% |           0.00% |           0.00% |
| FR94         | reduce_proximity | percentile-50 |           0.00% |           0.00% |           0.00% |           0.00% |
| FR94         | reduce_proximity | percentile-75 |           0.00% |           5.00% |          15.00% |          42.11% |
| FR94         | reduce_proximity | percentile-90 |          15.00% |          54.55% |         100.00% |         100.00% |
| FR94         | reduce_proximity | average       |           5.79% |          12.00% |          18.70% |          25.53% |
| FT           | 1_4_0            | percentile-10 |           0.00% |           0.00% |           0.00% |           0.00% |
| FT           | 1_4_0            | percentile-25 |           0.00% |           0.00% |           0.00% |           0.00% |
| FT           | 1_4_0            | percentile-50 |           0.00% |           0.00% |           5.00% |          10.00% |
| FT           | 1_4_0            | percentile-75 |           0.00% |          15.00% |          30.00% |          40.00% |
| FT           | 1_4_0            | percentile-90 |          20.00% |          50.00% |          65.00% |         100.00% |
| FT           | 1_4_0            | average       |           5.08% |          12.58% |          20.00% |          25.49% |
| FT           | reduce_proximity | percentile-10 |           0.00% |           0.00% |           0.00% |           0.00% |
| FT           | reduce_proximity | percentile-25 |           0.00% |           0.00% |           0.00% |           0.00% |
| FT           | reduce_proximity | percentile-50 |           0.00% |           0.00% |           5.00% |          10.00% |
| FT           | reduce_proximity | percentile-75 |           0.00% |          15.00% |          30.00% |          40.00% |
| FT           | reduce_proximity | percentile-90 |          10.00% |          45.00% |          60.00% |         100.00% |
| FT           | reduce_proximity | average       |           5.01% |          12.64% |          20.10% |          25.53% |
| LAT          | 1_4_0            | percentile-10 |           0.00% |           0.00% |           0.00% |           0.00% |
| LAT          | 1_4_0            | percentile-25 |           0.00% |           0.00% |           0.00% |           0.00% |
| LAT          | 1_4_0            | percentile-50 |           0.00% |           0.00% |           5.00% |           5.00% |
| LAT          | 1_4_0            | percentile-75 |           5.00% |          15.00% |          30.00% |          30.00% |
| LAT          | 1_4_0            | percentile-90 |          15.00% |          45.00% |          60.00% |          80.00% |
| LAT          | 1_4_0            | average       |           4.80% |          11.80% |          17.88% |          21.62% |
| LAT          | reduce_proximity | percentile-10 |           0.00% |           0.00% |           0.00% |           0.00% |
| LAT          | reduce_proximity | percentile-25 |           0.00% |           0.00% |           0.00% |           0.00% |
| LAT          | reduce_proximity | percentile-50 |           0.00% |           0.00% |           5.00% |           5.00% |
| LAT          | reduce_proximity | percentile-75 |           0.00% |          11.11% |          25.00% |          35.00% |
| LAT          | reduce_proximity | percentile-90 |          15.00% |          45.00% |          55.00% |          80.00% |
| LAT          | reduce_proximity | average       |           4.43% |          11.23% |          17.32% |          21.45% |

</details>

### Impact on Search time

| dataset_name | host_name        |      25.00% |      50.00% |      75.00% |     100.00% | Average     |
|--------------|------------------|------------:|------------:|------------:|------------:|-------------|
| FBIS         | 1_4_0            |        3.45 | 7.446666667 | 9.773489933 | 9.620300752 | 7.572614338 |
| FBIS         | reduce_proximity | 2.983333333 | 5.316666667 | 6.911073826 | 7.637218045 | 5.712072968 |
| FR94         | 1_4_0            | 2.236666667 |        4.45 | 5.523489933 | 4.560150376 | 4.192576744 |
| FR94         | reduce_proximity |        2.09 | 3.991666667 | 4.981543624 | 4.266917293 | 3.832531896 |
| FT           | 1_4_0            | 5.956666667 | 9.656666667 | 13.86912752 | 10.83270677 |  10.0787919 |
| FT           | reduce_proximity |        4.51 | 5.981666667 | 7.701342282 | 6.766917293 |  6.23998156 |
| LAT          | 1_4_0            | 5.856666667 | 9.233333333 | 12.98322148 | 10.78759398 | 9.715203865 |
| LAT          | reduce_proximity |        6.91 | 6.706666667 | 8.463087248 | 8.265037594 | 7.586197877 |

## Technical approach

- Ensure the MAX_DISTANCE constant is used everywhere needed
- Reduce the MAX_DISTANCE from 8 to 4

## Related

TBD

Co-authored-by: ManyTheFish <many@meilisearch.com>
2023-10-18 14:56:08 +00:00
ManyTheFish
27eec21415 Fix tests 2023-10-18 16:03:22 +02:00
Vivek Kumar
d4da06ff47
fix bug where distinct search with no ranking returns offset+limit hits 2023-10-11 19:02:16 +05:30