Commit Graph

107 Commits

Author SHA1 Message Date
Louis Dureuil
48409c9183
Add missing exactness.matchingWords, exactness.maxMatchingWords 2023-07-04 16:31:01 +02:00
Louis Dureuil
324d448236
Format let-else ❤️ 🎉 2023-07-03 10:20:28 +02:00
Louis Dureuil
8939e85f60
Add rank_to_score for graph based ranking rules 2023-06-22 12:39:14 +02:00
Louis Dureuil
f050634b1e
add virtual conditions to fid and position to always have the max cost 2023-06-20 10:07:18 +02:00
Louis Dureuil
becf1f066a
Change how the cost of removing words is computed 2023-06-20 09:45:43 +02:00
Louis Dureuil
a20e4d447c
Position now takes into account the distance to the position of the word in the query
it used to be based on the distance to the position 0
2023-06-20 09:45:42 +02:00
Louis Dureuil
af57c3c577
Proximity costs 0 for documents that are perfectly matching 2023-06-20 09:45:42 +02:00
Loïc Lecrenier
2da86b31a6 Remove comments and add documentation 2023-06-14 12:39:42 +02:00
meili-bors[bot]
2e49d6aec1
Merge #3768
3768: Fix bugs in graph-based ranking rules + make `words` a graph-based ranking rule r=dureuill a=loiclec

This PR contains three changes:

## 1. Don't call the `words` ranking rule if the term matching strategy is `All`

This is because the purpose of `words` is only to remove nodes from the query graph. It would never do any useful work when the matching strategy was `All`. Remember that the universe was already computed before by computing all the docids corresponding to the "maximally reduced" query graph, which, in the case of `All`, is equal to the original graph.

## 2. The `words` ranking rule is replaced by a graph-based ranking rule. 

This is for three reasons:

1. **performance**: graph-based ranking rules benefit from a lot of optimisations by default, which ensures that they are never too slow. The previous implementation of `words` could call `compute_query_graph_docids` many times if some words had to be removed from the query, which would be quite expensive. I was especially worried about its performance in cases where it is placed right after the `sort` ranking rule. Furthermore, `compute_query_graph_docids` would clone a lot of bitmaps many times unnecessarily.

2. **consistency**: every other ranking rule (except `sort`) is graph-based. It makes sense to implement `words` like that as well. It will automatically benefit from all the features, optimisations, and bug fixes that all the other ranking rules get.

3. **surfacing bugs**: as the first ranking rule to be called (most of the time), I'd like `words` to behave the same as the other ranking rules so that we can quickly detect bugs in our graph algorithms. This actually already happened, which is why this PR also contains a bug fix.

## 3. Fix the `update_all_costs_before_nodes` function

It is a bit difficult to explain what was wrong, but I'll try. The bug happened when we had graphs like:
<img width="730" alt="Screenshot 2023-05-16 at 10 58 57" src="https://github.com/meilisearch/meilisearch/assets/6040237/40db1a68-d852-4e89-99d5-0d65757242a7">
and we gave the node `is` as argument.

Then, we'd walk backwards from the node breadth-first. We'd update the costs of:
1. `sun`
2. `thesun`
3. `start`
4. `the`

which is an incorrect order. The correct order is:

1. `sun`
2. `thesun`
3. `the`
4. `start`

That is, we can only update the cost of a node when all of its successors have either already been visited or were not affected by the update to the node passed as argument. To solve this bug, I factored out the graph-traversal logic into a `traverse_breadth_first_backward` function.


Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-05-23 13:28:08 +00:00
Louis Dureuil
51043f78f0
Remove trailing whitespace 2023-05-23 15:27:25 +02:00
Louis Dureuil
a490a11325
Add explanatory comment on the way we're recomputing costs 2023-05-23 15:24:24 +02:00
Loïc Lecrenier
ec8f685d84 Fix bug in cheapest path algorithm 2023-05-16 17:01:30 +02:00
Loïc Lecrenier
5758268866 Don't compute split_words for phrases 2023-05-16 17:01:18 +02:00
Loïc Lecrenier
f6524a6858 Adjust costs of edges in position ranking rule
To ensure good performance
2023-05-16 11:28:56 +02:00
Loïc Lecrenier
a37da36766 Implement words as a graph-based ranking rule and fix some bugs 2023-05-16 10:42:11 +02:00
Louis Dureuil
1aaf24ccbf
Cargo fmt 2023-05-03 12:21:58 +02:00
Loïc Lecrenier
1b514517f5 Fix bug in computation of query term at a position 2023-05-02 10:48:32 +02:00
Loïc Lecrenier
11f814821d Minor cleanup 2023-05-02 10:48:32 +02:00
Loïc Lecrenier
30fb1153cc Speed up graph based ranking rule when a lot of different costs exist 2023-05-02 09:59:42 +02:00
Loïc Lecrenier
3b2c8b9f25 Improve performance of position rr 2023-05-02 09:59:42 +02:00
Loïc Lecrenier
608ceea440 Fix bug in position rr 2023-05-02 09:59:42 +02:00
Loïc Lecrenier
79001b9c97 Improve performance of the cheapest path finder algorithm 2023-05-02 09:59:42 +02:00
Loïc Lecrenier
bc4efca611 Add more tests for the attribute ranking rule 2023-04-29 10:56:48 +02:00
Loïc Lecrenier
3421125a55 Prevent the exactness ranking rule from removing random words
Make it strictly follow the term matching strategy
2023-04-26 09:09:19 +02:00
Loïc Lecrenier
d3a94e8b25 Fix bugs and add tests to exactness ranking rule 2023-04-25 16:49:08 +02:00
Loïc Lecrenier
d1fdbb63da Make all search tests pass, fix distinctAttribute bug 2023-04-24 12:12:08 +02:00
Loïc Lecrenier
bd9aba4d77 Add "position" part of the attribute ranking rule 2023-04-13 10:46:09 +02:00
Kerollmops
d9cebff61c Add a simple test to check that attributes are ranking correctly 2023-04-13 08:27:09 +02:00
Loïc Lecrenier
30f7bd03f6 Fix compiler warning/errors caused by previous merge 2023-04-13 08:27:09 +02:00
Kerollmops
df0d9bb878 Introduce the attribute ranking rule in the list of ranking rules 2023-04-13 08:27:09 +02:00
Kerollmops
5230ddb3ea Resolve the attribute ranking rule conditions 2023-04-13 08:27:09 +02:00
Kerollmops
d6a7c28e4d Implement the attribute ranking rule edge computation 2023-04-13 08:27:09 +02:00
Kerollmops
e55efc419e Introduce a new cache for the words fids 2023-04-13 08:27:09 +02:00
Louis Dureuil
7a01f20df7 Use word_prefix_docids, make get_word_prefix_docids private 2023-04-12 16:45:38 +02:00
Louis Dureuil
5ab46324c4 Everyone uses the SearchContext::word_docids instead of get_db_word_docids
make get_db_word_docids private
2023-04-12 16:44:43 +02:00
Louis Dureuil
e7ff987c46 Update call sites 2023-04-12 16:36:38 +02:00
Loïc Lecrenier
1f813a6f3b Simplify implementation of the detailed (=visual) logger 2023-04-12 16:32:53 +02:00
Loïc Lecrenier
96183e804a Simplify the logger 2023-04-12 16:32:53 +02:00
Loïc Lecrenier
f7d90ad19f Merge remote-tracking branch 'origin/search-refactor-tests-doc' into search-refactor 2023-04-07 10:13:18 +02:00
Louis Dureuil
31630c85d0 exactness graph rr: Add important TODO/FIXME after review 2023-04-06 17:50:39 +02:00
Louis Dureuil
90a6c01495 Use correct codec in proximity 2023-04-06 17:50:39 +02:00
Louis Dureuil
e58426109a Fix panics and issues in exactness graph ranking rule 2023-04-06 17:50:39 +02:00
Louis Dureuil
8a13ed7e3f Add exactness ranking rules 2023-04-06 17:50:39 +02:00
Loïc Lecrenier
7ca91ebb71 Merge branch 'search-refactor-exactness' into search-refactor-tests-doc 2023-04-06 15:16:35 +02:00
Louis Dureuil
d1ddaa223d
Use correct codec in proximity 2023-04-05 18:14:00 +02:00
Louis Dureuil
f7ecea142e
Fix panics and issues in exactness graph ranking rule 2023-04-05 18:13:46 +02:00
Louis Dureuil
4b4ffb8ec9
Add exactness ranking rules 2023-04-04 17:12:07 +02:00
Loïc Lecrenier
b439d36807 Split query_term module into multiple submodules 2023-04-04 15:38:30 +02:00
Loïc Lecrenier
aa9592455c Refactor the paths_of_cost algorithm
Support conditions that require certain nodes to be skipped
2023-03-30 12:11:11 +02:00
Loïc Lecrenier
01e24dd630 Rewrite proximity ranking rule 2023-03-30 11:59:06 +02:00