Loïc Lecrenier
|
d6585eb10b
|
Avoid splitting ngrams into their original component words
|
2023-04-07 10:13:49 +02:00 |
|
Loïc Lecrenier
|
f7d90ad19f
|
Merge remote-tracking branch 'origin/search-refactor-tests-doc' into search-refactor
|
2023-04-07 10:13:18 +02:00 |
|
Louis Dureuil
|
31630c85d0
|
exactness graph rr: Add important TODO/FIXME after review
|
2023-04-06 17:50:39 +02:00 |
|
Louis Dureuil
|
ab09dc0167
|
exact_attributes: Add TODOs and additional check after review
|
2023-04-06 17:50:39 +02:00 |
|
Louis Dureuil
|
618c54915d
|
exact_attribute: dedup nodes after sorting them
|
2023-04-06 17:50:39 +02:00 |
|
Louis Dureuil
|
90a6c01495
|
Use correct codec in proximity
|
2023-04-06 17:50:39 +02:00 |
|
Louis Dureuil
|
e58426109a
|
Fix panics and issues in exactness graph ranking rule
|
2023-04-06 17:50:39 +02:00 |
|
Louis Dureuil
|
f513cf930a
|
Exact attribute with state
|
2023-04-06 17:50:39 +02:00 |
|
Louis Dureuil
|
8a13ed7e3f
|
Add exactness ranking rules
|
2023-04-06 17:50:39 +02:00 |
|
Louis Dureuil
|
1b8e4d0301
|
Add ExactTerm and helper method
|
2023-04-06 17:50:39 +02:00 |
|
Louis Dureuil
|
996619b22a
|
Increase position by 8 on hard separator when building query terms
|
2023-04-06 17:50:39 +02:00 |
|
Louis Dureuil
|
2c9822a337
|
Rename is_multiple_words to is_ngram and zero_typo to exact
|
2023-04-06 17:50:39 +02:00 |
|
Louis Dureuil
|
7276deee0a
|
Add new db caches
|
2023-04-06 17:50:39 +02:00 |
|
ManyTheFish
|
f7e7f438f8
|
Patch prefix match
|
2023-04-06 17:22:31 +02:00 |
|
ManyTheFish
|
ba8dcc2d78
|
Fix clippy
|
2023-04-06 15:50:47 +02:00 |
|
Loïc Lecrenier
|
7ca91ebb71
|
Merge branch 'search-refactor-exactness' into search-refactor-tests-doc
|
2023-04-06 15:16:35 +02:00 |
|
ManyTheFish
|
47f6a3ad3d
|
Take into account that a logger need the search context
|
2023-04-06 15:02:23 +02:00 |
|
ManyTheFish
|
ae17c62e24
|
Remove warnings
|
2023-04-06 14:07:18 +02:00 |
|
ManyTheFish
|
9c5f64769a
|
Integrate the new Highlighter in the search
|
2023-04-06 13:58:56 +02:00 |
|
ManyTheFish
|
ebe23b04c9
|
Make the matcher consume the search context
|
2023-04-06 12:28:28 +02:00 |
|
ManyTheFish
|
13b7c826c1
|
add new highlighter
|
2023-04-06 12:15:37 +02:00 |
|
Louis Dureuil
|
d1ddaa223d
|
Use correct codec in proximity
|
2023-04-05 18:14:00 +02:00 |
|
Louis Dureuil
|
f7ecea142e
|
Fix panics and issues in exactness graph ranking rule
|
2023-04-05 18:13:46 +02:00 |
|
Louis Dureuil
|
337e75b0e4
|
Exact attribute with state
|
2023-04-05 18:12:46 +02:00 |
|
Loïc Lecrenier
|
b5691802a3
|
Add new tests and fix construction of query graph from paths
|
2023-04-05 16:31:10 +02:00 |
|
Loïc Lecrenier
|
6e50f23896
|
Add more search tests
|
2023-04-05 13:33:23 +02:00 |
|
Loïc Lecrenier
|
4c8a0179ba
|
Add more search tests
|
2023-04-05 11:30:49 +02:00 |
|
Loïc Lecrenier
|
c69cbec64a
|
Add more search tests
|
2023-04-05 11:20:04 +02:00 |
|
Loïc Lecrenier
|
ce328c329d
|
Move bucket sort function to its own module and fix a bug
|
2023-04-04 18:03:08 +02:00 |
|
Loïc Lecrenier
|
959e4607bb
|
Add more search tests
|
2023-04-04 18:02:46 +02:00 |
|
Louis Dureuil
|
4b4ffb8ec9
|
Add exactness ranking rules
|
2023-04-04 17:12:07 +02:00 |
|
Louis Dureuil
|
3951fe22ab
|
Add ExactTerm and helper method
|
2023-04-04 17:09:32 +02:00 |
|
Louis Dureuil
|
4d5bc9df4c
|
Increase position by 8 on hard separator when building query terms
|
2023-04-04 17:07:26 +02:00 |
|
Louis Dureuil
|
ec2f8e8040
|
Rename is_multiple_words to is_ngram and zero_typo to exact
|
2023-04-04 17:06:07 +02:00 |
|
Louis Dureuil
|
406b8bd248
|
Add new db caches
|
2023-04-04 17:04:46 +02:00 |
|
Loïc Lecrenier
|
62b9c6fbee
|
Add search tests
|
2023-04-04 16:18:22 +02:00 |
|
Loïc Lecrenier
|
b439d36807
|
Split query_term module into multiple submodules
|
2023-04-04 15:38:30 +02:00 |
|
Loïc Lecrenier
|
faceb661e3
|
Add note that a part of the code needs fixing
|
2023-04-04 15:02:01 +02:00 |
|
Loïc Lecrenier
|
4129d657e2
|
Simplify query_term module a bit
|
2023-04-04 15:01:42 +02:00 |
|
Loïc Lecrenier
|
3f13608002
|
Fix computation of ngram derivations
|
2023-04-03 15:27:49 +02:00 |
|
Loïc Lecrenier
|
4708d9b016
|
Fix compiler warnings/errors
|
2023-04-03 10:09:27 +02:00 |
|
Clément Renault
|
0d2e7bcc13
|
Implement the previous way for the exhaustive distinct candidates
|
2023-04-03 10:08:10 +02:00 |
|
Loïc Lecrenier
|
55fbfb6124
|
Merge branch 'search-refactor-located-query-terms' into search-refactor
|
2023-04-03 10:04:36 +02:00 |
|
Loïc Lecrenier
|
58fe260c72
|
Allow removing all the terms from a query if it contains a phrase
|
2023-04-03 09:18:02 +02:00 |
|
Loïc Lecrenier
|
24e5f6f7a9
|
Don't remove phrases with "last" term matching strategy
|
2023-04-03 09:17:33 +02:00 |
|
Louis Dureuil
|
9b87c36200
|
Limit the number of derivations for a single word.
|
2023-03-31 09:19:18 +02:00 |
|
Loïc Lecrenier
|
12b26cd54e
|
Don't remove phrases from the query with term matching strategy Last
|
2023-03-30 14:54:08 +02:00 |
|
Loïc Lecrenier
|
061b1e6d7c
|
Tiny refactor of query graph remove_nodes method
|
2023-03-30 14:49:25 +02:00 |
|
Loïc Lecrenier
|
0d6e8b5c31
|
Fix phrase search bug when the phrase has only one word
|
2023-03-30 14:48:12 +02:00 |
|
Loïc Lecrenier
|
d48cdc67a0
|
Fix term matching strategy bugs
|
2023-03-30 14:01:52 +02:00 |
|
Loïc Lecrenier
|
35c16ad047
|
Use new term matching strategy logic in words ranking rule
|
2023-03-30 13:15:43 +02:00 |
|
Loïc Lecrenier
|
2997d1f186
|
Use new term matching strategy logic in resolve_maximally_reduced_...
|
2023-03-30 13:12:51 +02:00 |
|
Loïc Lecrenier
|
2a5997fb20
|
Avoid expensive assert! in bucket sort function
|
2023-03-30 13:07:17 +02:00 |
|
Loïc Lecrenier
|
ee8a9e0bad
|
Remove outdated sentence in documentation
|
2023-03-30 12:22:24 +02:00 |
|
Loïc Lecrenier
|
3b0737a092
|
Fix detailed logger
|
2023-03-30 12:20:44 +02:00 |
|
Loïc Lecrenier
|
fdd02105ac
|
Graph-based ranking rule + term matching strategy support
|
2023-03-30 12:19:21 +02:00 |
|
Loïc Lecrenier
|
aa9592455c
|
Refactor the paths_of_cost algorithm
Support conditions that require certain nodes to be skipped
|
2023-03-30 12:11:11 +02:00 |
|
Loïc Lecrenier
|
01e24dd630
|
Rewrite proximity ranking rule
|
2023-03-30 11:59:06 +02:00 |
|
Loïc Lecrenier
|
ae6bb1ce17
|
Update the ConditionDocidsCache after change to RankingRuleGraphTrait
|
2023-03-30 11:41:20 +02:00 |
|
Loïc Lecrenier
|
5fd28620cd
|
Build ranking rule graph correctly after changes to trait definition
|
2023-03-30 11:32:55 +02:00 |
|
Loïc Lecrenier
|
728710d63a
|
Update typo ranking rule to use new query term structure
|
2023-03-30 11:32:19 +02:00 |
|
Loïc Lecrenier
|
fa81381865
|
Update the trait requirements of ranking-rule graphs
|
2023-03-30 11:19:45 +02:00 |
|
Loïc Lecrenier
|
b96a682f16
|
Update resolve_graph module to work with lazy query terms
|
2023-03-30 11:10:38 +02:00 |
|
Loïc Lecrenier
|
d0f048c068
|
Simplify the API of the DatabaseCache
|
2023-03-30 11:08:17 +02:00 |
|
Loïc Lecrenier
|
223e82a10d
|
Update QueryGraph to use new lazy query terms + build from paths
|
2023-03-30 11:06:02 +02:00 |
|
Loïc Lecrenier
|
9507ff5e31
|
Update query term structure to allow for laziness
|
2023-03-30 11:06:02 +02:00 |
|
Louis Dureuil
|
c2b025946a
|
located_query_terms_from_string : use u16 for positions, hard limit number of iterated tokens.
- Refactor phrase logic to reduce number of possible states
|
2023-03-30 11:04:14 +02:00 |
|
Loïc Lecrenier
|
3a818c5e87
|
Add more functionality to interners
|
2023-03-30 09:56:23 +02:00 |
|
Louis Dureuil
|
d74134ce3a
|
Check sort criteria
|
2023-03-29 15:21:54 +02:00 |
|
Louis Dureuil
|
5ac129bfa1
|
Mark geosearch as currently unimplemented for sort rule
|
2023-03-29 15:20:42 +02:00 |
|
Louis Dureuil
|
abb4522f76
|
Small comment on ignored rules for placeholder search
|
2023-03-29 09:11:06 +02:00 |
|
Louis Dureuil
|
ef084ef042
|
SmallBitmap: Consistently panic on incoherent universe lengths
|
2023-03-29 08:45:38 +02:00 |
|
Louis Dureuil
|
3524bd1257
|
SmallBitmap: Add documentation
|
2023-03-29 08:44:11 +02:00 |
|
Louis Dureuil
|
d4f6216966
|
Resolve rule time sort criteria
|
2023-03-28 16:42:02 +02:00 |
|
Louis Dureuil
|
77acafe534
|
Resolve search time sort criteria for placeholder search
|
2023-03-28 16:41:03 +02:00 |
|
Louis Dureuil
|
abb19d368d
|
Initialize query time ranking rule for query search
|
2023-03-28 12:40:52 +02:00 |
|
Louis Dureuil
|
b4a52a622e
|
BoxRankingRule
|
2023-03-28 12:39:42 +02:00 |
|
Louis Dureuil
|
e9eb271499
|
Remove empty attribute_rule mod
|
2023-03-27 11:08:03 +02:00 |
|
Louis Dureuil
|
3281a88d08
|
SmallBitmap: don't expose internal items
|
2023-03-27 11:04:43 +02:00 |
|
Louis Dureuil
|
5a644054ab
|
Removed unused search impl
|
2023-03-27 11:04:27 +02:00 |
|
Louis Dureuil
|
16fefd364e
|
Add TODO notes
|
2023-03-27 11:04:04 +02:00 |
|
Loïc Lecrenier
|
00bad8c716
|
Add comments suggesting performance improvements
|
2023-03-23 10:18:24 +01:00 |
|
Loïc Lecrenier
|
7169d85115
|
Remove old query_tree code and make clippy happy
|
2023-03-23 09:39:16 +01:00 |
|
Loïc Lecrenier
|
f5f5f03ec0
|
Remove old criteria code
|
2023-03-23 09:35:53 +01:00 |
|
Loïc Lecrenier
|
56b7209f26
|
Make clippy happy
|
2023-03-23 09:16:17 +01:00 |
|
Loïc Lecrenier
|
9b1f439a91
|
WIP
|
2023-03-23 09:12:35 +01:00 |
|
Loïc Lecrenier
|
a86aeba411
|
WIP
|
2023-03-22 14:43:08 +01:00 |
|
Loïc Lecrenier
|
384fdc2df4
|
Fix two bugs in proximity ranking rule
|
2023-03-21 11:43:25 +01:00 |
|
Loïc Lecrenier
|
83e5b4ed0d
|
Compute edges of proximity graph lazily
|
2023-03-21 10:44:40 +01:00 |
|
Loïc Lecrenier
|
272cd7ebbd
|
Small cleanup
|
2023-03-20 13:39:19 +01:00 |
|
Loïc Lecrenier
|
c63c7377e6
|
Switch order of MappedInterner generic params
|
2023-03-20 09:41:56 +01:00 |
|
Loïc Lecrenier
|
5b50e49522
|
cargo fmt
|
2023-03-20 09:41:56 +01:00 |
|
Loïc Lecrenier
|
65474c8de5
|
Update new sort ranking rule after rebasing
|
2023-03-20 09:41:56 +01:00 |
|
Loïc Lecrenier
|
fbb1ba3de0
|
Cargo fmt
|
2023-03-20 09:41:56 +01:00 |
|
Loïc Lecrenier
|
a59ca28e2c
|
Add forgotten file
|
2023-03-20 09:41:56 +01:00 |
|
Loïc Lecrenier
|
825f742000
|
Simplify graph-based ranking rule impl
|
2023-03-20 09:41:56 +01:00 |
|
Loïc Lecrenier
|
dd491320e5
|
Simplify graph-based ranking rule impl
|
2023-03-20 09:41:56 +01:00 |
|
Loïc Lecrenier
|
c6ff97a220
|
Rewrite the dead-ends cache to detect more dead-ends
|
2023-03-20 09:41:56 +01:00 |
|
Loïc Lecrenier
|
49240c367a
|
Fix bug in cost of typo conditions
|
2023-03-20 09:41:56 +01:00 |
|
Loïc Lecrenier
|
1e6e624078
|
Fix bug in SmallBitmap
|
2023-03-20 09:41:56 +01:00 |
|
Loïc Lecrenier
|
8b4e07e1a3
|
WIP
|
2023-03-20 09:41:56 +01:00 |
|
Loïc Lecrenier
|
2853009987
|
Renaming Edge -> Condition
|
2023-03-20 09:41:56 +01:00 |
|
Loïc Lecrenier
|
aa59c3bc2c
|
Replace EdgeCondition with an Option<..> + other code cleanup
|
2023-03-20 09:41:56 +01:00 |
|
Loïc Lecrenier
|
7b1d8f4c6d
|
Make PathSet strongly typed
|
2023-03-20 09:41:56 +01:00 |
|
Loïc Lecrenier
|
a49ddec9df
|
Prune the query graph after executing a ranking rule
|
2023-03-20 09:41:56 +01:00 |
|
Loïc Lecrenier
|
05fe856e6e
|
Merge forward and backward proximity conditions in proximity graph
|
2023-03-20 09:41:56 +01:00 |
|
Loïc Lecrenier
|
c0cdaf9f53
|
Fix bug in the proximity ranking rule for queries with ngrams
|
2023-03-20 09:41:56 +01:00 |
|
Loïc Lecrenier
|
e9cf58d584
|
Refactor of the Interner
|
2023-03-20 09:41:56 +01:00 |
|
Loïc Lecrenier
|
31628c5cd4
|
Merge Phrase and WordDerivations into one structure
|
2023-03-20 09:41:56 +01:00 |
|
Loïc Lecrenier
|
3004e281d7
|
Support ngram typos + splitwords and splitwords+synonyms in proximity
|
2023-03-20 09:41:56 +01:00 |
|
Loïc Lecrenier
|
14e8d0aaa2
|
Rename lifetime
|
2023-03-20 09:41:56 +01:00 |
|
Loïc Lecrenier
|
1c58cf8426
|
Intern ranking rule graph edge conditions as well
|
2023-03-20 09:41:56 +01:00 |
|
Loïc Lecrenier
|
5155fd2bf1
|
Reorganise initialisation of ranking rules + rename PathsMap -> PathSet
|
2023-03-20 09:41:56 +01:00 |
|
Loïc Lecrenier
|
9ec9c204d3
|
Small code cleanup
|
2023-03-20 09:41:56 +01:00 |
|
Loïc Lecrenier
|
78b9304d52
|
Implement distinct attribute
|
2023-03-20 09:41:56 +01:00 |
|
Loïc Lecrenier
|
0465ba4a05
|
Intern more values
|
2023-03-20 09:41:56 +01:00 |
|
Loïc Lecrenier
|
2099991dd1
|
Continue documenting and cleaning up the code
|
2023-03-20 09:41:56 +01:00 |
|
Loïc Lecrenier
|
c232cdabf5
|
Add documentation
|
2023-03-20 09:41:56 +01:00 |
|
Loïc Lecrenier
|
4e266211bf
|
Small code reorganisation
|
2023-03-20 09:41:56 +01:00 |
|
Loïc Lecrenier
|
57fa689131
|
Cargo fmt
|
2023-03-20 09:41:56 +01:00 |
|
Loïc Lecrenier
|
10626dddfc
|
Add a few more optimisations to new search algorithms
|
2023-03-20 09:41:56 +01:00 |
|
Loïc Lecrenier
|
9051065c22
|
Apply a few optimisations for graph-based ranking rules
|
2023-03-20 09:41:56 +01:00 |
|
Loïc Lecrenier
|
e8c76cf7bf
|
Intern all strings and phrases in the search logic
|
2023-03-20 09:41:56 +01:00 |
|
Loïc Lecrenier
|
3f1729a17f
|
Update new search test
|
2023-03-20 09:41:56 +01:00 |
|
Loïc Lecrenier
|
cab2b6bcda
|
Fix: computation of initial universe, code organisation
|
2023-03-20 09:41:56 +01:00 |
|
Loïc Lecrenier
|
c4979a2fda
|
Fix code visibility issue + unimplemented detail in proximity rule
|
2023-03-20 09:41:56 +01:00 |
|
Loïc Lecrenier
|
23931f8a4f
|
Fix small bug in visual logger of search algo
|
2023-03-20 09:41:56 +01:00 |
|
Loïc Lecrenier
|
aa414565bb
|
Fix proximity graph edge builder to include all proximities
|
2023-03-20 09:41:56 +01:00 |
|
Loïc Lecrenier
|
1db152046e
|
WIP on split words and synonyms support
|
2023-03-20 09:41:56 +01:00 |
|
Loïc Lecrenier
|
c27ea2677f
|
Rewrite cheapest path algorithm and empty path cache
It is now much simpler and has much better performance.
|
2023-03-20 09:41:56 +01:00 |
|
Loïc Lecrenier
|
caa1e1b923
|
Add typo ranking rule to new search impl
|
2023-03-20 09:41:56 +01:00 |
|
Loïc Lecrenier
|
71f18e4379
|
Add sort ranking rule to new search impl
|
2023-03-20 09:41:56 +01:00 |
|
Loïc Lecrenier
|
600e3dd1c5
|
Remove warnings
|
2023-03-20 09:41:56 +01:00 |
|
Loïc Lecrenier
|
362eb0de86
|
Add support for filters
|
2023-03-20 09:41:56 +01:00 |
|
Loïc Lecrenier
|
998d46ac10
|
Add support for search offset and limit
|
2023-03-20 09:41:56 +01:00 |
|
Loïc Lecrenier
|
6c85c0d95e
|
Fix more bugs + visual empty path cache logging
|
2023-03-20 09:41:56 +01:00 |
|
Loïc Lecrenier
|
0e1fbbf7c6
|
Fix bugs in query graph's "remove word" and "cheapest paths" algos
|
2023-03-20 09:41:56 +01:00 |
|
Loïc Lecrenier
|
6806640ef0
|
Fix d2 description of paths map
|
2023-03-20 09:41:56 +01:00 |
|
Loïc Lecrenier
|
173e37584c
|
Improve the visual/detailed search logger
|
2023-03-20 09:41:55 +01:00 |
|
Loïc Lecrenier
|
6ba4d5e987
|
Add a search logger
|
2023-03-20 09:41:55 +01:00 |
|
Loïc Lecrenier
|
dd12d44134
|
Support swapped word pairs in new proximity ranking rule impl
|
2023-03-20 09:41:55 +01:00 |
|
Loïc Lecrenier
|
c8e251bf24
|
Remove noise in codebase
|
2023-03-20 09:41:55 +01:00 |
|
Loïc Lecrenier
|
a938fbde4a
|
Use a cache when resolving the query graph
|
2023-03-20 09:41:55 +01:00 |
|
Loïc Lecrenier
|
dcf3f1d18a
|
Remove EdgeIndex and NodeIndex types, prefer u32 instead
|
2023-03-20 09:41:55 +01:00 |
|
Loïc Lecrenier
|
66d0c63694
|
Add some documentation and use bitmaps instead of hashmaps when possible
|
2023-03-20 09:41:55 +01:00 |
|
Loïc Lecrenier
|
132191360b
|
Introduce the sort ranking rule working with the new search structures
|
2023-03-20 09:41:55 +01:00 |
|
Loïc Lecrenier
|
345c99d5bd
|
Introduce the words ranking rule working with the new search structures
|
2023-03-20 09:41:55 +01:00 |
|
Loïc Lecrenier
|
89d696c1e3
|
Introduce the proximity ranking rule as a graph-based ranking rule
|
2023-03-20 09:41:55 +01:00 |
|
Loïc Lecrenier
|
c645853529
|
Introduce a generic graph-based ranking rule
|
2023-03-20 09:41:55 +01:00 |
|
Loïc Lecrenier
|
a70ab8b072
|
Introduce a function to find the K shortest paths in a graph
|
2023-03-20 09:41:55 +01:00 |
|
Loïc Lecrenier
|
48aae76b15
|
Introduce a function to find the docids of a set of paths in a graph
|
2023-03-20 09:41:55 +01:00 |
|
Loïc Lecrenier
|
23bf572dea
|
Introduce cache structures used with ranking rule graphs
|
2023-03-20 09:41:55 +01:00 |
|
Loïc Lecrenier
|
864f6410ed
|
Introduce a structure to represent a set of graph paths efficiently
|
2023-03-20 09:41:55 +01:00 |
|
Loïc Lecrenier
|
c9bf6bb2fa
|
Introduce a structure to implement ranking rules with graph algorithms
|
2023-03-20 09:41:55 +01:00 |
|
Loïc Lecrenier
|
46249ea901
|
Implement a function to find a QueryGraph's docids
|
2023-03-20 09:41:55 +01:00 |
|
Loïc Lecrenier
|
ce0d1e0e13
|
Introduce a common way to manage the coordination between ranking rules
|
2023-03-20 09:41:55 +01:00 |
|
Loïc Lecrenier
|
5065d8b0c1
|
Introduce a DatabaseCache to memorize the addresses of LMDB values
|
2023-03-20 09:41:55 +01:00 |
|
Loïc Lecrenier
|
a83007c013
|
Introduce structure to represent search queries as graphs
|
2023-03-20 09:41:55 +01:00 |
|
Loïc Lecrenier
|
79e0a6dd4e
|
Introduce a new search module, eventually meant to replace the old one
The code here does not compile, because I am merely splitting one giant
commit into smaller ones where each commit explains a single file.
|
2023-03-20 09:41:55 +01:00 |
|