Louis Dureuil
|
d4715e0c4d
|
Fix same vector sort bug
|
2023-12-14 16:08:41 +01:00 |
|
Louis Dureuil
|
11e2a2c1aa
|
Fix geosort bug
|
2023-12-14 16:08:41 +01:00 |
|
Louis Dureuil
|
65e49b7092
|
Remove stuff, add distribution shift (WIP)
|
2023-12-14 16:08:38 +01:00 |
|
Louis Dureuil
|
e56f160032
|
Actually pass embedders on reindex
|
2023-12-14 16:07:49 +01:00 |
|
Louis Dureuil
|
687d92f217
|
prompt bifluor+
|
2023-12-14 16:07:49 +01:00 |
|
Louis Dureuil
|
fb539f61fe
|
WIP
|
2023-12-14 16:07:49 +01:00 |
|
Louis Dureuil
|
cb4ebe163e
|
WIP
|
2023-12-14 16:07:49 +01:00 |
|
Louis Dureuil
|
dde3a04679
|
WIP arroy integration
|
2023-12-14 16:07:49 +01:00 |
|
Louis Dureuil
|
13c2c6c16b
|
Small commit to add hybrid search and autoembedding
|
2023-12-14 16:07:48 +01:00 |
|
Louis Dureuil
|
21bcf32109
|
Add candle and hg_hub, updating a lot of deps in the process
|
2023-12-14 16:07:48 +01:00 |
|
Clément Renault
|
56571f762a
|
Merge remote-tracking branch 'origin/main' into tmp-release-v1.5.1
|
2023-12-13 11:57:01 +01:00 |
|
ManyTheFish
|
467b49153d
|
Implement proximityPrecision setting on milli side
|
2023-12-06 15:49:02 +01:00 |
|
ManyTheFish
|
bddc168d83
|
List TODOs
|
2023-12-06 14:59:23 +01:00 |
|
ManyTheFish
|
3b3fa38f27
|
Put the restrict list in a sub-struct
|
2023-11-28 18:37:57 +01:00 |
|
Clément Renault
|
170e063b80
|
Remove the actix-web dependency from milli
|
2023-11-28 17:19:57 +01:00 |
|
ManyTheFish
|
d6c2ee15a9
|
Filter on attributes before computing the docids when attribute restriction is on
|
2023-11-28 14:55:29 +01:00 |
|
Clément Renault
|
ec9b52d608
|
Rename copy_to_path to copy_to_file
|
2023-11-28 14:32:30 +01:00 |
|
Clément Renault
|
34c67ac389
|
Remove the possibility to fail fetching the env info
|
2023-11-28 14:31:23 +01:00 |
|
Clément Renault
|
d050c9b4ae
|
Only remap the main database once
|
2023-11-28 14:27:30 +01:00 |
|
Clément Renault
|
7dd1226faf
|
Clarify an unreachable unwrap
|
2023-11-28 14:26:31 +01:00 |
|
Clément Renault
|
548c8247c2
|
Create and use real error types in the codecs
|
2023-11-28 10:11:17 +01:00 |
|
Clément Renault
|
d32eb11329
|
Move to the v0.20.0-alpha.9 of heed
|
2023-11-27 11:52:22 +01:00 |
|
Clément Renault
|
58dac8af42
|
Remove the panics and unwraps
|
2023-11-23 15:00:48 +01:00 |
|
Clément Renault
|
0dbf1a16ff
|
Make clippy happy
|
2023-11-23 14:11:38 +01:00 |
|
Clément Renault
|
462b4c0080
|
Fix the tests
|
2023-11-23 12:07:35 +01:00 |
|
Clément Renault
|
0d4482625a
|
Make the changes to use heed v0.20-alpha.6
|
2023-11-23 11:43:58 +01:00 |
|
Clément Renault
|
56a0d91ecd
|
Update the heed dependency and lock file
|
2023-11-22 15:11:09 +01:00 |
|
Clément Renault
|
7cb7e37ba8
|
Merge branch 'main' into tmp-release-v1.5.0
|
2023-11-21 16:30:46 +01:00 |
|
ManyTheFish
|
d3575fb028
|
Make into_del_add_obkv parameters more human readable
|
2023-11-20 16:10:39 +01:00 |
|
ManyTheFish
|
39cbb499c2
|
Small fixes
|
2023-11-20 10:20:39 +01:00 |
|
ManyTheFish
|
ebef6bc24d
|
Simplify documents database writing
|
2023-11-20 10:14:57 +01:00 |
|
ManyTheFish
|
d59b7db8d0
|
remove unused code
|
2023-11-20 10:10:45 +01:00 |
|
ManyTheFish
|
263e825619
|
Fix typos in comments
|
2023-11-20 10:06:29 +01:00 |
|
Many the fish
|
b0adc73ce6
|
Merge pull request #4207 from meilisearch/diff-indexing-prefix-databases
Diff indexing prefix databases
|
2023-11-14 16:04:05 +01:00 |
|
Louis Dureuil
|
772964125d
|
Factor removal of document from DB
|
2023-11-13 13:51:22 +01:00 |
|
Louis Dureuil
|
378deb0bef
|
Rename trait
|
2023-11-13 13:38:36 +01:00 |
|
ManyTheFish
|
1f36410541
|
Update tests
|
2023-11-13 13:36:39 +01:00 |
|
Louis Dureuil
|
8c649d8061
|
Throw error when the vector search is sent with the wrong size
|
2023-11-13 09:57:42 +01:00 |
|
Louis Dureuil
|
264b10ec20
|
Fixup documentation
|
2023-11-09 16:23:20 +01:00 |
|
Louis Dureuil
|
3053e01c05
|
Batch::remove_documents_from_db_no_batch
|
2023-11-09 14:23:02 +01:00 |
|
Louis Dureuil
|
b11c2afac0
|
Index::external_id_of
|
2023-11-09 14:22:43 +01:00 |
|
Louis Dureuil
|
9cef800b2a
|
Enrich uses the new type
|
2023-11-09 14:22:05 +01:00 |
|
Louis Dureuil
|
db2fb86b8b
|
Extract PrimaryKey logic to a type
|
2023-11-09 14:19:16 +01:00 |
|
ManyTheFish
|
882ab9cc85
|
remove warnings
|
2023-11-09 11:35:33 +01:00 |
|
ManyTheFish
|
5a9c96e1db
|
Compute word integer prefix cache
|
2023-11-09 11:34:26 +01:00 |
|
ManyTheFish
|
70ce40828c
|
Compute word docids prefix cache
|
2023-11-08 17:01:00 +01:00 |
|
ManyTheFish
|
688266c83e
|
Remove word pair proximity prefix cache and compute it at search time
|
2023-11-08 14:16:01 +01:00 |
|
ManyTheFish
|
6dab826908
|
Reactivate prefix databases
|
2023-11-08 13:58:01 +01:00 |
|
ManyTheFish
|
1e2fbc6a42
|
revert "REVERT ME: ignore prefix pair databases tests"
This reverts commit 1b2ea6cf19 .
|
2023-11-08 11:50:52 +01:00 |
|
Louis Dureuil
|
cbaa54cafd
|
Fix clippy issues
|
2023-11-06 11:19:31 +01:00 |
|
Louis Dureuil
|
1bccf2079e
|
Correctly mark non-tests as non-tests
|
2023-11-06 11:03:56 +01:00 |
|
ManyTheFish
|
1b2ea6cf19
|
REVERT ME: ignore prefix pair databases tests
|
2023-11-06 10:46:22 +01:00 |
|
Louis Dureuil
|
1ad1fcc8c8
|
Remove all warnings
|
2023-11-06 10:31:14 +01:00 |
|
ManyTheFish
|
87610a5f98
|
Don't try to delete a document that is not in the database
|
2023-11-02 16:49:03 +01:00 |
|
Clément Renault
|
ff522c919d
|
Fix the vector extractions for the diff indexing
|
2023-11-02 15:58:08 +01:00 |
|
ManyTheFish
|
bf0651f23c
|
Implement iter method on ExternalDocumentsIds
|
2023-11-02 15:38:00 +01:00 |
|
ManyTheFish
|
5b20e625f3
|
fix merge
|
2023-11-02 15:31:37 +01:00 |
|
ManyTheFish
|
bc51d6157a
|
Fix transform reindexing path
|
2023-11-02 15:26:20 +01:00 |
|
ManyTheFish
|
1b4ff991c0
|
update typed chunks
|
2023-11-02 15:26:20 +01:00 |
|
ManyTheFish
|
4b64c33aa2
|
update vector extractor
|
2023-11-02 15:26:20 +01:00 |
|
ManyTheFish
|
12323d610e
|
Change the original document sorter key from the internal docid to a concatenation of the internal and the external docid
|
2023-11-02 15:26:20 +01:00 |
|
Clément Renault
|
4d864f0702
|
Always sort internal Sorter entries in parallel
|
2023-11-02 14:47:43 +01:00 |
|
Clément Renault
|
b10c060bf7
|
Cleanup TOML
|
2023-11-01 14:03:04 +01:00 |
|
Clément Renault
|
c71b1d33ae
|
Sort entries using rayon in the transform sorters
|
2023-11-01 11:07:16 +01:00 |
|
Clément Renault
|
0fc446c62f
|
Add more timing logs to the Transform
|
2023-11-01 11:07:16 +01:00 |
|
Louis Dureuil
|
0fb6acefc3
|
Add snapshots for facets
|
2023-10-31 17:11:08 +01:00 |
|
Louis Dureuil
|
b1d1355b69
|
remove tests on soft-deleted
|
2023-10-31 16:36:27 +01:00 |
|
Louis Dureuil
|
f19332466e
|
Extract field value as values instead of Option<Value>
|
2023-10-31 16:36:27 +01:00 |
|
Louis Dureuil
|
03ddb4f310
|
use deladd in facet update tests
|
2023-10-31 16:36:27 +01:00 |
|
Louis Dureuil
|
c855cc2721
|
Remove unused test
|
2023-10-31 16:36:27 +01:00 |
|
Louis Dureuil
|
da0503ef80
|
Fix document count
|
2023-10-31 16:36:27 +01:00 |
|
ManyTheFish
|
94206b0055
|
Update tests
|
2023-10-31 13:48:47 +01:00 |
|
Louis Dureuil
|
b40253bf18
|
update snapshots
|
2023-10-31 10:30:48 +01:00 |
|
Louis Dureuil
|
d8bf3f3fc2
|
Remove unused snapshots
|
2023-10-31 10:12:49 +01:00 |
|
Louis Dureuil
|
9d59e8011a
|
fix some tests
|
2023-10-31 10:08:36 +01:00 |
|
Louis Dureuil
|
dad78cbf8d
|
Bulk facet remove deletes keys from DB when value empty
|
2023-10-31 09:53:55 +01:00 |
|
Louis Dureuil
|
4e91707a06
|
Rename test
|
2023-10-31 09:41:17 +01:00 |
|
Louis Dureuil
|
de10f20732
|
Fix field distribution again
|
2023-10-30 17:47:22 +01:00 |
|
Louis Dureuil
|
be395c7944
|
Change order of arguments to tokenizer_builder
|
2023-10-30 16:26:29 +01:00 |
|
Louis Dureuil
|
9fedd8101a
|
Fix tests
|
2023-10-30 15:11:07 +01:00 |
|
Louis Dureuil
|
54d07a8da3
|
Update field distribution taking into account both deletions and additions
|
2023-10-30 14:47:51 +01:00 |
|
Louis Dureuil
|
58690dfb19
|
Fix tests compilation after changes to ExternalDocumentsIds API
|
2023-10-30 13:34:07 +01:00 |
|
Louis Dureuil
|
abf424ebfc
|
Remove unused FromIterator
|
2023-10-30 11:41:56 +01:00 |
|
Clément Renault
|
dfab6293c9
|
Use an LMDB database to store the external documents ids
|
2023-10-30 11:41:23 +01:00 |
|
Louis Dureuil
|
fdf3f7f627
|
Fix facet distribution test
|
2023-10-30 11:41:23 +01:00 |
|
Louis Dureuil
|
6260cff65f
|
Actually delete documents from DB when the merge function says so
|
2023-10-30 11:41:22 +01:00 |
|
Louis Dureuil
|
8e0d9c9a5e
|
Recover delete_documents tests that were too eagerly deleted
|
2023-10-30 11:41:22 +01:00 |
|
Louis Dureuil
|
ae4ec8ea55
|
Add delete_document_using_wtxn to TempIndex
|
2023-10-30 11:41:22 +01:00 |
|
Louis Dureuil
|
9a2dccc3bc
|
Add iterator to find external ids of a bitmap of internal ids
|
2023-10-30 11:41:22 +01:00 |
|
Louis Dureuil
|
a35988550c
|
Fix some snapshots
|
2023-10-30 11:41:22 +01:00 |
|
Louis Dureuil
|
e78281785c
|
Actually execute the transform even if there are only documents to delete
|
2023-10-30 11:41:22 +01:00 |
|
Louis Dureuil
|
3c15881818
|
Add simple delete test
|
2023-10-30 11:41:22 +01:00 |
|
Louis Dureuil
|
73c06d31d9
|
snapshot always display stuff in consistent order
|
2023-10-30 11:41:22 +01:00 |
|
Louis Dureuil
|
290e773d23
|
remove more warnings and fix some tests
|
2023-10-30 11:41:22 +01:00 |
|
Louis Dureuil
|
fa6c7f65ca
|
Add TmpIndex::delete_documents
|
2023-10-30 11:41:22 +01:00 |
|
Louis Dureuil
|
113527f466
|
Remove soft-deleted related methods from Index
|
2023-10-30 11:41:22 +01:00 |
|
Louis Dureuil
|
c534a1b687
|
Stop using delete documents pipeline in batch runner
|
2023-10-30 11:41:22 +01:00 |
|
Louis Dureuil
|
2263dff02b
|
Stop using removed delete pipelines almost everywhere
|
2023-10-30 11:41:22 +01:00 |
|
Louis Dureuil
|
d651b3ef01
|
Remove delete documents files
|
2023-10-30 11:41:20 +01:00 |
|
ManyTheFish
|
762b0b47e6
|
Use deladd merging function in chunks mergers
|
2023-10-30 11:40:20 +01:00 |
|
Louis Dureuil
|
01d5eedf2f
|
Remove some warnings
|
2023-10-30 11:40:20 +01:00 |
|
Louis Dureuil
|
073f89db79
|
Fix facet tests
|
2023-10-30 11:40:20 +01:00 |
|
Louis Dureuil
|
8370fbc92b
|
Fix snaps
|
2023-10-30 11:40:20 +01:00 |
|
Louis Dureuil
|
85f42fbc03
|
Handle external to internal id mapping from TypedChunk::Documents
|
2023-10-30 11:40:20 +01:00 |
|
Louis Dureuil
|
c6b3c18c85
|
WIP: Comment out document deletion in other pipelines than update
TODO: fix calls to DELETE route
|
2023-10-30 11:40:20 +01:00 |
|
Louis Dureuil
|
bafeb892a7
|
Modify Index after changes to ExternalDocumentsIds
|
2023-10-30 11:40:20 +01:00 |
|
Louis Dureuil
|
8fb221dae3
|
Refactor ExternalDocumentsIds
- Remove soft deleted
- Add apply method that takes a list of operations to encapsulate modifications to the external -> internal mapping
|
2023-10-30 11:40:20 +01:00 |
|
Louis Dureuil
|
946c762d28
|
WIP: reset documents in TypedChunk::Documents
|
2023-10-30 11:40:20 +01:00 |
|
Louis Dureuil
|
cda6ca1ee6
|
Remove TypedChunk::NewDocumentIds
|
2023-10-30 11:40:18 +01:00 |
|
Louis Dureuil
|
696fcf4d18
|
Fix document insertion into LMDB
|
2023-10-30 11:39:31 +01:00 |
|
ManyTheFish
|
476e4d3dbe
|
Use value buffer instead of the initial value when writting the final result in the sorter
|
2023-10-30 11:39:31 +01:00 |
|
Clément Renault
|
576fa9c6da
|
Remove useless comment
|
2023-10-30 11:39:31 +01:00 |
|
Kerollmops
|
77dcbff6b2
|
Remove and Insert the DelAdd geo points
|
2023-10-30 11:39:31 +01:00 |
|
Kerollmops
|
544440c363
|
Ignore geo fields when the Del and Add content is the same
|
2023-10-30 11:39:31 +01:00 |
|
Clément Renault
|
a3dae4db9b
|
Extract the geo fields DelAdd and generate a new DelAdd obkv with it
|
2023-10-30 11:39:31 +01:00 |
|
ManyTheFish
|
ba90a5ec0e
|
update extract fid word count docids
|
2023-10-30 11:39:31 +01:00 |
|
Louis Dureuil
|
b26dc9aabe
|
Explanatory code comment
|
2023-10-30 11:39:31 +01:00 |
|
Louis Dureuil
|
66abac9364
|
Use specialized KvReaderDelAdd type
Co-authored-by: Clément Renault <clement@meilisearch.com>
|
2023-10-30 11:39:31 +01:00 |
|
Louis Dureuil
|
59f88c14b3
|
Simplify facet update after removing Index::faceted_documents_ids
|
2023-10-30 11:39:29 +01:00 |
|
Louis Dureuil
|
14832cb324
|
Remove Index::faceted_documents_ids
|
2023-10-30 11:37:32 +01:00 |
|
Louis Dureuil
|
04ec293024
|
Facet Incremental update
|
2023-10-30 11:37:30 +01:00 |
|
Louis Dureuil
|
f67ff3a738
|
Facets Bulk update
|
2023-10-30 11:36:40 +01:00 |
|
Clément Renault
|
560e8f5613
|
Introduce the CboRoaringBitmapCodec merge_deladd_into and use it
|
2023-10-30 11:34:55 +01:00 |
|
Clément Renault
|
2d3f15f82c
|
Introduce a function to only serialize the Add side of a DelAdd obkv
|
2023-10-30 11:34:55 +01:00 |
|
Clément Renault
|
40186bf403
|
Rename FieldIdWordCountDocids correctly
|
2023-10-30 11:34:50 +01:00 |
|
ManyTheFish
|
87e3d27878
|
update extract word pair proximity to support deladd obkvs
|
2023-10-30 11:34:02 +01:00 |
|
ManyTheFish
|
6bcf8b4f8c
|
update extract word position docids
|
2023-10-30 11:34:02 +01:00 |
|
ManyTheFish
|
46aa75abdb
|
update extract word docids
|
2023-10-30 11:34:02 +01:00 |
|
ManyTheFish
|
2597bbd107
|
Make script language docids map taking a tuple of roaring bitmaps expressing the deletions and the additions
|
2023-10-30 11:34:00 +01:00 |
|
Clément Renault
|
e2bc054604
|
Update extract_facet_string_docids to support deladd obkvs
|
2023-10-30 11:32:36 +01:00 |
|
Clément Renault
|
fcd3a1434d
|
Update extract_facet_number_docids to support deladd obkvs
|
2023-10-30 11:31:04 +01:00 |
|
Clément Renault
|
a82dee21e0
|
Rename docid_fid into fid_docid
|
2023-10-30 11:31:02 +01:00 |
|
Clément Renault
|
bc45c1206d
|
Implement all the facet extraction paths and simplify them
|
2023-10-30 11:29:08 +01:00 |
|
Clément Renault
|
6ae4100f07
|
Generate the DelAdd for is_null, is_empty, and exists
|
2023-10-30 11:29:08 +01:00 |
|
Clément Renault
|
0c47defeee
|
Work on fid docid facet values rewrite
|
2023-10-30 11:29:06 +01:00 |
|
ManyTheFish
|
313b16bec2
|
Support diff indexing on extract_docid_word_positions
|
2023-10-30 11:24:19 +01:00 |
|
ManyTheFish
|
1dd97578a8
|
Make the transform struct return diff-based documents obkvs
|
2023-10-30 11:22:07 +01:00 |
|
ManyTheFish
|
f5ef69293b
|
deactivate prefix dbs
|
2023-10-30 11:22:07 +01:00 |
|
ManyTheFish
|
1c5705c164
|
clean PR warnings
|
2023-10-30 11:22:05 +01:00 |
|
ManyTheFish
|
66c2c82a18
|
Split wpp in several sorters
|
2023-10-30 11:15:02 +01:00 |
|
ManyTheFish
|
28a8d0ccda
|
Fix word pair proximity
|
2023-10-30 11:15:02 +01:00 |
|
ManyTheFish
|
96be85396d
|
Use a vecDeque in wpp database
|
2023-10-30 11:15:02 +01:00 |
|
ManyTheFish
|
df9e5c8651
|
Generalize usage of CboRoaringBitmap codec to ease the use
|
2023-10-30 11:15:02 +01:00 |
|
ManyTheFish
|
b541d48847
|
Add buffer to the obkv writter
|
2023-10-30 11:15:02 +01:00 |
|
ManyTheFish
|
8ccf32d1a0
|
Compute word_fid_docids before word_docids and exact_word_docids
|
2023-10-30 11:15:02 +01:00 |
|
ManyTheFish
|
db1ca21231
|
add puffin in sorter into reeder function
|
2023-10-30 11:15:00 +01:00 |
|
ManyTheFish
|
11ea5acff9
|
Fix
|
2023-10-30 11:13:10 +01:00 |
|
ManyTheFish
|
8d77736a67
|
Fix fid_word_docids
|
2023-10-30 11:13:10 +01:00 |
|
ManyTheFish
|
748b333161
|
Add usefull debug assert before key insertion in database
|
2023-10-30 11:13:10 +01:00 |
|
ManyTheFish
|
17b647dfe5
|
Wip
|
2023-10-30 11:13:08 +01:00 |
|