Clément Renault
|
ff5d3b59f5
|
Move the document id extraction to the primary key code
|
2024-09-12 12:01:42 +02:00 |
|
ManyTheFish
|
aa69308e45
|
Use a bufWriter to build word FSTs
|
2024-09-12 11:48:00 +02:00 |
|
ManyTheFish
|
eb9a20ff0b
|
Fix fid_word_docids extraction
|
2024-09-12 11:08:18 +02:00 |
|
Clément Renault
|
3e9198ebaa
|
Support guessing primary key again
|
2024-09-11 17:25:40 +02:00 |
|
Clément Renault
|
2a0ad0982f
|
Fix the document counter
|
2024-09-11 15:59:36 +02:00 |
|
ManyTheFish
|
2b317c681b
|
Build mergers in parallel
|
2024-09-11 11:49:26 +02:00 |
|
ManyTheFish
|
39b5990f64
|
Mutualize tokenization
|
2024-09-11 10:22:38 +02:00 |
|
Clément Renault
|
8287c2644f
|
Support CSV again
|
2024-09-10 21:10:28 +01:00 |
|
Clément Renault
|
c1c44a0b81
|
Impl serialize on TopLevelMap
|
2024-09-10 19:32:03 +01:00 |
|
Clément Renault
|
04596f3616
|
Move the TopLevelMap into a dedicated module
|
2024-09-10 18:01:17 +01:00 |
|
Clément Renault
|
24cb5839ad
|
Move the document changes sorting logic to a new trait
|
2024-09-10 17:37:52 +01:00 |
|
ManyTheFish
|
f69688e8f7
|
Fix several warnings in extractors and remove unreachable macros
|
2024-09-09 14:52:50 +02:00 |
|
Clément Renault
|
8fd0afaaaa
|
Make sure we iterate over the payload documents in order
|
2024-09-06 08:09:08 +02:00 |
|
Clément Renault
|
72c6a21a30
|
Use raw JSON to read the payloads
|
2024-09-05 20:08:23 +02:00 |
|
Clément Renault
|
8412be4a7d
|
Cleanup CowStr and TopLevelMap struct
|
2024-09-05 18:32:55 +02:00 |
|
Louis Dureuil
|
10f09c531f
|
add some commented code to read from json with raw values
|
2024-09-05 18:22:16 +02:00 |
|
ManyTheFish
|
8fd99b111b
|
Add tracing timers logs
|
2024-09-05 18:00:22 +02:00 |
|
Clément Renault
|
f6b3d1f9a5
|
Increase some channel sizes
|
2024-09-05 15:12:07 +02:00 |
|
Clément Renault
|
73ce67862d
|
Use the word pair proximity and fid word count docids extractors
Co-authored-by: ManyTheFish <many@meilisearch.com>
|
2024-09-05 10:56:22 +02:00 |
|
Clément Renault
|
0fc02f7351
|
Move the facet extraction to dedicated modules
|
2024-09-05 10:32:27 +02:00 |
|
ManyTheFish
|
34f11e3380
|
Implement word count and word pair proximity extractors
|
2024-09-05 10:30:39 +02:00 |
|
Clément Renault
|
27308eaab1
|
Import the facet extractors
|
2024-09-04 17:58:15 +02:00 |
|
Clément Renault
|
b33ec9ba3f
|
Introduce the FieldIdFacetIsNullDocidsExtractor
|
2024-09-04 17:50:08 +02:00 |
|
Clément Renault
|
9c0a1cd9fd
|
Introduce the FieldIdFacetExistsDocidsExtractor
|
2024-09-04 17:48:49 +02:00 |
|
Clément Renault
|
0b061f1e70
|
Introduce the FieldIdFacetIsEmptyDocidsExtractor
|
2024-09-04 17:40:24 +02:00 |
|
Clément Renault
|
19d937ab21
|
Introduce the facet extractors
|
2024-09-04 17:03:54 +02:00 |
|
Clément Renault
|
1d59c19cd2
|
Send the WordsFst by using an Mmap
|
2024-09-04 14:30:09 +02:00 |
|
Clément Renault
|
98e48371c3
|
Factorize some stuff
|
2024-09-04 12:17:13 +02:00 |
|
Clément Renault
|
6d74fb0229
|
Introduce the WordFidWordDocids database
|
2024-09-04 11:40:55 +02:00 |
|
ManyTheFish
|
1eb75a1040
|
remove milli/src/update/new/extract/tokenize_document.rs
|
2024-09-04 11:40:26 +02:00 |
|
Clément Renault
|
3b82d8b5b9
|
Fix the cache to serialize entries correctly
|
2024-09-04 10:55:36 +02:00 |
|
ManyTheFish
|
781a186f75
|
remove milli/src/update/new/extract/extract_word_docids.rs
|
2024-09-04 10:28:31 +02:00 |
|
ManyTheFish
|
6a399556b5
|
Implement more searchable extractor
|
2024-09-04 10:20:18 +02:00 |
|
Clément Renault
|
27b4cab857
|
Extract and write the documents and words fst in the database
|
2024-09-04 09:59:19 +02:00 |
|
Clément Renault
|
52d32b4ee9
|
Move the channel sender in the closure to stop the merger thread
|
2024-09-03 16:08:33 +02:00 |
|
ManyTheFish
|
da61408e52
|
Remove unimplemented from document changes
|
2024-09-03 15:14:16 +02:00 |
|
ManyTheFish
|
fe69385bd7
|
Fix tokenizer test
|
2024-09-03 14:24:37 +02:00 |
|
Clément Renault
|
c1557734dc
|
Use the GlobalFieldsIdsMap everywhere and write it to disk
Co-authored-by: Dureuill <louis@meilisearch.com>
Co-authored-by: ManyTheFish <many@meilisearch.com>
|
2024-09-03 12:01:01 +02:00 |
|
ManyTheFish
|
c50d3edc4a
|
Integrate first searchable exctrator
|
2024-09-03 11:02:39 +02:00 |
|
Clément Renault
|
5369bf4a62
|
Change some lifetimes
|
2024-09-02 19:51:22 +02:00 |
|
Clément Renault
|
bcb1aa3d22
|
Find a temporary solution to par into iter on an HashMap
Spoiler: Do not use an HashMap but drain it into a Vec
|
2024-09-02 19:39:48 +02:00 |
|
Clément Renault
|
9b7858fb90
|
Expose the new indexer
|
2024-09-02 15:21:59 +02:00 |
|
Clément Renault
|
ab01679a8f
|
Remove the useless option from the document changes
|
2024-09-02 15:21:00 +02:00 |
|
Clément Renault
|
521775f788
|
I push for Many
|
2024-09-02 15:10:21 +02:00 |
|
Clément Renault
|
72e7b7846e
|
Renaming the indexers
|
2024-09-02 14:42:27 +02:00 |
|
Clément Renault
|
6526ce1208
|
Fix the merging of documents
|
2024-09-02 14:41:20 +02:00 |
|
Clément Renault
|
e639ec79d1
|
Move the indexers into their own modules
|
2024-09-02 10:42:19 +02:00 |
|
Clément Renault
|
bb885a5810
|
Fix the merge for roaring bitmap
|
2024-09-01 23:20:19 +02:00 |
|
Clément Renault
|
b625d31c7d
|
Introduce the PartialDumpIndexer indexer that generates document ids in parallel
|
2024-08-30 15:07:21 +02:00 |
|
Clément Renault
|
6487a67f2b
|
Introduce the ConcurrentAvailableIds struct and rename the other to AvailableIds
|
2024-08-30 15:06:50 +02:00 |
|