Commit Graph

259 Commits

Author SHA1 Message Date
Tamo
fa885e75b4
rename the send_progress in progress 2024-12-11 18:13:12 +01:00
Tamo
f1beb60204
make the progress use payload instead of documents 2024-12-11 18:07:45 +01:00
Tamo
c5536c37b5
rename the atomic::name to unit_name 2024-12-11 18:03:06 +01:00
Tamo
9245c89cfe
move the macros to milli 2024-12-11 18:00:46 +01:00
meili-bors[bot]
eaabc1af2f
Merge #5144
Some checks failed
Test suite / Tests on ${{ matrix.os }} (macos-13) (push) Waiting to run
Test suite / Tests almost all features (push) Has been skipped
Test suite / Test disabled tokenization (push) Has been skipped
Test suite / Tests on ubuntu-20.04 (push) Failing after 11s
Test suite / Run tests in debug (push) Failing after 10s
Test suite / Tests on ${{ matrix.os }} (windows-2022) (push) Failing after 47s
Test suite / Run Rustfmt (push) Successful in 1m17s
Test suite / Run Clippy (push) Successful in 5m30s
5144: Exactly 512 bytes docid fails r=Kerollmops a=dureuill

# Pull Request

## Related issue
Fixes #5050 

## What does this PR do?
- Return a user error rather than an internal one for docids of exactly 512 bytes
- Fix up error message to indicate that exactly 512 bytes long docids are not supported.
- Fix up error message to reflect that index uids are actually limited to 400 bytes in length

## Impact

- Impacts docs: 
    - update [this paragraph](https://www.meilisearch.com/docs/learn/resources/known_limitations#length-of-primary-key-values) to say 511 bytes instead of 512 

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-12-11 15:41:05 +00:00
Tamo
867e6a8f1d
rename the send_progress field to progress since it s not sending anything 2024-12-11 16:25:01 +01:00
Tamo
6f4823fc97
make the number of document in the document tasks more incremental 2024-12-11 16:25:01 +01:00
Tamo
df9b68f8ed
inital implementation of the progress 2024-12-11 16:25:01 +01:00
Louis Dureuil
bfca54cc2c
Return docid in case of errors while rendering the document template 2024-12-11 15:26:18 +01:00
Kerollmops
aeb6b74725
Make sure we use an FxHashBuilder on the Value 2024-12-10 15:52:22 +01:00
Kerollmops
a751972c57
Prefer using a stable than a random hash builder 2024-12-10 14:25:53 +01:00
Kerollmops
6b269795d2
Update bumparaw-collections to 0.1.2 2024-12-10 14:25:13 +01:00
Kerollmops
89637bcaaf
Use bumparaw-collections in Meilisearch/milli 2024-12-10 11:52:20 +01:00
Louis Dureuil
866ac91be3
Fix error messages 2024-12-10 11:06:58 +01:00
Louis Dureuil
e610af36aa
User failure for documents with docid of ==512 bytes 2024-12-10 11:06:24 +01:00
ManyTheFish
07f42e8057 Do not index a filed count when no word is counted 2024-12-09 15:45:12 +01:00
ManyTheFish
71f59749dc Reduce union impact in merging 2024-12-09 15:44:06 +01:00
Kerollmops
f5dd8dfc3e
Rollback max memory usage changes 2024-12-09 10:26:30 +01:00
James Hiew
54e34beac6
Check attributes are filterable before evaluating search query 2024-12-07 21:13:13 +00:00
Louis Dureuil
bd5110a2fe
Fix clippy warnings 2024-12-05 16:13:07 +01:00
Louis Dureuil
fa8b9acdf6
Ignore documents that didn't change in facets 2024-12-05 16:12:52 +01:00
Louis Dureuil
2b74d1824b
Ignore documents that didn't change any field in word pair proximity 2024-12-05 15:56:22 +01:00
Louis Dureuil
c77b00d3ac
Don't extract word docids when no searchable changed 2024-12-05 15:51:58 +01:00
Louis Dureuil
c77073efcc
Update::has_changed_for_fields 2024-12-05 15:50:12 +01:00
meili-bors[bot]
cac355bfa7
Merge #5124
5124: Optimize Prefixes and Merges r=ManyTheFish a=Kerollmops

In this PR, we plan to optimize the read of LMDB to use read the entries in lexicographic order and better use the memory-mapping OS cache:

 - Optimize the prefix generation for word position docids (`@manythefish)`
 - Optimize the parallel merging of the caches to sort entries before merging the caches (`@kerollmops)`
 
## Benchmarks on 1cpu 2gb gpo3 (5k IOps)
 
Before on the tag meilisearch-v1.12.0-rc.3.

```
word_position_docids:merge_and_send_docids: 988s
compute_word_fst: 23.3s
word_pair_proximity_docids:merge_and_send_docids: 428s
compute_word_prefix_fid_docids:recompute_modified_prefixes: 76.3s
compute_word_prefix_position_docids:recompute_modified_prefixes:from_prefixes: 429s
```

After sorting the whole `HashMap`s in a `Vec` on this branch.

```
word_position_docids:merge_and_send_docids: 202s
compute_word_fst: 20.4s
word_pair_proximity_docids:merge_and_send_docids: 427s
compute_word_prefix_fid_docids:recompute_modified_prefixes: 65.5s
compute_word_prefix_position_docids:recompute_modified_prefixes:from_prefixes: 62.5s
```

Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
2024-12-05 09:35:52 +00:00
Kerollmops
52843123d4
Clean up and remove the non-sorted merge_caches function 2024-12-05 10:03:05 +01:00
meili-bors[bot]
6298db5bea
Merge #5113
5113: Fix the Minimum BBQueue channel threshold r=Kerollmops a=Kerollmops



Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-12-05 09:01:02 +00:00
Louis Dureuil
3a11e39c01
Force max_memory to a min of 100MiB 2024-12-04 17:53:30 +01:00
Louis Dureuil
5f896b1050
Fix geo when spilling 2024-12-04 17:51:12 +01:00
Kerollmops
2e32d0474c
Lexicographically sort all the map to merge 2024-12-04 17:05:11 +01:00
Kerollmops
cb99ac6f7e
Consume vec instead of draining 2024-12-04 17:00:22 +01:00
Kerollmops
be411435f5
Use the merge_caches_alt function in the docids merging 2024-12-04 16:37:29 +01:00
Kerollmops
29ef164530
Introduce a new semi ordered merge function 2024-12-04 16:33:35 +01:00
ManyTheFish
739c52a3cd Replace HashSets by BTreeSets for the prefixes 2024-12-04 16:16:48 +01:00
Kerollmops
261d2ceb06
Yield the BBQueue writer instead of spin looping 2024-12-04 14:16:40 +01:00
Kerollmops
96831ed9bb
Send the WakeUp message if necessary in the reserve function 2024-12-04 11:03:01 +01:00
Kerollmops
0459b1a242
Change the reserve and grant function to accept a closure 2024-12-04 10:32:25 +01:00
Kerollmops
8ecb726683
Fix the minimun BBQueue channel threshold 2024-12-03 15:49:11 +01:00
Clément Renault
0ad2f57a92
Update bbqueue repo to point to the meilisearch org 2024-12-03 12:00:04 +01:00
Louis Dureuil
e905a72d73 remove mimalloc on Windows 2024-12-02 18:13:56 +01:00
Louis Dureuil
d040aff101 Stop allocating 1GiB for documents 2024-12-02 16:30:14 +01:00
Clément Renault
767259be7e
Prefer returning a abort indexation rather than throwing a panic 2024-12-02 11:53:42 +01:00
Clément Renault
e9f34fb4b1
Make the frame consumer pulling fair 2024-12-02 11:49:01 +01:00
Clément Renault
d5c07ef7b3
Manage key length conversion error correctly 2024-12-02 11:03:00 +01:00
Clément Renault
5e218f3f4d
Remove a sync_all (mark my words) 2024-12-02 11:03:00 +01:00
Clément Renault
bcab61ab1d
Do spurious wake ups on the receiver side 2024-12-02 11:03:00 +01:00
Clément Renault
263c5a348e
Move the spin looping for BBQueue frames into a dedicated function 2024-12-02 10:33:49 +01:00
Clément Renault
be7d2fbe63
Move the EntryHeader up in the file and document the safety related to the size 2024-12-02 10:19:11 +01:00
Clément Renault
f7f9a131e4
Improve copying bytes into aligned memory area 2024-12-02 10:15:58 +01:00
Clément Renault
5df5eb2db2
Clarify a method name 2024-12-02 10:10:48 +01:00
Clément Renault
30eb0e5b5b
Rename recv and read methods to recv_action and recv_frame 2024-12-02 10:08:01 +01:00
Clément Renault
5b860cb989
Fix english in the doc 2024-12-02 10:06:35 +01:00
Clément Renault
76d0623b11
Reduce the number of unwraps 2024-12-02 10:05:06 +01:00
Clément Renault
db4eaf4d2d
Rename serialize_into into serialize_into_writer 2024-12-02 10:03:27 +01:00
Clément Renault
13f21206a6
Call the serialize_into_writer method from the serialize_into one 2024-12-02 10:03:01 +01:00
Clément Renault
14ee7aa84c
Make sure the BBQueue is at least 50 MiB 2024-11-28 18:02:48 +01:00
Clément Renault
8a35cd1743
Adjust the BBQueue buffers to use 2% instead of 10% 2024-11-28 16:00:15 +01:00
Clément Renault
3c7ac093d3
Take the BBQueue capacity into account in the max memory 2024-11-28 15:43:14 +01:00
Clément Renault
b57dd5c58e
Remove the Vector variant and use the Vectors 2024-11-28 15:20:43 +01:00
Clément Renault
096a28656e
Fix a bug around deleting all the vectors of a doc 2024-11-28 15:15:06 +01:00
Clément Renault
cc4bd54669
Correctly construct the Embeddings struct 2024-11-28 13:53:25 +01:00
Clément Renault
58eab9a018
Send large payload through crossbeam 2024-11-28 12:01:06 +01:00
Clément Renault
5c488e20cc
Send the geo rtree through crossbeam channel 2024-11-27 18:03:45 +01:00
Clément Renault
da650f834e
Plug the NoPanicThreadPool in the tests and benchmarks 2024-11-27 17:04:49 +01:00
Clément Renault
e83534a430
Fix the indexer::index to correctly use the rayon::ThreadPool 2024-11-27 16:27:43 +01:00
Clément Renault
98d4a2909e
Fix the way we spawn the rayon threadpool 2024-11-27 16:05:44 +01:00
Clément Renault
a514ce472a
Make clippy happy 2024-11-27 14:59:04 +01:00
Clément Renault
cc63802115
Modify and return the IndexEmbeddings to write them later 2024-11-27 14:58:03 +01:00
Clément Renault
acec45ad7c
Send a WakeUp when writing data in the BBQueue buffers 2024-11-27 14:33:23 +01:00
Clément Renault
08d6413365
Fix result types 2024-11-27 14:32:42 +01:00
Clément Renault
70802eb7c7
Fix most issues with the lifetimes 2024-11-27 14:32:42 +01:00
Clément Renault
6ac5b3b136
Finish most of the channels types 2024-11-27 14:32:26 +01:00
Clément Renault
e1e76f39d0
Clean up dependencies 2024-11-27 14:30:34 +01:00
Clément Renault
2094ce8a9a
Move the arroy building after the writing loop 2024-11-27 14:30:33 +01:00
Clément Renault
8442db8101
Implement mostly all senders 2024-11-27 14:16:35 +01:00
Clément Renault
79671c9faa
Implement a first version of the bbqueue channels 2024-11-27 14:15:00 +01:00
meili-bors[bot]
a2f64f6552
Merge #5095
Some checks failed
Test suite / Tests on ${{ matrix.os }} (macos-13) (push) Waiting to run
Test suite / Tests almost all features (push) Has been skipped
Test suite / Test disabled tokenization (push) Has been skipped
Test suite / Tests on ubuntu-20.04 (push) Failing after 13s
Test suite / Run tests in debug (push) Failing after 12s
Test suite / Tests on ${{ matrix.os }} (windows-2022) (push) Failing after 40s
Test suite / Run Rustfmt (push) Successful in 1m46s
Test suite / Run Clippy (push) Successful in 9m55s
5095: Span to measure the part of db writes that is after the merge/extraction r=curquiza a=dureuill



Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-11-27 11:10:00 +00:00
ManyTheFish
18a9af353c Update Charabia version to v0.9.2 2024-11-27 11:12:08 +01:00
meili-bors[bot]
aae0dc715d
Merge #5063
5063: Fix pagination when embedding fails r=Kerollmops a=dureuill

# Pull Request

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/5045

## What does this PR do?
- Use `return_keyword_results` function when embedding fails


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-11-27 09:13:28 +00:00
meili-bors[bot]
d0b2c0a523
Merge #5091
Some checks failed
Test suite / Tests on ${{ matrix.os }} (macos-13) (push) Waiting to run
Test suite / Tests almost all features (push) Has been skipped
Test suite / Test disabled tokenization (push) Has been skipped
Test suite / Tests on ubuntu-20.04 (push) Failing after 11s
Test suite / Run tests in debug (push) Failing after 10s
Test suite / Tests on ${{ matrix.os }} (windows-2022) (push) Failing after 39s
Test suite / Run Rustfmt (push) Successful in 1m38s
Test suite / Run Clippy (push) Successful in 23m11s
5091: Settings opt out r=Kerollmops a=ManyTheFish

# Pull Request

Related PRD: https://www.notion.so/meilisearch/API-usage-Settings-to-opt-out-indexing-features-fff4b06b651f8108ade3f858aeb16b14?pvs=4

## Related issue
Fixes #4979 

- [x] Add setting opt-out
- [x] Add analytics
- [x] Add tests


Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Many the fish <many@meilisearch.com>
2024-11-26 15:50:28 +00:00
ManyTheFish
2e896f30a5 Fix PR comments 2024-11-26 16:06:33 +01:00
Louis Dureuil
8f57b4fdf4
Span to measure the part of db writes that is after the merge/extraction 2024-11-26 14:46:36 +01:00
Many the fish
f014e78684
Update crates/milli/src/index.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-11-26 14:46:01 +01:00
ManyTheFish
d7bcfb2d19 fix clippy 2024-11-26 14:04:16 +01:00
meili-bors[bot]
fb66fec398
Merge #5092
Some checks failed
Test suite / Tests on ${{ matrix.os }} (macos-13) (push) Waiting to run
Test suite / Tests almost all features (push) Has been skipped
Test suite / Test disabled tokenization (push) Has been skipped
Test suite / Tests on ubuntu-20.04 (push) Failing after 12s
Test suite / Run tests in debug (push) Failing after 11s
Test suite / Tests on ${{ matrix.os }} (windows-2022) (push) Failing after 23s
Test suite / Run Rustfmt (push) Successful in 1m41s
Test suite / Run Clippy (push) Successful in 5m36s
5092: Precise spans for new indexer r=dureuill a=dureuill

- Separate extract and merge spans
- Add span around commit

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-11-26 10:59:40 +00:00
Louis Dureuil
aa460819a7
Add more precise spans 2024-11-26 09:45:36 +01:00
meili-bors[bot]
e241f91285
Merge #5062
5062: Fix bugs for v1.12 r=Kerollmops a=ManyTheFish

# Pull Request

## Related issue
Fixes #4984
Fixes https://github.com/meilisearch/meilisearch/issues/4974
Fixes [SDK test](https://github.com/meilisearch/meilisearch/actions/runs/11886701996/job/33118278794)
## What does this PR do?
- add 3 tests
- fix bugs

Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-11-26 08:10:50 +00:00
ManyTheFish
d66dc363ed Test and implement settings opt-out 2024-11-25 18:23:22 +01:00
meili-bors[bot]
5560452ef9
Merge #5089
5089: Improve error handling when writing into LMDB r=dureuill a=Kerollmops

This PR exposes two new internal error variants: `StoreDelete` and `StorePut`. So that the error messages are better when we fail at writing into LMDB.

Related to #5078

Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-11-25 16:19:41 +00:00
Clément Renault
b4fb2dabd4
Use the grenad rayon feature 2024-11-25 16:31:21 +01:00
Clément Renault
5606679c53
Use the obkv and grenad crates.io versions 2024-11-25 16:24:59 +01:00
Clément Renault
a3103f347e
Fix the facet f64 database name 2024-11-25 16:05:31 +01:00
Clément Renault
25aac45fc7
Expose better error messages 2024-11-25 15:54:43 +01:00
Louis Dureuil
323ecbb885
Add span on document operation 2024-11-21 17:01:10 +01:00
Louis Dureuil
dcc3caef0d
Remove TopLevelMap 2024-11-21 16:56:46 +01:00
Louis Dureuil
221e547e86
Slight changes 2024-11-21 16:47:44 +01:00
Clément Renault
61d0615253
Document the geo point extractor 2024-11-21 16:47:08 +01:00
Clément Renault
5727e00374
Remove useless geo skipped 2024-11-21 16:47:08 +01:00
Clément Renault
9b60843831
Remove commented lines 2024-11-21 16:47:07 +01:00
ManyTheFish
36962b943b First batch of PR comment 2024-11-21 16:38:11 +01:00