Compare commits

...

29 Commits

Author SHA1 Message Date
Alexander Zaitsev
1e1b0314be
Merge f8b7a48b1a57df51b48eee6a6fea798952c00df2 into c63c25a9a28d32ba74de7c00747dbbd1b17008ec 2025-02-28 17:45:36 +01:00
meili-bors[bot]
c63c25a9a2
Merge #5355
Some checks failed
Look for flaky tests / flaky (push) Failing after 1s
Indexing bench (push) / Run and upload benchmarks (push) Has been cancelled
Benchmarks of indexing (push) / Run and upload benchmarks (push) Has been cancelled
Benchmarks of search for geo (push) / Run and upload benchmarks (push) Has been cancelled
Benchmarks of search for songs (push) / Run and upload benchmarks (push) Has been cancelled
Benchmarks of search for Wikipedia articles (push) / Run and upload benchmarks (push) Has been cancelled
Publish binaries to GitHub release / Check the version validity (push) Failing after 5s
Test suite / Tests almost all features (push) Failing after 13s
Test suite / Tests on ubuntu-22.04 (push) Failing after 19s
Test suite / Test with Ollama (push) Failing after 7s
Test suite / Test disabled tokenization (push) Failing after 10s
Test suite / Run tests in debug (push) Failing after 15s
Test suite / Run Rustfmt (push) Failing after 16s
Test suite / Run Clippy (push) Successful in 9m39s
SDKs tests / define-docker-image (push) Failing after 5s
SDKs tests / .NET SDK tests (push) Has been skipped
SDKs tests / Dart SDK tests (push) Has been skipped
SDKs tests / Go SDK tests (push) Has been skipped
SDKs tests / Java SDK tests (push) Has been skipped
SDKs tests / JS SDK tests (push) Has been skipped
SDKs tests / PHP SDK tests (push) Has been skipped
SDKs tests / Python SDK tests (push) Has been skipped
SDKs tests / Ruby SDK tests (push) Has been skipped
SDKs tests / Rust SDK tests (push) Has been skipped
SDKs tests / Swift SDK tests (push) Has been skipped
SDKs tests / meilisearch-js-plugins tests (push) Has been skipped
SDKs tests / meilisearch-rails tests (push) Has been skipped
SDKs tests / meilisearch-symfony tests (push) Has been skipped
Test suite / Tests on macos-13 (push) Has been cancelled
Test suite / Tests on windows-2022 (push) Has been cancelled
5355: Support fetching the pooling method from the model configuration r=Kerollmops a=dureuill

# Pull Request

## Related issue
Fixes #5354 

## What does this PR do?
- Fetches the pooling configuration from the model repository
- Use a pooling method that depends on the pooling configuration of that model.
- Allow overriding the pooling method with a new huggingFace embedder parameter `pooling`
  - for backward-compatibility with Meilisearch v1.13
  - for compatibility with embedders that exhibit the same behavior as Meilisearch v1.13
- Handle the default value of that new parameter
   - for compatibility, when importing a db/a dump, it should be set to `forceMean`
   - when (re)set from the settings for an embedder, it should be set to `useModel`


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2025-02-27 14:55:13 +00:00
meili-bors[bot]
80adbb1bdc
Merge #5338
Some checks are pending
Indexing bench (push) / Run and upload benchmarks (push) Waiting to run
Benchmarks of indexing (push) / Run and upload benchmarks (push) Waiting to run
Benchmarks of search for geo (push) / Run and upload benchmarks (push) Waiting to run
Benchmarks of search for songs (push) / Run and upload benchmarks (push) Waiting to run
Benchmarks of search for Wikipedia articles (push) / Run and upload benchmarks (push) Waiting to run
Run the indexing fuzzer / Setup the action (push) Successful in 1h5m23s
5338: Bump Ubuntu in the CI from 20.04 to 22.04 r=dureuill a=Kerollmops

This PR bumps the Ubuntu version we use in the CI from version 20.04 to version 22.04. This also means we are [using GLIBC version 2.35 and not version 2.28](https://gist.github.com/zchrissirhcz/ee13f604996bbbe312ba1d105954d2ed).

Note, the indentation fix is done by my IDE (Zed), sorry about that 🤦 

Fixes https://github.com/meilisearch/meilisearch/issues/5374

Co-authored-by: Kerollmops <clement@meilisearch.com>
2025-02-27 08:14:12 +00:00
meili-bors[bot]
4b6fa1cf41
Merge #5372
Some checks failed
Indexing bench (push) / Run and upload benchmarks (push) Waiting to run
Benchmarks of indexing (push) / Run and upload benchmarks (push) Waiting to run
Benchmarks of search for geo (push) / Run and upload benchmarks (push) Waiting to run
Benchmarks of search for songs (push) / Run and upload benchmarks (push) Waiting to run
Benchmarks of search for Wikipedia articles (push) / Run and upload benchmarks (push) Waiting to run
Test suite / Tests on ubuntu-20.04 (push) Failing after 1s
Test suite / Tests almost all features (push) Has been skipped
Test suite / Test disabled tokenization (push) Has been skipped
Run the indexing fuzzer / Setup the action (push) Failing after 11s
Test suite / Test with Ollama (push) Failing after 10s
Test suite / Run tests in debug (push) Failing after 13s
Test suite / Run Clippy (push) Failing after 19s
Test suite / Run Rustfmt (push) Failing after 32s
Test suite / Tests on macos-13 (push) Has been cancelled
Test suite / Tests on windows-2022 (push) Has been cancelled
5372: Bring back changes from v1.13.1 to main r=irevoire a=Kerollmops



Co-authored-by: Kerollmops <Kerollmops@users.noreply.github.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Strift <lau.cazanove@gmail.com>
Co-authored-by: Many the fish <many@meilisearch.com>
2025-02-26 17:24:51 +00:00
Kerollmops
dc78d8e9c4
Fix the dumpless upgrade log 2025-02-26 17:02:46 +01:00
ManyTheFish
d4063c9dcd
Fix fmt 2025-02-26 17:02:45 +01:00
Many the fish
abebc574f6
Update crates/milli/src/index.rs
Co-authored-by: Tamo <tamo@meilisearch.com>
2025-02-26 17:02:45 +01:00
Many the fish
f32ab67819
Update crates/milli/src/index.rs
Co-authored-by: Tamo <tamo@meilisearch.com>
2025-02-26 17:02:44 +01:00
ManyTheFish
d25953f322
fix clippy 2025-02-26 17:02:43 +01:00
ManyTheFish
405bbd04c1
Dumpless upgrade 2025-02-26 17:01:38 +01:00
ManyTheFish
5d421abdc4
Update Snapshots 2025-02-26 17:01:37 +01:00
ManyTheFish
9f3663e768
Implement Incremental document database stats computing 2025-02-26 17:01:35 +01:00
ManyTheFish
d9642ec916
Use checked_div in average computation 2025-02-26 17:01:34 +01:00
ManyTheFish
818e8b0237
Fix zero division 2025-02-26 17:01:31 +01:00
ManyTheFish
4f77a7fba5
fix clippy 2025-02-26 17:01:29 +01:00
ManyTheFish
058f08dff5
fix snapshots 2025-02-26 17:01:26 +01:00
ManyTheFish
9a6c1730aa
Add document database stats 2025-02-26 17:01:25 +01:00
Strift
91a8a97045
Bump 2025-02-26 17:01:24 +01:00
ManyTheFish
15788773af
Check the exact_word database when computing zero typo query 2025-02-26 17:01:22 +01:00
Kerollmops
025b9b79bb
Update the snapshots 2025-02-26 17:01:21 +01:00
Kerollmops
1c60b17a37
Update version for the next release (v1.13.1) in Cargo.toml 2025-02-26 17:01:19 +01:00
Louis Dureuil
589bf30ec6
make clippy happy 2025-02-19 11:38:07 +01:00
Louis Dureuil
b367c71ad2
fixup test 2025-02-19 11:31:17 +01:00
Louis Dureuil
1005a60fb8
Fixup dump settings 2025-02-19 11:03:48 +01:00
Louis Dureuil
cd0dfa3f1b
Fix test cases 2025-02-18 17:21:52 +01:00
Louis Dureuil
7b4ce468a6
Allow overriding pooling method 2025-02-18 17:12:23 +01:00
Louis Dureuil
11759c4be4
Support pooling 2025-02-18 16:10:51 +01:00
Kerollmops
a21c440274
Bump Ubuntu from 20.04 to 22.04 2025-02-12 09:49:50 +01:00
Alexander Zaitsev
f8b7a48b1a
feat: enable LTO 2024-12-02 12:55:50 +03:00
62 changed files with 761 additions and 206 deletions

View File

@ -9,22 +9,22 @@ jobs:
flaky: flaky:
runs-on: ubuntu-latest runs-on: ubuntu-latest
container: container:
# Use ubuntu-20.04 to compile with glibc 2.28 # Use ubuntu-22.04 to compile with glibc 2.35
image: ubuntu:20.04 image: ubuntu:22.04
steps: steps:
- uses: actions/checkout@v3 - uses: actions/checkout@v3
- name: Install needed dependencies - name: Install needed dependencies
run: | run: |
apt-get update && apt-get install -y curl apt-get update && apt-get install -y curl
apt-get install build-essential -y apt-get install build-essential -y
- uses: dtolnay/rust-toolchain@1.81 - uses: dtolnay/rust-toolchain@1.81
- name: Install cargo-flaky - name: Install cargo-flaky
run: cargo install cargo-flaky run: cargo install cargo-flaky
- name: Run cargo flaky in the dumps - name: Run cargo flaky in the dumps
run: cd crates/dump; cargo flaky -i 100 --release run: cd crates/dump; cargo flaky -i 100 --release
- name: Run cargo flaky in the index-scheduler - name: Run cargo flaky in the index-scheduler
run: cd crates/index-scheduler; cargo flaky -i 100 --release run: cd crates/index-scheduler; cargo flaky -i 100 --release
- name: Run cargo flaky in the auth - name: Run cargo flaky in the auth
run: cd crates/meilisearch-auth; cargo flaky -i 100 --release run: cd crates/meilisearch-auth; cargo flaky -i 100 --release
- name: Run cargo flaky in meilisearch - name: Run cargo flaky in meilisearch
run: cd crates/meilisearch; cargo flaky -i 100 --release run: cd crates/meilisearch; cargo flaky -i 100 --release

View File

@ -18,28 +18,28 @@ jobs:
runs-on: ubuntu-latest runs-on: ubuntu-latest
needs: check-version needs: check-version
container: container:
# Use ubuntu-20.04 to compile with glibc 2.28 # Use ubuntu-22.04 to compile with glibc 2.35
image: ubuntu:20.04 image: ubuntu:22.04
steps: steps:
- name: Install needed dependencies - name: Install needed dependencies
run: | run: |
apt-get update && apt-get install -y curl apt-get update && apt-get install -y curl
apt-get install build-essential -y apt-get install build-essential -y
- uses: dtolnay/rust-toolchain@1.81 - uses: dtolnay/rust-toolchain@1.81
- name: Install cargo-deb - name: Install cargo-deb
run: cargo install cargo-deb run: cargo install cargo-deb
- uses: actions/checkout@v3 - uses: actions/checkout@v3
- name: Build deb package - name: Build deb package
run: cargo deb -p meilisearch -o target/debian/meilisearch.deb run: cargo deb -p meilisearch -o target/debian/meilisearch.deb
- name: Upload debian pkg to release - name: Upload debian pkg to release
uses: svenstaro/upload-release-action@2.7.0 uses: svenstaro/upload-release-action@2.7.0
with: with:
repo_token: ${{ secrets.MEILI_BOT_GH_PAT }} repo_token: ${{ secrets.MEILI_BOT_GH_PAT }}
file: target/debian/meilisearch.deb file: target/debian/meilisearch.deb
asset_name: meilisearch.deb asset_name: meilisearch.deb
tag: ${{ github.ref }} tag: ${{ github.ref }}
- name: Upload debian pkg to apt repository - name: Upload debian pkg to apt repository
run: curl -F package=@target/debian/meilisearch.deb https://${{ secrets.GEMFURY_PUSH_TOKEN }}@push.fury.io/meilisearch/ run: curl -F package=@target/debian/meilisearch.deb https://${{ secrets.GEMFURY_PUSH_TOKEN }}@push.fury.io/meilisearch/
homebrew: homebrew:
name: Bump Homebrew formula name: Bump Homebrew formula

View File

@ -3,7 +3,7 @@ name: Publish binaries to GitHub release
on: on:
workflow_dispatch: workflow_dispatch:
schedule: schedule:
- cron: '0 2 * * *' # Every day at 2:00am - cron: "0 2 * * *" # Every day at 2:00am
release: release:
types: [published] types: [published]
@ -37,26 +37,26 @@ jobs:
runs-on: ubuntu-latest runs-on: ubuntu-latest
needs: check-version needs: check-version
container: container:
# Use ubuntu-20.04 to compile with glibc 2.28 # Use ubuntu-22.04 to compile with glibc 2.35
image: ubuntu:20.04 image: ubuntu:22.04
steps: steps:
- uses: actions/checkout@v3 - uses: actions/checkout@v3
- name: Install needed dependencies - name: Install needed dependencies
run: | run: |
apt-get update && apt-get install -y curl apt-get update && apt-get install -y curl
apt-get install build-essential -y apt-get install build-essential -y
- uses: dtolnay/rust-toolchain@1.81 - uses: dtolnay/rust-toolchain@1.81
- name: Build - name: Build
run: cargo build --release --locked run: cargo build --release --locked
# No need to upload binaries for dry run (cron) # No need to upload binaries for dry run (cron)
- name: Upload binaries to release - name: Upload binaries to release
if: github.event_name == 'release' if: github.event_name == 'release'
uses: svenstaro/upload-release-action@2.7.0 uses: svenstaro/upload-release-action@2.7.0
with: with:
repo_token: ${{ secrets.MEILI_BOT_GH_PAT }} repo_token: ${{ secrets.MEILI_BOT_GH_PAT }}
file: target/release/meilisearch file: target/release/meilisearch
asset_name: meilisearch-linux-amd64 asset_name: meilisearch-linux-amd64
tag: ${{ github.ref }} tag: ${{ github.ref }}
publish-macos-windows: publish-macos-windows:
name: Publish binary for ${{ matrix.os }} name: Publish binary for ${{ matrix.os }}
@ -74,19 +74,19 @@ jobs:
artifact_name: meilisearch.exe artifact_name: meilisearch.exe
asset_name: meilisearch-windows-amd64.exe asset_name: meilisearch-windows-amd64.exe
steps: steps:
- uses: actions/checkout@v3 - uses: actions/checkout@v3
- uses: dtolnay/rust-toolchain@1.81 - uses: dtolnay/rust-toolchain@1.81
- name: Build - name: Build
run: cargo build --release --locked run: cargo build --release --locked
# No need to upload binaries for dry run (cron) # No need to upload binaries for dry run (cron)
- name: Upload binaries to release - name: Upload binaries to release
if: github.event_name == 'release' if: github.event_name == 'release'
uses: svenstaro/upload-release-action@2.7.0 uses: svenstaro/upload-release-action@2.7.0
with: with:
repo_token: ${{ secrets.MEILI_BOT_GH_PAT }} repo_token: ${{ secrets.MEILI_BOT_GH_PAT }}
file: target/release/${{ matrix.artifact_name }} file: target/release/${{ matrix.artifact_name }}
asset_name: ${{ matrix.asset_name }} asset_name: ${{ matrix.asset_name }}
tag: ${{ github.ref }} tag: ${{ github.ref }}
publish-macos-apple-silicon: publish-macos-apple-silicon:
name: Publish binary for macOS silicon name: Publish binary for macOS silicon
@ -127,8 +127,8 @@ jobs:
env: env:
DEBIAN_FRONTEND: noninteractive DEBIAN_FRONTEND: noninteractive
container: container:
# Use ubuntu-20.04 to compile with glibc 2.28 # Use ubuntu-22.04 to compile with glibc 2.35
image: ubuntu:20.04 image: ubuntu:22.04
strategy: strategy:
matrix: matrix:
include: include:

View File

@ -19,11 +19,11 @@ env:
jobs: jobs:
test-linux: test-linux:
name: Tests on ubuntu-20.04 name: Tests on ubuntu-22.04
runs-on: ubuntu-latest runs-on: ubuntu-latest
container: container:
# Use ubuntu-20.04 to compile with glibc 2.28 # Use ubuntu-22.04 to compile with glibc 2.35
image: ubuntu:20.04 image: ubuntu:22.04
steps: steps:
- uses: actions/checkout@v3 - uses: actions/checkout@v3
- name: Install needed dependencies - name: Install needed dependencies
@ -72,8 +72,8 @@ jobs:
name: Tests almost all features name: Tests almost all features
runs-on: ubuntu-latest runs-on: ubuntu-latest
container: container:
# Use ubuntu-20.04 to compile with glibc 2.28 # Use ubuntu-22.04 to compile with glibc 2.35
image: ubuntu:20.04 image: ubuntu:22.04
if: github.event_name == 'schedule' || github.event_name == 'workflow_dispatch' if: github.event_name == 'schedule' || github.event_name == 'workflow_dispatch'
steps: steps:
- uses: actions/checkout@v3 - uses: actions/checkout@v3
@ -125,7 +125,7 @@ jobs:
name: Test disabled tokenization name: Test disabled tokenization
runs-on: ubuntu-latest runs-on: ubuntu-latest
container: container:
image: ubuntu:20.04 image: ubuntu:22.04
if: github.event_name == 'schedule' || github.event_name == 'workflow_dispatch' if: github.event_name == 'schedule' || github.event_name == 'workflow_dispatch'
steps: steps:
- uses: actions/checkout@v3 - uses: actions/checkout@v3
@ -149,8 +149,8 @@ jobs:
name: Run tests in debug name: Run tests in debug
runs-on: ubuntu-latest runs-on: ubuntu-latest
container: container:
# Use ubuntu-20.04 to compile with glibc 2.28 # Use ubuntu-22.04 to compile with glibc 2.35
image: ubuntu:20.04 image: ubuntu:22.04
steps: steps:
- uses: actions/checkout@v3 - uses: actions/checkout@v3
- name: Install needed dependencies - name: Install needed dependencies

34
Cargo.lock generated
View File

@ -503,7 +503,7 @@ source = "git+https://github.com/meilisearch/bbqueue#cbb87cc707b5af415ef203bdaf2
[[package]] [[package]]
name = "benchmarks" name = "benchmarks"
version = "1.13.0" version = "1.13.1"
dependencies = [ dependencies = [
"anyhow", "anyhow",
"bumpalo", "bumpalo",
@ -694,7 +694,7 @@ dependencies = [
[[package]] [[package]]
name = "build-info" name = "build-info"
version = "1.13.0" version = "1.13.1"
dependencies = [ dependencies = [
"anyhow", "anyhow",
"time", "time",
@ -1671,7 +1671,7 @@ dependencies = [
[[package]] [[package]]
name = "dump" name = "dump"
version = "1.13.0" version = "1.13.1"
dependencies = [ dependencies = [
"anyhow", "anyhow",
"big_s", "big_s",
@ -1873,7 +1873,7 @@ checksum = "37909eebbb50d72f9059c3b6d82c0463f2ff062c9e95845c43a6c9c0355411be"
[[package]] [[package]]
name = "file-store" name = "file-store"
version = "1.13.0" version = "1.13.1"
dependencies = [ dependencies = [
"tempfile", "tempfile",
"thiserror 2.0.9", "thiserror 2.0.9",
@ -1895,7 +1895,7 @@ dependencies = [
[[package]] [[package]]
name = "filter-parser" name = "filter-parser"
version = "1.13.0" version = "1.13.1"
dependencies = [ dependencies = [
"insta", "insta",
"nom", "nom",
@ -1915,7 +1915,7 @@ dependencies = [
[[package]] [[package]]
name = "flatten-serde-json" name = "flatten-serde-json"
version = "1.13.0" version = "1.13.1"
dependencies = [ dependencies = [
"criterion", "criterion",
"serde_json", "serde_json",
@ -2054,7 +2054,7 @@ dependencies = [
[[package]] [[package]]
name = "fuzzers" name = "fuzzers"
version = "1.13.0" version = "1.13.1"
dependencies = [ dependencies = [
"arbitrary", "arbitrary",
"bumpalo", "bumpalo",
@ -2743,7 +2743,7 @@ checksum = "206ca75c9c03ba3d4ace2460e57b189f39f43de612c2f85836e65c929701bb2d"
[[package]] [[package]]
name = "index-scheduler" name = "index-scheduler"
version = "1.13.0" version = "1.13.1"
dependencies = [ dependencies = [
"anyhow", "anyhow",
"arroy 0.5.0 (registry+https://github.com/rust-lang/crates.io-index)", "arroy 0.5.0 (registry+https://github.com/rust-lang/crates.io-index)",
@ -2950,7 +2950,7 @@ dependencies = [
[[package]] [[package]]
name = "json-depth-checker" name = "json-depth-checker"
version = "1.13.0" version = "1.13.1"
dependencies = [ dependencies = [
"criterion", "criterion",
"serde_json", "serde_json",
@ -3569,7 +3569,7 @@ checksum = "490cc448043f947bae3cbee9c203358d62dbee0db12107a74be5c30ccfd09771"
[[package]] [[package]]
name = "meili-snap" name = "meili-snap"
version = "1.13.0" version = "1.13.1"
dependencies = [ dependencies = [
"insta", "insta",
"md5", "md5",
@ -3578,7 +3578,7 @@ dependencies = [
[[package]] [[package]]
name = "meilisearch" name = "meilisearch"
version = "1.13.0" version = "1.13.1"
dependencies = [ dependencies = [
"actix-cors", "actix-cors",
"actix-http", "actix-http",
@ -3670,7 +3670,7 @@ dependencies = [
[[package]] [[package]]
name = "meilisearch-auth" name = "meilisearch-auth"
version = "1.13.0" version = "1.13.1"
dependencies = [ dependencies = [
"base64 0.22.1", "base64 0.22.1",
"enum-iterator", "enum-iterator",
@ -3689,7 +3689,7 @@ dependencies = [
[[package]] [[package]]
name = "meilisearch-types" name = "meilisearch-types"
version = "1.13.0" version = "1.13.1"
dependencies = [ dependencies = [
"actix-web", "actix-web",
"anyhow", "anyhow",
@ -3723,7 +3723,7 @@ dependencies = [
[[package]] [[package]]
name = "meilitool" name = "meilitool"
version = "1.13.0" version = "1.13.1"
dependencies = [ dependencies = [
"anyhow", "anyhow",
"arroy 0.5.0 (git+https://github.com/meilisearch/arroy/?tag=DO-NOT-DELETE-upgrade-v04-to-v05)", "arroy 0.5.0 (git+https://github.com/meilisearch/arroy/?tag=DO-NOT-DELETE-upgrade-v04-to-v05)",
@ -3758,7 +3758,7 @@ dependencies = [
[[package]] [[package]]
name = "milli" name = "milli"
version = "1.13.0" version = "1.13.1"
dependencies = [ dependencies = [
"allocator-api2", "allocator-api2",
"arroy 0.5.0 (registry+https://github.com/rust-lang/crates.io-index)", "arroy 0.5.0 (registry+https://github.com/rust-lang/crates.io-index)",
@ -4270,7 +4270,7 @@ checksum = "e3148f5046208a5d56bcfc03053e3ca6334e51da8dfb19b6cdc8b306fae3283e"
[[package]] [[package]]
name = "permissive-json-pointer" name = "permissive-json-pointer"
version = "1.13.0" version = "1.13.1"
dependencies = [ dependencies = [
"big_s", "big_s",
"serde_json", "serde_json",
@ -6847,7 +6847,7 @@ dependencies = [
[[package]] [[package]]
name = "xtask" name = "xtask"
version = "1.13.0" version = "1.13.1"
dependencies = [ dependencies = [
"anyhow", "anyhow",
"build-info", "build-info",

View File

@ -22,7 +22,7 @@ members = [
] ]
[workspace.package] [workspace.package]
version = "1.13.0" version = "1.13.1"
authors = [ authors = [
"Quentin de Quelen <quentin@dequelen.me>", "Quentin de Quelen <quentin@dequelen.me>",
"Clément Renault <clement@meilisearch.com>", "Clément Renault <clement@meilisearch.com>",
@ -35,6 +35,7 @@ license = "MIT"
[profile.release] [profile.release]
codegen-units = 1 codegen-units = 1
lto = true
[profile.dev.package.flate2] [profile.dev.package.flate2]
opt-level = 3 opt-level = 3

View File

@ -1,5 +1,5 @@
status = [ status = [
'Tests on ubuntu-20.04', 'Tests on ubuntu-22.04',
'Tests on macos-13', 'Tests on macos-13',
'Tests on windows-2022', 'Tests on windows-2022',
'Run Clippy', 'Run Clippy',

View File

@ -1,5 +1,5 @@
--- ---
source: dump/src/reader/mod.rs source: crates/dump/src/reader/mod.rs
expression: vector_index.settings().unwrap() expression: vector_index.settings().unwrap()
--- ---
{ {
@ -49,6 +49,7 @@ expression: vector_index.settings().unwrap()
"source": "huggingFace", "source": "huggingFace",
"model": "BAAI/bge-base-en-v1.5", "model": "BAAI/bge-base-en-v1.5",
"revision": "617ca489d9e86b49b8167676d8220688b99db36e", "revision": "617ca489d9e86b49b8167676d8220688b99db36e",
"pooling": "forceMean",
"documentTemplate": "{% for field in fields %} {{ field.name }}: {{ field.value }}\n{% endfor %}" "documentTemplate": "{% for field in fields %} {{ field.name }}: {{ field.value }}\n{% endfor %}"
} }
}, },

View File

@ -3,6 +3,7 @@ use std::io::{BufRead, BufReader, ErrorKind};
use std::path::Path; use std::path::Path;
pub use meilisearch_types::milli; pub use meilisearch_types::milli;
use meilisearch_types::milli::vector::hf::OverridePooling;
use tempfile::TempDir; use tempfile::TempDir;
use time::OffsetDateTime; use time::OffsetDateTime;
use tracing::debug; use tracing::debug;
@ -252,7 +253,29 @@ impl V6IndexReader {
} }
pub fn settings(&mut self) -> Result<Settings<Checked>> { pub fn settings(&mut self) -> Result<Settings<Checked>> {
let settings: Settings<Unchecked> = serde_json::from_reader(&mut self.settings)?; let mut settings: Settings<Unchecked> = serde_json::from_reader(&mut self.settings)?;
patch_embedders(&mut settings);
Ok(settings.check()) Ok(settings.check())
} }
} }
fn patch_embedders(settings: &mut Settings<Unchecked>) {
if let Setting::Set(embedders) = &mut settings.embedders {
for settings in embedders.values_mut() {
let Setting::Set(settings) = &mut settings.inner else {
continue;
};
if settings.source != Setting::Set(milli::vector::settings::EmbedderSource::HuggingFace)
{
continue;
}
settings.pooling = match settings.pooling {
Setting::Set(pooling) => Setting::Set(pooling),
// if the pooling for a hugging face embedder is not set, force it to `forceMean`
// for backward compatibility with v1.13
// dumps created in v1.14 and up will have the setting set for hugging face embedders
Setting::Reset | Setting::NotSet => Setting::Set(OverridePooling::ForceMean),
};
}
}
}

View File

@ -6,6 +6,7 @@ use std::{fs, thread};
use meilisearch_types::heed::types::{SerdeJson, Str}; use meilisearch_types::heed::types::{SerdeJson, Str};
use meilisearch_types::heed::{Database, Env, RoTxn, RwTxn}; use meilisearch_types::heed::{Database, Env, RoTxn, RwTxn};
use meilisearch_types::milli; use meilisearch_types::milli;
use meilisearch_types::milli::database_stats::DatabaseStats;
use meilisearch_types::milli::update::IndexerConfig; use meilisearch_types::milli::update::IndexerConfig;
use meilisearch_types::milli::{FieldDistribution, Index}; use meilisearch_types::milli::{FieldDistribution, Index};
use serde::{Deserialize, Serialize}; use serde::{Deserialize, Serialize};
@ -98,8 +99,9 @@ pub enum IndexStatus {
/// The statistics that can be computed from an `Index` object. /// The statistics that can be computed from an `Index` object.
#[derive(Serialize, Deserialize, Debug)] #[derive(Serialize, Deserialize, Debug)]
pub struct IndexStats { pub struct IndexStats {
/// Number of documents in the index. /// Stats of the documents database.
pub number_of_documents: u64, #[serde(default)]
pub documents_database_stats: DatabaseStats,
/// Size taken up by the index' DB, in bytes. /// Size taken up by the index' DB, in bytes.
/// ///
/// This includes the size taken by both the used and free pages of the DB, and as the free pages /// This includes the size taken by both the used and free pages of the DB, and as the free pages
@ -138,9 +140,9 @@ impl IndexStats {
pub fn new(index: &Index, rtxn: &RoTxn) -> milli::Result<Self> { pub fn new(index: &Index, rtxn: &RoTxn) -> milli::Result<Self> {
let arroy_stats = index.arroy_stats(rtxn)?; let arroy_stats = index.arroy_stats(rtxn)?;
Ok(IndexStats { Ok(IndexStats {
number_of_documents: index.number_of_documents(rtxn)?,
number_of_embeddings: Some(arroy_stats.number_of_embeddings), number_of_embeddings: Some(arroy_stats.number_of_embeddings),
number_of_embedded_documents: Some(arroy_stats.documents.len()), number_of_embedded_documents: Some(arroy_stats.documents.len()),
documents_database_stats: index.documents_stats(rtxn)?.unwrap_or_default(),
database_size: index.on_disk_size()?, database_size: index.on_disk_size()?,
used_database_size: index.used_size()?, used_database_size: index.used_size()?,
primary_key: index.primary_key(rtxn)?.map(|s| s.to_string()), primary_key: index.primary_key(rtxn)?.map(|s| s.to_string()),

View File

@ -370,7 +370,8 @@ pub fn snapshot_index_mapper(rtxn: &RoTxn, mapper: &IndexMapper) -> String {
let stats = mapper.stats_of(rtxn, &name).unwrap(); let stats = mapper.stats_of(rtxn, &name).unwrap();
s.push_str(&format!( s.push_str(&format!(
"{name}: {{ number_of_documents: {}, field_distribution: {:?} }}\n", "{name}: {{ number_of_documents: {}, field_distribution: {:?} }}\n",
stats.number_of_documents, stats.field_distribution stats.documents_database_stats.number_of_entries(),
stats.field_distribution
)); ));
} }

View File

@ -1,12 +1,12 @@
--- ---
source: crates/index-scheduler/src/scheduler/test_embedders.rs source: crates/index-scheduler/src/scheduler/test_embedders.rs
expression: simple_hf_config.embedder_options expression: simple_hf_config.embedder_options
snapshot_kind: text
--- ---
{ {
"HuggingFace": { "HuggingFace": {
"model": "sentence-transformers/all-MiniLM-L6-v2", "model": "sentence-transformers/all-MiniLM-L6-v2",
"revision": "e4ce9877abf3edfe10b0d82785e83bdcb973e22e", "revision": "e4ce9877abf3edfe10b0d82785e83bdcb973e22e",
"distribution": null "distribution": null,
"pooling": "useModel"
} }
} }

View File

@ -1,13 +1,12 @@
--- ---
source: crates/index-scheduler/src/scheduler/test.rs source: crates/index-scheduler/src/scheduler/test.rs
snapshot_kind: text
--- ---
### Autobatching Enabled = true ### Autobatching Enabled = true
### Processing batch None: ### Processing batch None:
[] []
---------------------------------------------------------------------- ----------------------------------------------------------------------
### All Tasks: ### All Tasks:
0 {uid: 0, status: enqueued, details: { settings: Settings { displayed_attributes: WildcardSetting(NotSet), searchable_attributes: WildcardSetting(NotSet), filterable_attributes: NotSet, sortable_attributes: NotSet, ranking_rules: NotSet, stop_words: NotSet, non_separator_tokens: NotSet, separator_tokens: NotSet, dictionary: NotSet, synonyms: NotSet, distinct_attribute: NotSet, proximity_precision: NotSet, typo_tolerance: NotSet, faceting: NotSet, pagination: NotSet, embedders: Set({"default": Set(EmbeddingSettings { source: Set(Rest), model: NotSet, revision: NotSet, api_key: Set("My super secret"), dimensions: Set(4), binary_quantized: NotSet, document_template: NotSet, document_template_max_bytes: NotSet, url: Set("http://localhost:7777"), request: Set(String("{{text}}")), response: Set(String("{{embedding}}")), headers: NotSet, distribution: NotSet })}), search_cutoff_ms: NotSet, localized_attributes: NotSet, facet_search: NotSet, prefix_search: NotSet, _kind: PhantomData<meilisearch_types::settings::Unchecked> } }, kind: SettingsUpdate { index_uid: "doggos", new_settings: Settings { displayed_attributes: WildcardSetting(NotSet), searchable_attributes: WildcardSetting(NotSet), filterable_attributes: NotSet, sortable_attributes: NotSet, ranking_rules: NotSet, stop_words: NotSet, non_separator_tokens: NotSet, separator_tokens: NotSet, dictionary: NotSet, synonyms: NotSet, distinct_attribute: NotSet, proximity_precision: NotSet, typo_tolerance: NotSet, faceting: NotSet, pagination: NotSet, embedders: Set({"default": Set(EmbeddingSettings { source: Set(Rest), model: NotSet, revision: NotSet, api_key: Set("My super secret"), dimensions: Set(4), binary_quantized: NotSet, document_template: NotSet, document_template_max_bytes: NotSet, url: Set("http://localhost:7777"), request: Set(String("{{text}}")), response: Set(String("{{embedding}}")), headers: NotSet, distribution: NotSet })}), search_cutoff_ms: NotSet, localized_attributes: NotSet, facet_search: NotSet, prefix_search: NotSet, _kind: PhantomData<meilisearch_types::settings::Unchecked> }, is_deletion: false, allow_index_creation: true }} 0 {uid: 0, status: enqueued, details: { settings: Settings { displayed_attributes: WildcardSetting(NotSet), searchable_attributes: WildcardSetting(NotSet), filterable_attributes: NotSet, sortable_attributes: NotSet, ranking_rules: NotSet, stop_words: NotSet, non_separator_tokens: NotSet, separator_tokens: NotSet, dictionary: NotSet, synonyms: NotSet, distinct_attribute: NotSet, proximity_precision: NotSet, typo_tolerance: NotSet, faceting: NotSet, pagination: NotSet, embedders: Set({"default": Set(EmbeddingSettings { source: Set(Rest), model: NotSet, revision: NotSet, pooling: NotSet, api_key: Set("My super secret"), dimensions: Set(4), binary_quantized: NotSet, document_template: NotSet, document_template_max_bytes: NotSet, url: Set("http://localhost:7777"), request: Set(String("{{text}}")), response: Set(String("{{embedding}}")), headers: NotSet, distribution: NotSet })}), search_cutoff_ms: NotSet, localized_attributes: NotSet, facet_search: NotSet, prefix_search: NotSet, _kind: PhantomData<meilisearch_types::settings::Unchecked> } }, kind: SettingsUpdate { index_uid: "doggos", new_settings: Settings { displayed_attributes: WildcardSetting(NotSet), searchable_attributes: WildcardSetting(NotSet), filterable_attributes: NotSet, sortable_attributes: NotSet, ranking_rules: NotSet, stop_words: NotSet, non_separator_tokens: NotSet, separator_tokens: NotSet, dictionary: NotSet, synonyms: NotSet, distinct_attribute: NotSet, proximity_precision: NotSet, typo_tolerance: NotSet, faceting: NotSet, pagination: NotSet, embedders: Set({"default": Set(EmbeddingSettings { source: Set(Rest), model: NotSet, revision: NotSet, pooling: NotSet, api_key: Set("My super secret"), dimensions: Set(4), binary_quantized: NotSet, document_template: NotSet, document_template_max_bytes: NotSet, url: Set("http://localhost:7777"), request: Set(String("{{text}}")), response: Set(String("{{embedding}}")), headers: NotSet, distribution: NotSet })}), search_cutoff_ms: NotSet, localized_attributes: NotSet, facet_search: NotSet, prefix_search: NotSet, _kind: PhantomData<meilisearch_types::settings::Unchecked> }, is_deletion: false, allow_index_creation: true }}
---------------------------------------------------------------------- ----------------------------------------------------------------------
### Status: ### Status:
enqueued [0,] enqueued [0,]

View File

@ -1,13 +1,12 @@
--- ---
source: crates/index-scheduler/src/scheduler/test.rs source: crates/index-scheduler/src/scheduler/test.rs
snapshot_kind: text
--- ---
### Autobatching Enabled = true ### Autobatching Enabled = true
### Processing batch None: ### Processing batch None:
[] []
---------------------------------------------------------------------- ----------------------------------------------------------------------
### All Tasks: ### All Tasks:
0 {uid: 0, batch_uid: 0, status: succeeded, details: { settings: Settings { displayed_attributes: WildcardSetting(NotSet), searchable_attributes: WildcardSetting(NotSet), filterable_attributes: NotSet, sortable_attributes: NotSet, ranking_rules: NotSet, stop_words: NotSet, non_separator_tokens: NotSet, separator_tokens: NotSet, dictionary: NotSet, synonyms: NotSet, distinct_attribute: NotSet, proximity_precision: NotSet, typo_tolerance: NotSet, faceting: NotSet, pagination: NotSet, embedders: Set({"default": Set(EmbeddingSettings { source: Set(Rest), model: NotSet, revision: NotSet, api_key: Set("My super secret"), dimensions: Set(4), binary_quantized: NotSet, document_template: NotSet, document_template_max_bytes: NotSet, url: Set("http://localhost:7777"), request: Set(String("{{text}}")), response: Set(String("{{embedding}}")), headers: NotSet, distribution: NotSet })}), search_cutoff_ms: NotSet, localized_attributes: NotSet, facet_search: NotSet, prefix_search: NotSet, _kind: PhantomData<meilisearch_types::settings::Unchecked> } }, kind: SettingsUpdate { index_uid: "doggos", new_settings: Settings { displayed_attributes: WildcardSetting(NotSet), searchable_attributes: WildcardSetting(NotSet), filterable_attributes: NotSet, sortable_attributes: NotSet, ranking_rules: NotSet, stop_words: NotSet, non_separator_tokens: NotSet, separator_tokens: NotSet, dictionary: NotSet, synonyms: NotSet, distinct_attribute: NotSet, proximity_precision: NotSet, typo_tolerance: NotSet, faceting: NotSet, pagination: NotSet, embedders: Set({"default": Set(EmbeddingSettings { source: Set(Rest), model: NotSet, revision: NotSet, api_key: Set("My super secret"), dimensions: Set(4), binary_quantized: NotSet, document_template: NotSet, document_template_max_bytes: NotSet, url: Set("http://localhost:7777"), request: Set(String("{{text}}")), response: Set(String("{{embedding}}")), headers: NotSet, distribution: NotSet })}), search_cutoff_ms: NotSet, localized_attributes: NotSet, facet_search: NotSet, prefix_search: NotSet, _kind: PhantomData<meilisearch_types::settings::Unchecked> }, is_deletion: false, allow_index_creation: true }} 0 {uid: 0, batch_uid: 0, status: succeeded, details: { settings: Settings { displayed_attributes: WildcardSetting(NotSet), searchable_attributes: WildcardSetting(NotSet), filterable_attributes: NotSet, sortable_attributes: NotSet, ranking_rules: NotSet, stop_words: NotSet, non_separator_tokens: NotSet, separator_tokens: NotSet, dictionary: NotSet, synonyms: NotSet, distinct_attribute: NotSet, proximity_precision: NotSet, typo_tolerance: NotSet, faceting: NotSet, pagination: NotSet, embedders: Set({"default": Set(EmbeddingSettings { source: Set(Rest), model: NotSet, revision: NotSet, pooling: NotSet, api_key: Set("My super secret"), dimensions: Set(4), binary_quantized: NotSet, document_template: NotSet, document_template_max_bytes: NotSet, url: Set("http://localhost:7777"), request: Set(String("{{text}}")), response: Set(String("{{embedding}}")), headers: NotSet, distribution: NotSet })}), search_cutoff_ms: NotSet, localized_attributes: NotSet, facet_search: NotSet, prefix_search: NotSet, _kind: PhantomData<meilisearch_types::settings::Unchecked> } }, kind: SettingsUpdate { index_uid: "doggos", new_settings: Settings { displayed_attributes: WildcardSetting(NotSet), searchable_attributes: WildcardSetting(NotSet), filterable_attributes: NotSet, sortable_attributes: NotSet, ranking_rules: NotSet, stop_words: NotSet, non_separator_tokens: NotSet, separator_tokens: NotSet, dictionary: NotSet, synonyms: NotSet, distinct_attribute: NotSet, proximity_precision: NotSet, typo_tolerance: NotSet, faceting: NotSet, pagination: NotSet, embedders: Set({"default": Set(EmbeddingSettings { source: Set(Rest), model: NotSet, revision: NotSet, pooling: NotSet, api_key: Set("My super secret"), dimensions: Set(4), binary_quantized: NotSet, document_template: NotSet, document_template_max_bytes: NotSet, url: Set("http://localhost:7777"), request: Set(String("{{text}}")), response: Set(String("{{embedding}}")), headers: NotSet, distribution: NotSet })}), search_cutoff_ms: NotSet, localized_attributes: NotSet, facet_search: NotSet, prefix_search: NotSet, _kind: PhantomData<meilisearch_types::settings::Unchecked> }, is_deletion: false, allow_index_creation: true }}
---------------------------------------------------------------------- ----------------------------------------------------------------------
### Status: ### Status:
enqueued [] enqueued []

View File

@ -1,13 +1,12 @@
--- ---
source: crates/index-scheduler/src/scheduler/test_embedders.rs source: crates/index-scheduler/src/scheduler/test_embedders.rs
snapshot_kind: text
--- ---
### Autobatching Enabled = true ### Autobatching Enabled = true
### Processing batch None: ### Processing batch None:
[] []
---------------------------------------------------------------------- ----------------------------------------------------------------------
### All Tasks: ### All Tasks:
0 {uid: 0, batch_uid: 0, status: succeeded, details: { settings: Settings { displayed_attributes: WildcardSetting(NotSet), searchable_attributes: WildcardSetting(NotSet), filterable_attributes: NotSet, sortable_attributes: NotSet, ranking_rules: NotSet, stop_words: NotSet, non_separator_tokens: NotSet, separator_tokens: NotSet, dictionary: NotSet, synonyms: NotSet, distinct_attribute: NotSet, proximity_precision: NotSet, typo_tolerance: NotSet, faceting: NotSet, pagination: NotSet, embedders: Set({"A_fakerest": Set(EmbeddingSettings { source: Set(Rest), model: NotSet, revision: NotSet, api_key: Set("My super secret"), dimensions: Set(384), binary_quantized: NotSet, document_template: NotSet, document_template_max_bytes: NotSet, url: Set("http://localhost:7777"), request: Set(String("{{text}}")), response: Set(String("{{embedding}}")), headers: NotSet, distribution: NotSet }), "B_small_hf": Set(EmbeddingSettings { source: Set(HuggingFace), model: Set("sentence-transformers/all-MiniLM-L6-v2"), revision: Set("e4ce9877abf3edfe10b0d82785e83bdcb973e22e"), api_key: NotSet, dimensions: NotSet, binary_quantized: NotSet, document_template: Set("{{doc.doggo}} the {{doc.breed}} best doggo"), document_template_max_bytes: NotSet, url: NotSet, request: NotSet, response: NotSet, headers: NotSet, distribution: NotSet })}), search_cutoff_ms: NotSet, localized_attributes: NotSet, facet_search: NotSet, prefix_search: NotSet, _kind: PhantomData<meilisearch_types::settings::Unchecked> } }, kind: SettingsUpdate { index_uid: "doggos", new_settings: Settings { displayed_attributes: WildcardSetting(NotSet), searchable_attributes: WildcardSetting(NotSet), filterable_attributes: NotSet, sortable_attributes: NotSet, ranking_rules: NotSet, stop_words: NotSet, non_separator_tokens: NotSet, separator_tokens: NotSet, dictionary: NotSet, synonyms: NotSet, distinct_attribute: NotSet, proximity_precision: NotSet, typo_tolerance: NotSet, faceting: NotSet, pagination: NotSet, embedders: Set({"A_fakerest": Set(EmbeddingSettings { source: Set(Rest), model: NotSet, revision: NotSet, api_key: Set("My super secret"), dimensions: Set(384), binary_quantized: NotSet, document_template: NotSet, document_template_max_bytes: NotSet, url: Set("http://localhost:7777"), request: Set(String("{{text}}")), response: Set(String("{{embedding}}")), headers: NotSet, distribution: NotSet }), "B_small_hf": Set(EmbeddingSettings { source: Set(HuggingFace), model: Set("sentence-transformers/all-MiniLM-L6-v2"), revision: Set("e4ce9877abf3edfe10b0d82785e83bdcb973e22e"), api_key: NotSet, dimensions: NotSet, binary_quantized: NotSet, document_template: Set("{{doc.doggo}} the {{doc.breed}} best doggo"), document_template_max_bytes: NotSet, url: NotSet, request: NotSet, response: NotSet, headers: NotSet, distribution: NotSet })}), search_cutoff_ms: NotSet, localized_attributes: NotSet, facet_search: NotSet, prefix_search: NotSet, _kind: PhantomData<meilisearch_types::settings::Unchecked> }, is_deletion: false, allow_index_creation: true }} 0 {uid: 0, batch_uid: 0, status: succeeded, details: { settings: Settings { displayed_attributes: WildcardSetting(NotSet), searchable_attributes: WildcardSetting(NotSet), filterable_attributes: NotSet, sortable_attributes: NotSet, ranking_rules: NotSet, stop_words: NotSet, non_separator_tokens: NotSet, separator_tokens: NotSet, dictionary: NotSet, synonyms: NotSet, distinct_attribute: NotSet, proximity_precision: NotSet, typo_tolerance: NotSet, faceting: NotSet, pagination: NotSet, embedders: Set({"A_fakerest": Set(EmbeddingSettings { source: Set(Rest), model: NotSet, revision: NotSet, pooling: NotSet, api_key: Set("My super secret"), dimensions: Set(384), binary_quantized: NotSet, document_template: NotSet, document_template_max_bytes: NotSet, url: Set("http://localhost:7777"), request: Set(String("{{text}}")), response: Set(String("{{embedding}}")), headers: NotSet, distribution: NotSet }), "B_small_hf": Set(EmbeddingSettings { source: Set(HuggingFace), model: Set("sentence-transformers/all-MiniLM-L6-v2"), revision: Set("e4ce9877abf3edfe10b0d82785e83bdcb973e22e"), pooling: NotSet, api_key: NotSet, dimensions: NotSet, binary_quantized: NotSet, document_template: Set("{{doc.doggo}} the {{doc.breed}} best doggo"), document_template_max_bytes: NotSet, url: NotSet, request: NotSet, response: NotSet, headers: NotSet, distribution: NotSet })}), search_cutoff_ms: NotSet, localized_attributes: NotSet, facet_search: NotSet, prefix_search: NotSet, _kind: PhantomData<meilisearch_types::settings::Unchecked> } }, kind: SettingsUpdate { index_uid: "doggos", new_settings: Settings { displayed_attributes: WildcardSetting(NotSet), searchable_attributes: WildcardSetting(NotSet), filterable_attributes: NotSet, sortable_attributes: NotSet, ranking_rules: NotSet, stop_words: NotSet, non_separator_tokens: NotSet, separator_tokens: NotSet, dictionary: NotSet, synonyms: NotSet, distinct_attribute: NotSet, proximity_precision: NotSet, typo_tolerance: NotSet, faceting: NotSet, pagination: NotSet, embedders: Set({"A_fakerest": Set(EmbeddingSettings { source: Set(Rest), model: NotSet, revision: NotSet, pooling: NotSet, api_key: Set("My super secret"), dimensions: Set(384), binary_quantized: NotSet, document_template: NotSet, document_template_max_bytes: NotSet, url: Set("http://localhost:7777"), request: Set(String("{{text}}")), response: Set(String("{{embedding}}")), headers: NotSet, distribution: NotSet }), "B_small_hf": Set(EmbeddingSettings { source: Set(HuggingFace), model: Set("sentence-transformers/all-MiniLM-L6-v2"), revision: Set("e4ce9877abf3edfe10b0d82785e83bdcb973e22e"), pooling: NotSet, api_key: NotSet, dimensions: NotSet, binary_quantized: NotSet, document_template: Set("{{doc.doggo}} the {{doc.breed}} best doggo"), document_template_max_bytes: NotSet, url: NotSet, request: NotSet, response: NotSet, headers: NotSet, distribution: NotSet })}), search_cutoff_ms: NotSet, localized_attributes: NotSet, facet_search: NotSet, prefix_search: NotSet, _kind: PhantomData<meilisearch_types::settings::Unchecked> }, is_deletion: false, allow_index_creation: true }}
1 {uid: 1, batch_uid: 1, status: succeeded, details: { received_documents: 1, indexed_documents: Some(1) }, kind: DocumentAdditionOrUpdate { index_uid: "doggos", primary_key: Some("id"), method: UpdateDocuments, content_file: 00000000-0000-0000-0000-000000000000, documents_count: 1, allow_index_creation: true }} 1 {uid: 1, batch_uid: 1, status: succeeded, details: { received_documents: 1, indexed_documents: Some(1) }, kind: DocumentAdditionOrUpdate { index_uid: "doggos", primary_key: Some("id"), method: UpdateDocuments, content_file: 00000000-0000-0000-0000-000000000000, documents_count: 1, allow_index_creation: true }}
2 {uid: 2, batch_uid: 2, status: succeeded, details: { received_documents: 1, indexed_documents: Some(1) }, kind: DocumentAdditionOrUpdate { index_uid: "doggos", primary_key: None, method: UpdateDocuments, content_file: 00000000-0000-0000-0000-000000000001, documents_count: 1, allow_index_creation: true }} 2 {uid: 2, batch_uid: 2, status: succeeded, details: { received_documents: 1, indexed_documents: Some(1) }, kind: DocumentAdditionOrUpdate { index_uid: "doggos", primary_key: None, method: UpdateDocuments, content_file: 00000000-0000-0000-0000-000000000001, documents_count: 1, allow_index_creation: true }}
---------------------------------------------------------------------- ----------------------------------------------------------------------

View File

@ -1,13 +1,12 @@
--- ---
source: crates/index-scheduler/src/scheduler/test_embedders.rs source: crates/index-scheduler/src/scheduler/test_embedders.rs
snapshot_kind: text
--- ---
### Autobatching Enabled = true ### Autobatching Enabled = true
### Processing batch None: ### Processing batch None:
[] []
---------------------------------------------------------------------- ----------------------------------------------------------------------
### All Tasks: ### All Tasks:
0 {uid: 0, batch_uid: 0, status: succeeded, details: { settings: Settings { displayed_attributes: WildcardSetting(NotSet), searchable_attributes: WildcardSetting(NotSet), filterable_attributes: NotSet, sortable_attributes: NotSet, ranking_rules: NotSet, stop_words: NotSet, non_separator_tokens: NotSet, separator_tokens: NotSet, dictionary: NotSet, synonyms: NotSet, distinct_attribute: NotSet, proximity_precision: NotSet, typo_tolerance: NotSet, faceting: NotSet, pagination: NotSet, embedders: Set({"A_fakerest": Set(EmbeddingSettings { source: Set(Rest), model: NotSet, revision: NotSet, api_key: Set("My super secret"), dimensions: Set(384), binary_quantized: NotSet, document_template: NotSet, document_template_max_bytes: NotSet, url: Set("http://localhost:7777"), request: Set(String("{{text}}")), response: Set(String("{{embedding}}")), headers: NotSet, distribution: NotSet }), "B_small_hf": Set(EmbeddingSettings { source: Set(HuggingFace), model: Set("sentence-transformers/all-MiniLM-L6-v2"), revision: Set("e4ce9877abf3edfe10b0d82785e83bdcb973e22e"), api_key: NotSet, dimensions: NotSet, binary_quantized: NotSet, document_template: Set("{{doc.doggo}} the {{doc.breed}} best doggo"), document_template_max_bytes: NotSet, url: NotSet, request: NotSet, response: NotSet, headers: NotSet, distribution: NotSet })}), search_cutoff_ms: NotSet, localized_attributes: NotSet, facet_search: NotSet, prefix_search: NotSet, _kind: PhantomData<meilisearch_types::settings::Unchecked> } }, kind: SettingsUpdate { index_uid: "doggos", new_settings: Settings { displayed_attributes: WildcardSetting(NotSet), searchable_attributes: WildcardSetting(NotSet), filterable_attributes: NotSet, sortable_attributes: NotSet, ranking_rules: NotSet, stop_words: NotSet, non_separator_tokens: NotSet, separator_tokens: NotSet, dictionary: NotSet, synonyms: NotSet, distinct_attribute: NotSet, proximity_precision: NotSet, typo_tolerance: NotSet, faceting: NotSet, pagination: NotSet, embedders: Set({"A_fakerest": Set(EmbeddingSettings { source: Set(Rest), model: NotSet, revision: NotSet, api_key: Set("My super secret"), dimensions: Set(384), binary_quantized: NotSet, document_template: NotSet, document_template_max_bytes: NotSet, url: Set("http://localhost:7777"), request: Set(String("{{text}}")), response: Set(String("{{embedding}}")), headers: NotSet, distribution: NotSet }), "B_small_hf": Set(EmbeddingSettings { source: Set(HuggingFace), model: Set("sentence-transformers/all-MiniLM-L6-v2"), revision: Set("e4ce9877abf3edfe10b0d82785e83bdcb973e22e"), api_key: NotSet, dimensions: NotSet, binary_quantized: NotSet, document_template: Set("{{doc.doggo}} the {{doc.breed}} best doggo"), document_template_max_bytes: NotSet, url: NotSet, request: NotSet, response: NotSet, headers: NotSet, distribution: NotSet })}), search_cutoff_ms: NotSet, localized_attributes: NotSet, facet_search: NotSet, prefix_search: NotSet, _kind: PhantomData<meilisearch_types::settings::Unchecked> }, is_deletion: false, allow_index_creation: true }} 0 {uid: 0, batch_uid: 0, status: succeeded, details: { settings: Settings { displayed_attributes: WildcardSetting(NotSet), searchable_attributes: WildcardSetting(NotSet), filterable_attributes: NotSet, sortable_attributes: NotSet, ranking_rules: NotSet, stop_words: NotSet, non_separator_tokens: NotSet, separator_tokens: NotSet, dictionary: NotSet, synonyms: NotSet, distinct_attribute: NotSet, proximity_precision: NotSet, typo_tolerance: NotSet, faceting: NotSet, pagination: NotSet, embedders: Set({"A_fakerest": Set(EmbeddingSettings { source: Set(Rest), model: NotSet, revision: NotSet, pooling: NotSet, api_key: Set("My super secret"), dimensions: Set(384), binary_quantized: NotSet, document_template: NotSet, document_template_max_bytes: NotSet, url: Set("http://localhost:7777"), request: Set(String("{{text}}")), response: Set(String("{{embedding}}")), headers: NotSet, distribution: NotSet }), "B_small_hf": Set(EmbeddingSettings { source: Set(HuggingFace), model: Set("sentence-transformers/all-MiniLM-L6-v2"), revision: Set("e4ce9877abf3edfe10b0d82785e83bdcb973e22e"), pooling: NotSet, api_key: NotSet, dimensions: NotSet, binary_quantized: NotSet, document_template: Set("{{doc.doggo}} the {{doc.breed}} best doggo"), document_template_max_bytes: NotSet, url: NotSet, request: NotSet, response: NotSet, headers: NotSet, distribution: NotSet })}), search_cutoff_ms: NotSet, localized_attributes: NotSet, facet_search: NotSet, prefix_search: NotSet, _kind: PhantomData<meilisearch_types::settings::Unchecked> } }, kind: SettingsUpdate { index_uid: "doggos", new_settings: Settings { displayed_attributes: WildcardSetting(NotSet), searchable_attributes: WildcardSetting(NotSet), filterable_attributes: NotSet, sortable_attributes: NotSet, ranking_rules: NotSet, stop_words: NotSet, non_separator_tokens: NotSet, separator_tokens: NotSet, dictionary: NotSet, synonyms: NotSet, distinct_attribute: NotSet, proximity_precision: NotSet, typo_tolerance: NotSet, faceting: NotSet, pagination: NotSet, embedders: Set({"A_fakerest": Set(EmbeddingSettings { source: Set(Rest), model: NotSet, revision: NotSet, pooling: NotSet, api_key: Set("My super secret"), dimensions: Set(384), binary_quantized: NotSet, document_template: NotSet, document_template_max_bytes: NotSet, url: Set("http://localhost:7777"), request: Set(String("{{text}}")), response: Set(String("{{embedding}}")), headers: NotSet, distribution: NotSet }), "B_small_hf": Set(EmbeddingSettings { source: Set(HuggingFace), model: Set("sentence-transformers/all-MiniLM-L6-v2"), revision: Set("e4ce9877abf3edfe10b0d82785e83bdcb973e22e"), pooling: NotSet, api_key: NotSet, dimensions: NotSet, binary_quantized: NotSet, document_template: Set("{{doc.doggo}} the {{doc.breed}} best doggo"), document_template_max_bytes: NotSet, url: NotSet, request: NotSet, response: NotSet, headers: NotSet, distribution: NotSet })}), search_cutoff_ms: NotSet, localized_attributes: NotSet, facet_search: NotSet, prefix_search: NotSet, _kind: PhantomData<meilisearch_types::settings::Unchecked> }, is_deletion: false, allow_index_creation: true }}
1 {uid: 1, batch_uid: 1, status: succeeded, details: { received_documents: 1, indexed_documents: Some(1) }, kind: DocumentAdditionOrUpdate { index_uid: "doggos", primary_key: Some("id"), method: UpdateDocuments, content_file: 00000000-0000-0000-0000-000000000000, documents_count: 1, allow_index_creation: true }} 1 {uid: 1, batch_uid: 1, status: succeeded, details: { received_documents: 1, indexed_documents: Some(1) }, kind: DocumentAdditionOrUpdate { index_uid: "doggos", primary_key: Some("id"), method: UpdateDocuments, content_file: 00000000-0000-0000-0000-000000000000, documents_count: 1, allow_index_creation: true }}
2 {uid: 2, status: enqueued, details: { received_documents: 1, indexed_documents: None }, kind: DocumentAdditionOrUpdate { index_uid: "doggos", primary_key: None, method: UpdateDocuments, content_file: 00000000-0000-0000-0000-000000000001, documents_count: 1, allow_index_creation: true }} 2 {uid: 2, status: enqueued, details: { received_documents: 1, indexed_documents: None }, kind: DocumentAdditionOrUpdate { index_uid: "doggos", primary_key: None, method: UpdateDocuments, content_file: 00000000-0000-0000-0000-000000000001, documents_count: 1, allow_index_creation: true }}
---------------------------------------------------------------------- ----------------------------------------------------------------------

View File

@ -1,13 +1,12 @@
--- ---
source: crates/index-scheduler/src/scheduler/test_embedders.rs source: crates/index-scheduler/src/scheduler/test_embedders.rs
snapshot_kind: text
--- ---
### Autobatching Enabled = true ### Autobatching Enabled = true
### Processing batch None: ### Processing batch None:
[] []
---------------------------------------------------------------------- ----------------------------------------------------------------------
### All Tasks: ### All Tasks:
0 {uid: 0, batch_uid: 0, status: succeeded, details: { settings: Settings { displayed_attributes: WildcardSetting(NotSet), searchable_attributes: WildcardSetting(NotSet), filterable_attributes: NotSet, sortable_attributes: NotSet, ranking_rules: NotSet, stop_words: NotSet, non_separator_tokens: NotSet, separator_tokens: NotSet, dictionary: NotSet, synonyms: NotSet, distinct_attribute: NotSet, proximity_precision: NotSet, typo_tolerance: NotSet, faceting: NotSet, pagination: NotSet, embedders: Set({"A_fakerest": Set(EmbeddingSettings { source: Set(Rest), model: NotSet, revision: NotSet, api_key: Set("My super secret"), dimensions: Set(384), binary_quantized: NotSet, document_template: NotSet, document_template_max_bytes: NotSet, url: Set("http://localhost:7777"), request: Set(String("{{text}}")), response: Set(String("{{embedding}}")), headers: NotSet, distribution: NotSet }), "B_small_hf": Set(EmbeddingSettings { source: Set(HuggingFace), model: Set("sentence-transformers/all-MiniLM-L6-v2"), revision: Set("e4ce9877abf3edfe10b0d82785e83bdcb973e22e"), api_key: NotSet, dimensions: NotSet, binary_quantized: NotSet, document_template: Set("{{doc.doggo}} the {{doc.breed}} best doggo"), document_template_max_bytes: NotSet, url: NotSet, request: NotSet, response: NotSet, headers: NotSet, distribution: NotSet })}), search_cutoff_ms: NotSet, localized_attributes: NotSet, facet_search: NotSet, prefix_search: NotSet, _kind: PhantomData<meilisearch_types::settings::Unchecked> } }, kind: SettingsUpdate { index_uid: "doggos", new_settings: Settings { displayed_attributes: WildcardSetting(NotSet), searchable_attributes: WildcardSetting(NotSet), filterable_attributes: NotSet, sortable_attributes: NotSet, ranking_rules: NotSet, stop_words: NotSet, non_separator_tokens: NotSet, separator_tokens: NotSet, dictionary: NotSet, synonyms: NotSet, distinct_attribute: NotSet, proximity_precision: NotSet, typo_tolerance: NotSet, faceting: NotSet, pagination: NotSet, embedders: Set({"A_fakerest": Set(EmbeddingSettings { source: Set(Rest), model: NotSet, revision: NotSet, api_key: Set("My super secret"), dimensions: Set(384), binary_quantized: NotSet, document_template: NotSet, document_template_max_bytes: NotSet, url: Set("http://localhost:7777"), request: Set(String("{{text}}")), response: Set(String("{{embedding}}")), headers: NotSet, distribution: NotSet }), "B_small_hf": Set(EmbeddingSettings { source: Set(HuggingFace), model: Set("sentence-transformers/all-MiniLM-L6-v2"), revision: Set("e4ce9877abf3edfe10b0d82785e83bdcb973e22e"), api_key: NotSet, dimensions: NotSet, binary_quantized: NotSet, document_template: Set("{{doc.doggo}} the {{doc.breed}} best doggo"), document_template_max_bytes: NotSet, url: NotSet, request: NotSet, response: NotSet, headers: NotSet, distribution: NotSet })}), search_cutoff_ms: NotSet, localized_attributes: NotSet, facet_search: NotSet, prefix_search: NotSet, _kind: PhantomData<meilisearch_types::settings::Unchecked> }, is_deletion: false, allow_index_creation: true }} 0 {uid: 0, batch_uid: 0, status: succeeded, details: { settings: Settings { displayed_attributes: WildcardSetting(NotSet), searchable_attributes: WildcardSetting(NotSet), filterable_attributes: NotSet, sortable_attributes: NotSet, ranking_rules: NotSet, stop_words: NotSet, non_separator_tokens: NotSet, separator_tokens: NotSet, dictionary: NotSet, synonyms: NotSet, distinct_attribute: NotSet, proximity_precision: NotSet, typo_tolerance: NotSet, faceting: NotSet, pagination: NotSet, embedders: Set({"A_fakerest": Set(EmbeddingSettings { source: Set(Rest), model: NotSet, revision: NotSet, pooling: NotSet, api_key: Set("My super secret"), dimensions: Set(384), binary_quantized: NotSet, document_template: NotSet, document_template_max_bytes: NotSet, url: Set("http://localhost:7777"), request: Set(String("{{text}}")), response: Set(String("{{embedding}}")), headers: NotSet, distribution: NotSet }), "B_small_hf": Set(EmbeddingSettings { source: Set(HuggingFace), model: Set("sentence-transformers/all-MiniLM-L6-v2"), revision: Set("e4ce9877abf3edfe10b0d82785e83bdcb973e22e"), pooling: NotSet, api_key: NotSet, dimensions: NotSet, binary_quantized: NotSet, document_template: Set("{{doc.doggo}} the {{doc.breed}} best doggo"), document_template_max_bytes: NotSet, url: NotSet, request: NotSet, response: NotSet, headers: NotSet, distribution: NotSet })}), search_cutoff_ms: NotSet, localized_attributes: NotSet, facet_search: NotSet, prefix_search: NotSet, _kind: PhantomData<meilisearch_types::settings::Unchecked> } }, kind: SettingsUpdate { index_uid: "doggos", new_settings: Settings { displayed_attributes: WildcardSetting(NotSet), searchable_attributes: WildcardSetting(NotSet), filterable_attributes: NotSet, sortable_attributes: NotSet, ranking_rules: NotSet, stop_words: NotSet, non_separator_tokens: NotSet, separator_tokens: NotSet, dictionary: NotSet, synonyms: NotSet, distinct_attribute: NotSet, proximity_precision: NotSet, typo_tolerance: NotSet, faceting: NotSet, pagination: NotSet, embedders: Set({"A_fakerest": Set(EmbeddingSettings { source: Set(Rest), model: NotSet, revision: NotSet, pooling: NotSet, api_key: Set("My super secret"), dimensions: Set(384), binary_quantized: NotSet, document_template: NotSet, document_template_max_bytes: NotSet, url: Set("http://localhost:7777"), request: Set(String("{{text}}")), response: Set(String("{{embedding}}")), headers: NotSet, distribution: NotSet }), "B_small_hf": Set(EmbeddingSettings { source: Set(HuggingFace), model: Set("sentence-transformers/all-MiniLM-L6-v2"), revision: Set("e4ce9877abf3edfe10b0d82785e83bdcb973e22e"), pooling: NotSet, api_key: NotSet, dimensions: NotSet, binary_quantized: NotSet, document_template: Set("{{doc.doggo}} the {{doc.breed}} best doggo"), document_template_max_bytes: NotSet, url: NotSet, request: NotSet, response: NotSet, headers: NotSet, distribution: NotSet })}), search_cutoff_ms: NotSet, localized_attributes: NotSet, facet_search: NotSet, prefix_search: NotSet, _kind: PhantomData<meilisearch_types::settings::Unchecked> }, is_deletion: false, allow_index_creation: true }}
1 {uid: 1, batch_uid: 1, status: succeeded, details: { received_documents: 1, indexed_documents: Some(1) }, kind: DocumentAdditionOrUpdate { index_uid: "doggos", primary_key: Some("id"), method: UpdateDocuments, content_file: 00000000-0000-0000-0000-000000000000, documents_count: 1, allow_index_creation: true }} 1 {uid: 1, batch_uid: 1, status: succeeded, details: { received_documents: 1, indexed_documents: Some(1) }, kind: DocumentAdditionOrUpdate { index_uid: "doggos", primary_key: Some("id"), method: UpdateDocuments, content_file: 00000000-0000-0000-0000-000000000000, documents_count: 1, allow_index_creation: true }}
---------------------------------------------------------------------- ----------------------------------------------------------------------
### Status: ### Status:

View File

@ -1,13 +1,12 @@
--- ---
source: crates/index-scheduler/src/scheduler/test_embedders.rs source: crates/index-scheduler/src/scheduler/test_embedders.rs
snapshot_kind: text
--- ---
### Autobatching Enabled = true ### Autobatching Enabled = true
### Processing batch None: ### Processing batch None:
[] []
---------------------------------------------------------------------- ----------------------------------------------------------------------
### All Tasks: ### All Tasks:
0 {uid: 0, batch_uid: 0, status: succeeded, details: { settings: Settings { displayed_attributes: WildcardSetting(NotSet), searchable_attributes: WildcardSetting(NotSet), filterable_attributes: NotSet, sortable_attributes: NotSet, ranking_rules: NotSet, stop_words: NotSet, non_separator_tokens: NotSet, separator_tokens: NotSet, dictionary: NotSet, synonyms: NotSet, distinct_attribute: NotSet, proximity_precision: NotSet, typo_tolerance: NotSet, faceting: NotSet, pagination: NotSet, embedders: Set({"A_fakerest": Set(EmbeddingSettings { source: Set(Rest), model: NotSet, revision: NotSet, api_key: Set("My super secret"), dimensions: Set(384), binary_quantized: NotSet, document_template: NotSet, document_template_max_bytes: NotSet, url: Set("http://localhost:7777"), request: Set(String("{{text}}")), response: Set(String("{{embedding}}")), headers: NotSet, distribution: NotSet }), "B_small_hf": Set(EmbeddingSettings { source: Set(HuggingFace), model: Set("sentence-transformers/all-MiniLM-L6-v2"), revision: Set("e4ce9877abf3edfe10b0d82785e83bdcb973e22e"), api_key: NotSet, dimensions: NotSet, binary_quantized: NotSet, document_template: Set("{{doc.doggo}} the {{doc.breed}} best doggo"), document_template_max_bytes: NotSet, url: NotSet, request: NotSet, response: NotSet, headers: NotSet, distribution: NotSet })}), search_cutoff_ms: NotSet, localized_attributes: NotSet, facet_search: NotSet, prefix_search: NotSet, _kind: PhantomData<meilisearch_types::settings::Unchecked> } }, kind: SettingsUpdate { index_uid: "doggos", new_settings: Settings { displayed_attributes: WildcardSetting(NotSet), searchable_attributes: WildcardSetting(NotSet), filterable_attributes: NotSet, sortable_attributes: NotSet, ranking_rules: NotSet, stop_words: NotSet, non_separator_tokens: NotSet, separator_tokens: NotSet, dictionary: NotSet, synonyms: NotSet, distinct_attribute: NotSet, proximity_precision: NotSet, typo_tolerance: NotSet, faceting: NotSet, pagination: NotSet, embedders: Set({"A_fakerest": Set(EmbeddingSettings { source: Set(Rest), model: NotSet, revision: NotSet, api_key: Set("My super secret"), dimensions: Set(384), binary_quantized: NotSet, document_template: NotSet, document_template_max_bytes: NotSet, url: Set("http://localhost:7777"), request: Set(String("{{text}}")), response: Set(String("{{embedding}}")), headers: NotSet, distribution: NotSet }), "B_small_hf": Set(EmbeddingSettings { source: Set(HuggingFace), model: Set("sentence-transformers/all-MiniLM-L6-v2"), revision: Set("e4ce9877abf3edfe10b0d82785e83bdcb973e22e"), api_key: NotSet, dimensions: NotSet, binary_quantized: NotSet, document_template: Set("{{doc.doggo}} the {{doc.breed}} best doggo"), document_template_max_bytes: NotSet, url: NotSet, request: NotSet, response: NotSet, headers: NotSet, distribution: NotSet })}), search_cutoff_ms: NotSet, localized_attributes: NotSet, facet_search: NotSet, prefix_search: NotSet, _kind: PhantomData<meilisearch_types::settings::Unchecked> }, is_deletion: false, allow_index_creation: true }} 0 {uid: 0, batch_uid: 0, status: succeeded, details: { settings: Settings { displayed_attributes: WildcardSetting(NotSet), searchable_attributes: WildcardSetting(NotSet), filterable_attributes: NotSet, sortable_attributes: NotSet, ranking_rules: NotSet, stop_words: NotSet, non_separator_tokens: NotSet, separator_tokens: NotSet, dictionary: NotSet, synonyms: NotSet, distinct_attribute: NotSet, proximity_precision: NotSet, typo_tolerance: NotSet, faceting: NotSet, pagination: NotSet, embedders: Set({"A_fakerest": Set(EmbeddingSettings { source: Set(Rest), model: NotSet, revision: NotSet, pooling: NotSet, api_key: Set("My super secret"), dimensions: Set(384), binary_quantized: NotSet, document_template: NotSet, document_template_max_bytes: NotSet, url: Set("http://localhost:7777"), request: Set(String("{{text}}")), response: Set(String("{{embedding}}")), headers: NotSet, distribution: NotSet }), "B_small_hf": Set(EmbeddingSettings { source: Set(HuggingFace), model: Set("sentence-transformers/all-MiniLM-L6-v2"), revision: Set("e4ce9877abf3edfe10b0d82785e83bdcb973e22e"), pooling: NotSet, api_key: NotSet, dimensions: NotSet, binary_quantized: NotSet, document_template: Set("{{doc.doggo}} the {{doc.breed}} best doggo"), document_template_max_bytes: NotSet, url: NotSet, request: NotSet, response: NotSet, headers: NotSet, distribution: NotSet })}), search_cutoff_ms: NotSet, localized_attributes: NotSet, facet_search: NotSet, prefix_search: NotSet, _kind: PhantomData<meilisearch_types::settings::Unchecked> } }, kind: SettingsUpdate { index_uid: "doggos", new_settings: Settings { displayed_attributes: WildcardSetting(NotSet), searchable_attributes: WildcardSetting(NotSet), filterable_attributes: NotSet, sortable_attributes: NotSet, ranking_rules: NotSet, stop_words: NotSet, non_separator_tokens: NotSet, separator_tokens: NotSet, dictionary: NotSet, synonyms: NotSet, distinct_attribute: NotSet, proximity_precision: NotSet, typo_tolerance: NotSet, faceting: NotSet, pagination: NotSet, embedders: Set({"A_fakerest": Set(EmbeddingSettings { source: Set(Rest), model: NotSet, revision: NotSet, pooling: NotSet, api_key: Set("My super secret"), dimensions: Set(384), binary_quantized: NotSet, document_template: NotSet, document_template_max_bytes: NotSet, url: Set("http://localhost:7777"), request: Set(String("{{text}}")), response: Set(String("{{embedding}}")), headers: NotSet, distribution: NotSet }), "B_small_hf": Set(EmbeddingSettings { source: Set(HuggingFace), model: Set("sentence-transformers/all-MiniLM-L6-v2"), revision: Set("e4ce9877abf3edfe10b0d82785e83bdcb973e22e"), pooling: NotSet, api_key: NotSet, dimensions: NotSet, binary_quantized: NotSet, document_template: Set("{{doc.doggo}} the {{doc.breed}} best doggo"), document_template_max_bytes: NotSet, url: NotSet, request: NotSet, response: NotSet, headers: NotSet, distribution: NotSet })}), search_cutoff_ms: NotSet, localized_attributes: NotSet, facet_search: NotSet, prefix_search: NotSet, _kind: PhantomData<meilisearch_types::settings::Unchecked> }, is_deletion: false, allow_index_creation: true }}
1 {uid: 1, status: enqueued, details: { received_documents: 1, indexed_documents: None }, kind: DocumentAdditionOrUpdate { index_uid: "doggos", primary_key: Some("id"), method: UpdateDocuments, content_file: 00000000-0000-0000-0000-000000000000, documents_count: 1, allow_index_creation: true }} 1 {uid: 1, status: enqueued, details: { received_documents: 1, indexed_documents: None }, kind: DocumentAdditionOrUpdate { index_uid: "doggos", primary_key: Some("id"), method: UpdateDocuments, content_file: 00000000-0000-0000-0000-000000000000, documents_count: 1, allow_index_creation: true }}
---------------------------------------------------------------------- ----------------------------------------------------------------------
### Status: ### Status:

View File

@ -1,13 +1,12 @@
--- ---
source: crates/index-scheduler/src/scheduler/test_embedders.rs source: crates/index-scheduler/src/scheduler/test_embedders.rs
snapshot_kind: text
--- ---
### Autobatching Enabled = true ### Autobatching Enabled = true
### Processing batch None: ### Processing batch None:
[] []
---------------------------------------------------------------------- ----------------------------------------------------------------------
### All Tasks: ### All Tasks:
0 {uid: 0, status: enqueued, details: { settings: Settings { displayed_attributes: WildcardSetting(NotSet), searchable_attributes: WildcardSetting(NotSet), filterable_attributes: NotSet, sortable_attributes: NotSet, ranking_rules: NotSet, stop_words: NotSet, non_separator_tokens: NotSet, separator_tokens: NotSet, dictionary: NotSet, synonyms: NotSet, distinct_attribute: NotSet, proximity_precision: NotSet, typo_tolerance: NotSet, faceting: NotSet, pagination: NotSet, embedders: Set({"A_fakerest": Set(EmbeddingSettings { source: Set(Rest), model: NotSet, revision: NotSet, api_key: Set("My super secret"), dimensions: Set(384), binary_quantized: NotSet, document_template: NotSet, document_template_max_bytes: NotSet, url: Set("http://localhost:7777"), request: Set(String("{{text}}")), response: Set(String("{{embedding}}")), headers: NotSet, distribution: NotSet }), "B_small_hf": Set(EmbeddingSettings { source: Set(HuggingFace), model: Set("sentence-transformers/all-MiniLM-L6-v2"), revision: Set("e4ce9877abf3edfe10b0d82785e83bdcb973e22e"), api_key: NotSet, dimensions: NotSet, binary_quantized: NotSet, document_template: Set("{{doc.doggo}} the {{doc.breed}} best doggo"), document_template_max_bytes: NotSet, url: NotSet, request: NotSet, response: NotSet, headers: NotSet, distribution: NotSet })}), search_cutoff_ms: NotSet, localized_attributes: NotSet, facet_search: NotSet, prefix_search: NotSet, _kind: PhantomData<meilisearch_types::settings::Unchecked> } }, kind: SettingsUpdate { index_uid: "doggos", new_settings: Settings { displayed_attributes: WildcardSetting(NotSet), searchable_attributes: WildcardSetting(NotSet), filterable_attributes: NotSet, sortable_attributes: NotSet, ranking_rules: NotSet, stop_words: NotSet, non_separator_tokens: NotSet, separator_tokens: NotSet, dictionary: NotSet, synonyms: NotSet, distinct_attribute: NotSet, proximity_precision: NotSet, typo_tolerance: NotSet, faceting: NotSet, pagination: NotSet, embedders: Set({"A_fakerest": Set(EmbeddingSettings { source: Set(Rest), model: NotSet, revision: NotSet, api_key: Set("My super secret"), dimensions: Set(384), binary_quantized: NotSet, document_template: NotSet, document_template_max_bytes: NotSet, url: Set("http://localhost:7777"), request: Set(String("{{text}}")), response: Set(String("{{embedding}}")), headers: NotSet, distribution: NotSet }), "B_small_hf": Set(EmbeddingSettings { source: Set(HuggingFace), model: Set("sentence-transformers/all-MiniLM-L6-v2"), revision: Set("e4ce9877abf3edfe10b0d82785e83bdcb973e22e"), api_key: NotSet, dimensions: NotSet, binary_quantized: NotSet, document_template: Set("{{doc.doggo}} the {{doc.breed}} best doggo"), document_template_max_bytes: NotSet, url: NotSet, request: NotSet, response: NotSet, headers: NotSet, distribution: NotSet })}), search_cutoff_ms: NotSet, localized_attributes: NotSet, facet_search: NotSet, prefix_search: NotSet, _kind: PhantomData<meilisearch_types::settings::Unchecked> }, is_deletion: false, allow_index_creation: true }} 0 {uid: 0, status: enqueued, details: { settings: Settings { displayed_attributes: WildcardSetting(NotSet), searchable_attributes: WildcardSetting(NotSet), filterable_attributes: NotSet, sortable_attributes: NotSet, ranking_rules: NotSet, stop_words: NotSet, non_separator_tokens: NotSet, separator_tokens: NotSet, dictionary: NotSet, synonyms: NotSet, distinct_attribute: NotSet, proximity_precision: NotSet, typo_tolerance: NotSet, faceting: NotSet, pagination: NotSet, embedders: Set({"A_fakerest": Set(EmbeddingSettings { source: Set(Rest), model: NotSet, revision: NotSet, pooling: NotSet, api_key: Set("My super secret"), dimensions: Set(384), binary_quantized: NotSet, document_template: NotSet, document_template_max_bytes: NotSet, url: Set("http://localhost:7777"), request: Set(String("{{text}}")), response: Set(String("{{embedding}}")), headers: NotSet, distribution: NotSet }), "B_small_hf": Set(EmbeddingSettings { source: Set(HuggingFace), model: Set("sentence-transformers/all-MiniLM-L6-v2"), revision: Set("e4ce9877abf3edfe10b0d82785e83bdcb973e22e"), pooling: NotSet, api_key: NotSet, dimensions: NotSet, binary_quantized: NotSet, document_template: Set("{{doc.doggo}} the {{doc.breed}} best doggo"), document_template_max_bytes: NotSet, url: NotSet, request: NotSet, response: NotSet, headers: NotSet, distribution: NotSet })}), search_cutoff_ms: NotSet, localized_attributes: NotSet, facet_search: NotSet, prefix_search: NotSet, _kind: PhantomData<meilisearch_types::settings::Unchecked> } }, kind: SettingsUpdate { index_uid: "doggos", new_settings: Settings { displayed_attributes: WildcardSetting(NotSet), searchable_attributes: WildcardSetting(NotSet), filterable_attributes: NotSet, sortable_attributes: NotSet, ranking_rules: NotSet, stop_words: NotSet, non_separator_tokens: NotSet, separator_tokens: NotSet, dictionary: NotSet, synonyms: NotSet, distinct_attribute: NotSet, proximity_precision: NotSet, typo_tolerance: NotSet, faceting: NotSet, pagination: NotSet, embedders: Set({"A_fakerest": Set(EmbeddingSettings { source: Set(Rest), model: NotSet, revision: NotSet, pooling: NotSet, api_key: Set("My super secret"), dimensions: Set(384), binary_quantized: NotSet, document_template: NotSet, document_template_max_bytes: NotSet, url: Set("http://localhost:7777"), request: Set(String("{{text}}")), response: Set(String("{{embedding}}")), headers: NotSet, distribution: NotSet }), "B_small_hf": Set(EmbeddingSettings { source: Set(HuggingFace), model: Set("sentence-transformers/all-MiniLM-L6-v2"), revision: Set("e4ce9877abf3edfe10b0d82785e83bdcb973e22e"), pooling: NotSet, api_key: NotSet, dimensions: NotSet, binary_quantized: NotSet, document_template: Set("{{doc.doggo}} the {{doc.breed}} best doggo"), document_template_max_bytes: NotSet, url: NotSet, request: NotSet, response: NotSet, headers: NotSet, distribution: NotSet })}), search_cutoff_ms: NotSet, localized_attributes: NotSet, facet_search: NotSet, prefix_search: NotSet, _kind: PhantomData<meilisearch_types::settings::Unchecked> }, is_deletion: false, allow_index_creation: true }}
---------------------------------------------------------------------- ----------------------------------------------------------------------
### Status: ### Status:
enqueued [0,] enqueued [0,]

View File

@ -1,13 +1,12 @@
--- ---
source: crates/index-scheduler/src/scheduler/test_embedders.rs source: crates/index-scheduler/src/scheduler/test_embedders.rs
snapshot_kind: text
--- ---
### Autobatching Enabled = true ### Autobatching Enabled = true
### Processing batch None: ### Processing batch None:
[] []
---------------------------------------------------------------------- ----------------------------------------------------------------------
### All Tasks: ### All Tasks:
0 {uid: 0, batch_uid: 0, status: succeeded, details: { settings: Settings { displayed_attributes: WildcardSetting(NotSet), searchable_attributes: WildcardSetting(NotSet), filterable_attributes: NotSet, sortable_attributes: NotSet, ranking_rules: NotSet, stop_words: NotSet, non_separator_tokens: NotSet, separator_tokens: NotSet, dictionary: NotSet, synonyms: NotSet, distinct_attribute: NotSet, proximity_precision: NotSet, typo_tolerance: NotSet, faceting: NotSet, pagination: NotSet, embedders: Set({"A_fakerest": Set(EmbeddingSettings { source: Set(Rest), model: NotSet, revision: NotSet, api_key: Set("My super secret"), dimensions: Set(384), binary_quantized: NotSet, document_template: NotSet, document_template_max_bytes: NotSet, url: Set("http://localhost:7777"), request: Set(String("{{text}}")), response: Set(String("{{embedding}}")), headers: NotSet, distribution: NotSet }), "B_small_hf": Set(EmbeddingSettings { source: Set(HuggingFace), model: Set("sentence-transformers/all-MiniLM-L6-v2"), revision: Set("e4ce9877abf3edfe10b0d82785e83bdcb973e22e"), api_key: NotSet, dimensions: NotSet, binary_quantized: NotSet, document_template: Set("{{doc.doggo}} the {{doc.breed}} best doggo"), document_template_max_bytes: NotSet, url: NotSet, request: NotSet, response: NotSet, headers: NotSet, distribution: NotSet })}), search_cutoff_ms: NotSet, localized_attributes: NotSet, facet_search: NotSet, prefix_search: NotSet, _kind: PhantomData<meilisearch_types::settings::Unchecked> } }, kind: SettingsUpdate { index_uid: "doggos", new_settings: Settings { displayed_attributes: WildcardSetting(NotSet), searchable_attributes: WildcardSetting(NotSet), filterable_attributes: NotSet, sortable_attributes: NotSet, ranking_rules: NotSet, stop_words: NotSet, non_separator_tokens: NotSet, separator_tokens: NotSet, dictionary: NotSet, synonyms: NotSet, distinct_attribute: NotSet, proximity_precision: NotSet, typo_tolerance: NotSet, faceting: NotSet, pagination: NotSet, embedders: Set({"A_fakerest": Set(EmbeddingSettings { source: Set(Rest), model: NotSet, revision: NotSet, api_key: Set("My super secret"), dimensions: Set(384), binary_quantized: NotSet, document_template: NotSet, document_template_max_bytes: NotSet, url: Set("http://localhost:7777"), request: Set(String("{{text}}")), response: Set(String("{{embedding}}")), headers: NotSet, distribution: NotSet }), "B_small_hf": Set(EmbeddingSettings { source: Set(HuggingFace), model: Set("sentence-transformers/all-MiniLM-L6-v2"), revision: Set("e4ce9877abf3edfe10b0d82785e83bdcb973e22e"), api_key: NotSet, dimensions: NotSet, binary_quantized: NotSet, document_template: Set("{{doc.doggo}} the {{doc.breed}} best doggo"), document_template_max_bytes: NotSet, url: NotSet, request: NotSet, response: NotSet, headers: NotSet, distribution: NotSet })}), search_cutoff_ms: NotSet, localized_attributes: NotSet, facet_search: NotSet, prefix_search: NotSet, _kind: PhantomData<meilisearch_types::settings::Unchecked> }, is_deletion: false, allow_index_creation: true }} 0 {uid: 0, batch_uid: 0, status: succeeded, details: { settings: Settings { displayed_attributes: WildcardSetting(NotSet), searchable_attributes: WildcardSetting(NotSet), filterable_attributes: NotSet, sortable_attributes: NotSet, ranking_rules: NotSet, stop_words: NotSet, non_separator_tokens: NotSet, separator_tokens: NotSet, dictionary: NotSet, synonyms: NotSet, distinct_attribute: NotSet, proximity_precision: NotSet, typo_tolerance: NotSet, faceting: NotSet, pagination: NotSet, embedders: Set({"A_fakerest": Set(EmbeddingSettings { source: Set(Rest), model: NotSet, revision: NotSet, pooling: NotSet, api_key: Set("My super secret"), dimensions: Set(384), binary_quantized: NotSet, document_template: NotSet, document_template_max_bytes: NotSet, url: Set("http://localhost:7777"), request: Set(String("{{text}}")), response: Set(String("{{embedding}}")), headers: NotSet, distribution: NotSet }), "B_small_hf": Set(EmbeddingSettings { source: Set(HuggingFace), model: Set("sentence-transformers/all-MiniLM-L6-v2"), revision: Set("e4ce9877abf3edfe10b0d82785e83bdcb973e22e"), pooling: NotSet, api_key: NotSet, dimensions: NotSet, binary_quantized: NotSet, document_template: Set("{{doc.doggo}} the {{doc.breed}} best doggo"), document_template_max_bytes: NotSet, url: NotSet, request: NotSet, response: NotSet, headers: NotSet, distribution: NotSet })}), search_cutoff_ms: NotSet, localized_attributes: NotSet, facet_search: NotSet, prefix_search: NotSet, _kind: PhantomData<meilisearch_types::settings::Unchecked> } }, kind: SettingsUpdate { index_uid: "doggos", new_settings: Settings { displayed_attributes: WildcardSetting(NotSet), searchable_attributes: WildcardSetting(NotSet), filterable_attributes: NotSet, sortable_attributes: NotSet, ranking_rules: NotSet, stop_words: NotSet, non_separator_tokens: NotSet, separator_tokens: NotSet, dictionary: NotSet, synonyms: NotSet, distinct_attribute: NotSet, proximity_precision: NotSet, typo_tolerance: NotSet, faceting: NotSet, pagination: NotSet, embedders: Set({"A_fakerest": Set(EmbeddingSettings { source: Set(Rest), model: NotSet, revision: NotSet, pooling: NotSet, api_key: Set("My super secret"), dimensions: Set(384), binary_quantized: NotSet, document_template: NotSet, document_template_max_bytes: NotSet, url: Set("http://localhost:7777"), request: Set(String("{{text}}")), response: Set(String("{{embedding}}")), headers: NotSet, distribution: NotSet }), "B_small_hf": Set(EmbeddingSettings { source: Set(HuggingFace), model: Set("sentence-transformers/all-MiniLM-L6-v2"), revision: Set("e4ce9877abf3edfe10b0d82785e83bdcb973e22e"), pooling: NotSet, api_key: NotSet, dimensions: NotSet, binary_quantized: NotSet, document_template: Set("{{doc.doggo}} the {{doc.breed}} best doggo"), document_template_max_bytes: NotSet, url: NotSet, request: NotSet, response: NotSet, headers: NotSet, distribution: NotSet })}), search_cutoff_ms: NotSet, localized_attributes: NotSet, facet_search: NotSet, prefix_search: NotSet, _kind: PhantomData<meilisearch_types::settings::Unchecked> }, is_deletion: false, allow_index_creation: true }}
---------------------------------------------------------------------- ----------------------------------------------------------------------
### Status: ### Status:
enqueued [] enqueued []

View File

@ -1,13 +1,12 @@
--- ---
source: crates/index-scheduler/src/scheduler/test_failure.rs source: crates/index-scheduler/src/scheduler/test_failure.rs
snapshot_kind: text
--- ---
### Autobatching Enabled = true ### Autobatching Enabled = true
### Processing batch None: ### Processing batch None:
[] []
---------------------------------------------------------------------- ----------------------------------------------------------------------
### All Tasks: ### All Tasks:
0 {uid: 0, batch_uid: 0, status: succeeded, details: { from: (1, 12, 0), to: (1, 13, 0) }, kind: UpgradeDatabase { from: (1, 12, 0) }} 0 {uid: 0, batch_uid: 0, status: succeeded, details: { from: (1, 12, 0), to: (1, 13, 1) }, kind: UpgradeDatabase { from: (1, 12, 0) }}
1 {uid: 1, batch_uid: 1, status: succeeded, details: { primary_key: Some("mouse") }, kind: IndexCreation { index_uid: "catto", primary_key: Some("mouse") }} 1 {uid: 1, batch_uid: 1, status: succeeded, details: { primary_key: Some("mouse") }, kind: IndexCreation { index_uid: "catto", primary_key: Some("mouse") }}
2 {uid: 2, batch_uid: 2, status: succeeded, details: { primary_key: Some("bone") }, kind: IndexCreation { index_uid: "doggo", primary_key: Some("bone") }} 2 {uid: 2, batch_uid: 2, status: succeeded, details: { primary_key: Some("bone") }, kind: IndexCreation { index_uid: "doggo", primary_key: Some("bone") }}
3 {uid: 3, batch_uid: 3, status: failed, error: ResponseError { code: 200, message: "Index `doggo` already exists.", error_code: "index_already_exists", error_type: "invalid_request", error_link: "https://docs.meilisearch.com/errors#index_already_exists" }, details: { primary_key: Some("bone") }, kind: IndexCreation { index_uid: "doggo", primary_key: Some("bone") }} 3 {uid: 3, batch_uid: 3, status: failed, error: ResponseError { code: 200, message: "Index `doggo` already exists.", error_code: "index_already_exists", error_type: "invalid_request", error_link: "https://docs.meilisearch.com/errors#index_already_exists" }, details: { primary_key: Some("bone") }, kind: IndexCreation { index_uid: "doggo", primary_key: Some("bone") }}
@ -58,7 +57,7 @@ girafo: { number_of_documents: 0, field_distribution: {} }
[timestamp] [4,] [timestamp] [4,]
---------------------------------------------------------------------- ----------------------------------------------------------------------
### All Batches: ### All Batches:
0 {uid: 0, details: {"upgradeFrom":"v1.12.0","upgradeTo":"v1.13.0"}, stats: {"totalNbTasks":1,"status":{"succeeded":1},"types":{"upgradeDatabase":1},"indexUids":{}}, } 0 {uid: 0, details: {"upgradeFrom":"v1.12.0","upgradeTo":"v1.13.1"}, stats: {"totalNbTasks":1,"status":{"succeeded":1},"types":{"upgradeDatabase":1},"indexUids":{}}, }
1 {uid: 1, details: {"primaryKey":"mouse"}, stats: {"totalNbTasks":1,"status":{"succeeded":1},"types":{"indexCreation":1},"indexUids":{"catto":1}}, } 1 {uid: 1, details: {"primaryKey":"mouse"}, stats: {"totalNbTasks":1,"status":{"succeeded":1},"types":{"indexCreation":1},"indexUids":{"catto":1}}, }
2 {uid: 2, details: {"primaryKey":"bone"}, stats: {"totalNbTasks":1,"status":{"succeeded":1},"types":{"indexCreation":1},"indexUids":{"doggo":1}}, } 2 {uid: 2, details: {"primaryKey":"bone"}, stats: {"totalNbTasks":1,"status":{"succeeded":1},"types":{"indexCreation":1},"indexUids":{"doggo":1}}, }
3 {uid: 3, details: {"primaryKey":"bone"}, stats: {"totalNbTasks":1,"status":{"failed":1},"types":{"indexCreation":1},"indexUids":{"doggo":1}}, } 3 {uid: 3, details: {"primaryKey":"bone"}, stats: {"totalNbTasks":1,"status":{"failed":1},"types":{"indexCreation":1},"indexUids":{"doggo":1}}, }

View File

@ -7,7 +7,7 @@ snapshot_kind: text
[] []
---------------------------------------------------------------------- ----------------------------------------------------------------------
### All Tasks: ### All Tasks:
0 {uid: 0, status: enqueued, details: { from: (1, 12, 0), to: (1, 13, 0) }, kind: UpgradeDatabase { from: (1, 12, 0) }} 0 {uid: 0, status: enqueued, details: { from: (1, 12, 0), to: (1, 13, 1) }, kind: UpgradeDatabase { from: (1, 12, 0) }}
---------------------------------------------------------------------- ----------------------------------------------------------------------
### Status: ### Status:
enqueued [0,] enqueued [0,]

View File

@ -7,7 +7,7 @@ snapshot_kind: text
[] []
---------------------------------------------------------------------- ----------------------------------------------------------------------
### All Tasks: ### All Tasks:
0 {uid: 0, status: enqueued, details: { from: (1, 12, 0), to: (1, 13, 0) }, kind: UpgradeDatabase { from: (1, 12, 0) }} 0 {uid: 0, status: enqueued, details: { from: (1, 12, 0), to: (1, 13, 1) }, kind: UpgradeDatabase { from: (1, 12, 0) }}
1 {uid: 1, status: enqueued, details: { primary_key: Some("mouse") }, kind: IndexCreation { index_uid: "catto", primary_key: Some("mouse") }} 1 {uid: 1, status: enqueued, details: { primary_key: Some("mouse") }, kind: IndexCreation { index_uid: "catto", primary_key: Some("mouse") }}
---------------------------------------------------------------------- ----------------------------------------------------------------------
### Status: ### Status:

View File

@ -1,13 +1,12 @@
--- ---
source: crates/index-scheduler/src/scheduler/test_failure.rs source: crates/index-scheduler/src/scheduler/test_failure.rs
snapshot_kind: text
--- ---
### Autobatching Enabled = true ### Autobatching Enabled = true
### Processing batch None: ### Processing batch None:
[] []
---------------------------------------------------------------------- ----------------------------------------------------------------------
### All Tasks: ### All Tasks:
0 {uid: 0, batch_uid: 0, status: failed, error: ResponseError { code: 200, message: "Planned failure for tests.", error_code: "internal", error_type: "internal", error_link: "https://docs.meilisearch.com/errors#internal" }, details: { from: (1, 12, 0), to: (1, 13, 0) }, kind: UpgradeDatabase { from: (1, 12, 0) }} 0 {uid: 0, batch_uid: 0, status: failed, error: ResponseError { code: 200, message: "Planned failure for tests.", error_code: "internal", error_type: "internal", error_link: "https://docs.meilisearch.com/errors#internal" }, details: { from: (1, 12, 0), to: (1, 13, 1) }, kind: UpgradeDatabase { from: (1, 12, 0) }}
1 {uid: 1, status: enqueued, details: { primary_key: Some("mouse") }, kind: IndexCreation { index_uid: "catto", primary_key: Some("mouse") }} 1 {uid: 1, status: enqueued, details: { primary_key: Some("mouse") }, kind: IndexCreation { index_uid: "catto", primary_key: Some("mouse") }}
---------------------------------------------------------------------- ----------------------------------------------------------------------
### Status: ### Status:
@ -38,7 +37,7 @@ catto [1,]
[timestamp] [0,] [timestamp] [0,]
---------------------------------------------------------------------- ----------------------------------------------------------------------
### All Batches: ### All Batches:
0 {uid: 0, details: {"upgradeFrom":"v1.12.0","upgradeTo":"v1.13.0"}, stats: {"totalNbTasks":1,"status":{"failed":1},"types":{"upgradeDatabase":1},"indexUids":{}}, } 0 {uid: 0, details: {"upgradeFrom":"v1.12.0","upgradeTo":"v1.13.1"}, stats: {"totalNbTasks":1,"status":{"failed":1},"types":{"upgradeDatabase":1},"indexUids":{}}, }
---------------------------------------------------------------------- ----------------------------------------------------------------------
### Batch to tasks mapping: ### Batch to tasks mapping:
0 [0,] 0 [0,]

View File

@ -1,13 +1,12 @@
--- ---
source: crates/index-scheduler/src/scheduler/test_failure.rs source: crates/index-scheduler/src/scheduler/test_failure.rs
snapshot_kind: text
--- ---
### Autobatching Enabled = true ### Autobatching Enabled = true
### Processing batch None: ### Processing batch None:
[] []
---------------------------------------------------------------------- ----------------------------------------------------------------------
### All Tasks: ### All Tasks:
0 {uid: 0, batch_uid: 0, status: failed, error: ResponseError { code: 200, message: "Planned failure for tests.", error_code: "internal", error_type: "internal", error_link: "https://docs.meilisearch.com/errors#internal" }, details: { from: (1, 12, 0), to: (1, 13, 0) }, kind: UpgradeDatabase { from: (1, 12, 0) }} 0 {uid: 0, batch_uid: 0, status: failed, error: ResponseError { code: 200, message: "Planned failure for tests.", error_code: "internal", error_type: "internal", error_link: "https://docs.meilisearch.com/errors#internal" }, details: { from: (1, 12, 0), to: (1, 13, 1) }, kind: UpgradeDatabase { from: (1, 12, 0) }}
1 {uid: 1, status: enqueued, details: { primary_key: Some("mouse") }, kind: IndexCreation { index_uid: "catto", primary_key: Some("mouse") }} 1 {uid: 1, status: enqueued, details: { primary_key: Some("mouse") }, kind: IndexCreation { index_uid: "catto", primary_key: Some("mouse") }}
2 {uid: 2, status: enqueued, details: { primary_key: Some("bone") }, kind: IndexCreation { index_uid: "doggo", primary_key: Some("bone") }} 2 {uid: 2, status: enqueued, details: { primary_key: Some("bone") }, kind: IndexCreation { index_uid: "doggo", primary_key: Some("bone") }}
---------------------------------------------------------------------- ----------------------------------------------------------------------
@ -41,7 +40,7 @@ doggo [2,]
[timestamp] [0,] [timestamp] [0,]
---------------------------------------------------------------------- ----------------------------------------------------------------------
### All Batches: ### All Batches:
0 {uid: 0, details: {"upgradeFrom":"v1.12.0","upgradeTo":"v1.13.0"}, stats: {"totalNbTasks":1,"status":{"failed":1},"types":{"upgradeDatabase":1},"indexUids":{}}, } 0 {uid: 0, details: {"upgradeFrom":"v1.12.0","upgradeTo":"v1.13.1"}, stats: {"totalNbTasks":1,"status":{"failed":1},"types":{"upgradeDatabase":1},"indexUids":{}}, }
---------------------------------------------------------------------- ----------------------------------------------------------------------
### Batch to tasks mapping: ### Batch to tasks mapping:
0 [0,] 0 [0,]

View File

@ -1,13 +1,12 @@
--- ---
source: crates/index-scheduler/src/scheduler/test_failure.rs source: crates/index-scheduler/src/scheduler/test_failure.rs
snapshot_kind: text
--- ---
### Autobatching Enabled = true ### Autobatching Enabled = true
### Processing batch None: ### Processing batch None:
[] []
---------------------------------------------------------------------- ----------------------------------------------------------------------
### All Tasks: ### All Tasks:
0 {uid: 0, batch_uid: 0, status: succeeded, details: { from: (1, 12, 0), to: (1, 13, 0) }, kind: UpgradeDatabase { from: (1, 12, 0) }} 0 {uid: 0, batch_uid: 0, status: succeeded, details: { from: (1, 12, 0), to: (1, 13, 1) }, kind: UpgradeDatabase { from: (1, 12, 0) }}
1 {uid: 1, status: enqueued, details: { primary_key: Some("mouse") }, kind: IndexCreation { index_uid: "catto", primary_key: Some("mouse") }} 1 {uid: 1, status: enqueued, details: { primary_key: Some("mouse") }, kind: IndexCreation { index_uid: "catto", primary_key: Some("mouse") }}
2 {uid: 2, status: enqueued, details: { primary_key: Some("bone") }, kind: IndexCreation { index_uid: "doggo", primary_key: Some("bone") }} 2 {uid: 2, status: enqueued, details: { primary_key: Some("bone") }, kind: IndexCreation { index_uid: "doggo", primary_key: Some("bone") }}
3 {uid: 3, status: enqueued, details: { primary_key: Some("bone") }, kind: IndexCreation { index_uid: "doggo", primary_key: Some("bone") }} 3 {uid: 3, status: enqueued, details: { primary_key: Some("bone") }, kind: IndexCreation { index_uid: "doggo", primary_key: Some("bone") }}
@ -44,7 +43,7 @@ doggo [2,3,]
[timestamp] [0,] [timestamp] [0,]
---------------------------------------------------------------------- ----------------------------------------------------------------------
### All Batches: ### All Batches:
0 {uid: 0, details: {"upgradeFrom":"v1.12.0","upgradeTo":"v1.13.0"}, stats: {"totalNbTasks":1,"status":{"succeeded":1},"types":{"upgradeDatabase":1},"indexUids":{}}, } 0 {uid: 0, details: {"upgradeFrom":"v1.12.0","upgradeTo":"v1.13.1"}, stats: {"totalNbTasks":1,"status":{"succeeded":1},"types":{"upgradeDatabase":1},"indexUids":{}}, }
---------------------------------------------------------------------- ----------------------------------------------------------------------
### Batch to tasks mapping: ### Batch to tasks mapping:
0 [0,] 0 [0,]

View File

@ -910,7 +910,11 @@ fn create_and_list_index() {
[ [
"kefir", "kefir",
{ {
"number_of_documents": 0, "documents_database_stats": {
"numberOfEntries": 0,
"totalKeySize": 0,
"totalValueSize": 0
},
"database_size": "[bytes]", "database_size": "[bytes]",
"number_of_embeddings": 0, "number_of_embeddings": 0,
"number_of_embedded_documents": 0, "number_of_embedded_documents": 0,

View File

@ -404,31 +404,32 @@ fn import_vectors_first_and_embedder_later() {
// even though we specified the vector for the ID 3, it shouldn't be marked // even though we specified the vector for the ID 3, it shouldn't be marked
// as user provided since we explicitely marked it as NOT user provided. // as user provided since we explicitely marked it as NOT user provided.
snapshot!(format!("{conf:#?}"), @r###" snapshot!(format!("{conf:#?}"), @r###"
[ [
IndexEmbeddingConfig { IndexEmbeddingConfig {
name: "my_doggo_embedder", name: "my_doggo_embedder",
config: EmbeddingConfig { config: EmbeddingConfig {
embedder_options: HuggingFace( embedder_options: HuggingFace(
EmbedderOptions { EmbedderOptions {
model: "sentence-transformers/all-MiniLM-L6-v2", model: "sentence-transformers/all-MiniLM-L6-v2",
revision: Some( revision: Some(
"e4ce9877abf3edfe10b0d82785e83bdcb973e22e", "e4ce9877abf3edfe10b0d82785e83bdcb973e22e",
),
distribution: None,
},
),
prompt: PromptData {
template: "{{doc.doggo}}",
max_bytes: Some(
400,
), ),
distribution: None,
pooling: UseModel,
}, },
quantized: None, ),
prompt: PromptData {
template: "{{doc.doggo}}",
max_bytes: Some(
400,
),
}, },
user_provided: RoaringBitmap<[1, 2]>, quantized: None,
}, },
] user_provided: RoaringBitmap<[1, 2]>,
"###); },
]
"###);
let docid = index.external_documents_ids.get(&rtxn, "0").unwrap().unwrap(); let docid = index.external_documents_ids.get(&rtxn, "0").unwrap().unwrap();
let embeddings = index.embeddings(&rtxn, docid).unwrap(); let embeddings = index.embeddings(&rtxn, docid).unwrap();
let embedding = &embeddings["my_doggo_embedder"]; let embedding = &embeddings["my_doggo_embedder"];

View File

@ -46,20 +46,19 @@ pub fn upgrade_index_scheduler(
} }
}; };
let mut current_version = from;
info!("Upgrading the task queue"); info!("Upgrading the task queue");
let mut local_from = from;
for upgrade in upgrade_functions[start..].iter() { for upgrade in upgrade_functions[start..].iter() {
let target = upgrade.target_version(); let target = upgrade.target_version();
info!( info!(
"Upgrading from v{}.{}.{} to v{}.{}.{}", "Upgrading from v{}.{}.{} to v{}.{}.{}",
from.0, from.1, from.2, current_version.0, current_version.1, current_version.2 local_from.0, local_from.1, local_from.2, target.0, target.1, target.2
); );
let mut wtxn = env.write_txn()?; let mut wtxn = env.write_txn()?;
upgrade.upgrade(env, &mut wtxn, from)?; upgrade.upgrade(env, &mut wtxn, local_from)?;
versioning.set_version(&mut wtxn, target)?; versioning.set_version(&mut wtxn, target)?;
wtxn.commit()?; wtxn.commit()?;
current_version = target; local_from = target;
} }
let mut wtxn = env.write_txn()?; let mut wtxn = env.write_txn()?;

View File

@ -170,5 +170,5 @@ german = ["meilisearch-types/german"]
turkish = ["meilisearch-types/turkish"] turkish = ["meilisearch-types/turkish"]
[package.metadata.mini-dashboard] [package.metadata.mini-dashboard]
assets-url = "https://github.com/meilisearch/mini-dashboard/releases/download/v0.2.16/build.zip" assets-url = "https://github.com/meilisearch/mini-dashboard/releases/download/v0.2.17/build.zip"
sha1 = "68f83438a114aabbe76bc9fe480071e741996662" sha1 = "29e92ce25f306208a9c86f013279c736bdc1e034"

View File

@ -494,6 +494,10 @@ pub async fn delete_index(
pub struct IndexStats { pub struct IndexStats {
/// Number of documents in the index /// Number of documents in the index
pub number_of_documents: u64, pub number_of_documents: u64,
/// Size of the documents database, in bytes.
pub raw_document_db_size: u64,
/// Average size of a document in the documents database.
pub avg_document_size: u64,
/// Whether or not the index is currently ingesting document /// Whether or not the index is currently ingesting document
pub is_indexing: bool, pub is_indexing: bool,
/// Number of embeddings in the index /// Number of embeddings in the index
@ -510,7 +514,9 @@ pub struct IndexStats {
impl From<index_scheduler::IndexStats> for IndexStats { impl From<index_scheduler::IndexStats> for IndexStats {
fn from(stats: index_scheduler::IndexStats) -> Self { fn from(stats: index_scheduler::IndexStats) -> Self {
IndexStats { IndexStats {
number_of_documents: stats.inner_stats.number_of_documents, number_of_documents: stats.inner_stats.documents_database_stats.number_of_entries(),
raw_document_db_size: stats.inner_stats.documents_database_stats.total_value_size(),
avg_document_size: stats.inner_stats.documents_database_stats.average_value_size(),
is_indexing: stats.is_indexing, is_indexing: stats.is_indexing,
number_of_embeddings: stats.inner_stats.number_of_embeddings, number_of_embeddings: stats.inner_stats.number_of_embeddings,
number_of_embedded_documents: stats.inner_stats.number_of_embedded_documents, number_of_embedded_documents: stats.inner_stats.number_of_embedded_documents,
@ -532,6 +538,8 @@ impl From<index_scheduler::IndexStats> for IndexStats {
(status = OK, description = "The stats of the index", body = IndexStats, content_type = "application/json", example = json!( (status = OK, description = "The stats of the index", body = IndexStats, content_type = "application/json", example = json!(
{ {
"numberOfDocuments": 10, "numberOfDocuments": 10,
"rawDocumentDbSize": 10,
"avgDocumentSize": 10,
"numberOfEmbeddings": 10, "numberOfEmbeddings": 10,
"numberOfEmbeddedDocuments": 10, "numberOfEmbeddedDocuments": 10,
"isIndexing": true, "isIndexing": true,

View File

@ -392,6 +392,9 @@ pub struct Stats {
"indexes": { "indexes": {
"movies": { "movies": {
"numberOfDocuments": 10, "numberOfDocuments": 10,
"rawDocumentDbSize": 100,
"maxDocumentSize": 16,
"avgDocumentSize": 10,
"isIndexing": true, "isIndexing": true,
"fieldDistribution": { "fieldDistribution": {
"genre": 10, "genre": 10,

View File

@ -160,6 +160,8 @@ async fn delete_document_by_filter() {
snapshot!(json_string!(stats), @r###" snapshot!(json_string!(stats), @r###"
{ {
"numberOfDocuments": 4, "numberOfDocuments": 4,
"rawDocumentDbSize": 42,
"avgDocumentSize": 10,
"isIndexing": false, "isIndexing": false,
"numberOfEmbeddings": 0, "numberOfEmbeddings": 0,
"numberOfEmbeddedDocuments": 0, "numberOfEmbeddedDocuments": 0,
@ -209,6 +211,8 @@ async fn delete_document_by_filter() {
snapshot!(json_string!(stats), @r###" snapshot!(json_string!(stats), @r###"
{ {
"numberOfDocuments": 2, "numberOfDocuments": 2,
"rawDocumentDbSize": 16,
"avgDocumentSize": 8,
"isIndexing": false, "isIndexing": false,
"numberOfEmbeddings": 0, "numberOfEmbeddings": 0,
"numberOfEmbeddedDocuments": 0, "numberOfEmbeddedDocuments": 0,
@ -277,6 +281,8 @@ async fn delete_document_by_filter() {
snapshot!(json_string!(stats), @r###" snapshot!(json_string!(stats), @r###"
{ {
"numberOfDocuments": 1, "numberOfDocuments": 1,
"rawDocumentDbSize": 12,
"avgDocumentSize": 12,
"isIndexing": false, "isIndexing": false,
"numberOfEmbeddings": 0, "numberOfEmbeddings": 0,
"numberOfEmbeddedDocuments": 0, "numberOfEmbeddedDocuments": 0,

View File

@ -32,6 +32,8 @@ async fn import_dump_v1_movie_raw() {
@r###" @r###"
{ {
"numberOfDocuments": 53, "numberOfDocuments": 53,
"rawDocumentDbSize": 21965,
"avgDocumentSize": 414,
"isIndexing": false, "isIndexing": false,
"numberOfEmbeddings": 0, "numberOfEmbeddings": 0,
"numberOfEmbeddedDocuments": 0, "numberOfEmbeddedDocuments": 0,
@ -187,6 +189,8 @@ async fn import_dump_v1_movie_with_settings() {
@r###" @r###"
{ {
"numberOfDocuments": 53, "numberOfDocuments": 53,
"rawDocumentDbSize": 21965,
"avgDocumentSize": 414,
"isIndexing": false, "isIndexing": false,
"numberOfEmbeddings": 0, "numberOfEmbeddings": 0,
"numberOfEmbeddedDocuments": 0, "numberOfEmbeddedDocuments": 0,
@ -355,6 +359,8 @@ async fn import_dump_v1_rubygems_with_settings() {
@r###" @r###"
{ {
"numberOfDocuments": 53, "numberOfDocuments": 53,
"rawDocumentDbSize": 8606,
"avgDocumentSize": 162,
"isIndexing": false, "isIndexing": false,
"numberOfEmbeddings": 0, "numberOfEmbeddings": 0,
"numberOfEmbeddedDocuments": 0, "numberOfEmbeddedDocuments": 0,
@ -520,6 +526,8 @@ async fn import_dump_v2_movie_raw() {
@r###" @r###"
{ {
"numberOfDocuments": 53, "numberOfDocuments": 53,
"rawDocumentDbSize": 21965,
"avgDocumentSize": 414,
"isIndexing": false, "isIndexing": false,
"numberOfEmbeddings": 0, "numberOfEmbeddings": 0,
"numberOfEmbeddedDocuments": 0, "numberOfEmbeddedDocuments": 0,
@ -675,6 +683,8 @@ async fn import_dump_v2_movie_with_settings() {
@r###" @r###"
{ {
"numberOfDocuments": 53, "numberOfDocuments": 53,
"rawDocumentDbSize": 21965,
"avgDocumentSize": 414,
"isIndexing": false, "isIndexing": false,
"numberOfEmbeddings": 0, "numberOfEmbeddings": 0,
"numberOfEmbeddedDocuments": 0, "numberOfEmbeddedDocuments": 0,
@ -840,6 +850,8 @@ async fn import_dump_v2_rubygems_with_settings() {
@r###" @r###"
{ {
"numberOfDocuments": 53, "numberOfDocuments": 53,
"rawDocumentDbSize": 8606,
"avgDocumentSize": 162,
"isIndexing": false, "isIndexing": false,
"numberOfEmbeddings": 0, "numberOfEmbeddings": 0,
"numberOfEmbeddedDocuments": 0, "numberOfEmbeddedDocuments": 0,
@ -1002,6 +1014,8 @@ async fn import_dump_v3_movie_raw() {
@r###" @r###"
{ {
"numberOfDocuments": 53, "numberOfDocuments": 53,
"rawDocumentDbSize": 21965,
"avgDocumentSize": 414,
"isIndexing": false, "isIndexing": false,
"numberOfEmbeddings": 0, "numberOfEmbeddings": 0,
"numberOfEmbeddedDocuments": 0, "numberOfEmbeddedDocuments": 0,
@ -1157,6 +1171,8 @@ async fn import_dump_v3_movie_with_settings() {
@r###" @r###"
{ {
"numberOfDocuments": 53, "numberOfDocuments": 53,
"rawDocumentDbSize": 21965,
"avgDocumentSize": 414,
"isIndexing": false, "isIndexing": false,
"numberOfEmbeddings": 0, "numberOfEmbeddings": 0,
"numberOfEmbeddedDocuments": 0, "numberOfEmbeddedDocuments": 0,
@ -1322,6 +1338,8 @@ async fn import_dump_v3_rubygems_with_settings() {
@r###" @r###"
{ {
"numberOfDocuments": 53, "numberOfDocuments": 53,
"rawDocumentDbSize": 8606,
"avgDocumentSize": 162,
"isIndexing": false, "isIndexing": false,
"numberOfEmbeddings": 0, "numberOfEmbeddings": 0,
"numberOfEmbeddedDocuments": 0, "numberOfEmbeddedDocuments": 0,
@ -1484,6 +1502,8 @@ async fn import_dump_v4_movie_raw() {
@r###" @r###"
{ {
"numberOfDocuments": 53, "numberOfDocuments": 53,
"rawDocumentDbSize": 21965,
"avgDocumentSize": 414,
"isIndexing": false, "isIndexing": false,
"numberOfEmbeddings": 0, "numberOfEmbeddings": 0,
"numberOfEmbeddedDocuments": 0, "numberOfEmbeddedDocuments": 0,
@ -1639,6 +1659,8 @@ async fn import_dump_v4_movie_with_settings() {
@r###" @r###"
{ {
"numberOfDocuments": 53, "numberOfDocuments": 53,
"rawDocumentDbSize": 21965,
"avgDocumentSize": 414,
"isIndexing": false, "isIndexing": false,
"numberOfEmbeddings": 0, "numberOfEmbeddings": 0,
"numberOfEmbeddedDocuments": 0, "numberOfEmbeddedDocuments": 0,
@ -1804,6 +1826,8 @@ async fn import_dump_v4_rubygems_with_settings() {
@r###" @r###"
{ {
"numberOfDocuments": 53, "numberOfDocuments": 53,
"rawDocumentDbSize": 8606,
"avgDocumentSize": 162,
"isIndexing": false, "isIndexing": false,
"numberOfEmbeddings": 0, "numberOfEmbeddings": 0,
"numberOfEmbeddedDocuments": 0, "numberOfEmbeddedDocuments": 0,
@ -1973,6 +1997,8 @@ async fn import_dump_v5() {
snapshot!(json_string!(stats), @r###" snapshot!(json_string!(stats), @r###"
{ {
"numberOfDocuments": 10, "numberOfDocuments": 10,
"rawDocumentDbSize": 6782,
"avgDocumentSize": 678,
"isIndexing": false, "isIndexing": false,
"numberOfEmbeddings": 0, "numberOfEmbeddings": 0,
"numberOfEmbeddedDocuments": 0, "numberOfEmbeddedDocuments": 0,
@ -2009,6 +2035,8 @@ async fn import_dump_v5() {
@r###" @r###"
{ {
"numberOfDocuments": 10, "numberOfDocuments": 10,
"rawDocumentDbSize": 6782,
"avgDocumentSize": 678,
"isIndexing": false, "isIndexing": false,
"numberOfEmbeddings": 0, "numberOfEmbeddings": 0,
"numberOfEmbeddedDocuments": 0, "numberOfEmbeddedDocuments": 0,
@ -2386,6 +2414,7 @@ async fn generate_and_import_dump_containing_vectors() {
"source": "huggingFace", "source": "huggingFace",
"model": "sentence-transformers/all-MiniLM-L6-v2", "model": "sentence-transformers/all-MiniLM-L6-v2",
"revision": "e4ce9877abf3edfe10b0d82785e83bdcb973e22e", "revision": "e4ce9877abf3edfe10b0d82785e83bdcb973e22e",
"pooling": "useModel",
"documentTemplate": "{{doc.doggo}}", "documentTemplate": "{{doc.doggo}}",
"documentTemplateMaxBytes": 400 "documentTemplateMaxBytes": 400
} }

View File

@ -128,6 +128,40 @@ async fn search_with_stop_word() {
.await; .await;
} }
#[actix_rt::test]
async fn search_with_typo_settings() {
// related to https://github.com/meilisearch/meilisearch/issues/5240
let server = Server::new().await;
let index = server.index("test");
let (_, code) = index
.update_settings(json!({"typoTolerance": { "disableOnAttributes": ["title", "id"]}}))
.await;
meili_snap::snapshot!(code, @"202 Accepted");
let documents = DOCUMENTS.clone();
let (task, _status_code) = index.add_documents(documents, None).await;
index.wait_task(task.uid()).await.succeeded();
index
.search(json!({"q": "287947" }), |response, code| {
assert_eq!(code, 200, "{}", response);
snapshot!(json_string!(response["hits"]), @r###"
[
{
"title": "Shazam!",
"id": "287947",
"color": [
"green",
"blue"
]
}
]
"###);
})
.await;
}
#[actix_rt::test] #[actix_rt::test]
async fn phrase_search_with_stop_word() { async fn phrase_search_with_stop_word() {
// related to https://github.com/meilisearch/meilisearch/issues/3521 // related to https://github.com/meilisearch/meilisearch/issues/3521

View File

@ -113,6 +113,8 @@ async fn add_remove_embeddings() {
snapshot!(json_string!(stats), @r###" snapshot!(json_string!(stats), @r###"
{ {
"numberOfDocuments": 2, "numberOfDocuments": 2,
"rawDocumentDbSize": 27,
"avgDocumentSize": 13,
"isIndexing": false, "isIndexing": false,
"numberOfEmbeddings": 5, "numberOfEmbeddings": 5,
"numberOfEmbeddedDocuments": 2, "numberOfEmbeddedDocuments": 2,
@ -136,6 +138,8 @@ async fn add_remove_embeddings() {
snapshot!(json_string!(stats), @r###" snapshot!(json_string!(stats), @r###"
{ {
"numberOfDocuments": 2, "numberOfDocuments": 2,
"rawDocumentDbSize": 27,
"avgDocumentSize": 13,
"isIndexing": false, "isIndexing": false,
"numberOfEmbeddings": 3, "numberOfEmbeddings": 3,
"numberOfEmbeddedDocuments": 2, "numberOfEmbeddedDocuments": 2,
@ -159,6 +163,8 @@ async fn add_remove_embeddings() {
snapshot!(json_string!(stats), @r###" snapshot!(json_string!(stats), @r###"
{ {
"numberOfDocuments": 2, "numberOfDocuments": 2,
"rawDocumentDbSize": 27,
"avgDocumentSize": 13,
"isIndexing": false, "isIndexing": false,
"numberOfEmbeddings": 2, "numberOfEmbeddings": 2,
"numberOfEmbeddedDocuments": 2, "numberOfEmbeddedDocuments": 2,
@ -183,6 +189,8 @@ async fn add_remove_embeddings() {
snapshot!(json_string!(stats), @r###" snapshot!(json_string!(stats), @r###"
{ {
"numberOfDocuments": 2, "numberOfDocuments": 2,
"rawDocumentDbSize": 27,
"avgDocumentSize": 13,
"isIndexing": false, "isIndexing": false,
"numberOfEmbeddings": 2, "numberOfEmbeddings": 2,
"numberOfEmbeddedDocuments": 1, "numberOfEmbeddedDocuments": 1,
@ -231,6 +239,8 @@ async fn add_remove_embedded_documents() {
snapshot!(json_string!(stats), @r###" snapshot!(json_string!(stats), @r###"
{ {
"numberOfDocuments": 2, "numberOfDocuments": 2,
"rawDocumentDbSize": 27,
"avgDocumentSize": 13,
"isIndexing": false, "isIndexing": false,
"numberOfEmbeddings": 5, "numberOfEmbeddings": 5,
"numberOfEmbeddedDocuments": 2, "numberOfEmbeddedDocuments": 2,
@ -250,6 +260,8 @@ async fn add_remove_embedded_documents() {
snapshot!(json_string!(stats), @r###" snapshot!(json_string!(stats), @r###"
{ {
"numberOfDocuments": 1, "numberOfDocuments": 1,
"rawDocumentDbSize": 13,
"avgDocumentSize": 13,
"isIndexing": false, "isIndexing": false,
"numberOfEmbeddings": 3, "numberOfEmbeddings": 3,
"numberOfEmbeddedDocuments": 1, "numberOfEmbeddedDocuments": 1,
@ -281,6 +293,8 @@ async fn update_embedder_settings() {
snapshot!(json_string!(stats), @r###" snapshot!(json_string!(stats), @r###"
{ {
"numberOfDocuments": 2, "numberOfDocuments": 2,
"rawDocumentDbSize": 108,
"avgDocumentSize": 54,
"isIndexing": false, "isIndexing": false,
"numberOfEmbeddings": 0, "numberOfEmbeddings": 0,
"numberOfEmbeddedDocuments": 0, "numberOfEmbeddedDocuments": 0,
@ -315,6 +329,8 @@ async fn update_embedder_settings() {
snapshot!(json_string!(stats), @r###" snapshot!(json_string!(stats), @r###"
{ {
"numberOfDocuments": 2, "numberOfDocuments": 2,
"rawDocumentDbSize": 108,
"avgDocumentSize": 54,
"isIndexing": false, "isIndexing": false,
"numberOfEmbeddings": 3, "numberOfEmbeddings": 3,
"numberOfEmbeddedDocuments": 2, "numberOfEmbeddedDocuments": 2,

View File

@ -43,7 +43,7 @@ async fn version_too_old() {
std::fs::write(db_path.join("VERSION"), "1.11.9999").unwrap(); std::fs::write(db_path.join("VERSION"), "1.11.9999").unwrap();
let options = Opt { experimental_dumpless_upgrade: true, ..default_settings }; let options = Opt { experimental_dumpless_upgrade: true, ..default_settings };
let err = Server::new_with_options(options).await.map(|_| ()).unwrap_err(); let err = Server::new_with_options(options).await.map(|_| ()).unwrap_err();
snapshot!(err, @"Database version 1.11.9999 is too old for the experimental dumpless upgrade feature. Please generate a dump using the v1.11.9999 and import it in the v1.13.0"); snapshot!(err, @"Database version 1.11.9999 is too old for the experimental dumpless upgrade feature. Please generate a dump using the v1.11.9999 and import it in the v1.13.1");
} }
#[actix_rt::test] #[actix_rt::test]
@ -58,7 +58,7 @@ async fn version_requires_downgrade() {
std::fs::write(db_path.join("VERSION"), format!("{major}.{minor}.{patch}")).unwrap(); std::fs::write(db_path.join("VERSION"), format!("{major}.{minor}.{patch}")).unwrap();
let options = Opt { experimental_dumpless_upgrade: true, ..default_settings }; let options = Opt { experimental_dumpless_upgrade: true, ..default_settings };
let err = Server::new_with_options(options).await.map(|_| ()).unwrap_err(); let err = Server::new_with_options(options).await.map(|_| ()).unwrap_err();
snapshot!(err, @"Database version 1.13.1 is higher than the Meilisearch version 1.13.0. Downgrade is not supported"); snapshot!(err, @"Database version 1.13.2 is higher than the Meilisearch version 1.13.1. Downgrade is not supported");
} }
#[actix_rt::test] #[actix_rt::test]

View File

@ -8,7 +8,7 @@ source: crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs
"progress": null, "progress": null,
"details": { "details": {
"upgradeFrom": "v1.12.0", "upgradeFrom": "v1.12.0",
"upgradeTo": "v1.13.0" "upgradeTo": "v1.13.1"
}, },
"stats": { "stats": {
"totalNbTasks": 1, "totalNbTasks": 1,

View File

@ -8,7 +8,7 @@ source: crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs
"progress": null, "progress": null,
"details": { "details": {
"upgradeFrom": "v1.12.0", "upgradeFrom": "v1.12.0",
"upgradeTo": "v1.13.0" "upgradeTo": "v1.13.1"
}, },
"stats": { "stats": {
"totalNbTasks": 1, "totalNbTasks": 1,

View File

@ -8,7 +8,7 @@ source: crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs
"progress": null, "progress": null,
"details": { "details": {
"upgradeFrom": "v1.12.0", "upgradeFrom": "v1.12.0",
"upgradeTo": "v1.13.0" "upgradeTo": "v1.13.1"
}, },
"stats": { "stats": {
"totalNbTasks": 1, "totalNbTasks": 1,

View File

@ -12,7 +12,7 @@ source: crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs
"canceledBy": null, "canceledBy": null,
"details": { "details": {
"upgradeFrom": "v1.12.0", "upgradeFrom": "v1.12.0",
"upgradeTo": "v1.13.0" "upgradeTo": "v1.13.1"
}, },
"error": null, "error": null,
"duration": "[duration]", "duration": "[duration]",

View File

@ -12,7 +12,7 @@ source: crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs
"canceledBy": null, "canceledBy": null,
"details": { "details": {
"upgradeFrom": "v1.12.0", "upgradeFrom": "v1.12.0",
"upgradeTo": "v1.13.0" "upgradeTo": "v1.13.1"
}, },
"error": null, "error": null,
"duration": "[duration]", "duration": "[duration]",

View File

@ -12,7 +12,7 @@ source: crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs
"canceledBy": null, "canceledBy": null,
"details": { "details": {
"upgradeFrom": "v1.12.0", "upgradeFrom": "v1.12.0",
"upgradeTo": "v1.13.0" "upgradeTo": "v1.13.1"
}, },
"error": null, "error": null,
"duration": "[duration]", "duration": "[duration]",

View File

@ -8,7 +8,7 @@ source: crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs
"progress": null, "progress": null,
"details": { "details": {
"upgradeFrom": "v1.12.0", "upgradeFrom": "v1.12.0",
"upgradeTo": "v1.13.0" "upgradeTo": "v1.13.1"
}, },
"stats": { "stats": {
"totalNbTasks": 1, "totalNbTasks": 1,

View File

@ -1,6 +1,5 @@
--- ---
source: crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs source: crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs
snapshot_kind: text
--- ---
{ {
"results": [ "results": [
@ -13,7 +12,7 @@ snapshot_kind: text
"canceledBy": null, "canceledBy": null,
"details": { "details": {
"upgradeFrom": "v1.12.0", "upgradeFrom": "v1.12.0",
"upgradeTo": "v1.13.0" "upgradeTo": "v1.13.1"
}, },
"error": null, "error": null,
"duration": "[duration]", "duration": "[duration]",

View File

@ -139,6 +139,8 @@ async fn check_the_index_scheduler(server: &Server) {
"indexes": { "indexes": {
"kefir": { "kefir": {
"numberOfDocuments": 1, "numberOfDocuments": 1,
"rawDocumentDbSize": 109,
"avgDocumentSize": 109,
"isIndexing": false, "isIndexing": false,
"numberOfEmbeddings": 0, "numberOfEmbeddings": 0,
"numberOfEmbeddedDocuments": 0, "numberOfEmbeddedDocuments": 0,
@ -225,6 +227,8 @@ async fn check_the_index_scheduler(server: &Server) {
"indexes": { "indexes": {
"kefir": { "kefir": {
"numberOfDocuments": 1, "numberOfDocuments": 1,
"rawDocumentDbSize": 109,
"avgDocumentSize": 109,
"isIndexing": false, "isIndexing": false,
"numberOfEmbeddings": 0, "numberOfEmbeddings": 0,
"numberOfEmbeddedDocuments": 0, "numberOfEmbeddedDocuments": 0,
@ -244,6 +248,8 @@ async fn check_the_index_scheduler(server: &Server) {
snapshot!(stats, @r###" snapshot!(stats, @r###"
{ {
"numberOfDocuments": 1, "numberOfDocuments": 1,
"rawDocumentDbSize": 109,
"avgDocumentSize": 109,
"isIndexing": false, "isIndexing": false,
"numberOfEmbeddings": 0, "numberOfEmbeddings": 0,
"numberOfEmbeddedDocuments": 0, "numberOfEmbeddedDocuments": 0,

View File

@ -0,0 +1,96 @@
use heed::types::Bytes;
use heed::Database;
use heed::RoTxn;
use serde::{Deserialize, Serialize};
#[derive(Debug, Clone, Copy, Serialize, Deserialize, PartialEq, Eq, Default)]
#[serde(rename_all = "camelCase")]
/// The stats of a database.
pub struct DatabaseStats {
/// The number of entries in the database.
number_of_entries: u64,
/// The total size of the keys in the database.
total_key_size: u64,
/// The total size of the values in the database.
total_value_size: u64,
}
impl DatabaseStats {
/// Returns the stats of the database.
///
/// This function iterates over the whole database and computes the stats.
/// It is not efficient and should be cached somewhere.
pub(crate) fn new(database: Database<Bytes, Bytes>, rtxn: &RoTxn<'_>) -> heed::Result<Self> {
let mut database_stats =
Self { number_of_entries: 0, total_key_size: 0, total_value_size: 0 };
let mut iter = database.iter(rtxn)?;
while let Some((key, value)) = iter.next().transpose()? {
let key_size = key.len() as u64;
let value_size = value.len() as u64;
database_stats.total_key_size += key_size;
database_stats.total_value_size += value_size;
}
database_stats.number_of_entries = database.len(rtxn)?;
Ok(database_stats)
}
/// Recomputes the stats of the database and returns the new stats.
///
/// This function is used to update the stats of the database when some keys are modified.
/// It is more efficient than the `new` function because it does not iterate over the whole database but only the modified keys comparing the before and after states.
pub(crate) fn recompute<I, K>(
mut stats: Self,
database: Database<Bytes, Bytes>,
before_rtxn: &RoTxn<'_>,
after_rtxn: &RoTxn<'_>,
modified_keys: I,
) -> heed::Result<Self>
where
I: IntoIterator<Item = K>,
K: AsRef<[u8]>,
{
for key in modified_keys {
let key = key.as_ref();
if let Some(value) = database.get(after_rtxn, key)? {
let key_size = key.len() as u64;
let value_size = value.len() as u64;
stats.total_key_size = stats.total_key_size.saturating_add(key_size);
stats.total_value_size = stats.total_value_size.saturating_add(value_size);
}
if let Some(value) = database.get(before_rtxn, key)? {
let key_size = key.len() as u64;
let value_size = value.len() as u64;
stats.total_key_size = stats.total_key_size.saturating_sub(key_size);
stats.total_value_size = stats.total_value_size.saturating_sub(value_size);
}
}
stats.number_of_entries = database.len(after_rtxn)?;
Ok(stats)
}
pub fn average_key_size(&self) -> u64 {
self.total_key_size.checked_div(self.number_of_entries).unwrap_or(0)
}
pub fn average_value_size(&self) -> u64 {
self.total_value_size.checked_div(self.number_of_entries).unwrap_or(0)
}
pub fn number_of_entries(&self) -> u64 {
self.number_of_entries
}
pub fn total_key_size(&self) -> u64 {
self.total_key_size
}
pub fn total_value_size(&self) -> u64 {
self.total_value_size
}
}

View File

@ -11,6 +11,7 @@ use rstar::RTree;
use serde::{Deserialize, Serialize}; use serde::{Deserialize, Serialize};
use crate::constants::{self, RESERVED_VECTORS_FIELD_NAME}; use crate::constants::{self, RESERVED_VECTORS_FIELD_NAME};
use crate::database_stats::DatabaseStats;
use crate::documents::PrimaryKey; use crate::documents::PrimaryKey;
use crate::error::{InternalError, UserError}; use crate::error::{InternalError, UserError};
use crate::fields_ids_map::FieldsIdsMap; use crate::fields_ids_map::FieldsIdsMap;
@ -74,6 +75,7 @@ pub mod main_key {
pub const LOCALIZED_ATTRIBUTES_RULES: &str = "localized_attributes_rules"; pub const LOCALIZED_ATTRIBUTES_RULES: &str = "localized_attributes_rules";
pub const FACET_SEARCH: &str = "facet_search"; pub const FACET_SEARCH: &str = "facet_search";
pub const PREFIX_SEARCH: &str = "prefix_search"; pub const PREFIX_SEARCH: &str = "prefix_search";
pub const DOCUMENTS_STATS: &str = "documents_stats";
} }
pub mod db_name { pub mod db_name {
@ -403,6 +405,58 @@ impl Index {
Ok(count.unwrap_or_default()) Ok(count.unwrap_or_default())
} }
/// Updates the stats of the documents database based on the previous stats and the modified docids.
pub fn update_documents_stats(
&self,
wtxn: &mut RwTxn<'_>,
modified_docids: roaring::RoaringBitmap,
) -> Result<()> {
let before_rtxn = self.read_txn()?;
let document_stats = match self.documents_stats(&before_rtxn)? {
Some(before_stats) => DatabaseStats::recompute(
before_stats,
self.documents.remap_types(),
&before_rtxn,
wtxn,
modified_docids.iter().map(|docid| docid.to_be_bytes()),
)?,
None => {
// This should never happen when there are already documents in the index, the documents stats should be present.
// If it happens, it means that the index was not properly initialized/upgraded.
debug_assert_eq!(
self.documents.len(&before_rtxn)?,
0,
"The documents stats should be present when there are documents in the index"
);
tracing::warn!("No documents stats found, creating new ones");
DatabaseStats::new(self.documents.remap_types(), &*wtxn)?
}
};
self.put_documents_stats(wtxn, document_stats)?;
Ok(())
}
/// Writes the stats of the documents database.
pub fn put_documents_stats(
&self,
wtxn: &mut RwTxn<'_>,
stats: DatabaseStats,
) -> heed::Result<()> {
self.main.remap_types::<Str, SerdeJson<DatabaseStats>>().put(
wtxn,
main_key::DOCUMENTS_STATS,
&stats,
)
}
/// Returns the stats of the documents database.
pub fn documents_stats(&self, rtxn: &RoTxn<'_>) -> heed::Result<Option<DatabaseStats>> {
self.main
.remap_types::<Str, SerdeJson<DatabaseStats>>()
.get(rtxn, main_key::DOCUMENTS_STATS)
}
/* primary key */ /* primary key */
/// Writes the documents primary key, this is the field name that is used to store the id. /// Writes the documents primary key, this is the field name that is used to store the id.

View File

@ -10,6 +10,7 @@ pub mod documents;
mod asc_desc; mod asc_desc;
mod criterion; mod criterion;
pub mod database_stats;
mod error; mod error;
mod external_documents_ids; mod external_documents_ids;
pub mod facet; pub mod facet;

View File

@ -215,7 +215,7 @@ pub fn partially_initialized_term_from_word(
let mut zero_typo = None; let mut zero_typo = None;
let mut prefix_of = BTreeSet::new(); let mut prefix_of = BTreeSet::new();
if fst.contains(word) { if fst.contains(word) || ctx.index.exact_word_docids.get(ctx.txn, word)?.is_some() {
zero_typo = Some(word_interned); zero_typo = Some(word_interned);
} }

View File

@ -307,6 +307,7 @@ where
let current_span = tracing::Span::current(); let current_span = tracing::Span::current();
// Run extraction pipeline in parallel. // Run extraction pipeline in parallel.
let mut modified_docids = RoaringBitmap::new();
pool.install(|| { pool.install(|| {
let settings_diff_cloned = settings_diff.clone(); let settings_diff_cloned = settings_diff.clone();
rayon::spawn(move || { rayon::spawn(move || {
@ -367,7 +368,7 @@ where
Err(status) => { Err(status) => {
if let Some(typed_chunks) = chunk_accumulator.pop_longest() { if let Some(typed_chunks) = chunk_accumulator.pop_longest() {
let (docids, is_merged_database) = let (docids, is_merged_database) =
write_typed_chunk_into_index(self.wtxn, self.index, &settings_diff, typed_chunks)?; write_typed_chunk_into_index(self.wtxn, self.index, &settings_diff, typed_chunks, &mut modified_docids)?;
if !docids.is_empty() { if !docids.is_empty() {
final_documents_ids |= docids; final_documents_ids |= docids;
let documents_seen_count = final_documents_ids.len(); let documents_seen_count = final_documents_ids.len();
@ -467,6 +468,10 @@ where
Ok(()) Ok(())
}).map_err(InternalError::from)??; }).map_err(InternalError::from)??;
if !settings_diff.settings_update_only {
// Update the stats of the documents database when there is a document update.
self.index.update_documents_stats(self.wtxn, modified_docids)?;
}
// We write the field distribution into the main database // We write the field distribution into the main database
self.index.put_field_distribution(self.wtxn, &field_distribution)?; self.index.put_field_distribution(self.wtxn, &field_distribution)?;
@ -2763,6 +2768,7 @@ mod tests {
source: Setting::Set(crate::vector::settings::EmbedderSource::UserProvided), source: Setting::Set(crate::vector::settings::EmbedderSource::UserProvided),
model: Setting::NotSet, model: Setting::NotSet,
revision: Setting::NotSet, revision: Setting::NotSet,
pooling: Setting::NotSet,
api_key: Setting::NotSet, api_key: Setting::NotSet,
dimensions: Setting::Set(3), dimensions: Setting::Set(3),
document_template: Setting::NotSet, document_template: Setting::NotSet,

View File

@ -129,6 +129,7 @@ pub(crate) fn write_typed_chunk_into_index(
index: &Index, index: &Index,
settings_diff: &InnerIndexSettingsDiff, settings_diff: &InnerIndexSettingsDiff,
typed_chunks: Vec<TypedChunk>, typed_chunks: Vec<TypedChunk>,
modified_docids: &mut RoaringBitmap,
) -> Result<(RoaringBitmap, bool)> { ) -> Result<(RoaringBitmap, bool)> {
let mut is_merged_database = false; let mut is_merged_database = false;
match typed_chunks[0] { match typed_chunks[0] {
@ -214,6 +215,7 @@ pub(crate) fn write_typed_chunk_into_index(
kind: DocumentOperationKind::Create, kind: DocumentOperationKind::Create,
}); });
docids.insert(docid); docids.insert(docid);
modified_docids.insert(docid);
} else { } else {
db.delete(wtxn, &docid)?; db.delete(wtxn, &docid)?;
operations.push(DocumentOperation { operations.push(DocumentOperation {
@ -222,6 +224,7 @@ pub(crate) fn write_typed_chunk_into_index(
kind: DocumentOperationKind::Delete, kind: DocumentOperationKind::Delete,
}); });
docids.remove(docid); docids.remove(docid);
modified_docids.insert(docid);
} }
} }
let external_documents_docids = index.external_documents_ids(); let external_documents_docids = index.external_documents_ids();

View File

@ -711,15 +711,17 @@ impl DelAddRoaringBitmap {
DelAddRoaringBitmap { del, add } DelAddRoaringBitmap { del, add }
} }
pub fn apply_to(&self, documents_ids: &mut RoaringBitmap) { pub fn apply_to(&self, documents_ids: &mut RoaringBitmap, modified_docids: &mut RoaringBitmap) {
let DelAddRoaringBitmap { del, add } = self; let DelAddRoaringBitmap { del, add } = self;
if let Some(del) = del { if let Some(del) = del {
*documents_ids -= del; *documents_ids -= del;
*modified_docids |= del;
} }
if let Some(add) = add { if let Some(add) = add {
*documents_ids |= add; *documents_ids |= add;
*modified_docids |= add;
} }
} }
} }

View File

@ -32,6 +32,7 @@ pub(super) fn extract_all<'pl, 'extractor, DC, MSP>(
field_distribution: &mut BTreeMap<String, u64>, field_distribution: &mut BTreeMap<String, u64>,
mut index_embeddings: Vec<IndexEmbeddingConfig>, mut index_embeddings: Vec<IndexEmbeddingConfig>,
document_ids: &mut RoaringBitmap, document_ids: &mut RoaringBitmap,
modified_docids: &mut RoaringBitmap,
) -> Result<(FacetFieldIdsDelta, Vec<IndexEmbeddingConfig>)> ) -> Result<(FacetFieldIdsDelta, Vec<IndexEmbeddingConfig>)>
where where
DC: DocumentChanges<'pl>, DC: DocumentChanges<'pl>,
@ -70,7 +71,7 @@ where
// adding the delta should never cause a negative result, as we are removing fields that previously existed. // adding the delta should never cause a negative result, as we are removing fields that previously existed.
*current = current.saturating_add_signed(delta); *current = current.saturating_add_signed(delta);
} }
document_extractor_data.docids_delta.apply_to(document_ids); document_extractor_data.docids_delta.apply_to(document_ids, modified_docids);
} }
field_distribution.retain(|_, v| *v != 0); field_distribution.retain(|_, v| *v != 0);
@ -256,7 +257,7 @@ where
let Some(deladd) = data.remove(&config.name) else { let Some(deladd) = data.remove(&config.name) else {
continue 'data; continue 'data;
}; };
deladd.apply_to(&mut config.user_provided); deladd.apply_to(&mut config.user_provided, modified_docids);
} }
} }
} }

View File

@ -130,6 +130,7 @@ where
let index_embeddings = index.embedding_configs(wtxn)?; let index_embeddings = index.embedding_configs(wtxn)?;
let mut field_distribution = index.field_distribution(wtxn)?; let mut field_distribution = index.field_distribution(wtxn)?;
let mut document_ids = index.documents_ids(wtxn)?; let mut document_ids = index.documents_ids(wtxn)?;
let mut modified_docids = roaring::RoaringBitmap::new();
let congestion = thread::scope(|s| -> Result<ChannelCongestion> { let congestion = thread::scope(|s| -> Result<ChannelCongestion> {
let indexer_span = tracing::Span::current(); let indexer_span = tracing::Span::current();
@ -138,6 +139,7 @@ where
// prevent moving the field_distribution and document_ids in the inner closure... // prevent moving the field_distribution and document_ids in the inner closure...
let field_distribution = &mut field_distribution; let field_distribution = &mut field_distribution;
let document_ids = &mut document_ids; let document_ids = &mut document_ids;
let modified_docids = &mut modified_docids;
let extractor_handle = let extractor_handle =
Builder::new().name(S("indexer-extractors")).spawn_scoped(s, move || { Builder::new().name(S("indexer-extractors")).spawn_scoped(s, move || {
pool.install(move || { pool.install(move || {
@ -152,6 +154,7 @@ where
field_distribution, field_distribution,
index_embeddings, index_embeddings,
document_ids, document_ids,
modified_docids,
) )
}) })
.unwrap() .unwrap()
@ -227,6 +230,7 @@ where
embedders, embedders,
field_distribution, field_distribution,
document_ids, document_ids,
modified_docids,
)?; )?;
Ok(congestion) Ok(congestion)

View File

@ -121,7 +121,8 @@ where
Ok(()) Ok(())
} }
pub fn update_index( #[allow(clippy::too_many_arguments)]
pub(super) fn update_index(
index: &Index, index: &Index,
wtxn: &mut RwTxn<'_>, wtxn: &mut RwTxn<'_>,
new_fields_ids_map: FieldIdMapWithMetadata, new_fields_ids_map: FieldIdMapWithMetadata,
@ -129,6 +130,7 @@ pub fn update_index(
embedders: EmbeddingConfigs, embedders: EmbeddingConfigs,
field_distribution: std::collections::BTreeMap<String, u64>, field_distribution: std::collections::BTreeMap<String, u64>,
document_ids: roaring::RoaringBitmap, document_ids: roaring::RoaringBitmap,
modified_docids: roaring::RoaringBitmap,
) -> Result<()> { ) -> Result<()> {
index.put_fields_ids_map(wtxn, new_fields_ids_map.as_fields_ids_map())?; index.put_fields_ids_map(wtxn, new_fields_ids_map.as_fields_ids_map())?;
if let Some(new_primary_key) = new_primary_key { if let Some(new_primary_key) = new_primary_key {
@ -140,6 +142,7 @@ pub fn update_index(
index.put_field_distribution(wtxn, &field_distribution)?; index.put_field_distribution(wtxn, &field_distribution)?;
index.put_documents_ids(wtxn, &document_ids)?; index.put_documents_ids(wtxn, &document_ids)?;
index.set_updated_at(wtxn, &OffsetDateTime::now_utc())?; index.set_updated_at(wtxn, &OffsetDateTime::now_utc())?;
index.update_documents_stats(wtxn, modified_docids)?;
Ok(()) Ok(())
} }

View File

@ -1676,6 +1676,7 @@ fn validate_prompt(
source, source,
model, model,
revision, revision,
pooling,
api_key, api_key,
dimensions, dimensions,
document_template: Setting::Set(template), document_template: Setting::Set(template),
@ -1709,6 +1710,7 @@ fn validate_prompt(
source, source,
model, model,
revision, revision,
pooling,
api_key, api_key,
dimensions, dimensions,
document_template: Setting::Set(template), document_template: Setting::Set(template),
@ -1735,6 +1737,7 @@ pub fn validate_embedding_settings(
source, source,
model, model,
revision, revision,
pooling,
api_key, api_key,
dimensions, dimensions,
document_template, document_template,
@ -1776,6 +1779,7 @@ pub fn validate_embedding_settings(
source, source,
model, model,
revision, revision,
pooling,
api_key, api_key,
dimensions, dimensions,
document_template, document_template,
@ -1791,6 +1795,7 @@ pub fn validate_embedding_settings(
match inferred_source { match inferred_source {
EmbedderSource::OpenAi => { EmbedderSource::OpenAi => {
check_unset(&revision, EmbeddingSettings::REVISION, inferred_source, name)?; check_unset(&revision, EmbeddingSettings::REVISION, inferred_source, name)?;
check_unset(&pooling, EmbeddingSettings::POOLING, inferred_source, name)?;
check_unset(&request, EmbeddingSettings::REQUEST, inferred_source, name)?; check_unset(&request, EmbeddingSettings::REQUEST, inferred_source, name)?;
check_unset(&response, EmbeddingSettings::RESPONSE, inferred_source, name)?; check_unset(&response, EmbeddingSettings::RESPONSE, inferred_source, name)?;
@ -1829,6 +1834,7 @@ pub fn validate_embedding_settings(
EmbedderSource::Ollama => { EmbedderSource::Ollama => {
check_set(&model, EmbeddingSettings::MODEL, inferred_source, name)?; check_set(&model, EmbeddingSettings::MODEL, inferred_source, name)?;
check_unset(&revision, EmbeddingSettings::REVISION, inferred_source, name)?; check_unset(&revision, EmbeddingSettings::REVISION, inferred_source, name)?;
check_unset(&pooling, EmbeddingSettings::POOLING, inferred_source, name)?;
check_unset(&request, EmbeddingSettings::REQUEST, inferred_source, name)?; check_unset(&request, EmbeddingSettings::REQUEST, inferred_source, name)?;
check_unset(&response, EmbeddingSettings::RESPONSE, inferred_source, name)?; check_unset(&response, EmbeddingSettings::RESPONSE, inferred_source, name)?;
@ -1846,6 +1852,7 @@ pub fn validate_embedding_settings(
EmbedderSource::UserProvided => { EmbedderSource::UserProvided => {
check_unset(&model, EmbeddingSettings::MODEL, inferred_source, name)?; check_unset(&model, EmbeddingSettings::MODEL, inferred_source, name)?;
check_unset(&revision, EmbeddingSettings::REVISION, inferred_source, name)?; check_unset(&revision, EmbeddingSettings::REVISION, inferred_source, name)?;
check_unset(&pooling, EmbeddingSettings::POOLING, inferred_source, name)?;
check_unset(&api_key, EmbeddingSettings::API_KEY, inferred_source, name)?; check_unset(&api_key, EmbeddingSettings::API_KEY, inferred_source, name)?;
check_unset( check_unset(
&document_template, &document_template,
@ -1869,6 +1876,7 @@ pub fn validate_embedding_settings(
EmbedderSource::Rest => { EmbedderSource::Rest => {
check_unset(&model, EmbeddingSettings::MODEL, inferred_source, name)?; check_unset(&model, EmbeddingSettings::MODEL, inferred_source, name)?;
check_unset(&revision, EmbeddingSettings::REVISION, inferred_source, name)?; check_unset(&revision, EmbeddingSettings::REVISION, inferred_source, name)?;
check_unset(&pooling, EmbeddingSettings::POOLING, inferred_source, name)?;
check_set(&url, EmbeddingSettings::URL, inferred_source, name)?; check_set(&url, EmbeddingSettings::URL, inferred_source, name)?;
check_set(&request, EmbeddingSettings::REQUEST, inferred_source, name)?; check_set(&request, EmbeddingSettings::REQUEST, inferred_source, name)?;
check_set(&response, EmbeddingSettings::RESPONSE, inferred_source, name)?; check_set(&response, EmbeddingSettings::RESPONSE, inferred_source, name)?;
@ -1878,6 +1886,7 @@ pub fn validate_embedding_settings(
source, source,
model, model,
revision, revision,
pooling,
api_key, api_key,
dimensions, dimensions,
document_template, document_template,

View File

@ -3,7 +3,7 @@ mod v1_13;
use heed::RwTxn; use heed::RwTxn;
use v1_12::{V1_12_3_To_V1_13_0, V1_12_To_V1_12_3}; use v1_12::{V1_12_3_To_V1_13_0, V1_12_To_V1_12_3};
use v1_13::V1_13_0_To_Current; use v1_13::{V1_13_0_To_V1_13_1, V1_13_1_To_Current};
use crate::progress::{Progress, VariableNameStep}; use crate::progress::{Progress, VariableNameStep};
use crate::{Index, InternalError, Result}; use crate::{Index, InternalError, Result};
@ -28,13 +28,18 @@ pub fn upgrade(
progress: Progress, progress: Progress,
) -> Result<bool> { ) -> Result<bool> {
let from = index.get_version(wtxn)?.unwrap_or(db_version); let from = index.get_version(wtxn)?.unwrap_or(db_version);
let upgrade_functions: &[&dyn UpgradeIndex] = let upgrade_functions: &[&dyn UpgradeIndex] = &[
&[&V1_12_To_V1_12_3 {}, &V1_12_3_To_V1_13_0 {}, &V1_13_0_To_Current()]; &V1_12_To_V1_12_3 {},
&V1_12_3_To_V1_13_0 {},
&V1_13_0_To_V1_13_1 {},
&V1_13_1_To_Current {},
];
let start = match from { let start = match from {
(1, 12, 0..=2) => 0, (1, 12, 0..=2) => 0,
(1, 12, 3..) => 1, (1, 12, 3..) => 1,
(1, 13, 0) => 2, (1, 13, 0) => 2,
(1, 13, 1) => 3,
// We must handle the current version in the match because in case of a failure some index may have been upgraded but not other. // We must handle the current version in the match because in case of a failure some index may have been upgraded but not other.
(1, 13, _) => return Ok(false), (1, 13, _) => return Ok(false),
(major, minor, patch) => { (major, minor, patch) => {

View File

@ -2,13 +2,44 @@ use heed::RwTxn;
use super::UpgradeIndex; use super::UpgradeIndex;
use crate::constants::{VERSION_MAJOR, VERSION_MINOR, VERSION_PATCH}; use crate::constants::{VERSION_MAJOR, VERSION_MINOR, VERSION_PATCH};
use crate::database_stats::DatabaseStats;
use crate::progress::Progress; use crate::progress::Progress;
use crate::{Index, Result}; use crate::{make_enum_progress, Index, Result};
#[allow(non_camel_case_types)] #[allow(non_camel_case_types)]
pub(super) struct V1_13_0_To_Current(); pub(super) struct V1_13_0_To_V1_13_1();
impl UpgradeIndex for V1_13_0_To_Current { impl UpgradeIndex for V1_13_0_To_V1_13_1 {
fn upgrade(
&self,
wtxn: &mut RwTxn,
index: &Index,
_original: (u32, u32, u32),
progress: Progress,
) -> Result<bool> {
make_enum_progress! {
enum DocumentsStats {
CreatingDocumentsStats,
}
};
// Create the new documents stats.
progress.update_progress(DocumentsStats::CreatingDocumentsStats);
let stats = DatabaseStats::new(index.documents.remap_types(), wtxn)?;
index.put_documents_stats(wtxn, stats)?;
Ok(true)
}
fn target_version(&self) -> (u32, u32, u32) {
(1, 13, 1)
}
}
#[allow(non_camel_case_types)]
pub(super) struct V1_13_1_To_Current();
impl UpgradeIndex for V1_13_1_To_Current {
fn upgrade( fn upgrade(
&self, &self,
_wtxn: &mut RwTxn, _wtxn: &mut RwTxn,

View File

@ -262,6 +262,31 @@ impl NewEmbedderError {
} }
} }
pub fn open_pooling_config(
pooling_config_filename: PathBuf,
inner: std::io::Error,
) -> NewEmbedderError {
let open_config = OpenPoolingConfig { filename: pooling_config_filename, inner };
Self {
kind: NewEmbedderErrorKind::OpenPoolingConfig(open_config),
fault: FaultSource::Runtime,
}
}
pub fn deserialize_pooling_config(
model_name: String,
pooling_config_filename: PathBuf,
inner: serde_json::Error,
) -> NewEmbedderError {
let deserialize_pooling_config =
DeserializePoolingConfig { model_name, filename: pooling_config_filename, inner };
Self {
kind: NewEmbedderErrorKind::DeserializePoolingConfig(deserialize_pooling_config),
fault: FaultSource::Runtime,
}
}
pub fn open_tokenizer( pub fn open_tokenizer(
tokenizer_filename: PathBuf, tokenizer_filename: PathBuf,
inner: Box<dyn std::error::Error + Send + Sync>, inner: Box<dyn std::error::Error + Send + Sync>,
@ -319,6 +344,13 @@ pub struct OpenConfig {
pub inner: std::io::Error, pub inner: std::io::Error,
} }
#[derive(Debug, thiserror::Error)]
#[error("could not open pooling config at {filename}: {inner}")]
pub struct OpenPoolingConfig {
pub filename: PathBuf,
pub inner: std::io::Error,
}
#[derive(Debug, thiserror::Error)] #[derive(Debug, thiserror::Error)]
#[error("for model '{model_name}', could not deserialize config at {filename} as JSON: {inner}")] #[error("for model '{model_name}', could not deserialize config at {filename} as JSON: {inner}")]
pub struct DeserializeConfig { pub struct DeserializeConfig {
@ -327,6 +359,14 @@ pub struct DeserializeConfig {
pub inner: serde_json::Error, pub inner: serde_json::Error,
} }
#[derive(Debug, thiserror::Error)]
#[error("for model '{model_name}', could not deserialize file at `{filename}` as a pooling config: {inner}")]
pub struct DeserializePoolingConfig {
pub model_name: String,
pub filename: PathBuf,
pub inner: serde_json::Error,
}
#[derive(Debug, thiserror::Error)] #[derive(Debug, thiserror::Error)]
#[error("model `{model_name}` appears to be unsupported{}\n - inner error: {inner}", #[error("model `{model_name}` appears to be unsupported{}\n - inner error: {inner}",
if architectures.is_empty() { if architectures.is_empty() {
@ -354,8 +394,12 @@ pub enum NewEmbedderErrorKind {
#[error(transparent)] #[error(transparent)]
OpenConfig(OpenConfig), OpenConfig(OpenConfig),
#[error(transparent)] #[error(transparent)]
OpenPoolingConfig(OpenPoolingConfig),
#[error(transparent)]
DeserializeConfig(DeserializeConfig), DeserializeConfig(DeserializeConfig),
#[error(transparent)] #[error(transparent)]
DeserializePoolingConfig(DeserializePoolingConfig),
#[error(transparent)]
UnsupportedModel(UnsupportedModel), UnsupportedModel(UnsupportedModel),
#[error(transparent)] #[error(transparent)]
OpenTokenizer(OpenTokenizer), OpenTokenizer(OpenTokenizer),

View File

@ -34,6 +34,30 @@ pub struct EmbedderOptions {
pub model: String, pub model: String,
pub revision: Option<String>, pub revision: Option<String>,
pub distribution: Option<DistributionShift>, pub distribution: Option<DistributionShift>,
#[serde(default)]
pub pooling: OverridePooling,
}
#[derive(
Debug,
Clone,
Copy,
Default,
Hash,
PartialEq,
Eq,
serde::Deserialize,
serde::Serialize,
utoipa::ToSchema,
deserr::Deserr,
)]
#[deserr(rename_all = camelCase, deny_unknown_fields)]
#[serde(rename_all = "camelCase")]
pub enum OverridePooling {
UseModel,
ForceCls,
#[default]
ForceMean,
} }
impl EmbedderOptions { impl EmbedderOptions {
@ -42,6 +66,7 @@ impl EmbedderOptions {
model: "BAAI/bge-base-en-v1.5".to_string(), model: "BAAI/bge-base-en-v1.5".to_string(),
revision: Some("617ca489d9e86b49b8167676d8220688b99db36e".into()), revision: Some("617ca489d9e86b49b8167676d8220688b99db36e".into()),
distribution: None, distribution: None,
pooling: OverridePooling::UseModel,
} }
} }
} }
@ -58,6 +83,7 @@ pub struct Embedder {
tokenizer: Tokenizer, tokenizer: Tokenizer,
options: EmbedderOptions, options: EmbedderOptions,
dimensions: usize, dimensions: usize,
pooling: Pooling,
} }
impl std::fmt::Debug for Embedder { impl std::fmt::Debug for Embedder {
@ -66,10 +92,62 @@ impl std::fmt::Debug for Embedder {
.field("model", &self.options.model) .field("model", &self.options.model)
.field("tokenizer", &self.tokenizer) .field("tokenizer", &self.tokenizer)
.field("options", &self.options) .field("options", &self.options)
.field("pooling", &self.pooling)
.finish() .finish()
} }
} }
#[derive(Clone, Copy, serde::Deserialize)]
struct PoolingConfig {
#[serde(default)]
pub pooling_mode_cls_token: bool,
#[serde(default)]
pub pooling_mode_mean_tokens: bool,
#[serde(default)]
pub pooling_mode_max_tokens: bool,
#[serde(default)]
pub pooling_mode_mean_sqrt_len_tokens: bool,
#[serde(default)]
pub pooling_mode_lasttoken: bool,
}
#[derive(Debug, Clone, Copy, Default)]
pub enum Pooling {
#[default]
Mean,
Cls,
Max,
MeanSqrtLen,
LastToken,
}
impl Pooling {
fn override_with(&mut self, pooling: OverridePooling) {
match pooling {
OverridePooling::UseModel => {}
OverridePooling::ForceCls => *self = Pooling::Cls,
OverridePooling::ForceMean => *self = Pooling::Mean,
}
}
}
impl From<PoolingConfig> for Pooling {
fn from(value: PoolingConfig) -> Self {
if value.pooling_mode_cls_token {
Self::Cls
} else if value.pooling_mode_mean_tokens {
Self::Mean
} else if value.pooling_mode_lasttoken {
Self::LastToken
} else if value.pooling_mode_mean_sqrt_len_tokens {
Self::MeanSqrtLen
} else if value.pooling_mode_max_tokens {
Self::Max
} else {
Self::default()
}
}
}
impl Embedder { impl Embedder {
pub fn new(options: EmbedderOptions) -> std::result::Result<Self, NewEmbedderError> { pub fn new(options: EmbedderOptions) -> std::result::Result<Self, NewEmbedderError> {
let device = match candle_core::Device::cuda_if_available(0) { let device = match candle_core::Device::cuda_if_available(0) {
@ -83,7 +161,7 @@ impl Embedder {
Some(revision) => Repo::with_revision(options.model.clone(), RepoType::Model, revision), Some(revision) => Repo::with_revision(options.model.clone(), RepoType::Model, revision),
None => Repo::model(options.model.clone()), None => Repo::model(options.model.clone()),
}; };
let (config_filename, tokenizer_filename, weights_filename, weight_source) = { let (config_filename, tokenizer_filename, weights_filename, weight_source, pooling) = {
let api = Api::new().map_err(NewEmbedderError::new_api_fail)?; let api = Api::new().map_err(NewEmbedderError::new_api_fail)?;
let api = api.repo(repo); let api = api.repo(repo);
let config = api.get("config.json").map_err(NewEmbedderError::api_get)?; let config = api.get("config.json").map_err(NewEmbedderError::api_get)?;
@ -97,7 +175,38 @@ impl Embedder {
}) })
.map_err(NewEmbedderError::api_get)? .map_err(NewEmbedderError::api_get)?
}; };
(config, tokenizer, weights, source) let pooling = match api.get("1_Pooling/config.json") {
Ok(pooling) => Some(pooling),
Err(hf_hub::api::sync::ApiError::RequestError(error))
if matches!(*error, ureq::Error::Status(404, _,)) =>
{
// ignore the error if the file simply doesn't exist
None
}
Err(error) => return Err(NewEmbedderError::api_get(error)),
};
let mut pooling: Pooling = match pooling {
Some(pooling_filename) => {
let pooling = std::fs::read_to_string(&pooling_filename).map_err(|inner| {
NewEmbedderError::open_pooling_config(pooling_filename.clone(), inner)
})?;
let pooling: PoolingConfig =
serde_json::from_str(&pooling).map_err(|inner| {
NewEmbedderError::deserialize_pooling_config(
options.model.clone(),
pooling_filename,
inner,
)
})?;
pooling.into()
}
None => Pooling::default(),
};
pooling.override_with(options.pooling);
(config, tokenizer, weights, source, pooling)
}; };
let config = std::fs::read_to_string(&config_filename) let config = std::fs::read_to_string(&config_filename)
@ -122,6 +231,8 @@ impl Embedder {
}, },
}; };
tracing::debug!(model = options.model, weight=?weight_source, pooling=?pooling, "model config");
let model = BertModel::load(vb, &config).map_err(NewEmbedderError::load_model)?; let model = BertModel::load(vb, &config).map_err(NewEmbedderError::load_model)?;
if let Some(pp) = tokenizer.get_padding_mut() { if let Some(pp) = tokenizer.get_padding_mut() {
@ -134,7 +245,7 @@ impl Embedder {
tokenizer.with_padding(Some(pp)); tokenizer.with_padding(Some(pp));
} }
let mut this = Self { model, tokenizer, options, dimensions: 0 }; let mut this = Self { model, tokenizer, options, dimensions: 0, pooling };
let embeddings = this let embeddings = this
.embed(vec!["test".into()]) .embed(vec!["test".into()])
@ -168,17 +279,53 @@ impl Embedder {
.forward(&token_ids, &token_type_ids, None) .forward(&token_ids, &token_type_ids, None)
.map_err(EmbedError::model_forward)?; .map_err(EmbedError::model_forward)?;
// Apply some avg-pooling by taking the mean embedding value for all tokens (including padding) let embeddings = Self::pooling(embeddings, self.pooling)?;
let (_n_sentence, n_tokens, _hidden_size) =
embeddings.dims3().map_err(EmbedError::tensor_shape)?;
let embeddings = (embeddings.sum(1).map_err(EmbedError::tensor_value)? / (n_tokens as f64))
.map_err(EmbedError::tensor_shape)?;
let embeddings: Vec<Embedding> = embeddings.to_vec2().map_err(EmbedError::tensor_shape)?; let embeddings: Vec<Embedding> = embeddings.to_vec2().map_err(EmbedError::tensor_shape)?;
Ok(embeddings) Ok(embeddings)
} }
fn pooling(embeddings: Tensor, pooling: Pooling) -> Result<Tensor, EmbedError> {
match pooling {
Pooling::Mean => Self::mean_pooling(embeddings),
Pooling::Cls => Self::cls_pooling(embeddings),
Pooling::Max => Self::max_pooling(embeddings),
Pooling::MeanSqrtLen => Self::mean_sqrt_pooling(embeddings),
Pooling::LastToken => Self::last_token_pooling(embeddings),
}
}
fn cls_pooling(embeddings: Tensor) -> Result<Tensor, EmbedError> {
embeddings.get_on_dim(1, 0).map_err(EmbedError::tensor_value)
}
fn mean_sqrt_pooling(embeddings: Tensor) -> Result<Tensor, EmbedError> {
let (_n_sentence, n_tokens, _hidden_size) =
embeddings.dims3().map_err(EmbedError::tensor_shape)?;
(embeddings.sum(1).map_err(EmbedError::tensor_value)? / (n_tokens as f64).sqrt())
.map_err(EmbedError::tensor_shape)
}
fn mean_pooling(embeddings: Tensor) -> Result<Tensor, EmbedError> {
let (_n_sentence, n_tokens, _hidden_size) =
embeddings.dims3().map_err(EmbedError::tensor_shape)?;
(embeddings.sum(1).map_err(EmbedError::tensor_value)? / (n_tokens as f64))
.map_err(EmbedError::tensor_shape)
}
fn max_pooling(embeddings: Tensor) -> Result<Tensor, EmbedError> {
embeddings.max(1).map_err(EmbedError::tensor_shape)
}
fn last_token_pooling(embeddings: Tensor) -> Result<Tensor, EmbedError> {
let (_n_sentence, n_tokens, _hidden_size) =
embeddings.dims3().map_err(EmbedError::tensor_shape)?;
embeddings.get_on_dim(1, n_tokens - 1).map_err(EmbedError::tensor_value)
}
pub fn embed_one(&self, text: &str) -> std::result::Result<Embedding, EmbedError> { pub fn embed_one(&self, text: &str) -> std::result::Result<Embedding, EmbedError> {
let tokens = self.tokenizer.encode(text, true).map_err(EmbedError::tokenize)?; let tokens = self.tokenizer.encode(text, true).map_err(EmbedError::tokenize)?;
let token_ids = tokens.get_ids(); let token_ids = tokens.get_ids();
@ -192,11 +339,8 @@ impl Embedder {
.forward(&token_ids, &token_type_ids, None) .forward(&token_ids, &token_type_ids, None)
.map_err(EmbedError::model_forward)?; .map_err(EmbedError::model_forward)?;
// Apply some avg-pooling by taking the mean embedding value for all tokens (including padding) let embedding = Self::pooling(embeddings, self.pooling)?;
let (_n_sentence, n_tokens, _hidden_size) =
embeddings.dims3().map_err(EmbedError::tensor_shape)?;
let embedding = (embeddings.sum(1).map_err(EmbedError::tensor_value)? / (n_tokens as f64))
.map_err(EmbedError::tensor_shape)?;
let embedding = embedding.squeeze(0).map_err(EmbedError::tensor_shape)?; let embedding = embedding.squeeze(0).map_err(EmbedError::tensor_shape)?;
let embedding: Embedding = embedding.to_vec1().map_err(EmbedError::tensor_shape)?; let embedding: Embedding = embedding.to_vec1().map_err(EmbedError::tensor_shape)?;
Ok(embedding) Ok(embedding)

View File

@ -6,6 +6,7 @@ use roaring::RoaringBitmap;
use serde::{Deserialize, Serialize}; use serde::{Deserialize, Serialize};
use utoipa::ToSchema; use utoipa::ToSchema;
use super::hf::OverridePooling;
use super::{ollama, openai, DistributionShift}; use super::{ollama, openai, DistributionShift};
use crate::prompt::{default_max_bytes, PromptData}; use crate::prompt::{default_max_bytes, PromptData};
use crate::update::Setting; use crate::update::Setting;
@ -30,6 +31,10 @@ pub struct EmbeddingSettings {
pub revision: Setting<String>, pub revision: Setting<String>,
#[serde(default, skip_serializing_if = "Setting::is_not_set")] #[serde(default, skip_serializing_if = "Setting::is_not_set")]
#[deserr(default)] #[deserr(default)]
#[schema(value_type = Option<OverridePooling>)]
pub pooling: Setting<OverridePooling>,
#[serde(default, skip_serializing_if = "Setting::is_not_set")]
#[deserr(default)]
#[schema(value_type = Option<String>)] #[schema(value_type = Option<String>)]
pub api_key: Setting<String>, pub api_key: Setting<String>,
#[serde(default, skip_serializing_if = "Setting::is_not_set")] #[serde(default, skip_serializing_if = "Setting::is_not_set")]
@ -164,6 +169,7 @@ impl SettingsDiff {
mut source, mut source,
mut model, mut model,
mut revision, mut revision,
mut pooling,
mut api_key, mut api_key,
mut dimensions, mut dimensions,
mut document_template, mut document_template,
@ -180,6 +186,7 @@ impl SettingsDiff {
source: new_source, source: new_source,
model: new_model, model: new_model,
revision: new_revision, revision: new_revision,
pooling: new_pooling,
api_key: new_api_key, api_key: new_api_key,
dimensions: new_dimensions, dimensions: new_dimensions,
document_template: new_document_template, document_template: new_document_template,
@ -210,6 +217,7 @@ impl SettingsDiff {
&source, &source,
&mut model, &mut model,
&mut revision, &mut revision,
&mut pooling,
&mut dimensions, &mut dimensions,
&mut url, &mut url,
&mut request, &mut request,
@ -225,6 +233,9 @@ impl SettingsDiff {
if revision.apply(new_revision) { if revision.apply(new_revision) {
ReindexAction::push_action(&mut reindex_action, ReindexAction::FullReindex); ReindexAction::push_action(&mut reindex_action, ReindexAction::FullReindex);
} }
if pooling.apply(new_pooling) {
ReindexAction::push_action(&mut reindex_action, ReindexAction::FullReindex);
}
if dimensions.apply(new_dimensions) { if dimensions.apply(new_dimensions) {
match source { match source {
// regenerate on dimensions change in OpenAI since truncation is supported // regenerate on dimensions change in OpenAI since truncation is supported
@ -290,6 +301,7 @@ impl SettingsDiff {
source, source,
model, model,
revision, revision,
pooling,
api_key, api_key,
dimensions, dimensions,
document_template, document_template,
@ -338,6 +350,7 @@ fn apply_default_for_source(
source: &Setting<EmbedderSource>, source: &Setting<EmbedderSource>,
model: &mut Setting<String>, model: &mut Setting<String>,
revision: &mut Setting<String>, revision: &mut Setting<String>,
pooling: &mut Setting<OverridePooling>,
dimensions: &mut Setting<usize>, dimensions: &mut Setting<usize>,
url: &mut Setting<String>, url: &mut Setting<String>,
request: &mut Setting<serde_json::Value>, request: &mut Setting<serde_json::Value>,
@ -350,6 +363,7 @@ fn apply_default_for_source(
Setting::Set(EmbedderSource::HuggingFace) => { Setting::Set(EmbedderSource::HuggingFace) => {
*model = Setting::Reset; *model = Setting::Reset;
*revision = Setting::Reset; *revision = Setting::Reset;
*pooling = Setting::Reset;
*dimensions = Setting::NotSet; *dimensions = Setting::NotSet;
*url = Setting::NotSet; *url = Setting::NotSet;
*request = Setting::NotSet; *request = Setting::NotSet;
@ -359,6 +373,7 @@ fn apply_default_for_source(
Setting::Set(EmbedderSource::Ollama) => { Setting::Set(EmbedderSource::Ollama) => {
*model = Setting::Reset; *model = Setting::Reset;
*revision = Setting::NotSet; *revision = Setting::NotSet;
*pooling = Setting::NotSet;
*dimensions = Setting::Reset; *dimensions = Setting::Reset;
*url = Setting::NotSet; *url = Setting::NotSet;
*request = Setting::NotSet; *request = Setting::NotSet;
@ -368,6 +383,7 @@ fn apply_default_for_source(
Setting::Set(EmbedderSource::OpenAi) | Setting::Reset => { Setting::Set(EmbedderSource::OpenAi) | Setting::Reset => {
*model = Setting::Reset; *model = Setting::Reset;
*revision = Setting::NotSet; *revision = Setting::NotSet;
*pooling = Setting::NotSet;
*dimensions = Setting::NotSet; *dimensions = Setting::NotSet;
*url = Setting::Reset; *url = Setting::Reset;
*request = Setting::NotSet; *request = Setting::NotSet;
@ -377,6 +393,7 @@ fn apply_default_for_source(
Setting::Set(EmbedderSource::Rest) => { Setting::Set(EmbedderSource::Rest) => {
*model = Setting::NotSet; *model = Setting::NotSet;
*revision = Setting::NotSet; *revision = Setting::NotSet;
*pooling = Setting::NotSet;
*dimensions = Setting::Reset; *dimensions = Setting::Reset;
*url = Setting::Reset; *url = Setting::Reset;
*request = Setting::Reset; *request = Setting::Reset;
@ -386,6 +403,7 @@ fn apply_default_for_source(
Setting::Set(EmbedderSource::UserProvided) => { Setting::Set(EmbedderSource::UserProvided) => {
*model = Setting::NotSet; *model = Setting::NotSet;
*revision = Setting::NotSet; *revision = Setting::NotSet;
*pooling = Setting::NotSet;
*dimensions = Setting::Reset; *dimensions = Setting::Reset;
*url = Setting::NotSet; *url = Setting::NotSet;
*request = Setting::NotSet; *request = Setting::NotSet;
@ -419,6 +437,7 @@ impl EmbeddingSettings {
pub const SOURCE: &'static str = "source"; pub const SOURCE: &'static str = "source";
pub const MODEL: &'static str = "model"; pub const MODEL: &'static str = "model";
pub const REVISION: &'static str = "revision"; pub const REVISION: &'static str = "revision";
pub const POOLING: &'static str = "pooling";
pub const API_KEY: &'static str = "apiKey"; pub const API_KEY: &'static str = "apiKey";
pub const DIMENSIONS: &'static str = "dimensions"; pub const DIMENSIONS: &'static str = "dimensions";
pub const DOCUMENT_TEMPLATE: &'static str = "documentTemplate"; pub const DOCUMENT_TEMPLATE: &'static str = "documentTemplate";
@ -446,6 +465,7 @@ impl EmbeddingSettings {
&[EmbedderSource::HuggingFace, EmbedderSource::OpenAi, EmbedderSource::Ollama] &[EmbedderSource::HuggingFace, EmbedderSource::OpenAi, EmbedderSource::Ollama]
} }
Self::REVISION => &[EmbedderSource::HuggingFace], Self::REVISION => &[EmbedderSource::HuggingFace],
Self::POOLING => &[EmbedderSource::HuggingFace],
Self::API_KEY => { Self::API_KEY => {
&[EmbedderSource::OpenAi, EmbedderSource::Ollama, EmbedderSource::Rest] &[EmbedderSource::OpenAi, EmbedderSource::Ollama, EmbedderSource::Rest]
} }
@ -500,6 +520,7 @@ impl EmbeddingSettings {
Self::SOURCE, Self::SOURCE,
Self::MODEL, Self::MODEL,
Self::REVISION, Self::REVISION,
Self::POOLING,
Self::DOCUMENT_TEMPLATE, Self::DOCUMENT_TEMPLATE,
Self::DOCUMENT_TEMPLATE_MAX_BYTES, Self::DOCUMENT_TEMPLATE_MAX_BYTES,
Self::DISTRIBUTION, Self::DISTRIBUTION,
@ -592,10 +613,12 @@ impl From<EmbeddingConfig> for EmbeddingSettings {
model, model,
revision, revision,
distribution, distribution,
pooling,
}) => Self { }) => Self {
source: Setting::Set(EmbedderSource::HuggingFace), source: Setting::Set(EmbedderSource::HuggingFace),
model: Setting::Set(model), model: Setting::Set(model),
revision: Setting::some_or_not_set(revision), revision: Setting::some_or_not_set(revision),
pooling: Setting::Set(pooling),
api_key: Setting::NotSet, api_key: Setting::NotSet,
dimensions: Setting::NotSet, dimensions: Setting::NotSet,
document_template: Setting::Set(prompt.template), document_template: Setting::Set(prompt.template),
@ -617,6 +640,7 @@ impl From<EmbeddingConfig> for EmbeddingSettings {
source: Setting::Set(EmbedderSource::OpenAi), source: Setting::Set(EmbedderSource::OpenAi),
model: Setting::Set(embedding_model.name().to_owned()), model: Setting::Set(embedding_model.name().to_owned()),
revision: Setting::NotSet, revision: Setting::NotSet,
pooling: Setting::NotSet,
api_key: Setting::some_or_not_set(api_key), api_key: Setting::some_or_not_set(api_key),
dimensions: Setting::some_or_not_set(dimensions), dimensions: Setting::some_or_not_set(dimensions),
document_template: Setting::Set(prompt.template), document_template: Setting::Set(prompt.template),
@ -638,6 +662,7 @@ impl From<EmbeddingConfig> for EmbeddingSettings {
source: Setting::Set(EmbedderSource::Ollama), source: Setting::Set(EmbedderSource::Ollama),
model: Setting::Set(embedding_model), model: Setting::Set(embedding_model),
revision: Setting::NotSet, revision: Setting::NotSet,
pooling: Setting::NotSet,
api_key: Setting::some_or_not_set(api_key), api_key: Setting::some_or_not_set(api_key),
dimensions: Setting::some_or_not_set(dimensions), dimensions: Setting::some_or_not_set(dimensions),
document_template: Setting::Set(prompt.template), document_template: Setting::Set(prompt.template),
@ -656,6 +681,7 @@ impl From<EmbeddingConfig> for EmbeddingSettings {
source: Setting::Set(EmbedderSource::UserProvided), source: Setting::Set(EmbedderSource::UserProvided),
model: Setting::NotSet, model: Setting::NotSet,
revision: Setting::NotSet, revision: Setting::NotSet,
pooling: Setting::NotSet,
api_key: Setting::NotSet, api_key: Setting::NotSet,
dimensions: Setting::Set(dimensions), dimensions: Setting::Set(dimensions),
document_template: Setting::NotSet, document_template: Setting::NotSet,
@ -679,6 +705,7 @@ impl From<EmbeddingConfig> for EmbeddingSettings {
source: Setting::Set(EmbedderSource::Rest), source: Setting::Set(EmbedderSource::Rest),
model: Setting::NotSet, model: Setting::NotSet,
revision: Setting::NotSet, revision: Setting::NotSet,
pooling: Setting::NotSet,
api_key: Setting::some_or_not_set(api_key), api_key: Setting::some_or_not_set(api_key),
dimensions: Setting::some_or_not_set(dimensions), dimensions: Setting::some_or_not_set(dimensions),
document_template: Setting::Set(prompt.template), document_template: Setting::Set(prompt.template),
@ -701,6 +728,7 @@ impl From<EmbeddingSettings> for EmbeddingConfig {
source, source,
model, model,
revision, revision,
pooling,
api_key, api_key,
dimensions, dimensions,
document_template, document_template,
@ -764,6 +792,9 @@ impl From<EmbeddingSettings> for EmbeddingConfig {
if let Some(revision) = revision.set() { if let Some(revision) = revision.set() {
options.revision = Some(revision); options.revision = Some(revision);
} }
if let Some(pooling) = pooling.set() {
options.pooling = pooling;
}
options.distribution = distribution.set(); options.distribution = distribution.set();
this.embedder_options = super::EmbedderOptions::HuggingFace(options); this.embedder_options = super::EmbedderOptions::HuggingFace(options);
} }