2020-08-04 21:40:02 +08:00
|
|
|
<p align="center">
|
2020-11-05 18:41:31 +08:00
|
|
|
<img alt="the milli logo" src="http-ui/public/logo-black.svg">
|
2020-08-04 21:40:02 +08:00
|
|
|
</p>
|
|
|
|
|
2020-11-03 01:06:10 +08:00
|
|
|
<p align="center">a concurrent indexer combined with fast and relevant search algorithms</p>
|
2020-06-28 18:40:08 +08:00
|
|
|
|
|
|
|
## Introduction
|
|
|
|
|
2021-08-17 22:49:17 +08:00
|
|
|
This repository contains the core engine used in [MeiliSearch].
|
2020-06-28 18:40:08 +08:00
|
|
|
|
2021-08-17 22:49:17 +08:00
|
|
|
It contains a library that can manage one and only one index. MeiliSearch
|
|
|
|
manages the multi-index itself. Milli is unable to store updates in a store:
|
|
|
|
it is the job of something else above and this is why it is only able
|
|
|
|
to process one update at a time.
|
|
|
|
|
|
|
|
This repository contains crates to quickly debug the engine:
|
|
|
|
- There are benchmarks located in the `benchmarks` crate.
|
|
|
|
- The `http-ui` crate is a simple HTTP dashboard to tests the features like for real!
|
|
|
|
- The `infos` crate is used to dump the internal data-structure and ensure correctness.
|
|
|
|
- The `search` crate is a simple command-line that helps run [flamegraph] on top of it.
|
|
|
|
- The `helpers` crate is only used to modify the database inplace, sometimes.
|
|
|
|
|
|
|
|
### Compile and run the HTTP debug server
|
2020-11-03 01:06:10 +08:00
|
|
|
|
|
|
|
You can specify the number of threads to use to index documents and many other settings too.
|
2020-06-28 18:40:08 +08:00
|
|
|
|
|
|
|
```bash
|
2020-11-05 18:16:39 +08:00
|
|
|
cd http-ui
|
2021-04-17 02:08:43 +08:00
|
|
|
cargo run --release -- --db my-database.mdb -vvv --indexing-jobs 8
|
2020-06-28 18:40:08 +08:00
|
|
|
```
|
|
|
|
|
2020-11-03 01:06:10 +08:00
|
|
|
### Index your documents
|
2020-06-28 18:40:08 +08:00
|
|
|
|
2020-11-03 01:06:10 +08:00
|
|
|
It can index a massive amount of documents in not much time, I already achieved to index:
|
|
|
|
- 115m songs (song and artist name) in ~1h and take 107GB on disk.
|
|
|
|
- 12m cities (name, timezone and country ID) in 15min and take 10GB on disk.
|
2020-06-28 18:40:08 +08:00
|
|
|
|
|
|
|
All of that on a 39$/month machine with 4cores.
|
|
|
|
|
2020-11-03 01:06:10 +08:00
|
|
|
You can feed the engine with your CSV (comma-seperated, yes) data like this:
|
2020-06-28 18:40:08 +08:00
|
|
|
|
|
|
|
```bash
|
2021-09-13 20:00:56 +08:00
|
|
|
printf "id,name,age\n1,hello,32\n2,kiki,24\n" | http POST 127.0.0.1:9700/documents content-type:text/csv
|
2020-06-28 18:40:08 +08:00
|
|
|
```
|
|
|
|
|
2021-09-13 20:00:56 +08:00
|
|
|
Don't forget to specify the `id` of the documents. Also Note that it also support JSON and
|
|
|
|
JSON streaming, you can send them to the engine by using the `content-type:application/json`
|
|
|
|
and `content-type:application/x-ndjson` headers respectively.
|
2020-06-28 18:40:08 +08:00
|
|
|
|
2020-11-03 01:06:10 +08:00
|
|
|
### Querying the engine via the website
|
2020-06-28 18:40:08 +08:00
|
|
|
|
2020-11-03 01:06:10 +08:00
|
|
|
You can query the engine by going to [the HTML page itself](http://127.0.0.1:9700).
|
2021-06-17 00:33:33 +08:00
|
|
|
|
|
|
|
## Contributing
|
|
|
|
|
|
|
|
You can setup a `git-hook` to stop you from making a commit too fast. It'll stop you if:
|
|
|
|
- Any of the workspaces does not build
|
|
|
|
- Your code is not well-formatted
|
|
|
|
|
|
|
|
These two things are also checked in the CI, so ignoring the hook won't help you merge your code.
|
|
|
|
But if you need to, you can still add `--no-verify` when creating your commit to ignore the hook.
|
|
|
|
|
|
|
|
To enable the hook, run the following command from the root of the project:
|
|
|
|
```
|
|
|
|
cp script/pre-commit .git/hooks/pre-commit
|
|
|
|
```
|
2021-08-17 22:49:17 +08:00
|
|
|
|
|
|
|
[MeiliSearch]: https://github.com/MeiliSearch/MeiliSearch
|
|
|
|
[flamegraph]: https://github.com/flamegraph-rs/flamegraph
|