Clean up the README

This commit is contained in:
Kerollmops 2020-07-06 17:38:22 +02:00
parent adb1038b26
commit 45d0d7c3d4
No known key found for this signature in database
GPG Key ID: 92ADA4E935E71FA4

View File

@ -23,19 +23,10 @@ All of that on a 39$/month machine with 4cores.
### Index your documents
You first need to split your csv yourself, the engine is currently not able to split it itself.
The bigger the split size is the faster the engine will index your documents but the higher the RAM usage will be too.
Here we use [the awesome xsv tool](https://github.com/BurntSushi/xsv) to split our big dataset.
You can feed the engine with your CSV data:
```bash
cat my-data.csv | xsv split -s 2000000 my-data-split/
```
Once your data is ready you can feed the engine with it, it will spawn one thread by CSV part up to one by number of core.
```bash
./target/release/indexer --db my-data.mmdb ../my-data-split/*
./target/release/indexer --db my-data.mmdb ../my-data.csv
```
## Querying