meilisearch

mirror of https://github.com/meilisearch/meilisearch.git synced 2024-12-19 01:45:54 +08:00

Author	SHA1	Message	Date
many	b664a46e91	Update milli version	2021-11-03 16:11:20 +01:00
many	06e6eaa7b4	Remove useless Facet variant	2021-11-03 16:11:09 +01:00
many	30a094cbb2	Change lacking errors	2021-11-03 14:33:33 +01:00
Tamo	904bae98f8	send the analytics even when the search fail	2021-11-02 12:38:01 +01:00
bors[bot]	c32f13a909	Merge #1800 1800: Analytics r=irevoire a=irevoire Closes #1784 Implements [this spec](https://github.com/meilisearch/specifications/blob/update-analytics-specs/text/0034-telemetry-policies.md) # Anonymous Analytics Policy ## 1. Functional Specification ### I. Summary This specification describes an exhaustive list of anonymous metrics collected by the MeiliSearch binary. It also describes the tools we use for this collection and how we identify a Meilisearch instance. ### II. Motivation At MeiliSearch, our vision is to provide an easy-to-use search solution that meets the essential needs of our users. At all times, we strive to understand our users better and meet their expectations in the best possible way. Although we can gather needs and understand our users through several channels such as Github, Slack, surveys, interviews or roadmap votes, we realize that this is not enough to have a complete view of MeiliSearch usage and features adoption. By cross-referencing our product discovery phases with aggregated quantitative data, we want to make the product much better than what it is today. Our decision-making will be taken a step further to make a product that users love. ### III. Explanation #### General Data Protection Regulation (GDPR) The metrics collected are non-sensitive, non-personal and do not identify an individual or a group of individuals using MeiliSearch. The data collected is secured and anonymized. We do not collect any data from the values stored in the documents. We, the MeiliSearch team, provide an email address so that users can request the removal of their data: privacy@meilisearch.com.<br> Thanks to the unique identifier generated for their MeiliSearch installation (`Instance uuid` when launching MeiliSearch), we can remove the corresponding data from all the tools we describe below. Any questions regarding the management of the data collected can be sent to the email address as well. #### Tools ##### Segment The collected data is sent to [Segment](https://segment.com/). Segment is a platform for data collection and provides data management tools. ##### Amplitude [Amplitude](https://amplitude.com/) is a tool for graphing and highlighting collected data. Segment feeds Amplitude so that we can build visualizations according to our needs. ----------- # The `identify` call we send every hour: ## System Configuration `system` This property allows us to gather essential information to better understand on which type of machine MeiliSearch is used. This allows us to better advise users on the machines to choose according to their data volume and their use-cases. - [x] `system` => Never changes but still sent every hours - [x] distribution \| On which distribution MeiliSearch is launched, eg: Arch Linux - [x] kernel_version \| On which kernel version MeiliSearch is launched, eg: 5.14.10-arch1-1 - [x] cores \| How many cores does the machine have, eg: 24 - [x] ram_size \| Total capacity of the machine's ram. Expressed in `Kb`, eg: 33604210 - [x] disk_size \| Total capacity of the biggest disk. Expressed in `Kb`, eg: 336042103 - [x] server_provider \| Users can tell us on which provider MeiliSearch is hosted by filling the `MEILI_SERVER_PROVIDER` env var. This is also filled by our providers deploy scripts. e.g. GCP [cloud-config.yaml](`56a7c2630c/scripts/providers/gcp/cloud-config.yaml (L33)`), eg: gcp ## MeiliSearch Configuration - [x] `context.app.version`: MeiliSearch version, eg: 0.23.0 - [x] `env`: `production` / `development`, eg: `production` - [x] `has_snapshot`: Does the MeiliSearch instance has snapshot activated, eg: `true` ## MeiliSearch Statistics `stats` - [x] `stats` - [x] `database_size`: Size of indexed data. Expressed in `Kb`, eg: 180230 - [x] `indexes_number`: Number of indexes, eg: 2 - [x] `documents_number`: Number of indexed documents, eg: 165847 - [x] `start_since_days`: How many days ago was the instance launched?, eg: 328 --------- - [x] Launched \| This is the first event sent to mark that MeiliSearch is launched a first time --------- - [x] `Documents Searched POST`: The Documents Searched event is sent once an hour. The event's properties are averaged over all search operations during that time so as not to track everything and generate unnecessary noise. - [x] `user-agent`: Represents all the user-agents encountered on this endpoint during one hour, eg: `["MeiliSearch Ruby (2.1)", "Ruby (3.0)"]` - [x] `requests` - [x] `99th_response_time`: The maximum latency, in ms, for the fastest 99% of requests, eg: `57ms` - [x] `total_suceeded`: The total number of succeeded search requests, eg: `3456` - [x] `total_failed`: The total number of failed search requests, eg: `24` - [x] `total_received`: The total number of received search requests, eg: `3480` - [x] `sort` - [x] `with_geoPoint`: Does the built-in sort rule _geoPoint rule has been used?, eg: `true` /`false` - [x] `avg_criteria_number`: The average number of sort criteria among all the requests containing the sort parameter. "sort": [] equals to 0 while not sending sort does not influence the average, eg: `2` - [x] `filter` - [x] `with_geoRadius`: Does the built-in filter rule _geoRadius has been used?, eg: `true` /`false` - [x] `avg_criteria_number`: The average number of filter criteria among all the requests containing the filter parameter. "filter": [] equals to 0 while not sending filter does not influence the average, eg: `4` - [x] `most_used_syntax`: The most used filter syntax among all the requests containing the requests containing the filter parameter. `string` / `array` / `mixed`, `mixed` - [x] `q` - [x] `avg_terms_number`: The average number of terms for the `q` parameter among all requests, eg: `5` - [x] `pagination`: - [x] `max_limit`: The maximum limit encountered among all requests, eg: `20` - [x] `max_offset`: The maxium offset encountered among all requests, eg: `1000` --- - [x] `Documents Searched GET`: The Documents Searched event is sent once an hour. The event's properties are averaged over all search operations during that time so as not to track everything and generate unnecessary noise. - [x] `user-agent`: Represents all the user-agents encountered on this endpoint during one hour, eg: `["MeiliSearch Ruby (2.1)", "Ruby (3.0)"]` - [x] `requests` - [x] `99th_response_time`: The maximum latency, in ms, for the fastest 99% of requests, eg: `57ms` - [x] `total_suceeded`: The total number of succeeded search requests, eg: `3456` - [x] `total_failed`: The total number of failed search requests, eg: `24` - [x] `total_received`: The total number of received search requests, eg: `3480` - [x] `sort` - [x] `with_geoPoint`: Does the built-in sort rule _geoPoint rule has been used?, eg: `true` /`false` - [x] `avg_criteria_number`: The average number of sort criteria among all the requests containing the sort parameter. "sort": [] equals to 0 while not sending sort does not influence the average, eg: `2` - [x] `filter` - [x] `with_geoRadius`: Does the built-in filter rule _geoRadius has been used?, eg: `true` /`false` - [x] `avg_criteria_number`: The average number of filter criteria among all the requests containing the filter parameter. "filter": [] equals to 0 while not sending filter does not influence the average, eg: `4` - [x] `most_used_syntax`: The most used filter syntax among all the requests containing the requests containing the filter parameter. `string` / `array` / `mixed`, `mixed` - [x] `q` - [x] `avg_terms_number`: The average number of terms for the `q` parameter among all requests, eg: `5` - [x] `pagination`: - [x] `max_limit`: The maximum limit encountered among all requests, eg: `20` - [x] `max_offset`: The maxium offset encountered among all requests, eg: `1000` --- - [x] `Index Created` - [x] `user-agent`: Represents the user-agent encountered for this API call, eg: ["MeiliSearch Ruby (2.1)", "Ruby (3.0)"] - [x] `primary_key`: The name of the field used as primary key if set, otherwise `null`, eg: `id` --- - [x] `Index Updated` - [x] `user-agent`: Represents the user-agent encountered for this API call, eg: ["MeiliSearch Ruby (2.1)", "Ruby (3.0)"] - [x] `primary_key`: The name of the field used as primary key if set, otherwise `null`, eg: `id` --- - [x] `Documents Added`: The Documents Added event is sent once an hour. The event's properties are averaged over all POST /documents additions operations during that time to not track everything and generate unnecessary noise. - [x] `user-agent`: Represents the user-agent encountered for this API call, eg: ["MeiliSearch Ruby (2.1)", "Ruby (3.0)"] - [x] `payload_type`: Represents all the `payload_type` encountered on this endpoint during one hour, eg: [`text/csv`] - [x] `primary_key`: The name of the field used as primary key if set, otherwise `null`, eg: `id` - [x] `index_creation`: Does an index creation happened, eg: `false` --- - [x] `Documents Updated`: The Documents Added event is sent once an hour. The event's properties are averaged over all PUT /documents additions operations during that time to not track everything and generate unnecessary noise. - [x] `user-agent`: Represents the user-agent encountered for this API call, eg: ["MeiliSearch Ruby (2.1)", "Ruby (3.0)"] - [x] `payload_type`: Represents all the `payload_type` encountered on this endpoint during one hour, eg: [`application/json`] - [x] `primary_key`: The name of the field used as primary key if set, otherwise `null`, eg: `id` - [x] `index_creation`: Does an index creation happened, eg: `false` --- - [x] Settings Updated - [x] `user-agent`: Represents the user-agent encountered for this API call, eg: ["MeiliSearch Ruby (2.1)", "Ruby (3.0)"] - [x] `ranking_rules` - [x] `sort_position`: Position of the `sort` ranking rule if any, otherwise `null`, eg: `5` - [x] `sortable_attributes` - [x] `total`: Number of sortable attributes, eg: `3` - [x] `has_geo`: Indicate if `_geo` is set as a sortable attribute, eg: `false` - [x] `filterable_attributes` - [x] `total`: Number of filterable attributes, eg: `3` - [x] `has_geo`: Indicate if `_geo` is set as a filterable attribute, eg: `false` --- - [x] `RankingRules Updated` - [x] `user-agent`: Represents the user-agent encountered for this API call, eg: ["MeiliSearch Ruby (2.1)", "Ruby (3.0)"] - [x] `sort_position`: Position of the `sort` ranking rule if any, otherwise `null`, eg: `5` --- - [x] `SortableAttributes Updated` - [x] `user-agent`: Represents the user-agent encountered for this API call, eg: ["MeiliSearch Ruby (2.1)", "Ruby (3.0)"] - [x] `total`: Number of sortable attributes, eg: `3` - [x] `has_geo`: Indicate if `_geo` is set as a sortable attribute, eg: `false` --- - [x] `FilterableAttributes Updated` - [x] `user-agent`: Represents the user-agent encountered for this API call, eg: ["MeiliSearch Ruby (2.1)", "Ruby (3.0)"] - [x] `total`: Number of filterable attributes, eg: `3` - [x] `has_geo`: Indicate if `_geo` is set as a filterable attribute, eg: `false` --- - [x] Dump Created - [x] `user-agent`: Represents the user-agent encountered for this API call, eg: ["MeiliSearch Ruby (2.1)", "Ruby (3.0)"] --- Ensure the user-id file is well saved and loaded with: - [x] the dumps - [x] the snapshots - [x] Ensure the CLI uuid only show if analytics are activate at launch or already exists (=even if meilisearch was launched without analytics) Co-authored-by: Tamo <tamo@meilisearch.com> Co-authored-by: Irevoire <tamo@meilisearch.com>	2021-10-29 16:11:03 +00:00
marin postma	519093ea65	fix bad rebase	2021-10-29 17:32:49 +02:00
Tamo	bd49d1c4b5	fix one small bug	2021-10-29 17:25:56 +02:00
marin postma	2665c0099d	clippy + fmt	2021-10-29 17:25:56 +02:00
marin postma	d65f055030	pass anaytics into Arc instead of static ref	2021-10-29 17:25:55 +02:00
Tamo	66d87761b7	align the parameters in the launche resume	2021-10-29 17:25:55 +02:00
Tamo	ba69ad672a	fix the timing issue	2021-10-29 17:25:55 +02:00
Tamo	7934e3956b	replace all mutexes by channel	2021-10-29 17:25:55 +02:00
Guillaume Mourier	68fe93b7db	add ranking_rules marker before sort_position	2021-10-29 17:25:55 +02:00
Tamo	efd0ea9e1e	makes clippy happier	2021-10-29 17:25:55 +02:00
Tamo	6ef73eb226	fix all the single settings route and add the searchable attributes Updated event	2021-10-29 17:25:55 +02:00
Tamo	fc2f23d36c	move the start_since_days to teh root of the identify	2021-10-29 17:25:54 +02:00
Tamo	7c39fab453	move the user-agent out of the context in every request	2021-10-29 17:25:54 +02:00
Tamo	c5164c01c0	set the total of sortable attributes and filterable-attributes to 0 when not set	2021-10-29 17:25:54 +02:00
Tamo	351ad32d77	fix the index_creation boolean	2021-10-29 17:25:54 +02:00
Tamo	3ad8311bdd	split the analytics in a module	2021-10-29 17:25:54 +02:00
Tamo	ea5ae2bae5	sort the imports	2021-10-29 17:25:54 +02:00
Tamo	72e3adc55e	display an instance-id instead of a user-id	2021-10-29 17:25:54 +02:00
Tamo	b250392e8d	remove the first - in the path to the db instance in the instance-id	2021-10-29 17:25:53 +02:00
Tamo	d8b0d68840	use a regex to count the number of filters instead of split + flatten	2021-10-29 17:25:53 +02:00
Tamo	c4737749ab	bump segment to be able to display a user	2021-10-29 17:25:53 +02:00
Tamo	a1ab02f9fb	remove some commented code	2021-10-29 17:25:53 +02:00
Tamo	bba64b32ca	async_traits is not needed anymore	2021-10-29 17:25:53 +02:00
Tamo	9abd2aa9d7	make the analytics interval a const	2021-10-29 17:25:53 +02:00
Tamo	de35a9a605	use an official release of segment	2021-10-29 17:25:53 +02:00
Tamo	ed750e8792	fix start_since_day	2021-10-29 17:25:53 +02:00
Tamo	37ca50832c	fix the sort position	2021-10-29 17:25:52 +02:00
Tamo	31c7a0105b	fix a bug on the batch documents function	2021-10-29 17:25:52 +02:00
Tamo	ddab9eafa1	fix a typo	2021-10-29 17:25:52 +02:00
Tamo	76a4f86e0c	rename user-id to instance-uid	2021-10-29 17:25:52 +02:00
Tamo	6b34318274	makes clippy happy	2021-10-29 17:25:52 +02:00
Tamo	5508c6c154	a bit of styling	2021-10-29 17:25:52 +02:00
Tamo	9a62ac0c94	send the analytics only once every hours	2021-10-29 17:25:52 +02:00
Tamo	01737ef847	remove all the debug prints	2021-10-29 17:25:51 +02:00
Tamo	3144b572c4	remove the debug mode in release	2021-10-29 17:25:51 +02:00
Tamo	10de92987a	compile write_user_id only when the analytics are enabled	2021-10-29 17:25:51 +02:00
Tamo	c752c14c46	refactorize the dump and snapshot	2021-10-29 17:25:51 +02:00
Tamo	87a8bf5e96	write and load the user-id in the dumps	2021-10-29 17:25:51 +02:00
Tamo	ba14ea1243	plug the new batchers into the documents route	2021-10-29 17:25:51 +02:00
Tamo	9be90011c6	save the user-id in the config dir of the OS	2021-10-29 17:25:51 +02:00
Tamo	f9b14ca149	simplify the search batcher	2021-10-29 17:25:50 +02:00
Tamo	6591acfdfa	rename the documents batchers	2021-10-29 17:25:50 +02:00
Tamo	e64ba122e1	factorize the code between the two documents batcher	2021-10-29 17:25:50 +02:00
Tamo	a9523146a3	simplify the into_events methods	2021-10-29 17:25:50 +02:00
Tamo	392ee86714	implement the documents batcher	2021-10-29 17:25:50 +02:00
Tamo	1d73f484f0	update the primary key when creating a new index	2021-10-29 17:25:50 +02:00

... 18 19 20 21 22 ...

4111 Commits