diff --git a/deploy-manage/deploy/cloud-enterprise/add-custom-bundles-plugins.md b/deploy-manage/deploy/cloud-enterprise/add-custom-bundles-plugins.md index 36df3185c..d8ff85c74 100644 --- a/deploy-manage/deploy/cloud-enterprise/add-custom-bundles-plugins.md +++ b/deploy-manage/deploy/cloud-enterprise/add-custom-bundles-plugins.md @@ -314,7 +314,7 @@ To import a JVM trust store: } ``` -4. To use this bundle, you can refer it in the [GeoIP processor](elasticsearch://reference/ingestion-tools/enrich-processor/geoip-processor.md) of an ingest pipeline as `MyGeoLite2-City.mmdb` under `database_file` such as: +4. To use this bundle, you can refer it in the [GeoIP processor](elasticsearch://reference/enrich-processor/geoip-processor.md) of an ingest pipeline as `MyGeoLite2-City.mmdb` under `database_file` such as: ```sh ... diff --git a/deploy-manage/deploy/cloud-on-k8s/custom-configuration-files-plugins.md b/deploy-manage/deploy/cloud-on-k8s/custom-configuration-files-plugins.md index d0480c52d..90a283c82 100644 --- a/deploy-manage/deploy/cloud-on-k8s/custom-configuration-files-plugins.md +++ b/deploy-manage/deploy/cloud-on-k8s/custom-configuration-files-plugins.md @@ -107,7 +107,7 @@ To install custom configuration files you can: 1. Add the configuration data into a ConfigMap or Secret. 2. Use volumes and volume mounts in your manifest to mount the contents of the ConfigMap or Secret as files in your {{es}} nodes. -The next example shows how to add a synonyms file for the [synonym token filter](elasticsearch://reference/data-analysis/text-analysis/analysis-synonym-tokenfilter.md) in Elasticsearch. But you can **use the same approach for any kind of file you want to mount into the configuration directory of Elasticsearch**, like adding CA certificates of external systems. +The next example shows how to add a synonyms file for the [synonym token filter](elasticsearch://reference/text-analysis/analysis-synonym-tokenfilter.md) in Elasticsearch. But you can **use the same approach for any kind of file you want to mount into the configuration directory of Elasticsearch**, like adding CA certificates of external systems. 1. Create the ConfigMap or Secret with the data: diff --git a/deploy-manage/deploy/elastic-cloud/differences-from-other-elasticsearch-offerings.md b/deploy-manage/deploy/elastic-cloud/differences-from-other-elasticsearch-offerings.md index 9ec7081d2..875b39f66 100644 --- a/deploy-manage/deploy/elastic-cloud/differences-from-other-elasticsearch-offerings.md +++ b/deploy-manage/deploy/elastic-cloud/differences-from-other-elasticsearch-offerings.md @@ -154,6 +154,6 @@ The following features are not available in {{es-serverless}} and are not planne * [Custom plugins and bundles](/deploy-manage/deploy/elastic-cloud/upload-custom-plugins-bundles.md) * [{{es}} for Apache Hadoop](elasticsearch-hadoop://reference/index.md) -* [Scripted metric aggregations](elasticsearch://reference/data-analysis/aggregations/search-aggregations-metrics-scripted-metric-aggregation.md) +* [Scripted metric aggregations](elasticsearch://reference/aggregations/search-aggregations-metrics-scripted-metric-aggregation.md) * Managed web crawler: You can use the [self-managed web crawler](https://github.com/elastic/crawler) instead. -* Managed Search connectors: You can use [self-managed Search connectors](elasticsearch://reference/ingestion-tools/search-connectors/self-managed-connectors.md) instead. \ No newline at end of file +* Managed Search connectors: You can use [self-managed Search connectors](elasticsearch://reference/search-connectors/self-managed-connectors.md) instead. \ No newline at end of file diff --git a/deploy-manage/deploy/self-managed/air-gapped-install.md b/deploy-manage/deploy/self-managed/air-gapped-install.md index ac497f900..541fa7560 100644 --- a/deploy-manage/deploy/self-managed/air-gapped-install.md +++ b/deploy-manage/deploy/self-managed/air-gapped-install.md @@ -19,7 +19,7 @@ Air-gapped install of {{es}} may require additional steps in order to access som Specifically: -* To be able to use the GeoIP processor, refer to [the GeoIP processor documentation](elasticsearch://reference/ingestion-tools/enrich-processor/geoip-processor.md#manually-update-geoip-databases) for instructions on downloading and deploying the required databases. +* To be able to use the GeoIP processor, refer to [the GeoIP processor documentation](elasticsearch://reference/enrich-processor/geoip-processor.md#manually-update-geoip-databases) for instructions on downloading and deploying the required databases. * Refer to [{{ml-cap}}](/deploy-manage/deploy/self-managed/air-gapped-install.md#air-gapped-machine-learning) for instructions on deploying the Elastic Learned Sparse EncodeR (ELSER) natural language processing (NLP) model and other trained {{ml}} models. diff --git a/deploy-manage/manage-connectors.md b/deploy-manage/manage-connectors.md index 059a87357..030aac864 100644 --- a/deploy-manage/manage-connectors.md +++ b/deploy-manage/manage-connectors.md @@ -12,7 +12,7 @@ applies_to: Connectors serve as a central place to store connection information for both Elastic and third-party systems. They enable the linking of actions to rules, which execute as background tasks on the {{kib}} server when rule conditions are met. This allows rules to route actions to various destinations such as log files, ticketing systems, and messaging tools. Different {{kib}} apps may have their own rule types, but they typically share connectors. The **{{stack-manage-app}} > {{connectors-ui}}** provides a central location to view and manage all connectors in the current space. ::::{note} -This page is about {{kib}} connectors that integrate with services like generative AI model providers. If you’re looking for Search connectors that synchronize third-party data into {{es}}, refer to [Connector clients](elasticsearch://reference/ingestion-tools/search-connectors/index.md). +This page is about {{kib}} connectors that integrate with services like generative AI model providers. If you’re looking for Search connectors that synchronize third-party data into {{es}}, refer to [Connector clients](elasticsearch://reference/search-connectors/index.md). :::: diff --git a/deploy-manage/production-guidance/optimize-performance/search-speed.md b/deploy-manage/production-guidance/optimize-performance/search-speed.md index 2c3ee5579..53524dd40 100644 --- a/deploy-manage/production-guidance/optimize-performance/search-speed.md +++ b/deploy-manage/production-guidance/optimize-performance/search-speed.md @@ -14,7 +14,7 @@ applies_to: This page provides guidance on tuning {{es}} for faster search performance. While hardware and system-level settings play an important role, the structure of your documents and the design of your queries often have the biggest impact. Use these recommendations to optimize field mappings, caching behavior, and query design for high-throughput, low-latency search at scale. ::::{note} -Search performance in {{es}} depends on a combination of factors, including how expensive individual queries are, how many searches run in parallel, the number of indices and shards involved, and the overall sharding strategy and shard size. +Search performance in {{es}} depends on a combination of factors, including how expensive individual queries are, how many searches run in parallel, the number of indices and shards involved, and the overall sharding strategy and shard size. These variables influence how the system should be tuned. For example, optimizing for a small number of complex queries differs significantly from optimizing for many lightweight, concurrent searches. @@ -34,7 +34,7 @@ deployment: By default, {{es}} automatically sets its [JVM heap size](/deploy-manage/deploy/self-managed/important-settings-configuration.md#heap-size-settings) to follow this best practice. However, in self-managed or {{eck}} deployments, you have the flexibility to allocate even more memory to the filesystem cache, which can lead to performance improvements depending on your workload. ::::{note} -On Linux, the filesystem cache uses any memory not actively used by applications. To allocate memory to the cache, ensure that enough system memory remains available and is not consumed by {{es}} or other processes. +On Linux, the filesystem cache uses any memory not actively used by applications. To allocate memory to the cache, ensure that enough system memory remains available and is not consumed by {{es}} or other processes. :::: ## Avoid page cache thrashing by using modest readahead values on Linux [_avoid_page_cache_thrashing_by_using_modest_readahead_values_on_linux] @@ -117,7 +117,7 @@ PUT movies ## Pre-index data [_pre_index_data] -You should leverage patterns in your queries to optimize the way data is indexed. For instance, if all your documents have a `price` field and most queries run [`range`](elasticsearch://reference/data-analysis/aggregations/search-aggregations-bucket-range-aggregation.md) aggregations on a fixed list of ranges, you could make this aggregation faster by pre-indexing the ranges into the index and using a [`terms`](elasticsearch://reference/data-analysis/aggregations/search-aggregations-bucket-terms-aggregation.md) aggregations. +You should leverage patterns in your queries to optimize the way data is indexed. For instance, if all your documents have a `price` field and most queries run [`range`](elasticsearch://reference/aggregations/search-aggregations-bucket-range-aggregation.md) aggregations on a fixed list of ranges, you could make this aggregation faster by pre-indexing the ranges into the index and using a [`terms`](elasticsearch://reference/aggregations/search-aggregations-bucket-terms-aggregation.md) aggregations. For instance, if documents look like: diff --git a/deploy-manage/users-roles/cluster-or-deployment-auth/controlling-access-at-document-field-level.md b/deploy-manage/users-roles/cluster-or-deployment-auth/controlling-access-at-document-field-level.md index 5c52dbfa1..20fe71fe9 100644 --- a/deploy-manage/users-roles/cluster-or-deployment-auth/controlling-access-at-document-field-level.md +++ b/deploy-manage/users-roles/cluster-or-deployment-auth/controlling-access-at-document-field-level.md @@ -188,15 +188,15 @@ POST /_security/role/example3 ### Pre-processing documents to add security details [set-security-user-processor] -To guarantee that a user reads only their own documents, it makes sense to set up document level security. In this scenario, each document must have the username or role name associated with it, so that this information can be used by the role query for document level security. This is a situation where the [set security user processor](elasticsearch://reference/ingestion-tools/enrich-processor/ingest-node-set-security-user-processor.md) ingest processor can help. +To guarantee that a user reads only their own documents, it makes sense to set up document level security. In this scenario, each document must have the username or role name associated with it, so that this information can be used by the role query for document level security. This is a situation where the [set security user processor](elasticsearch://reference/enrich-processor/ingest-node-set-security-user-processor.md) ingest processor can help. ::::{note} Document level security doesn’t apply to write APIs. You must use unique ids for each user that uses the same data stream or index, otherwise they might overwrite other users' documents. The ingest processor just adds properties for the current authenticated user to the documents that are being indexed. :::: -The [set security user processor](elasticsearch://reference/ingestion-tools/enrich-processor/ingest-node-set-security-user-processor.md) attaches user-related details (such as `username`, `roles`, `email`, `full_name` and `metadata` ) from the current authenticated user to the current document by pre-processing the ingest. When you index data with an ingest pipeline, user details are automatically attached to the document. If the authenticating credential is an API key, the API key `id`, `name` and `metadata` (if it exists and is non-empty) are also attached to the document. +The [set security user processor](elasticsearch://reference/enrich-processor/ingest-node-set-security-user-processor.md) attaches user-related details (such as `username`, `roles`, `email`, `full_name` and `metadata` ) from the current authenticated user to the current document by pre-processing the ingest. When you index data with an ingest pipeline, user details are automatically attached to the document. If the authenticating credential is an API key, the API key `id`, `name` and `metadata` (if it exists and is non-empty) are also attached to the document. -For more information, see [Ingest pipelines](/manage-data/ingest/transform-enrich/ingest-pipelines.md) and [Set security user](elasticsearch://reference/ingestion-tools/enrich-processor/ingest-node-set-security-user-processor.md). +For more information, see [Ingest pipelines](/manage-data/ingest/transform-enrich/ingest-pipelines.md) and [Set security user](elasticsearch://reference/enrich-processor/ingest-node-set-security-user-processor.md). ## Field level security [field-level-security] diff --git a/explore-analyze/alerts-cases/alerts/rule-type-es-query.md b/explore-analyze/alerts-cases/alerts/rule-type-es-query.md index 50a6ae59d..2a2db3343 100644 --- a/explore-analyze/alerts-cases/alerts/rule-type-es-query.md +++ b/explore-analyze/alerts-cases/alerts/rule-type-es-query.md @@ -52,7 +52,7 @@ When you create an {{es}} query rule, your choice of query type affects the info : Specify how to calculate the value that is compared to the threshold. The value is calculated by aggregating a numeric field within the time window. The aggregation options are: `count`, `average`, `sum`, `min`, and `max`. When using `count` the document count is used and an aggregation field is not necessary. Over or Grouped Over - : Specify whether the aggregation is applied over all documents or split into groups using up to four grouping fields. If you choose to use grouping, it’s a [terms](elasticsearch://reference/data-analysis/aggregations/search-aggregations-bucket-terms-aggregation.md) or [multi terms aggregation](elasticsearch://reference/data-analysis/aggregations/search-aggregations-bucket-multi-terms-aggregation.md); an alert will be created for each unique set of values when it meets the condition. To limit the number of alerts on high cardinality fields, you must specify the number of groups to check against the threshold. Only the top groups are checked. + : Specify whether the aggregation is applied over all documents or split into groups using up to four grouping fields. If you choose to use grouping, it’s a [terms](elasticsearch://reference/aggregations/search-aggregations-bucket-terms-aggregation.md) or [multi terms aggregation](elasticsearch://reference/aggregations/search-aggregations-bucket-multi-terms-aggregation.md); an alert will be created for each unique set of values when it meets the condition. To limit the number of alerts on high cardinality fields, you must specify the number of groups to check against the threshold. Only the top groups are checked. Threshold : Defines a threshold value and a comparison operator (`is above`, `is above or equals`, `is below`, `is below or equals`, or `is between`). The value calculated by the aggregation is compared to this threshold. diff --git a/explore-analyze/elastic-inference/inference-api.md b/explore-analyze/elastic-inference/inference-api.md index 700327b37..b0ca89105 100644 --- a/explore-analyze/elastic-inference/inference-api.md +++ b/explore-analyze/elastic-inference/inference-api.md @@ -61,7 +61,7 @@ Your {{es}} deployment contains preconfigured {{infer}} endpoints which makes th * `.elser-2-elasticsearch`: uses the [ELSER](../../explore-analyze/machine-learning/nlp/ml-nlp-elser.md) built-in trained model for `sparse_embedding` tasks (recommended for English language tex). The `model_id` is `.elser_model_2_linux-x86_64`. * `.multilingual-e5-small-elasticsearch`: uses the [E5](../../explore-analyze/machine-learning/nlp/ml-nlp-e5.md) built-in trained model for `text_embedding` tasks (recommended for non-English language texts). The `model_id` is `.e5_model_2_linux-x86_64`. -Use the `inference_id` of the endpoint in a [`semantic_text`](elasticsearch://reference/elasticsearch/mapping-reference/semantic-text.md) field definition or when creating an [{{infer}} processor](elasticsearch://reference/ingestion-tools/enrich-processor/inference-processor.md). The API call will automatically download and deploy the model which might take a couple of minutes. Default {{infer}} enpoints have adaptive allocations enabled. For these models, the minimum number of allocations is `0`. If there is no {{infer}} activity that uses the endpoint, the number of allocations will scale down to `0` automatically after 15 minutes. +Use the `inference_id` of the endpoint in a [`semantic_text`](elasticsearch://reference/elasticsearch/mapping-reference/semantic-text.md) field definition or when creating an [{{infer}} processor](elasticsearch://reference/enrich-processor/inference-processor.md). The API call will automatically download and deploy the model which might take a couple of minutes. Default {{infer}} enpoints have adaptive allocations enabled. For these models, the minimum number of allocations is `0`. If there is no {{infer}} activity that uses the endpoint, the number of allocations will scale down to `0` automatically after 15 minutes. ## Configuring chunking [infer-chunking-config] diff --git a/explore-analyze/geospatial-analysis.md b/explore-analyze/geospatial-analysis.md index 41505476b..c75063839 100644 --- a/explore-analyze/geospatial-analysis.md +++ b/explore-analyze/geospatial-analysis.md @@ -24,9 +24,9 @@ Have an index with lat/lon pairs but no geo_point mapping? Use [runtime fields]( Data is often messy and incomplete. [Ingest pipelines](../manage-data/ingest/transform-enrich/ingest-pipelines.md) lets you clean, transform, and augment your data before indexing. -* Use [CSV](elasticsearch://reference/ingestion-tools/enrich-processor/csv-processor.md) together with [explicit mapping](../manage-data/data-store/mapping/explicit-mapping.md) to index CSV files with geo data. Kibana’s [Import CSV](visualize/maps/import-geospatial-data.md) feature can help with this. -* Use [GeoIP](elasticsearch://reference/ingestion-tools/enrich-processor/geoip-processor.md) to add geographical location of an IPv4 or IPv6 address. -* Use [geo-grid processor](elasticsearch://reference/ingestion-tools/enrich-processor/ingest-geo-grid-processor.md) to convert grid tiles or hexagonal cell ids to bounding boxes or polygons which describe their shape. +* Use [CSV](elasticsearch://reference/enrich-processor/csv-processor.md) together with [explicit mapping](../manage-data/data-store/mapping/explicit-mapping.md) to index CSV files with geo data. Kibana’s [Import CSV](visualize/maps/import-geospatial-data.md) feature can help with this. +* Use [GeoIP](elasticsearch://reference/enrich-processor/geoip-processor.md) to add geographical location of an IPv4 or IPv6 address. +* Use [geo-grid processor](elasticsearch://reference/enrich-processor/ingest-geo-grid-processor.md) to convert grid tiles or hexagonal cell ids to bounding boxes or polygons which describe their shape. * Use [geo_match enrich policy](../manage-data/ingest/transform-enrich/example-enrich-data-based-on-geolocation.md) for reverse geocoding. For example, use [reverse geocoding](visualize/maps/reverse-geocoding-tutorial.md) to visualize metropolitan areas by web traffic. @@ -48,22 +48,22 @@ Data is often messy and incomplete. [Ingest pipelines](../manage-data/ingest/tra ## Aggregate [geospatial-aggregate] -[Aggregations](query-filter/aggregations.md) summarizes your data as metrics, statistics, or other analytics. Use [bucket aggregations](elasticsearch://reference/data-analysis/aggregations/bucket.md) to group documents into buckets, also called bins, based on field values, ranges, or other criteria. Then, use [metric aggregations](elasticsearch://reference/data-analysis/aggregations/metrics.md) to calculate metrics, such as a sum or average, from field values in each bucket. Compare metrics across buckets to gain insights from your data. +[Aggregations](query-filter/aggregations.md) summarizes your data as metrics, statistics, or other analytics. Use [bucket aggregations](elasticsearch://reference/aggregations/bucket.md) to group documents into buckets, also called bins, based on field values, ranges, or other criteria. Then, use [metric aggregations](elasticsearch://reference/aggregations/metrics.md) to calculate metrics, such as a sum or average, from field values in each bucket. Compare metrics across buckets to gain insights from your data. Geospatial bucket aggregations: -* [Geo-distance aggregation](elasticsearch://reference/data-analysis/aggregations/search-aggregations-bucket-geodistance-aggregation.md) evaluates the distance of each geo_point location from an origin point and determines the buckets it belongs to based on the ranges (a document belongs to a bucket if the distance between the document and the origin falls within the distance range of the bucket). -* [Geohash grid aggregation](elasticsearch://reference/data-analysis/aggregations/search-aggregations-bucket-geohashgrid-aggregation.md) groups geo_point and geo_shape values into buckets that represent a grid. -* [Geohex grid aggregation](elasticsearch://reference/data-analysis/aggregations/search-aggregations-bucket-geohexgrid-aggregation.md) groups geo_point and geo_shape values into buckets that represent an H3 hexagonal cell. -* [Geotile grid aggregation](elasticsearch://reference/data-analysis/aggregations/search-aggregations-bucket-geotilegrid-aggregation.md) groups geo_point and geo_shape values into buckets that represent a grid. Each cell corresponds to a [map tile](https://en.wikipedia.org/wiki/Tiled_web_map) as used by many online map sites. +* [Geo-distance aggregation](elasticsearch://reference/aggregations/search-aggregations-bucket-geodistance-aggregation.md) evaluates the distance of each geo_point location from an origin point and determines the buckets it belongs to based on the ranges (a document belongs to a bucket if the distance between the document and the origin falls within the distance range of the bucket). +* [Geohash grid aggregation](elasticsearch://reference/aggregations/search-aggregations-bucket-geohashgrid-aggregation.md) groups geo_point and geo_shape values into buckets that represent a grid. +* [Geohex grid aggregation](elasticsearch://reference/aggregations/search-aggregations-bucket-geohexgrid-aggregation.md) groups geo_point and geo_shape values into buckets that represent an H3 hexagonal cell. +* [Geotile grid aggregation](elasticsearch://reference/aggregations/search-aggregations-bucket-geotilegrid-aggregation.md) groups geo_point and geo_shape values into buckets that represent a grid. Each cell corresponds to a [map tile](https://en.wikipedia.org/wiki/Tiled_web_map) as used by many online map sites. Geospatial metric aggregations: -* [Geo-bounds aggregation](elasticsearch://reference/data-analysis/aggregations/search-aggregations-metrics-geobounds-aggregation.md) computes the geographic bounding box containing all values for a Geopoint or Geoshape field. -* [Geo-centroid aggregation](elasticsearch://reference/data-analysis/aggregations/search-aggregations-metrics-geocentroid-aggregation.md) computes the weighted centroid from all coordinate values for geo fields. -* [Geo-line aggregation](elasticsearch://reference/data-analysis/aggregations/search-aggregations-metrics-geo-line.md) aggregates all geo_point values within a bucket into a LineString ordered by the chosen sort field. Use geo_line aggregation to create [vehicle tracks](visualize/maps/asset-tracking-tutorial.md). +* [Geo-bounds aggregation](elasticsearch://reference/aggregations/search-aggregations-metrics-geobounds-aggregation.md) computes the geographic bounding box containing all values for a Geopoint or Geoshape field. +* [Geo-centroid aggregation](elasticsearch://reference/aggregations/search-aggregations-metrics-geocentroid-aggregation.md) computes the weighted centroid from all coordinate values for geo fields. +* [Geo-line aggregation](elasticsearch://reference/aggregations/search-aggregations-metrics-geo-line.md) aggregates all geo_point values within a bucket into a LineString ordered by the chosen sort field. Use geo_line aggregation to create [vehicle tracks](visualize/maps/asset-tracking-tutorial.md). -Combine aggregations to perform complex geospatial analysis. For example, to calculate the most recent GPS tracks per flight, use a [terms aggregation](elasticsearch://reference/data-analysis/aggregations/search-aggregations-bucket-terms-aggregation.md) to group documents into buckets per aircraft. Then use geo-line aggregation to compute a track for each aircraft. In another example, use geotile grid aggregation to group documents into a grid. Then use geo-centroid aggregation to find the weighted centroid of each grid cell. +Combine aggregations to perform complex geospatial analysis. For example, to calculate the most recent GPS tracks per flight, use a [terms aggregation](elasticsearch://reference/aggregations/search-aggregations-bucket-terms-aggregation.md) to group documents into buckets per aircraft. Then use geo-line aggregation to compute a track for each aircraft. In another example, use geotile grid aggregation to group documents into a grid. Then use geo-centroid aggregation to find the weighted centroid of each grid cell. ## Integrate [geospatial-integrate] diff --git a/explore-analyze/machine-learning/anomaly-detection/geographic-anomalies.md b/explore-analyze/machine-learning/anomaly-detection/geographic-anomalies.md index c31b2a58e..bad0afb6e 100644 --- a/explore-analyze/machine-learning/anomaly-detection/geographic-anomalies.md +++ b/explore-analyze/machine-learning/anomaly-detection/geographic-anomalies.md @@ -17,7 +17,7 @@ To run this type of {{anomaly-job}}, you must have [{{ml-features}} set up](../s * two comma-separated numbers of the form `latitude,longitude`, * a [`geo_point`](elasticsearch://reference/elasticsearch/mapping-reference/geo-point.md) field, * a [`geo_shape`](elasticsearch://reference/elasticsearch/mapping-reference/geo-shape.md) field that contains point values, or -* a [`geo_centroid`](elasticsearch://reference/data-analysis/aggregations/search-aggregations-metrics-geocentroid-aggregation.md) aggregation +* a [`geo_centroid`](elasticsearch://reference/aggregations/search-aggregations-metrics-geocentroid-aggregation.md) aggregation The latitude and longitude must be in the range -180 to 180 and represent a point on the surface of the Earth. diff --git a/explore-analyze/machine-learning/anomaly-detection/ml-configuring-aggregation.md b/explore-analyze/machine-learning/anomaly-detection/ml-configuring-aggregation.md index 6cca858c1..1e559739c 100644 --- a/explore-analyze/machine-learning/anomaly-detection/ml-configuring-aggregation.md +++ b/explore-analyze/machine-learning/anomaly-detection/ml-configuring-aggregation.md @@ -34,13 +34,13 @@ There are a number of requirements for using aggregations in {{dfeeds}}. * If your [{{dfeed}} uses aggregations with nested `terms` aggs](#aggs-dfeeds) and model plot is not enabled for the {{anomaly-job}}, neither the **Single Metric Viewer** nor the **Anomaly Explorer** can plot and display an anomaly chart. In these cases, an explanatory message is shown instead of the chart. * Your {{dfeed}} can contain multiple aggregations, but only the ones with names that match values in the job configuration are fed to the job. -* Using [scripted metric](elasticsearch://reference/data-analysis/aggregations/search-aggregations-metrics-scripted-metric-aggregation.md) aggregations is not supported in {{dfeeds}}. +* Using [scripted metric](elasticsearch://reference/aggregations/search-aggregations-metrics-scripted-metric-aggregation.md) aggregations is not supported in {{dfeeds}}. ## Recommendations [aggs-recommendations-dfeeds] * When your detectors use [metric](/reference/data-analysis/machine-learning/ml-metric-functions.md) or [sum](/reference/data-analysis/machine-learning/ml-sum-functions.md) analytical functions, it’s recommended to set the `date_histogram` or `composite` aggregation interval to a tenth of the bucket span. This creates finer, more granular time buckets, which are ideal for this type of analysis. * When your detectors use [count](/reference/data-analysis/machine-learning/ml-count-functions.md) or [rare](/reference/data-analysis/machine-learning/ml-rare-functions.md) functions, set the interval to the same value as the bucket span. -* If you have multiple influencers or partition fields or if your field cardinality is more than 1000, use [composite aggregations](elasticsearch://reference/data-analysis/aggregations/search-aggregations-bucket-composite-aggregation.md). +* If you have multiple influencers or partition fields or if your field cardinality is more than 1000, use [composite aggregations](elasticsearch://reference/aggregations/search-aggregations-bucket-composite-aggregation.md). To determine the cardinality of your data, you can run searches such as: @@ -254,10 +254,10 @@ Use the following format to define a composite aggregation in your {{dfeed}}: You can also use complex nested aggregations in {{dfeeds}}. -The next example uses the [`derivative` pipeline aggregation](elasticsearch://reference/data-analysis/aggregations/search-aggregations-pipeline-derivative-aggregation.md) to find the first order derivative of the counter `system.network.out.bytes` for each value of the field `beat.name`. +The next example uses the [`derivative` pipeline aggregation](elasticsearch://reference/aggregations/search-aggregations-pipeline-derivative-aggregation.md) to find the first order derivative of the counter `system.network.out.bytes` for each value of the field `beat.name`. ::::{note} -`derivative` or other pipeline aggregations may not work within `composite` aggregations. See [composite aggregations and pipeline aggregations](elasticsearch://reference/data-analysis/aggregations/search-aggregations-bucket-composite-aggregation.md#search-aggregations-bucket-composite-aggregation-pipeline-aggregations). +`derivative` or other pipeline aggregations may not work within `composite` aggregations. See [composite aggregations and pipeline aggregations](elasticsearch://reference/aggregations/search-aggregations-bucket-composite-aggregation.md#search-aggregations-bucket-composite-aggregation-pipeline-aggregations). :::: ```js diff --git a/explore-analyze/machine-learning/anomaly-detection/ml-configuring-categories.md b/explore-analyze/machine-learning/anomaly-detection/ml-configuring-categories.md index 7b00286ae..3ed9286e8 100644 --- a/explore-analyze/machine-learning/anomaly-detection/ml-configuring-categories.md +++ b/explore-analyze/machine-learning/anomaly-detection/ml-configuring-categories.md @@ -101,7 +101,7 @@ If you use the categorization wizard in {{kib}}, you can see which categorizatio :screenshot: ::: -The categorization analyzer can refer to a built-in {{es}} analyzer or a combination of zero or more character filters, a tokenizer, and zero or more token filters. In this example, adding a [`pattern_replace` character filter](elasticsearch://reference/data-analysis/text-analysis/analysis-pattern-replace-charfilter.md) achieves the same behavior as the `categorization_filters` job configuration option described earlier. For more details about these properties, refer to the [`categorization_analyzer` API object](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-ml-put-job#ml-put-job-request-body). +The categorization analyzer can refer to a built-in {{es}} analyzer or a combination of zero or more character filters, a tokenizer, and zero or more token filters. In this example, adding a [`pattern_replace` character filter](elasticsearch://reference/text-analysis/analysis-pattern-replace-charfilter.md) achieves the same behavior as the `categorization_filters` job configuration option described earlier. For more details about these properties, refer to the [`categorization_analyzer` API object](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-ml-put-job#ml-put-job-request-body). If you use the default categorization analyzer in {{kib}} or omit the `categorization_analyzer` property from the API, the following default values are used: @@ -137,7 +137,7 @@ POST _ml/anomaly_detectors/_validate If you specify any part of the `categorization_analyzer`, however, any omitted sub-properties are *not* set to default values. -The `ml_standard` tokenizer and the day and month stopword filter are almost equivalent to the following analyzer, which is defined using only built-in {{es}} [tokenizers](elasticsearch://reference/data-analysis/text-analysis/tokenizer-reference.md) and [token filters](elasticsearch://reference/data-analysis/text-analysis/token-filter-reference.md): +The `ml_standard` tokenizer and the day and month stopword filter are almost equivalent to the following analyzer, which is defined using only built-in {{es}} [tokenizers](elasticsearch://reference/text-analysis/tokenizer-reference.md) and [token filters](elasticsearch://reference/text-analysis/token-filter-reference.md): ```console PUT _ml/anomaly_detectors/it_ops_new_logs diff --git a/explore-analyze/machine-learning/anomaly-detection/ml-limitations.md b/explore-analyze/machine-learning/anomaly-detection/ml-limitations.md index 8bb60daa0..4c41dfdfe 100644 --- a/explore-analyze/machine-learning/anomaly-detection/ml-limitations.md +++ b/explore-analyze/machine-learning/anomaly-detection/ml-limitations.md @@ -40,7 +40,7 @@ If you send pre-aggregated data to a job for analysis, you must ensure that the ### Scripted metric aggregations are not supported [_scripted_metric_aggregations_are_not_supported] -Using [scripted metric aggregations](elasticsearch://reference/data-analysis/aggregations/search-aggregations-metrics-scripted-metric-aggregation.md) in {{dfeeds}} is not supported. Refer to the [Aggregating data for faster performance](ml-configuring-aggregation.md) page to learn more about aggregations in {{dfeeds}}. +Using [scripted metric aggregations](elasticsearch://reference/aggregations/search-aggregations-metrics-scripted-metric-aggregation.md) in {{dfeeds}} is not supported. Refer to the [Aggregating data for faster performance](ml-configuring-aggregation.md) page to learn more about aggregations in {{dfeeds}}. ### Fields named "by", "count", or "over" cannot be used to split data [_fields_named_by_count_or_over_cannot_be_used_to_split_data] @@ -124,7 +124,7 @@ In {{kib}}, **Anomaly Explorer** and **Single Metric Viewer** charts are not dis * for anomalies that were due to categorization (if model plot is not enabled), * if the {{dfeed}} uses scripted fields and model plot is not enabled (except for scripts that define metric fields), -* if the {{dfeed}} uses [composite aggregations](elasticsearch://reference/data-analysis/aggregations/search-aggregations-bucket-composite-aggregation.md) that have composite sources other than `terms` and `date_histogram`, +* if the {{dfeed}} uses [composite aggregations](elasticsearch://reference/aggregations/search-aggregations-bucket-composite-aggregation.md) that have composite sources other than `terms` and `date_histogram`, * if your [{{dfeed}} uses aggregations with nested `terms` aggs](ml-configuring-aggregation.md#aggs-dfeeds) and model plot is not enabled, * `freq_rare` functions, * `info_content`, `high_info_content`, `low_info_content` functions, diff --git a/explore-analyze/machine-learning/data-frame-analytics/ml-dfa-classification.md b/explore-analyze/machine-learning/data-frame-analytics/ml-dfa-classification.md index ece68f955..faad97da9 100644 --- a/explore-analyze/machine-learning/data-frame-analytics/ml-dfa-classification.md +++ b/explore-analyze/machine-learning/data-frame-analytics/ml-dfa-classification.md @@ -193,13 +193,13 @@ For instance, suppose you have an online service and you would like to predict w {{infer-cap}} can be used as a processor specified in an [ingest pipeline](../../../manage-data/ingest/transform-enrich/ingest-pipelines.md). It uses a trained model to infer against the data that is being ingested in the pipeline. The model is used on the ingest node. {{infer-cap}} pre-processes the data by using the model and provides a prediction. After the process, the pipeline continues executing (if there is any other processor in the pipeline), finally the new data together with the results are indexed into the destination index. -Check the [{{infer}} processor](elasticsearch://reference/ingestion-tools/enrich-processor/inference-processor.md) and [the {{ml}} {{dfanalytics}} API documentation](https://www.elastic.co/docs/api/doc/elasticsearch/group/endpoint-ml-data-frame) to learn more. +Check the [{{infer}} processor](elasticsearch://reference/enrich-processor/inference-processor.md) and [the {{ml}} {{dfanalytics}} API documentation](https://www.elastic.co/docs/api/doc/elasticsearch/group/endpoint-ml-data-frame) to learn more. #### {{infer-cap}} aggregation [ml-inference-aggregation-class] {{infer-cap}} can also be used as a pipeline aggregation. You can reference a trained model in the aggregation to infer on the result field of the parent bucket aggregation. The {{infer}} aggregation uses the model on the results to provide a prediction. This aggregation enables you to run {{classification}} or {{reganalysis}} at search time. If you want to perform the analysis on a small set of data, this aggregation enables you to generate predictions without the need to set up a processor in the ingest pipeline. -Check the [{{infer}} bucket aggregation](elasticsearch://reference/data-analysis/aggregations/search-aggregations-pipeline-inference-bucket-aggregation.md) and [the {{ml}} {{dfanalytics}} API documentation](https://www.elastic.co/docs/api/doc/elasticsearch/group/endpoint-ml-data-frame) to learn more. +Check the [{{infer}} bucket aggregation](elasticsearch://reference/aggregations/search-aggregations-pipeline-inference-bucket-aggregation.md) and [the {{ml}} {{dfanalytics}} API documentation](https://www.elastic.co/docs/api/doc/elasticsearch/group/endpoint-ml-data-frame) to learn more. ::::{note} If you use trained model aliases to reference your trained model in an {{infer}} processor or {{infer}} aggregation, you can replace your trained model with a new one without the need of updating the processor or the aggregation. Reassign the alias you used to a new trained model ID by using the [Create or update trained model aliases API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-ml-put-trained-model-alias). The new trained model needs to use the same type of {{dfanalytics}} as the old one. diff --git a/explore-analyze/machine-learning/data-frame-analytics/ml-dfa-regression.md b/explore-analyze/machine-learning/data-frame-analytics/ml-dfa-regression.md index d90a5e880..69149af5f 100644 --- a/explore-analyze/machine-learning/data-frame-analytics/ml-dfa-regression.md +++ b/explore-analyze/machine-learning/data-frame-analytics/ml-dfa-regression.md @@ -139,13 +139,13 @@ For instance, suppose you have an online service and you would like to predict w {{infer-cap}} can be used as a processor specified in an [ingest pipeline](../../../manage-data/ingest/transform-enrich/ingest-pipelines.md). It uses a trained model to infer against the data that is being ingested in the pipeline. The model is used on the ingest node. {{infer-cap}} pre-processes the data by using the model and provides a prediction. After the process, the pipeline continues executing (if there is any other processor in the pipeline), finally the new data together with the results are indexed into the destination index. -Check the [{{infer}} processor](elasticsearch://reference/ingestion-tools/enrich-processor/inference-processor.md) and [the {{ml}} {{dfanalytics}} API documentation](https://www.elastic.co/docs/api/doc/elasticsearch/group/endpoint-ml-data-frame) to learn more. +Check the [{{infer}} processor](elasticsearch://reference/enrich-processor/inference-processor.md) and [the {{ml}} {{dfanalytics}} API documentation](https://www.elastic.co/docs/api/doc/elasticsearch/group/endpoint-ml-data-frame) to learn more. #### {{infer-cap}} aggregation [ml-inference-aggregation-reg] {{infer-cap}} can also be used as a pipeline aggregation. You can reference a trained model in the aggregation to infer on the result field of the parent bucket aggregation. The {{infer}} aggregation uses the model on the results to provide a prediction. This aggregation enables you to run {{classification}} or {{reganalysis}} at search time. If you want to perform the analysis on a small set of data, this aggregation enables you to generate predictions without the need to set up a processor in the ingest pipeline. -Check the [{{infer}} bucket aggregation](elasticsearch://reference/data-analysis/aggregations/search-aggregations-pipeline-inference-bucket-aggregation.md) and [the {{ml}} {{dfanalytics}} API documentation](https://www.elastic.co/docs/api/doc/elasticsearch/group/endpoint-ml-data-frame) to learn more. +Check the [{{infer}} bucket aggregation](elasticsearch://reference/aggregations/search-aggregations-pipeline-inference-bucket-aggregation.md) and [the {{ml}} {{dfanalytics}} API documentation](https://www.elastic.co/docs/api/doc/elasticsearch/group/endpoint-ml-data-frame) to learn more. ::::{note} If you use trained model aliases to reference your trained model in an {{infer}} processor or {{infer}} aggregation, you can replace your trained model with a new one without the need of updating the processor or the aggregation. Reassign the alias you used to a new trained model ID by using the [Create or update trained model aliases API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-ml-put-trained-model-alias). The new trained model needs to use the same type of {{dfanalytics}} as the old one. diff --git a/explore-analyze/machine-learning/machine-learning-in-kibana/xpack-ml-aiops.md b/explore-analyze/machine-learning/machine-learning-in-kibana/xpack-ml-aiops.md index 4b6d647e3..cf7e17809 100644 --- a/explore-analyze/machine-learning/machine-learning-in-kibana/xpack-ml-aiops.md +++ b/explore-analyze/machine-learning/machine-learning-in-kibana/xpack-ml-aiops.md @@ -52,7 +52,7 @@ Select a field for categorization and optionally apply any filters that you want This functionality is in technical preview and may be changed or removed in a future release. Elastic will work to fix any issues, but features in technical preview are not subject to the support SLA of official GA features. :::: -Change point detection uses the [change point aggregation](elasticsearch://reference/data-analysis/aggregations/search-aggregations-change-point-aggregation.md) to detect distribution changes, trend changes, and other statistically significant change points in a metric of your time series data. +Change point detection uses the [change point aggregation](elasticsearch://reference/aggregations/search-aggregations-change-point-aggregation.md) to detect distribution changes, trend changes, and other statistically significant change points in a metric of your time series data. You can find change point detection under **{{ml-app}}** > **AIOps Labs** or by using the [global search field](/explore-analyze/find-and-organize/find-apps-and-objects.md). Here, you can select the {{data-source}} or saved Discover session that you want to analyze. diff --git a/explore-analyze/machine-learning/nlp/ml-nlp-elser.md b/explore-analyze/machine-learning/nlp/ml-nlp-elser.md index 5c9d004db..a0e3755c4 100644 --- a/explore-analyze/machine-learning/nlp/ml-nlp-elser.md +++ b/explore-analyze/machine-learning/nlp/ml-nlp-elser.md @@ -284,7 +284,7 @@ To learn more about ELSER performance, refer to the [Benchmark information](#els ## Pre-cleaning input text [pre-cleaning] -The quality of the input text significantly affects the quality of the embeddings. To achieve the best results, it’s recommended to clean the input text before generating embeddings. The exact preprocessing you may need to do heavily depends on your text. For example, if your text contains HTML tags, use the [HTML strip processor](elasticsearch://reference/ingestion-tools/enrich-processor/htmlstrip-processor.md) in an ingest pipeline to remove unnecessary elements. Always review and clean your input text before ingestion to eliminate any irrelevant entities that might affect the results. +The quality of the input text significantly affects the quality of the embeddings. To achieve the best results, it’s recommended to clean the input text before generating embeddings. The exact preprocessing you may need to do heavily depends on your text. For example, if your text contains HTML tags, use the [HTML strip processor](elasticsearch://reference/enrich-processor/htmlstrip-processor.md) in an ingest pipeline to remove unnecessary elements. Always review and clean your input text before ingestion to eliminate any irrelevant entities that might affect the results. ## Recommendations for using ELSER [elser-recommendations] diff --git a/explore-analyze/machine-learning/nlp/ml-nlp-inference.md b/explore-analyze/machine-learning/nlp/ml-nlp-inference.md index 8b8dd08f6..48fc52e27 100644 --- a/explore-analyze/machine-learning/nlp/ml-nlp-inference.md +++ b/explore-analyze/machine-learning/nlp/ml-nlp-inference.md @@ -25,7 +25,7 @@ In {{kib}}, you can create and edit pipelines in **{{stack-manage-app}}** > **In ::: 1. Click **Create pipeline** or edit an existing pipeline. -2. Add an [{{infer}} processor](elasticsearch://reference/ingestion-tools/enrich-processor/inference-processor.md) to your pipeline: +2. Add an [{{infer}} processor](elasticsearch://reference/enrich-processor/inference-processor.md) to your pipeline: 1. Click **Add a processor** and select the **{{infer-cap}}** processor type. 2. Set **Model ID** to the name of your trained model, for example `elastic__distilbert-base-cased-finetuned-conll03-english` or `lang_ident_model_1`. @@ -39,7 +39,7 @@ In {{kib}}, you can create and edit pipelines in **{{stack-manage-app}}** > **In } ``` - 2. You can also optionally add [classification configuration options](elasticsearch://reference/ingestion-tools/enrich-processor/inference-processor.md#inference-processor-classification-opt) in the **{{infer-cap}} configuration** section. For example, to include the top five language predictions: + 2. You can also optionally add [classification configuration options](elasticsearch://reference/enrich-processor/inference-processor.md#inference-processor-classification-opt) in the **{{infer-cap}} configuration** section. For example, to include the top five language predictions: ```js { @@ -51,7 +51,7 @@ In {{kib}}, you can create and edit pipelines in **{{stack-manage-app}}** > **In 4. Click **Add** to save the processor. -3. Optional: Add a [set processor](elasticsearch://reference/ingestion-tools/enrich-processor/set-processor.md) to index the ingest timestamp. +3. Optional: Add a [set processor](elasticsearch://reference/enrich-processor/set-processor.md) to index the ingest timestamp. 1. Click **Add a processor** and select the **Set** processor type. 2. Choose a name for the field (such as `event.ingested`) and set its value to `{{{_ingest.timestamp}}}`. For more details, refer to [Access ingest metadata in a processor](../../../manage-data/ingest/transform-enrich/ingest-pipelines.md#access-ingest-metadata). diff --git a/explore-analyze/machine-learning/nlp/ml-nlp-ner-example.md b/explore-analyze/machine-learning/nlp/ml-nlp-ner-example.md index 472bb06a7..09441c35b 100644 --- a/explore-analyze/machine-learning/nlp/ml-nlp-ner-example.md +++ b/explore-analyze/machine-learning/nlp/ml-nlp-ner-example.md @@ -113,7 +113,7 @@ Using the example text "Elastic is headquartered in Mountain View, California.", ## Add the NER model to an {{infer}} ingest pipeline [ex-ner-ingest] -You can perform bulk {{infer}} on documents as they are ingested by using an [{{infer}} processor](elasticsearch://reference/ingestion-tools/enrich-processor/inference-processor.md) in your ingest pipeline. The novel *Les Misérables* by Victor Hugo is used as an example for {{infer}} in the following example. [Download](https://github.com/elastic/stack-docs/blob/8.5/docs/en/stack/ml/nlp/data/les-miserables-nd.json) the novel text split by paragraph as a JSON file, then upload it by using the [Data Visualizer](../../../manage-data/ingest/upload-data-files.md). Give the new index the name `les-miserables` when uploading the file. +You can perform bulk {{infer}} on documents as they are ingested by using an [{{infer}} processor](elasticsearch://reference/enrich-processor/inference-processor.md) in your ingest pipeline. The novel *Les Misérables* by Victor Hugo is used as an example for {{infer}} in the following example. [Download](https://github.com/elastic/stack-docs/blob/8.5/docs/en/stack/ml/nlp/data/les-miserables-nd.json) the novel text split by paragraph as a JSON file, then upload it by using the [Data Visualizer](../../../manage-data/ingest/upload-data-files.md). Give the new index the name `les-miserables` when uploading the file. Now create an ingest pipeline either in the [Stack management UI](ml-nlp-inference.md#ml-nlp-inference-processor) or by using the API: diff --git a/explore-analyze/machine-learning/nlp/ml-nlp-text-emb-vector-search-example.md b/explore-analyze/machine-learning/nlp/ml-nlp-text-emb-vector-search-example.md index 16bf2f74b..84522f9b2 100644 --- a/explore-analyze/machine-learning/nlp/ml-nlp-text-emb-vector-search-example.md +++ b/explore-analyze/machine-learning/nlp/ml-nlp-text-emb-vector-search-example.md @@ -112,7 +112,7 @@ Upload the file by using the [Data Visualizer](../../../manage-data/ingest/uploa ## Add the text embedding model to an {{infer}} ingest pipeline [ex-text-emb-ingest] -Process the initial data with an [{{infer}} processor](elasticsearch://reference/ingestion-tools/enrich-processor/inference-processor.md). It adds an embedding for each passage. For this, create a text embedding ingest pipeline and then reindex the initial data with this pipeline. +Process the initial data with an [{{infer}} processor](elasticsearch://reference/enrich-processor/inference-processor.md). It adds an embedding for each passage. For this, create a text embedding ingest pipeline and then reindex the initial data with this pipeline. Now create an ingest pipeline either in the [{{stack-manage-app}} UI](ml-nlp-inference.md#ml-nlp-inference-processor) or by using the API: diff --git a/explore-analyze/query-filter/aggregations.md b/explore-analyze/query-filter/aggregations.md index 9fa5985b7..bd0142de1 100644 --- a/explore-analyze/query-filter/aggregations.md +++ b/explore-analyze/query-filter/aggregations.md @@ -17,13 +17,13 @@ An aggregation summarizes your data as metrics, statistics, or other analytics. {{es}} organizes aggregations into three categories: -* [Metric](elasticsearch://reference/data-analysis/aggregations/metrics.md) aggregations that calculate metrics, such as a sum or average, from field values. -* [Bucket](elasticsearch://reference/data-analysis/aggregations/bucket.md) aggregations that group documents into buckets, also called bins, based on field values, ranges, or other criteria. -* [Pipeline](elasticsearch://reference/data-analysis/aggregations/pipeline.md) aggregations that take input from other aggregations instead of documents or fields. +* [Metric](elasticsearch://reference/aggregations/metrics.md) aggregations that calculate metrics, such as a sum or average, from field values. +* [Bucket](elasticsearch://reference/aggregations/bucket.md) aggregations that group documents into buckets, also called bins, based on field values, ranges, or other criteria. +* [Pipeline](elasticsearch://reference/aggregations/pipeline.md) aggregations that take input from other aggregations instead of documents or fields. ## Run an aggregation [run-an-agg] -You can run aggregations as part of a [search](../../solutions/search/querying-for-search.md) by specifying the [search API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-search)'s `aggs` parameter. The following search runs a [terms aggregation](elasticsearch://reference/data-analysis/aggregations/search-aggregations-bucket-terms-aggregation.md) on `my-field`: +You can run aggregations as part of a [search](../../solutions/search/querying-for-search.md) by specifying the [search API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-search)'s `aggs` parameter. The following search runs a [terms aggregation](elasticsearch://reference/aggregations/search-aggregations-bucket-terms-aggregation.md) on `my-field`: ```console GET /my-index-000001/_search @@ -137,7 +137,7 @@ GET /my-index-000001/_search ## Run sub-aggregations [run-sub-aggs] -Bucket aggregations support bucket or metric sub-aggregations. For example, a terms aggregation with an [avg](elasticsearch://reference/data-analysis/aggregations/search-aggregations-metrics-avg-aggregation.md) sub-aggregation calculates an average value for each bucket of documents. There is no level or depth limit for nesting sub-aggregations. +Bucket aggregations support bucket or metric sub-aggregations. For example, a terms aggregation with an [avg](elasticsearch://reference/aggregations/search-aggregations-metrics-avg-aggregation.md) sub-aggregation calculates an average value for each bucket of documents. There is no level or depth limit for nesting sub-aggregations. ```console GET /my-index-000001/_search @@ -244,7 +244,7 @@ GET /my-index-000001/_search?typed_keys The response returns the aggregation type as a prefix to the aggregation’s name. ::::{important} -Some aggregations return a different aggregation type from the type in the request. For example, the terms, [significant terms](elasticsearch://reference/data-analysis/aggregations/search-aggregations-bucket-significantterms-aggregation.md), and [percentiles](elasticsearch://reference/data-analysis/aggregations/search-aggregations-metrics-percentile-aggregation.md) aggregations return different aggregations types depending on the data type of the aggregated field. +Some aggregations return a different aggregation type from the type in the request. For example, the terms, [significant terms](elasticsearch://reference/aggregations/search-aggregations-bucket-significantterms-aggregation.md), and [percentiles](elasticsearch://reference/aggregations/search-aggregations-metrics-percentile-aggregation.md) aggregations return different aggregations types depending on the data type of the aggregated field. :::: ```console-result @@ -284,7 +284,7 @@ GET /my-index-000001/_search?size=0 } ``` -Scripts calculate field values dynamically, which adds a little overhead to the aggregation. In addition to the time spent calculating, some aggregations like [`terms`](elasticsearch://reference/data-analysis/aggregations/search-aggregations-bucket-terms-aggregation.md) and [`filters`](elasticsearch://reference/data-analysis/aggregations/search-aggregations-bucket-filters-aggregation.md) can’t use some of their optimizations with runtime fields. In total, performance costs for using a runtime field varies from aggregation to aggregation. +Scripts calculate field values dynamically, which adds a little overhead to the aggregation. In addition to the time spent calculating, some aggregations like [`terms`](elasticsearch://reference/aggregations/search-aggregations-bucket-terms-aggregation.md) and [`filters`](elasticsearch://reference/aggregations/search-aggregations-bucket-filters-aggregation.md) can’t use some of their optimizations with runtime fields. In total, performance costs for using a runtime field varies from aggregation to aggregation. ## Aggregation caches [agg-caches] diff --git a/explore-analyze/query-filter/aggregations/tutorial-analyze-ecommerce-data-with-aggregations-using-query-dsl.md b/explore-analyze/query-filter/aggregations/tutorial-analyze-ecommerce-data-with-aggregations-using-query-dsl.md index def3d4f77..2559d0c46 100644 --- a/explore-analyze/query-filter/aggregations/tutorial-analyze-ecommerce-data-with-aggregations-using-query-dsl.md +++ b/explore-analyze/query-filter/aggregations/tutorial-analyze-ecommerce-data-with-aggregations-using-query-dsl.md @@ -290,7 +290,7 @@ Let’s start by calculating important metrics about orders and customers. ### Get average order size [aggregations-tutorial-order-value] -Calculate the average order value across all orders in the dataset using the [`avg`](elasticsearch://reference/data-analysis/aggregations/search-aggregations-metrics-avg-aggregation.md) aggregation. +Calculate the average order value across all orders in the dataset using the [`avg`](elasticsearch://reference/aggregations/search-aggregations-metrics-avg-aggregation.md) aggregation. ```console GET kibana_sample_data_ecommerce/_search @@ -347,7 +347,7 @@ GET kibana_sample_data_ecommerce/_search ### Get multiple order statistics at once [aggregations-tutorial-order-stats] -Calculate multiple statistics about orders in one request using the [`stats`](elasticsearch://reference/data-analysis/aggregations/search-aggregations-metrics-stats-aggregation.md) aggregation. +Calculate multiple statistics about orders in one request using the [`stats`](elasticsearch://reference/aggregations/search-aggregations-metrics-stats-aggregation.md) aggregation. ```console GET kibana_sample_data_ecommerce/_search @@ -391,7 +391,7 @@ GET kibana_sample_data_ecommerce/_search :::: ::::{tip} -The [stats aggregation](elasticsearch://reference/data-analysis/aggregations/search-aggregations-metrics-stats-aggregation.md) is more efficient than running individual min, max, avg, and sum aggregations. +The [stats aggregation](elasticsearch://reference/aggregations/search-aggregations-metrics-stats-aggregation.md) is more efficient than running individual min, max, avg, and sum aggregations. :::: @@ -401,7 +401,7 @@ Let’s group orders in different ways to understand sales patterns. ### Break down sales by category [aggregations-tutorial-category-breakdown] -Group orders by category to see which product categories are most popular, using the [`terms`](elasticsearch://reference/data-analysis/aggregations/search-aggregations-bucket-terms-aggregation.md) aggregation. +Group orders by category to see which product categories are most popular, using the [`terms`](elasticsearch://reference/aggregations/search-aggregations-bucket-terms-aggregation.md) aggregation. ```console GET kibana_sample_data_ecommerce/_search @@ -476,7 +476,7 @@ GET kibana_sample_data_ecommerce/_search } ``` -1. Due to Elasticsearch’s distributed architecture, when [terms aggregations](elasticsearch://reference/data-analysis/aggregations/search-aggregations-bucket-terms-aggregation.md) run across multiple shards, the doc counts may have a small margin of error. This value indicates the maximum possible error in the counts. +1. Due to Elasticsearch’s distributed architecture, when [terms aggregations](elasticsearch://reference/aggregations/search-aggregations-bucket-terms-aggregation.md) run across multiple shards, the doc counts may have a small margin of error. This value indicates the maximum possible error in the counts. 2. Count of documents in categories beyond the requested size. 3. Array of category buckets, ordered by count. 4. Category name. @@ -486,7 +486,7 @@ GET kibana_sample_data_ecommerce/_search ### Track daily sales patterns [aggregations-tutorial-daily-sales] -Group orders by day to track daily sales patterns using the [`date_histogram`](elasticsearch://reference/data-analysis/aggregations/search-aggregations-bucket-datehistogram-aggregation.md) aggregation. +Group orders by day to track daily sales patterns using the [`date_histogram`](elasticsearch://reference/aggregations/search-aggregations-bucket-datehistogram-aggregation.md) aggregation. ```console GET kibana_sample_data_ecommerce/_search @@ -507,7 +507,7 @@ GET kibana_sample_data_ecommerce/_search 1. Descriptive name for the time-series aggregation results. 2. The `date_histogram` aggregation groups documents into time-based buckets, similar to terms aggregation but for dates. -3. Uses [calendar and fixed time intervals](elasticsearch://reference/data-analysis/aggregations/search-aggregations-bucket-datehistogram-aggregation.md#calendar_and_fixed_intervals) to handle months with different lengths. `"day"` ensures consistent daily grouping regardless of timezone. +3. Uses [calendar and fixed time intervals](elasticsearch://reference/aggregations/search-aggregations-bucket-datehistogram-aggregation.md#calendar_and_fixed_intervals) to handle months with different lengths. `"day"` ensures consistent daily grouping regardless of timezone. 4. Formats dates in response using [date patterns](elasticsearch://reference/elasticsearch/mapping-reference/mapping-date-format.md) (e.g. "yyyy-MM-dd"). Refer to [date math expressions](elasticsearch://reference/elasticsearch/rest-apis/common-options.md#date-math) for additional options. 5. When `min_doc_count` is 0, returns buckets for days with no orders, useful for continuous time series visualization. @@ -705,7 +705,7 @@ GET kibana_sample_data_ecommerce/_search ## Combine metrics with groupings [aggregations-tutorial-combined-analysis] -Now let’s calculate [metrics](elasticsearch://reference/data-analysis/aggregations/metrics.md) within each group to get deeper insights. +Now let’s calculate [metrics](elasticsearch://reference/aggregations/metrics.md) within each group to get deeper insights. ### Compare category performance [aggregations-tutorial-category-metrics] @@ -827,7 +827,7 @@ GET kibana_sample_data_ecommerce/_search ``` 1. Daily revenue -2. Uses the [`cardinality`](elasticsearch://reference/data-analysis/aggregations/search-aggregations-metrics-cardinality-aggregation.md) aggregation to count unique customers per day +2. Uses the [`cardinality`](elasticsearch://reference/aggregations/search-aggregations-metrics-cardinality-aggregation.md) aggregation to count unique customers per day 3. Average number of items per order ::::{dropdown} Example response @@ -1297,11 +1297,11 @@ GET kibana_sample_data_ecommerce/_search ## Track trends and patterns [aggregations-tutorial-trends] -You can use [pipeline aggregations](elasticsearch://reference/data-analysis/aggregations/pipeline.md) on the results of other aggregations. Let’s analyze how metrics change over time. +You can use [pipeline aggregations](elasticsearch://reference/aggregations/pipeline.md) on the results of other aggregations. Let’s analyze how metrics change over time. ### Smooth out daily fluctuations [aggregations-tutorial-moving-average] -Moving averages help identify trends by reducing day-to-day noise in the data. Let’s observe sales trends more clearly by smoothing daily revenue variations, using the [Moving Function](elasticsearch://reference/data-analysis/aggregations/search-aggregations-pipeline-movfn-aggregation.md) aggregation. +Moving averages help identify trends by reducing day-to-day noise in the data. Let’s observe sales trends more clearly by smoothing daily revenue variations, using the [Moving Function](elasticsearch://reference/aggregations/search-aggregations-pipeline-movfn-aggregation.md) aggregation. ```console GET kibana_sample_data_ecommerce/_search @@ -1724,7 +1724,7 @@ Notice how the smoothed values lag behind the actual values - this is because th ### Track running totals [aggregations-tutorial-cumulative] -Track running totals over time using the [`cumulative_sum`](elasticsearch://reference/data-analysis/aggregations/search-aggregations-pipeline-cumulative-sum-aggregation.md) aggregation. +Track running totals over time using the [`cumulative_sum`](elasticsearch://reference/aggregations/search-aggregations-pipeline-cumulative-sum-aggregation.md) aggregation. ```console GET kibana_sample_data_ecommerce/_search diff --git a/explore-analyze/query-filter/languages/querydsl.md b/explore-analyze/query-filter/languages/querydsl.md index f8372b050..adb1375a0 100644 --- a/explore-analyze/query-filter/languages/querydsl.md +++ b/explore-analyze/query-filter/languages/querydsl.md @@ -42,9 +42,9 @@ Because aggregations leverage the same data structures used for search, they are The following aggregation types are available: -* [Metric](elasticsearch://reference/data-analysis/aggregations/metrics.md): Calculate metrics, such as a sum or average, from field values. -* [Bucket](elasticsearch://reference/data-analysis/aggregations/bucket.md): Group documents into buckets based on field values, ranges, or other criteria. -* [Pipeline](elasticsearch://reference/data-analysis/aggregations/pipeline.md): Run aggregations on the results of other aggregations. +* [Metric](elasticsearch://reference/aggregations/metrics.md): Calculate metrics, such as a sum or average, from field values. +* [Bucket](elasticsearch://reference/aggregations/bucket.md): Group documents into buckets based on field values, ranges, or other criteria. +* [Pipeline](elasticsearch://reference/aggregations/pipeline.md): Run aggregations on the results of other aggregations. Run aggregations by specifying the [search API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-search)'s `aggs` parameter. Learn more in [Run an aggregation](/explore-analyze/query-filter/aggregations.md#run-an-agg). @@ -132,7 +132,7 @@ Filter context applies when a query clause is passed to a `filter` parameter, su * `filter` or `must_not` parameters in [`bool`](elasticsearch://reference/query-languages/query-dsl/query-dsl-bool-query.md) queries * `filter` parameter in [`constant_score`](elasticsearch://reference/query-languages/query-dsl/query-dsl-constant-score-query.md) queries -* [`filter`](elasticsearch://reference/data-analysis/aggregations/search-aggregations-bucket-filter-aggregation.md) aggregations +* [`filter`](elasticsearch://reference/aggregations/search-aggregations-bucket-filter-aggregation.md) aggregations Filters optimize query performance and efficiency, especially for structured data queries and when combined with full-text searches. diff --git a/explore-analyze/query-filter/languages/sql-functions-aggs.md b/explore-analyze/query-filter/languages/sql-functions-aggs.md index 6ca568aaf..52b005d1a 100644 --- a/explore-analyze/query-filter/languages/sql-functions-aggs.md +++ b/explore-analyze/query-filter/languages/sql-functions-aggs.md @@ -556,8 +556,8 @@ PERCENTILE( 1. a numeric field. If this field contains only `null` values, the function returns `null`. Otherwise, the function ignores `null` values in this field. 2. a numeric expression (must be a constant and not based on a field). If `null`, the function returns `null`. -3. optional string literal for the [percentile algorithm](elasticsearch://reference/data-analysis/aggregations/search-aggregations-metrics-percentile-aggregation.md#search-aggregations-metrics-percentile-aggregation-approximation). Possible values: `tdigest` or `hdr`. Defaults to `tdigest`. -4. optional numeric literal that configures the [percentile algorithm](elasticsearch://reference/data-analysis/aggregations/search-aggregations-metrics-percentile-aggregation.md#search-aggregations-metrics-percentile-aggregation-approximation). Configures `compression` for `tdigest` or `number_of_significant_value_digits` for `hdr`. The default is the same as that of the backing algorithm. +3. optional string literal for the [percentile algorithm](elasticsearch://reference/aggregations/search-aggregations-metrics-percentile-aggregation.md#search-aggregations-metrics-percentile-aggregation-approximation). Possible values: `tdigest` or `hdr`. Defaults to `tdigest`. +4. optional numeric literal that configures the [percentile algorithm](elasticsearch://reference/aggregations/search-aggregations-metrics-percentile-aggregation.md#search-aggregations-metrics-percentile-aggregation-approximation). Configures `compression` for `tdigest` or `number_of_significant_value_digits` for `hdr`. The default is the same as that of the backing algorithm. **Output**: `double` numeric value @@ -627,8 +627,8 @@ PERCENTILE_RANK( 1. a numeric field. If this field contains only `null` values, the function returns `null`. Otherwise, the function ignores `null` values in this field. 2. a numeric expression (must be a constant and not based on a field). If `null`, the function returns `null`. -3. optional string literal for the [percentile algorithm](elasticsearch://reference/data-analysis/aggregations/search-aggregations-metrics-percentile-aggregation.md#search-aggregations-metrics-percentile-aggregation-approximation). Possible values: `tdigest` or `hdr`. Defaults to `tdigest`. -4. optional numeric literal that configures the [percentile algorithm](elasticsearch://reference/data-analysis/aggregations/search-aggregations-metrics-percentile-aggregation.md#search-aggregations-metrics-percentile-aggregation-approximation). Configures `compression` for `tdigest` or `number_of_significant_value_digits` for `hdr`. The default is the same as that of the backing algorithm. +3. optional string literal for the [percentile algorithm](elasticsearch://reference/aggregations/search-aggregations-metrics-percentile-aggregation.md#search-aggregations-metrics-percentile-aggregation-approximation). Possible values: `tdigest` or `hdr`. Defaults to `tdigest`. +4. optional numeric literal that configures the [percentile algorithm](elasticsearch://reference/aggregations/search-aggregations-metrics-percentile-aggregation.md#search-aggregations-metrics-percentile-aggregation-approximation). Configures `compression` for `tdigest` or `number_of_significant_value_digits` for `hdr`. The default is the same as that of the backing algorithm. **Output**: `double` numeric value diff --git a/explore-analyze/query-filter/languages/sql-functions-grouping.md b/explore-analyze/query-filter/languages/sql-functions-grouping.md index 3a8723a6d..94d64d38d 100644 --- a/explore-analyze/query-filter/languages/sql-functions-grouping.md +++ b/explore-analyze/query-filter/languages/sql-functions-grouping.md @@ -39,7 +39,7 @@ bucket_key = Math.floor(value / interval) * interval ``` ::::{note} -The histogram in SQL does **NOT** return empty buckets for missing intervals as the traditional [histogram](elasticsearch://reference/data-analysis/aggregations/search-aggregations-bucket-histogram-aggregation.md) and [date histogram](elasticsearch://reference/data-analysis/aggregations/search-aggregations-bucket-datehistogram-aggregation.md). Such behavior does not fit conceptually in SQL which treats all missing values as `null`; as such the histogram places all missing values in the `null` group. +The histogram in SQL does **NOT** return empty buckets for missing intervals as the traditional [histogram](elasticsearch://reference/aggregations/search-aggregations-bucket-histogram-aggregation.md) and [date histogram](elasticsearch://reference/aggregations/search-aggregations-bucket-datehistogram-aggregation.md). Such behavior does not fit conceptually in SQL which treats all missing values as `null`; as such the histogram places all missing values in the `null` group. :::: @@ -137,7 +137,7 @@ When the histogram in SQL is applied on **DATE** type instead of **DATETIME**, t ::::{important} -All intervals specified for a date/time HISTOGRAM will use a [fixed interval](elasticsearch://reference/data-analysis/aggregations/search-aggregations-bucket-datehistogram-aggregation.md) in their `date_histogram` aggregation definition, with the notable exceptions of `INTERVAL '1' YEAR`, `INTERVAL '1' MONTH` and `INTERVAL '1' DAY` where a calendar interval is used. The choice for a calendar interval was made for having a more intuitive result for YEAR, MONTH and DAY groupings. In the case of YEAR, for example, the calendar intervals consider a one year bucket as the one starting on January 1st that specific year, whereas a fixed interval one-year-bucket considers one year as a number of milliseconds (for example, `31536000000ms` corresponding to 365 days, 24 hours per day, 60 minutes per hour etc.). With fixed intervals, the day of February 5th, 2019 for example, belongs to a bucket that starts on December 20th, 2018 and {{es}} (and implicitly Elasticsearch SQL) would have returned the year 2018 for a date that’s actually in 2019. With calendar interval this behavior is more intuitive, having the day of February 5th, 2019 actually belonging to the 2019 year bucket. +All intervals specified for a date/time HISTOGRAM will use a [fixed interval](elasticsearch://reference/aggregations/search-aggregations-bucket-datehistogram-aggregation.md) in their `date_histogram` aggregation definition, with the notable exceptions of `INTERVAL '1' YEAR`, `INTERVAL '1' MONTH` and `INTERVAL '1' DAY` where a calendar interval is used. The choice for a calendar interval was made for having a more intuitive result for YEAR, MONTH and DAY groupings. In the case of YEAR, for example, the calendar intervals consider a one year bucket as the one starting on January 1st that specific year, whereas a fixed interval one-year-bucket considers one year as a number of milliseconds (for example, `31536000000ms` corresponding to 365 days, 24 hours per day, 60 minutes per hour etc.). With fixed intervals, the day of February 5th, 2019 for example, belongs to a bucket that starts on December 20th, 2018 and {{es}} (and implicitly Elasticsearch SQL) would have returned the year 2018 for a date that’s actually in 2019. With calendar interval this behavior is more intuitive, having the day of February 5th, 2019 actually belonging to the 2019 year bucket. :::: diff --git a/explore-analyze/scripting/modules-scripting-security.md b/explore-analyze/scripting/modules-scripting-security.md index 79ef2fd30..7135bef5e 100644 --- a/explore-analyze/scripting/modules-scripting-security.md +++ b/explore-analyze/scripting/modules-scripting-security.md @@ -16,7 +16,7 @@ The second layer of security is the [Java Security Manager](https://www.oracle.c {{es}} uses [seccomp](https://en.wikipedia.org/wiki/Seccomp) in Linux, [Seatbelt](https://www.chromium.org/developers/design-documents/sandbox/osx-sandboxing-design) in macOS, and [ActiveProcessLimit](https://msdn.microsoft.com/en-us/library/windows/desktop/ms684147) on Windows as additional security layers to prevent {{es}} from forking or running other processes. -Finally, scripts used in [scripted metrics aggregations](elasticsearch://reference/data-analysis/aggregations/search-aggregations-metrics-scripted-metric-aggregation.md) can be restricted to a defined list of scripts, or forbidden altogether. This can prevent users from running particularly slow or resource intensive aggregation queries. +Finally, scripts used in [scripted metrics aggregations](elasticsearch://reference/aggregations/search-aggregations-metrics-scripted-metric-aggregation.md) can be restricted to a defined list of scripts, or forbidden altogether. This can prevent users from running particularly slow or resource intensive aggregation queries. You can modify the following script settings to restrict the type of scripts that are allowed to run, and control the available [contexts](elasticsearch://reference/scripting-languages/painless/painless-contexts.md) that scripts can run in. To implement additional layers in your defense in depth strategy, follow the [{{es}} security principles](../../deploy-manage/security.md). @@ -50,7 +50,7 @@ script.allowed_contexts: score, update ## Allowed scripts in scripted metrics aggregations [allowed-script-in-aggs-settings] -By default, all scripts are permitted in [scripted metrics aggregations](elasticsearch://reference/data-analysis/aggregations/search-aggregations-metrics-scripted-metric-aggregation.md). To restrict the set of allowed scripts, set [`search.aggs.only_allowed_metric_scripts`](elasticsearch://reference/elasticsearch/configuration-reference/search-settings.md#search-settings-only-allowed-scripts) to `true` and provide the allowed scripts using [`search.aggs.allowed_inline_metric_scripts`](elasticsearch://reference/elasticsearch/configuration-reference/search-settings.md#search-settings-allowed-inline-scripts) and/or [`search.aggs.allowed_stored_metric_scripts`](elasticsearch://reference/elasticsearch/configuration-reference/search-settings.md#search-settings-allowed-stored-scripts). +By default, all scripts are permitted in [scripted metrics aggregations](elasticsearch://reference/aggregations/search-aggregations-metrics-scripted-metric-aggregation.md). To restrict the set of allowed scripts, set [`search.aggs.only_allowed_metric_scripts`](elasticsearch://reference/elasticsearch/configuration-reference/search-settings.md#search-settings-only-allowed-scripts) to `true` and provide the allowed scripts using [`search.aggs.allowed_inline_metric_scripts`](elasticsearch://reference/elasticsearch/configuration-reference/search-settings.md#search-settings-allowed-inline-scripts) and/or [`search.aggs.allowed_stored_metric_scripts`](elasticsearch://reference/elasticsearch/configuration-reference/search-settings.md#search-settings-allowed-stored-scripts). To disallow certain script types, omit the corresponding script list (`search.aggs.allowed_inline_metric_scripts` or `search.aggs.allowed_stored_metric_scripts`) or set it to an empty array. When both script lists are not empty, the given stored scripts and the given inline scripts will be allowed. diff --git a/explore-analyze/scripting/scripts-search-speed.md b/explore-analyze/scripting/scripts-search-speed.md index ba2bbd18f..f9fbdf679 100644 --- a/explore-analyze/scripting/scripts-search-speed.md +++ b/explore-analyze/scripting/scripts-search-speed.md @@ -72,7 +72,7 @@ PUT /my_test_scores/_mapping } ``` -Next, use an [ingest pipeline](../../manage-data/ingest/transform-enrich/ingest-pipelines.md) containing the [script processor](elasticsearch://reference/ingestion-tools/enrich-processor/script-processor.md) to calculate the sum of `math_score` and `verbal_score` and index it in the `total_score` field. +Next, use an [ingest pipeline](../../manage-data/ingest/transform-enrich/ingest-pipelines.md) containing the [script processor](elasticsearch://reference/enrich-processor/script-processor.md) to calculate the sum of `math_score` and `verbal_score` and index it in the `total_score` field. ```console PUT _ingest/pipeline/my_test_scores_pipeline diff --git a/explore-analyze/transforms/ecommerce-transforms.md b/explore-analyze/transforms/ecommerce-transforms.md index ea9bf3e0b..c06a2bd2f 100644 --- a/explore-analyze/transforms/ecommerce-transforms.md +++ b/explore-analyze/transforms/ecommerce-transforms.md @@ -27,7 +27,7 @@ mapped_pages: :screenshot: ::: - Group the data by customer ID and add one or more aggregations to learn more about each customer’s orders. For example, let’s calculate the sum of products they purchased, the total price of their purchases, the maximum number of products that they purchased in a single order, and their total number of orders. We’ll accomplish this by using the [`sum` aggregation](elasticsearch://reference/data-analysis/aggregations/search-aggregations-metrics-sum-aggregation.md) on the `total_quantity` and `taxless_total_price` fields, the [`max` aggregation](elasticsearch://reference/data-analysis/aggregations/search-aggregations-metrics-max-aggregation.md) on the `total_quantity` field, and the [`cardinality` aggregation](elasticsearch://reference/data-analysis/aggregations/search-aggregations-metrics-cardinality-aggregation.md) on the `order_id` field: + Group the data by customer ID and add one or more aggregations to learn more about each customer’s orders. For example, let’s calculate the sum of products they purchased, the total price of their purchases, the maximum number of products that they purchased in a single order, and their total number of orders. We’ll accomplish this by using the [`sum` aggregation](elasticsearch://reference/aggregations/search-aggregations-metrics-sum-aggregation.md) on the `total_quantity` and `taxless_total_price` fields, the [`max` aggregation](elasticsearch://reference/aggregations/search-aggregations-metrics-max-aggregation.md) on the `total_quantity` field, and the [`cardinality` aggregation](elasticsearch://reference/aggregations/search-aggregations-metrics-cardinality-aggregation.md) on the `order_id` field: :::{image} /explore-analyze/images/elasticsearch-reference-ecommerce-pivot2.png :alt: Adding multiple aggregations to a {{transform}} in {{kib}} @@ -171,7 +171,7 @@ mapped_pages: :::: 5. Optional: Create the destination index. - If the destination index does not exist, it is created the first time you start your {{transform}}. A pivot transform deduces the mappings for the destination index from the source indices and the transform aggregations. If there are fields in the destination index that are derived from scripts (for example, if you use [`scripted_metrics`](elasticsearch://reference/data-analysis/aggregations/search-aggregations-metrics-scripted-metric-aggregation.md) or [`bucket_scripts`](elasticsearch://reference/data-analysis/aggregations/search-aggregations-pipeline-bucket-script-aggregation.md) aggregations), they’re created with [dynamic mappings](../../manage-data/data-store/mapping/dynamic-mapping.md). You can use the preview {{transform}} API to preview the mappings it will use for the destination index. In {{kib}}, if you copied the API request to your clipboard, paste it into the console, then refer to the `generated_dest_index` object in the API response. + If the destination index does not exist, it is created the first time you start your {{transform}}. A pivot transform deduces the mappings for the destination index from the source indices and the transform aggregations. If there are fields in the destination index that are derived from scripts (for example, if you use [`scripted_metrics`](elasticsearch://reference/aggregations/search-aggregations-metrics-scripted-metric-aggregation.md) or [`bucket_scripts`](elasticsearch://reference/aggregations/search-aggregations-pipeline-bucket-script-aggregation.md) aggregations), they’re created with [dynamic mappings](../../manage-data/data-store/mapping/dynamic-mapping.md). You can use the preview {{transform}} API to preview the mappings it will use for the destination index. In {{kib}}, if you copied the API request to your clipboard, paste it into the console, then refer to the `generated_dest_index` object in the API response. ::::{note} {{transforms-cap}} might have more configuration options provided by the APIs than the options available in {{kib}}. For example, you can set an ingest pipeline for `dest` by calling the [Create {{transform}}](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-transform-put-transform). For all the {{transform}} configuration options, refer to the [documentation](https://www.elastic.co/docs/api/doc/elasticsearch/group/endpoint-transform). :::: diff --git a/explore-analyze/transforms/transform-checkpoints.md b/explore-analyze/transforms/transform-checkpoints.md index 479f28b63..cc479b870 100644 --- a/explore-analyze/transforms/transform-checkpoints.md +++ b/explore-analyze/transforms/transform-checkpoints.md @@ -43,7 +43,7 @@ If the cluster experiences unsuitable performance degradation due to the {{trans In most cases, it is strongly recommended to use the ingest timestamp of the source indices for syncing the {{transform}}. This is the most optimal way for {{transforms}} to be able to identify new changes. If your data source follows the [ECS standard](ecs://reference/index.md), you might already have an [`event.ingested`](ecs://reference/ecs-event.md#field-event-ingested) field. In this case, use `event.ingested` as the `sync`.`time`.`field` property of your {{transform}}. -If you don’t have a `event.ingested` field or it isn’t populated, you can set it by using an ingest pipeline. Create an ingest pipeline either using the [ingest pipeline API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-ingest-put-pipeline) (like the example below) or via {{kib}} under **Stack Management > Ingest Pipelines**. Use a [`set` processor](elasticsearch://reference/ingestion-tools/enrich-processor/set-processor.md) to set the field and associate it with the value of the ingest timestamp. +If you don’t have a `event.ingested` field or it isn’t populated, you can set it by using an ingest pipeline. Create an ingest pipeline either using the [ingest pipeline API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-ingest-put-pipeline) (like the example below) or via {{kib}} under **Stack Management > Ingest Pipelines**. Use a [`set` processor](elasticsearch://reference/enrich-processor/set-processor.md) to set the field and associate it with the value of the ingest timestamp. ```console PUT _ingest/pipeline/set_ingest_time diff --git a/explore-analyze/transforms/transform-examples.md b/explore-analyze/transforms/transform-examples.md index 6ddbd5f15..6977f04c7 100644 --- a/explore-analyze/transforms/transform-examples.md +++ b/explore-analyze/transforms/transform-examples.md @@ -94,7 +94,7 @@ It’s possible to answer these questions using aggregations alone, however {{tr ## Finding air carriers with the most delays [example-airline] -This example uses the Flights sample data set to find out which air carrier had the most delays. First, filter the source data such that it excludes all the cancelled flights by using a query filter. Then transform the data to contain the distinct number of flights, the sum of delayed minutes, and the sum of the flight minutes by air carrier. Finally, use a [`bucket_script`](elasticsearch://reference/data-analysis/aggregations/search-aggregations-pipeline-bucket-script-aggregation.md) to determine what percentage of the flight time was actually delay. +This example uses the Flights sample data set to find out which air carrier had the most delays. First, filter the source data such that it excludes all the cancelled flights by using a query filter. Then transform the data to contain the distinct number of flights, the sum of delayed minutes, and the sum of the flight minutes by air carrier. Finally, use a [`bucket_script`](elasticsearch://reference/aggregations/search-aggregations-pipeline-bucket-script-aggregation.md) to determine what percentage of the flight time was actually delay. ```console POST _transform/_preview @@ -415,9 +415,9 @@ This {{transform}} makes it easier to answer questions such as: ## Finding client IPs that sent the most bytes to the server [example-bytes] -This example uses the web log sample data set to find the client IP that sent the most bytes to the server in every hour. The example uses a `pivot` {{transform}} with a [`top_metrics`](elasticsearch://reference/data-analysis/aggregations/search-aggregations-metrics-top-metrics.md) aggregation. +This example uses the web log sample data set to find the client IP that sent the most bytes to the server in every hour. The example uses a `pivot` {{transform}} with a [`top_metrics`](elasticsearch://reference/aggregations/search-aggregations-metrics-top-metrics.md) aggregation. -Group the data by a [date histogram](elasticsearch://reference/data-analysis/aggregations/search-aggregations-bucket-composite-aggregation.md#_date_histogram) on the time field with an interval of one hour. Use a [max aggregation](elasticsearch://reference/data-analysis/aggregations/search-aggregations-metrics-max-aggregation.md) on the `bytes` field to get the maximum amount of data that is sent to the server. Without the `max` aggregation, the API call still returns the client IP that sent the most bytes, however, the amount of bytes that it sent is not returned. In the `top_metrics` property, specify `clientip` and `geo.src`, then sort them by the `bytes` field in descending order. The {{transform}} returns the client IP that sent the biggest amount of data and the 2-letter ISO code of the corresponding location. +Group the data by a [date histogram](elasticsearch://reference/aggregations/search-aggregations-bucket-composite-aggregation.md#_date_histogram) on the time field with an interval of one hour. Use a [max aggregation](elasticsearch://reference/aggregations/search-aggregations-metrics-max-aggregation.md) on the `bytes` field to get the maximum amount of data that is sent to the server. Without the `max` aggregation, the API call still returns the client IP that sent the most bytes, however, the amount of bytes that it sent is not returned. In the `top_metrics` property, specify `clientip` and `geo.src`, then sort them by the `bytes` field in descending order. The {{transform}} returns the client IP that sent the biggest amount of data and the 2-letter ISO code of the corresponding location. ```console POST _transform/_preview diff --git a/explore-analyze/transforms/transform-limitations.md b/explore-analyze/transforms/transform-limitations.md index c1e8cc680..4ff84ca8e 100644 --- a/explore-analyze/transforms/transform-limitations.md +++ b/explore-analyze/transforms/transform-limitations.md @@ -49,7 +49,7 @@ A {{ctransform}} periodically checks for changes to source data. The functionali ### Aggregation responses may be incompatible with destination index mappings [transform-aggresponse-limitations] -When a pivot {{transform}} is first started, it will deduce the mappings required for the destination index. This process is based on the field types of the source index and the aggregations used. If the fields are derived from [`scripted_metrics`](elasticsearch://reference/data-analysis/aggregations/search-aggregations-metrics-scripted-metric-aggregation.md) or [`bucket_scripts`](elasticsearch://reference/data-analysis/aggregations/search-aggregations-pipeline-bucket-script-aggregation.md), [dynamic mappings](../../manage-data/data-store/mapping/dynamic-mapping.md) will be used. In some instances the deduced mappings may be incompatible with the actual data. For example, numeric overflows might occur or dynamically mapped fields might contain both numbers and strings. Please check {{es}} logs if you think this may have occurred. +When a pivot {{transform}} is first started, it will deduce the mappings required for the destination index. This process is based on the field types of the source index and the aggregations used. If the fields are derived from [`scripted_metrics`](elasticsearch://reference/aggregations/search-aggregations-metrics-scripted-metric-aggregation.md) or [`bucket_scripts`](elasticsearch://reference/aggregations/search-aggregations-pipeline-bucket-script-aggregation.md), [dynamic mappings](../../manage-data/data-store/mapping/dynamic-mapping.md) will be used. In some instances the deduced mappings may be incompatible with the actual data. For example, numeric overflows might occur or dynamically mapped fields might contain both numbers and strings. Please check {{es}} logs if you think this may have occurred. You can view the deduced mappings by using the [preview transform API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-transform-preview-transform). See the `generated_dest_index` object in the API response. @@ -57,7 +57,7 @@ If it’s required, you may define custom mappings prior to starting the {{trans ### Batch {{transforms}} may not account for changed documents [transform-batch-limitations] -A batch {{transform}} uses a [composite aggregation](elasticsearch://reference/data-analysis/aggregations/search-aggregations-bucket-composite-aggregation.md) which allows efficient pagination through all buckets. Composite aggregations do not yet support a search context, therefore if the source data is changed (deleted, updated, added) while the batch {{dataframe}} is in progress, then the results may not include these changes. +A batch {{transform}} uses a [composite aggregation](elasticsearch://reference/aggregations/search-aggregations-bucket-composite-aggregation.md) which allows efficient pagination through all buckets. Composite aggregations do not yet support a search context, therefore if the source data is changed (deleted, updated, added) while the batch {{dataframe}} is in progress, then the results may not include these changes. ### {{ctransform-cap}} consistency does not account for deleted or updated documents [transform-consistency-limitations] @@ -119,7 +119,7 @@ If your data uses the [date nanosecond data type](elasticsearch://reference/elas [ILM](../../manage-data/lifecycle/index-lifecycle-management.md) is not recommended to use as a {{transform}} destination index. {{transforms-cap}} update documents in the current destination, and cannot delete documents in the indices previously used by ILM. This may lead to duplicated documents when you use {{transforms}} combined with ILM in case of a rollover. -If you use ILM to have time-based indices, please consider using the [Date index name](elasticsearch://reference/ingestion-tools/enrich-processor/date-index-name-processor.md) instead. The processor works without duplicated documents if your {{transform}} contains a `group_by` based on `date_histogram`. +If you use ILM to have time-based indices, please consider using the [Date index name](elasticsearch://reference/enrich-processor/date-index-name-processor.md) instead. The processor works without duplicated documents if your {{transform}} contains a `group_by` based on `date_histogram`. ## Limitations in {{kib}} [transform-ui-limitations] diff --git a/explore-analyze/transforms/transform-painless-examples.md b/explore-analyze/transforms/transform-painless-examples.md index 962bff1a5..4ed0ecf08 100644 --- a/explore-analyze/transforms/transform-painless-examples.md +++ b/explore-analyze/transforms/transform-painless-examples.md @@ -31,7 +31,7 @@ These examples demonstrate how to use Painless in {{transforms}}. You can learn ## Getting top hits by using scripted metric aggregation [painless-top-hits] -This snippet shows how to find the latest document, in other words the document with the latest timestamp. From a technical perspective, it helps to achieve the function of a [Top hits](elasticsearch://reference/data-analysis/aggregations/search-aggregations-metrics-top-hits-aggregation.md) by using scripted metric aggregation in a {{transform}}, which provides a metric output. +This snippet shows how to find the latest document, in other words the document with the latest timestamp. From a technical perspective, it helps to achieve the function of a [Top hits](elasticsearch://reference/aggregations/search-aggregations-metrics-top-hits-aggregation.md) by using scripted metric aggregation in a {{transform}}, which provides a metric output. ::::{important} This example uses a `scripted_metric` aggregation which is not supported on {{es}} Serverless. @@ -66,7 +66,7 @@ This example uses a `scripted_metric` aggregation which is not supported on {{es 3. The `combine_script` returns `state` from each shard. 4. The `reduce_script` iterates through the value of `s.timestamp_latest` returned by each shard and returns the document with the latest timestamp (`last_doc`). In the response, the top hit (in other words, the `latest_doc`) is nested below the `latest_doc` field. -Check the [scope of scripts](elasticsearch://reference/data-analysis/aggregations/search-aggregations-metrics-scripted-metric-aggregation.md#scripted-metric-aggregation-scope) for detailed explanation on the respective scripts. +Check the [scope of scripts](elasticsearch://reference/aggregations/search-aggregations-metrics-scripted-metric-aggregation.md#scripted-metric-aggregation-scope) for detailed explanation on the respective scripts. You can retrieve the last value in a similar way: @@ -215,7 +215,7 @@ This snippet shows how to extract time based features by using Painless in a {{t ## Getting duration by using bucket script [painless-bucket-script] -This example shows you how to get the duration of a session by client IP from a data log by using [bucket script](elasticsearch://reference/data-analysis/aggregations/search-aggregations-pipeline-bucket-script-aggregation.md). The example uses the {{kib}} sample web logs dataset. +This example shows you how to get the duration of a session by client IP from a data log by using [bucket script](elasticsearch://reference/aggregations/search-aggregations-pipeline-bucket-script-aggregation.md). The example uses the {{kib}} sample web logs dataset. ```console PUT _transform/data_log diff --git a/explore-analyze/transforms/transform-usage.md b/explore-analyze/transforms/transform-usage.md index f7d173064..789d0d55f 100644 --- a/explore-analyze/transforms/transform-usage.md +++ b/explore-analyze/transforms/transform-usage.md @@ -18,11 +18,11 @@ You might want to consider using {{transforms}} instead of aggregations when: In {{ml}}, you often need a complete set of behavioral features rather just the top-N. For example, if you are predicting customer churn, you might look at features such as the number of website visits in the last week, the total number of sales, or the number of emails sent. The {{stack}} {{ml-features}} create models based on this multi-dimensional feature space, so they benefit from the full feature indices that are created by {{transforms}}. - This scenario also applies when you are trying to search across the results of an aggregation or multiple aggregations. Aggregation results can be ordered or filtered, but there are [limitations to ordering](elasticsearch://reference/data-analysis/aggregations/search-aggregations-bucket-terms-aggregation.md#search-aggregations-bucket-terms-aggregation-order) and [filtering by bucket selector](elasticsearch://reference/data-analysis/aggregations/search-aggregations-pipeline-bucket-selector-aggregation.md) is constrained by the maximum number of buckets returned. If you want to search all aggregation results, you need to create the complete {{dataframe}}. If you need to sort or filter the aggregation results by multiple fields, {{transforms}} are particularly useful. + This scenario also applies when you are trying to search across the results of an aggregation or multiple aggregations. Aggregation results can be ordered or filtered, but there are [limitations to ordering](elasticsearch://reference/aggregations/search-aggregations-bucket-terms-aggregation.md#search-aggregations-bucket-terms-aggregation-order) and [filtering by bucket selector](elasticsearch://reference/aggregations/search-aggregations-pipeline-bucket-selector-aggregation.md) is constrained by the maximum number of buckets returned. If you want to search all aggregation results, you need to create the complete {{dataframe}}. If you need to sort or filter the aggregation results by multiple fields, {{transforms}} are particularly useful. * You need to sort aggregation results by a pipeline aggregation. - [Pipeline aggregations](elasticsearch://reference/data-analysis/aggregations/pipeline.md) cannot be used for sorting. Technically, this is because pipeline aggregations are run during the reduce phase after all other aggregations have already completed. If you create a {{transform}}, you can effectively perform multiple passes over the data. + [Pipeline aggregations](elasticsearch://reference/aggregations/pipeline.md) cannot be used for sorting. Technically, this is because pipeline aggregations are run during the reduce phase after all other aggregations have already completed. If you create a {{transform}}, you can effectively perform multiple passes over the data. * You want to create summary tables to optimize queries. @@ -30,6 +30,6 @@ You might want to consider using {{transforms}} instead of aggregations when: * You need to account for late-arriving data. - In some cases, data might not be immediately available when a {{transform}} runs, leading to missing records in the destination index. This can happen due to ingestion delays, where documents take a few seconds or minutes to become searchable after being indexed. To handle this, the `delay` parameter in the {{transform}}’s sync configuration allows you to postpone processing new data. Instead of always querying the most recent records, the {{transform}} will skip a short period of time (for example, 60 seconds) to ensure all relevant data has arrived before processing. - + In some cases, data might not be immediately available when a {{transform}} runs, leading to missing records in the destination index. This can happen due to ingestion delays, where documents take a few seconds or minutes to become searchable after being indexed. To handle this, the `delay` parameter in the {{transform}}’s sync configuration allows you to postpone processing new data. Instead of always querying the most recent records, the {{transform}} will skip a short period of time (for example, 60 seconds) to ensure all relevant data has arrived before processing. + For example, if a {{transform}} runs every 5 minutes, it usually processes data from 5 minutes ago up to the current time. However, if you set `delay` to 60 seconds, the {{transform}} will instead process data from 6 minutes ago up to 1 minute ago, making sure that any documents that arrived late are included. By adjusting the `delay` parameter, you can improve the accuracy of transformed data while still maintaining near real-time results. diff --git a/explore-analyze/visualize/custom-visualizations-with-vega.md b/explore-analyze/visualize/custom-visualizations-with-vega.md index eb44b3607..abf87a85a 100644 --- a/explore-analyze/visualize/custom-visualizations-with-vega.md +++ b/explore-analyze/visualize/custom-visualizations-with-vega.md @@ -116,7 +116,7 @@ POST kibana_sample_data_ecommerce/_search } ``` -Add the [terms aggregation](elasticsearch://reference/data-analysis/aggregations/search-aggregations-bucket-terms-aggregation.md), then click **Click to send request**: +Add the [terms aggregation](elasticsearch://reference/aggregations/search-aggregations-bucket-terms-aggregation.md), then click **Click to send request**: ```js POST kibana_sample_data_ecommerce/_search diff --git a/explore-analyze/visualize/maps/import-geospatial-data.md b/explore-analyze/visualize/maps/import-geospatial-data.md index da64573aa..4601a1c95 100644 --- a/explore-analyze/visualize/maps/import-geospatial-data.md +++ b/explore-analyze/visualize/maps/import-geospatial-data.md @@ -114,7 +114,7 @@ To draw features: ## Upload data with IP addresses [_upload_data_with_ip_addresses] -The GeoIP processor adds information about the geographical location of IP addresses. See [GeoIP processor](elasticsearch://reference/ingestion-tools/enrich-processor/geoip-processor.md) for details. For private IP addresses, see [Enriching data with GeoIPs from internal, private IP addresses](https://www.elastic.co/blog/enriching-elasticsearch-data-geo-ips-internal-private-ip-addresses). +The GeoIP processor adds information about the geographical location of IP addresses. See [GeoIP processor](elasticsearch://reference/enrich-processor/geoip-processor.md) for details. For private IP addresses, see [Enriching data with GeoIPs from internal, private IP addresses](https://www.elastic.co/blog/enriching-elasticsearch-data-geo-ips-internal-private-ip-addresses). ## Upload data with GDAL [_upload_data_with_gdal] diff --git a/explore-analyze/visualize/maps/maps-grid-aggregation.md b/explore-analyze/visualize/maps/maps-grid-aggregation.md index 35683a847..7f5f1cba2 100644 --- a/explore-analyze/visualize/maps/maps-grid-aggregation.md +++ b/explore-analyze/visualize/maps/maps-grid-aggregation.md @@ -8,21 +8,21 @@ mapped_pages: # Clusters [maps-grid-aggregation] -Clusters use [Geotile grid aggregation](elasticsearch://reference/data-analysis/aggregations/search-aggregations-bucket-geotilegrid-aggregation.md) or [Geohex grid aggregation](elasticsearch://reference/data-analysis/aggregations/search-aggregations-bucket-geohexgrid-aggregation.md) to group your documents into grids. You can calculate metrics for each gridded cell. +Clusters use [Geotile grid aggregation](elasticsearch://reference/aggregations/search-aggregations-bucket-geotilegrid-aggregation.md) or [Geohex grid aggregation](elasticsearch://reference/aggregations/search-aggregations-bucket-geohexgrid-aggregation.md) to group your documents into grids. You can calculate metrics for each gridded cell. Symbolize cluster metrics as: **Clusters** -: Uses [Geotile grid aggregation](elasticsearch://reference/data-analysis/aggregations/search-aggregations-bucket-geotilegrid-aggregation.md) to group your documents into grids. Creates a [vector layer](vector-layer.md) with a cluster symbol for each gridded cell. The cluster location is the weighted centroid for all documents in the gridded cell. +: Uses [Geotile grid aggregation](elasticsearch://reference/aggregations/search-aggregations-bucket-geotilegrid-aggregation.md) to group your documents into grids. Creates a [vector layer](vector-layer.md) with a cluster symbol for each gridded cell. The cluster location is the weighted centroid for all documents in the gridded cell. **Grids** -: Uses [Geotile grid aggregation](elasticsearch://reference/data-analysis/aggregations/search-aggregations-bucket-geotilegrid-aggregation.md) to group your documents into grids. Creates a [vector layer](vector-layer.md) with a bounding box polygon for each gridded cell. +: Uses [Geotile grid aggregation](elasticsearch://reference/aggregations/search-aggregations-bucket-geotilegrid-aggregation.md) to group your documents into grids. Creates a [vector layer](vector-layer.md) with a bounding box polygon for each gridded cell. **Heat map** -: Uses [Geotile grid aggregation](elasticsearch://reference/data-analysis/aggregations/search-aggregations-bucket-geotilegrid-aggregation.md) to group your documents into grids. Creates a [heat map layer](heatmap-layer.md) that clusters the weighted centroids for each gridded cell. +: Uses [Geotile grid aggregation](elasticsearch://reference/aggregations/search-aggregations-bucket-geotilegrid-aggregation.md) to group your documents into grids. Creates a [heat map layer](heatmap-layer.md) that clusters the weighted centroids for each gridded cell. **Hexbins** -: Uses [Geohex grid aggregation](elasticsearch://reference/data-analysis/aggregations/search-aggregations-bucket-geohexgrid-aggregation.md) to group your documents into H3 hexagon grids. Creates a [vector layer](vector-layer.md) with a hexagon polygon for each gridded cell. +: Uses [Geohex grid aggregation](elasticsearch://reference/aggregations/search-aggregations-bucket-geohexgrid-aggregation.md) to group your documents into H3 hexagon grids. Creates a [vector layer](vector-layer.md) with a hexagon polygon for each gridded cell. To enable a clusters layer: diff --git a/explore-analyze/visualize/maps/maps-top-hits-aggregation.md b/explore-analyze/visualize/maps/maps-top-hits-aggregation.md index 05ba4f80e..7aed3c53b 100644 --- a/explore-analyze/visualize/maps/maps-top-hits-aggregation.md +++ b/explore-analyze/visualize/maps/maps-top-hits-aggregation.md @@ -8,7 +8,7 @@ mapped_pages: # Display the most relevant documents per entity [maps-top-hits-aggregation] -Use **Top hits per entity** to display the most relevant documents per entity, for example, the most recent GPS tracks per flight route. To get this data, {{es}} first groups your data using a [terms aggregation](elasticsearch://reference/data-analysis/aggregations/search-aggregations-bucket-terms-aggregation.md), then accumulates the most relevant documents based on sort order for each entry using a [top hits metric aggregation](elasticsearch://reference/data-analysis/aggregations/search-aggregations-metrics-top-hits-aggregation.md). +Use **Top hits per entity** to display the most relevant documents per entity, for example, the most recent GPS tracks per flight route. To get this data, {{es}} first groups your data using a [terms aggregation](elasticsearch://reference/aggregations/search-aggregations-bucket-terms-aggregation.md), then accumulates the most relevant documents based on sort order for each entry using a [top hits metric aggregation](elasticsearch://reference/aggregations/search-aggregations-metrics-top-hits-aggregation.md). To enable top hits: diff --git a/explore-analyze/visualize/maps/point-to-point.md b/explore-analyze/visualize/maps/point-to-point.md index e5a505a49..f84dfb59a 100644 --- a/explore-analyze/visualize/maps/point-to-point.md +++ b/explore-analyze/visualize/maps/point-to-point.md @@ -10,7 +10,7 @@ mapped_pages: A point-to-point connection plots aggregated data paths between the source and the destination. Thicker, darker lines symbolize more connections between a source and destination, and thinner, lighter lines symbolize less connections. -Point to point uses an {{es}} [terms aggregation](elasticsearch://reference/data-analysis/aggregations/search-aggregations-bucket-terms-aggregation.md) to group your documents by destination. Then, a nested [GeoTile grid aggregation](elasticsearch://reference/data-analysis/aggregations/search-aggregations-bucket-geotilegrid-aggregation.md) groups sources for each destination into grids. A line connects each source grid centroid to each destination. +Point to point uses an {{es}} [terms aggregation](elasticsearch://reference/aggregations/search-aggregations-bucket-terms-aggregation.md) to group your documents by destination. Then, a nested [GeoTile grid aggregation](elasticsearch://reference/aggregations/search-aggregations-bucket-geotilegrid-aggregation.md) groups sources for each destination into grids. A line connects each source grid centroid to each destination. Point-to-point layers are used in several common use cases: diff --git a/explore-analyze/visualize/maps/reverse-geocoding-tutorial.md b/explore-analyze/visualize/maps/reverse-geocoding-tutorial.md index 9b099f8e1..dff637ee4 100644 --- a/explore-analyze/visualize/maps/reverse-geocoding-tutorial.md +++ b/explore-analyze/visualize/maps/reverse-geocoding-tutorial.md @@ -17,7 +17,7 @@ In this tutorial, you’ll use reverse geocoding to visualize United States Cens You’ll learn to: * Upload custom regions. -* Reverse geocode with the {{es}} [enrich processor](elasticsearch://reference/ingestion-tools/enrich-processor/enrich-processor.md). +* Reverse geocode with the {{es}} [enrich processor](elasticsearch://reference/enrich-processor/enrich-processor.md). * Create a map and visualize CSA regions by web traffic. When you complete this tutorial, you’ll have a map that looks like this: @@ -32,7 +32,7 @@ When you complete this tutorial, you’ll have a map that looks like this: GeoIP is a common way of transforming an IP address to a longitude and latitude. GeoIP is roughly accurate on the city level globally and neighborhood level in selected countries. It’s not as good as an actual GPS location from your phone, but it’s much more precise than just a country, state, or province. -You’ll use the [web logs sample data set](../../index.md#gs-get-data-into-kibana) that comes with Kibana for this tutorial. Web logs sample data set has longitude and latitude. If your web log data does not contain longitude and latitude, use [GeoIP processor](elasticsearch://reference/ingestion-tools/enrich-processor/geoip-processor.md) to transform an IP address into a [geo_point](elasticsearch://reference/elasticsearch/mapping-reference/geo-point.md) field. +You’ll use the [web logs sample data set](../../index.md#gs-get-data-into-kibana) that comes with Kibana for this tutorial. Web logs sample data set has longitude and latitude. If your web log data does not contain longitude and latitude, use [GeoIP processor](elasticsearch://reference/enrich-processor/geoip-processor.md) to transform an IP address into a [geo_point](elasticsearch://reference/elasticsearch/mapping-reference/geo-point.md) field. ## Step 2: Index Combined Statistical Area (CSA) regions [_step_2_index_combined_statistical_area_csa_regions] @@ -75,7 +75,7 @@ Looking at the map, you get a sense of what constitutes a metro area in the eyes ## Step 3: Reverse geocoding [_step_3_reverse_geocoding] -To visualize CSA regions by web log traffic, the web log traffic must contain a CSA region identifier. You’ll use {{es}} [enrich processor](elasticsearch://reference/ingestion-tools/enrich-processor/enrich-processor.md) to add CSA region identifiers to the web logs sample data set. You can skip this step if your source data already contains region identifiers. +To visualize CSA regions by web log traffic, the web log traffic must contain a CSA region identifier. You’ll use {{es}} [enrich processor](elasticsearch://reference/enrich-processor/enrich-processor.md) to add CSA region identifiers to the web logs sample data set. You can skip this step if your source data already contains region identifiers. 1. Go to **Developer tools** using the navigation menu or the [global search field](/explore-analyze/find-and-organize/find-apps-and-objects.md). 2. In **Console**, create a [geo_match enrichment policy](../../../manage-data/ingest/transform-enrich/example-enrich-data-based-on-geolocation.md): diff --git a/explore-analyze/visualize/maps/terms-join.md b/explore-analyze/visualize/maps/terms-join.md index 188f5ac56..61105eacc 100644 --- a/explore-analyze/visualize/maps/terms-join.md +++ b/explore-analyze/visualize/maps/terms-join.md @@ -62,7 +62,7 @@ In the following example, **iso2** property defines the shared key for the left The right source uses the Kibana sample data set "Sample web logs". In this data set, the **geo.src** field contains the ISO 3166-1 alpha-2 code of the country of origin. -A [terms aggregation](elasticsearch://reference/data-analysis/aggregations/search-aggregations-bucket-terms-aggregation.md) groups the sample web log documents by **geo.src** and calculates metrics for each term. +A [terms aggregation](elasticsearch://reference/aggregations/search-aggregations-bucket-terms-aggregation.md) groups the sample web log documents by **geo.src** and calculates metrics for each term. The METRICS configuration defines two metric aggregations: diff --git a/explore-analyze/visualize/maps/vector-layer.md b/explore-analyze/visualize/maps/vector-layer.md index ab504b17a..171ec72c2 100644 --- a/explore-analyze/visualize/maps/vector-layer.md +++ b/explore-analyze/visualize/maps/vector-layer.md @@ -32,7 +32,7 @@ To add a vector layer to your map, click **Add layer**, then select one of the f Results are limited to the `index.max_result_window` index setting, which defaults to 10000. Select the appropriate **Scaling** option for your use case. * **Limit results to 10,000** The layer displays features from the first `index.max_result_window` documents. Results exceeding `index.max_result_window` are not displayed. - * **Show clusters when results exceed 10,000** When results exceed `index.max_result_window`, the layer uses [GeoTile grid aggregation](elasticsearch://reference/data-analysis/aggregations/search-aggregations-bucket-geotilegrid-aggregation.md) to group your documents into clusters and displays metrics for each cluster. When results are less then `index.max_result_window`, the layer displays features from individual documents. + * **Show clusters when results exceed 10,000** When results exceed `index.max_result_window`, the layer uses [GeoTile grid aggregation](elasticsearch://reference/aggregations/search-aggregations-bucket-geotilegrid-aggregation.md) to group your documents into clusters and displays metrics for each cluster. When results are less then `index.max_result_window`, the layer displays features from individual documents. * **Use vector tiles.** Vector tiles partition your map into tiles. Each tile request is limited to the `index.max_result_window` index setting. When a tile exceeds `index.max_result_window`, results exceeding `index.max_result_window` are not contained in the tile and a dashed rectangle outlining the bounding box containing all geo values within the tile is displayed. diff --git a/explore-analyze/visualize/supported-chart-types.md b/explore-analyze/visualize/supported-chart-types.md index 2b76bc5e0..6e41cde22 100644 --- a/explore-analyze/visualize/supported-chart-types.md +++ b/explore-analyze/visualize/supported-chart-types.md @@ -89,7 +89,7 @@ Metric aggregations are calculated from the values in the aggregated documents. | Value count | ✓ | | ✓ | ✓ | | Variance | ✓ | ✓ | | ✓ | -For information about {{es}} metrics aggregations, refer to [Metrics aggregations](elasticsearch://reference/data-analysis/aggregations/metrics.md). +For information about {{es}} metrics aggregations, refer to [Metrics aggregations](elasticsearch://reference/aggregations/metrics.md). ## Bucket aggregations [bucket-aggregations] @@ -110,7 +110,7 @@ Bucket aggregations group, or bucket, documents based on the aggregation type. T | Terms | ✓ | ✓ | ✓ | ✓ | | Significant terms | ✓ | | ✓ | ✓ | -For information about {{es}} bucket aggregations, refer to [Bucket aggregations](elasticsearch://reference/data-analysis/aggregations/bucket.md). +For information about {{es}} bucket aggregations, refer to [Bucket aggregations](elasticsearch://reference/aggregations/bucket.md). ## Pipeline aggregations [pipeline-aggregations] @@ -130,5 +130,5 @@ Pipeline aggregations are dependent on the outputs calculated from other aggrega | Bucket selector | | | | ✓ | | Serial differencing | | ✓ | ✓ | ✓ | -For information about {{es}} pipeline aggregations, refer to [Pipeline aggregations](elasticsearch://reference/data-analysis/aggregations/pipeline.md). +For information about {{es}} pipeline aggregations, refer to [Pipeline aggregations](elasticsearch://reference/aggregations/pipeline.md). diff --git a/manage-data/data-store/data-streams/downsampling-time-series-data-stream.md b/manage-data/data-store/data-streams/downsampling-time-series-data-stream.md index 1017b0745..6f0b72100 100644 --- a/manage-data/data-store/data-streams/downsampling-time-series-data-stream.md +++ b/manage-data/data-store/data-streams/downsampling-time-series-data-stream.md @@ -118,12 +118,12 @@ The result of a time based histogram aggregation is in a uniform bucket size and There are a few things to note about querying downsampled indices: * When you run queries in {{kib}} and through Elastic solutions, a normal response is returned without notification that some of the queried indices are downsampled. -* For [date histogram aggregations](elasticsearch://reference/data-analysis/aggregations/search-aggregations-bucket-datehistogram-aggregation.md), only `fixed_intervals` (and not calendar-aware intervals) are supported. +* For [date histogram aggregations](elasticsearch://reference/aggregations/search-aggregations-bucket-datehistogram-aggregation.md), only `fixed_intervals` (and not calendar-aware intervals) are supported. * Timezone support comes with caveats: * Date histograms at intervals that are multiples of an hour are based on values generated at UTC. This works well for timezones that are on the hour, e.g. +5:00 or -3:00, but requires offsetting the reported time buckets, e.g. `2020-01-01T10:30:00.000` instead of `2020-03-07T10:00:00.000` for timezone +5:30 (India), if downsampling aggregates values per hour. In this case, the results include the field `downsampled_results_offset: true`, to indicate that the time buckets are shifted. This can be avoided if a downsampling interval of 15 minutes is used, as it allows properly calculating hourly values for the shifted buckets. * Date histograms at intervals that are multiples of a day are similarly affected, in case downsampling aggregates values per day. In this case, the beginning of each day is always calculated at UTC when generated the downsampled values, so the time buckets need to be shifted, e.g. reported as `2020-03-07T19:00:00.000` instead of `2020-03-07T00:00:00.000` for timezone `America/New_York`. The field `downsampled_results_offset: true` is added in this case too. - * Daylight savings and similar peculiarities around timezones affect reported results, as [documented](elasticsearch://reference/data-analysis/aggregations/search-aggregations-bucket-datehistogram-aggregation.md#datehistogram-aggregation-time-zone) for date histogram aggregation. Besides, downsampling at daily interval hinders tracking any information related to daylight savings changes. + * Daylight savings and similar peculiarities around timezones affect reported results, as [documented](elasticsearch://reference/aggregations/search-aggregations-bucket-datehistogram-aggregation.md#datehistogram-aggregation-time-zone) for date histogram aggregation. Besides, downsampling at daily interval hinders tracking any information related to daylight savings changes. diff --git a/manage-data/data-store/mapping/define-runtime-fields-in-search-request.md b/manage-data/data-store/mapping/define-runtime-fields-in-search-request.md index d2c13c490..b45558c0e 100644 --- a/manage-data/data-store/mapping/define-runtime-fields-in-search-request.md +++ b/manage-data/data-store/mapping/define-runtime-fields-in-search-request.md @@ -74,7 +74,7 @@ PUT my-index-000001/_mapping Runtime fields take precedence over fields defined with the same name in the index mappings. This flexibility allows you to shadow existing fields and calculate a different value, without modifying the field itself. If you made a mistake in your index mapping, you can use runtime fields to calculate values that [override values](override-field-values-at-query-time.md) in the mapping during the search request. -Now, you can easily run an [average aggregation](elasticsearch://reference/data-analysis/aggregations/search-aggregations-metrics-avg-aggregation.md) on the `measures.start` and `measures.end` fields: +Now, you can easily run an [average aggregation](elasticsearch://reference/aggregations/search-aggregations-metrics-avg-aggregation.md) on the `measures.start` and `measures.end` fields: ```console GET my-index-000001/_search @@ -109,7 +109,7 @@ The response includes the aggregation results without changing the values for th } ``` -Further, you can define a runtime field as part of a search query that calculates a value, and then run a [stats aggregation](elasticsearch://reference/data-analysis/aggregations/search-aggregations-metrics-stats-aggregation.md) on that field *in the same query*. +Further, you can define a runtime field as part of a search query that calculates a value, and then run a [stats aggregation](elasticsearch://reference/aggregations/search-aggregations-metrics-stats-aggregation.md) on that field *in the same query*. The `duration` runtime field doesn’t exist in the index mapping, but we can still search and aggregate on that field. The following query returns the calculated value for the `duration` field and runs a stats aggregation to compute statistics over numeric values extracted from the aggregated documents. diff --git a/manage-data/data-store/mapping/explore-data-with-runtime-fields.md b/manage-data/data-store/mapping/explore-data-with-runtime-fields.md index dc3ed068d..200573967 100644 --- a/manage-data/data-store/mapping/explore-data-with-runtime-fields.md +++ b/manage-data/data-store/mapping/explore-data-with-runtime-fields.md @@ -292,7 +292,7 @@ The response includes the document where the log format doesn’t match, but the ## Define a runtime field with a dissect pattern [runtime-examples-dissect] -If you don’t need the power of regular expressions, you can use [dissect patterns](elasticsearch://reference/ingestion-tools/enrich-processor/dissect-processor.md) instead of grok patterns. Dissect patterns match on fixed delimiters but are typically faster than grok. +If you don’t need the power of regular expressions, you can use [dissect patterns](elasticsearch://reference/enrich-processor/dissect-processor.md) instead of grok patterns. Dissect patterns match on fixed delimiters but are typically faster than grok. You can use dissect to achieve the same results as parsing the Apache logs with a [grok pattern](#runtime-examples-grok). Instead of matching on a log pattern, you include the parts of the string that you want to discard. Paying special attention to the parts of the string you want to discard will help build successful dissect patterns. diff --git a/manage-data/data-store/text-analysis.md b/manage-data/data-store/text-analysis.md index d93b3a986..a82bfe252 100644 --- a/manage-data/data-store/text-analysis.md +++ b/manage-data/data-store/text-analysis.md @@ -47,9 +47,9 @@ To ensure search terms match these words as intended, you can apply the same tok Text analysis is performed by an [*analyzer*](/manage-data/data-store/text-analysis/anatomy-of-an-analyzer.md), a set of rules that govern the entire process. -{{es}} includes a default analyzer, called the [standard analyzer](elasticsearch://reference/data-analysis/text-analysis/analysis-standard-analyzer.md), which works well for most use cases right out of the box. +{{es}} includes a default analyzer, called the [standard analyzer](elasticsearch://reference/text-analysis/analysis-standard-analyzer.md), which works well for most use cases right out of the box. -If you want to tailor your search experience, you can choose a different [built-in analyzer](elasticsearch://reference/data-analysis/text-analysis/analyzer-reference.md) or even [configure a custom one](/manage-data/data-store/text-analysis/create-custom-analyzer.md). A custom analyzer gives you control over each step of the analysis process, including: +If you want to tailor your search experience, you can choose a different [built-in analyzer](elasticsearch://reference/text-analysis/analyzer-reference.md) or even [configure a custom one](/manage-data/data-store/text-analysis/create-custom-analyzer.md). A custom analyzer gives you control over each step of the analysis process, including: * Changes to the text *before* tokenization * How text is converted to tokens diff --git a/manage-data/data-store/text-analysis/anatomy-of-an-analyzer.md b/manage-data/data-store/text-analysis/anatomy-of-an-analyzer.md index 968175199..0230f5454 100644 --- a/manage-data/data-store/text-analysis/anatomy-of-an-analyzer.md +++ b/manage-data/data-store/text-analysis/anatomy-of-an-analyzer.md @@ -10,30 +10,30 @@ applies_to: An *analyzer*  — whether built-in or custom — is just a package which contains three lower-level building blocks: *character filters*, *tokenizers*, and *token filters*. -The built-in [analyzers](elasticsearch://reference/data-analysis/text-analysis/analyzer-reference.md) pre-package these building blocks into analyzers suitable for different languages and types of text. Elasticsearch also exposes the individual building blocks so that they can be combined to define new [`custom`](create-custom-analyzer.md) analyzers. +The built-in [analyzers](elasticsearch://reference/text-analysis/analyzer-reference.md) pre-package these building blocks into analyzers suitable for different languages and types of text. Elasticsearch also exposes the individual building blocks so that they can be combined to define new [`custom`](create-custom-analyzer.md) analyzers. ## Character filters [analyzer-anatomy-character-filters] A *character filter* receives the original text as a stream of characters and can transform the stream by adding, removing, or changing characters. For instance, a character filter could be used to convert Hindu-Arabic numerals (٠‎١٢٣٤٥٦٧٨‎٩‎) into their Arabic-Latin equivalents (0123456789), or to strip HTML elements like `` from the stream. -An analyzer may have **zero or more** [character filters](elasticsearch://reference/data-analysis/text-analysis/character-filter-reference.md), which are applied in order. +An analyzer may have **zero or more** [character filters](elasticsearch://reference/text-analysis/character-filter-reference.md), which are applied in order. ## Tokenizer [analyzer-anatomy-tokenizer] -A *tokenizer* receives a stream of characters, breaks it up into individual *tokens* (usually individual words), and outputs a stream of *tokens*. For instance, a [`whitespace`](elasticsearch://reference/data-analysis/text-analysis/analysis-whitespace-tokenizer.md) tokenizer breaks text into tokens whenever it sees any whitespace. It would convert the text `"Quick brown fox!"` into the terms `[Quick, brown, fox!]`. +A *tokenizer* receives a stream of characters, breaks it up into individual *tokens* (usually individual words), and outputs a stream of *tokens*. For instance, a [`whitespace`](elasticsearch://reference/text-analysis/analysis-whitespace-tokenizer.md) tokenizer breaks text into tokens whenever it sees any whitespace. It would convert the text `"Quick brown fox!"` into the terms `[Quick, brown, fox!]`. The tokenizer is also responsible for recording the order or *position* of each term and the start and end *character offsets* of the original word which the term represents. -An analyzer must have **exactly one** [tokenizer](elasticsearch://reference/data-analysis/text-analysis/tokenizer-reference.md). +An analyzer must have **exactly one** [tokenizer](elasticsearch://reference/text-analysis/tokenizer-reference.md). ## Token filters [analyzer-anatomy-token-filters] -A *token filter* receives the token stream and may add, remove, or change tokens. For example, a [`lowercase`](elasticsearch://reference/data-analysis/text-analysis/analysis-lowercase-tokenfilter.md) token filter converts all tokens to lowercase, a [`stop`](elasticsearch://reference/data-analysis/text-analysis/analysis-stop-tokenfilter.md) token filter removes common words (*stop words*) like `the` from the token stream, and a [`synonym`](elasticsearch://reference/data-analysis/text-analysis/analysis-synonym-tokenfilter.md) token filter introduces synonyms into the token stream. +A *token filter* receives the token stream and may add, remove, or change tokens. For example, a [`lowercase`](elasticsearch://reference/text-analysis/analysis-lowercase-tokenfilter.md) token filter converts all tokens to lowercase, a [`stop`](elasticsearch://reference/text-analysis/analysis-stop-tokenfilter.md) token filter removes common words (*stop words*) like `the` from the token stream, and a [`synonym`](elasticsearch://reference/text-analysis/analysis-synonym-tokenfilter.md) token filter introduces synonyms into the token stream. Token filters are not allowed to change the position or character offsets of each token. -An analyzer may have **zero or more** [token filters](elasticsearch://reference/data-analysis/text-analysis/token-filter-reference.md), which are applied in order. +An analyzer may have **zero or more** [token filters](elasticsearch://reference/text-analysis/token-filter-reference.md), which are applied in order. diff --git a/manage-data/data-store/text-analysis/configure-text-analysis.md b/manage-data/data-store/text-analysis/configure-text-analysis.md index c2672633c..2bf1dfe87 100644 --- a/manage-data/data-store/text-analysis/configure-text-analysis.md +++ b/manage-data/data-store/text-analysis/configure-text-analysis.md @@ -8,9 +8,9 @@ applies_to: # Configure text analysis [configure-text-analysis] -By default, {{es}} uses the [`standard` analyzer](elasticsearch://reference/data-analysis/text-analysis/analysis-standard-analyzer.md) for all text analysis. The `standard` analyzer gives you out-of-the-box support for most natural languages and use cases. If you chose to use the `standard` analyzer as-is, no further configuration is needed. +By default, {{es}} uses the [`standard` analyzer](elasticsearch://reference/text-analysis/analysis-standard-analyzer.md) for all text analysis. The `standard` analyzer gives you out-of-the-box support for most natural languages and use cases. If you chose to use the `standard` analyzer as-is, no further configuration is needed. -If the standard analyzer does not fit your needs, review and test {{es}}'s other built-in [built-in analyzers](elasticsearch://reference/data-analysis/text-analysis/analyzer-reference.md). Built-in analyzers don’t require configuration, but some support options that can be used to adjust their behavior. For example, you can configure the `standard` analyzer with a list of custom stop words to remove. +If the standard analyzer does not fit your needs, review and test {{es}}'s other built-in [built-in analyzers](elasticsearch://reference/text-analysis/analyzer-reference.md). Built-in analyzers don’t require configuration, but some support options that can be used to adjust their behavior. For example, you can configure the `standard` analyzer with a list of custom stop words to remove. If no built-in analyzer fits your needs, you can test and create a custom analyzer. Custom analyzers involve selecting and combining different [analyzer components](anatomy-of-an-analyzer.md), giving you greater control over the process. diff --git a/manage-data/data-store/text-analysis/configuring-built-in-analyzers.md b/manage-data/data-store/text-analysis/configuring-built-in-analyzers.md index a93a9177b..a0f26acdc 100644 --- a/manage-data/data-store/text-analysis/configuring-built-in-analyzers.md +++ b/manage-data/data-store/text-analysis/configuring-built-in-analyzers.md @@ -8,7 +8,7 @@ applies_to: # Configuring built-in analyzers [configuring-analyzers] -The built-in analyzers can be used directly without any configuration. Some of them, however, support configuration options to alter their behaviour. For instance, the [`standard` analyzer](elasticsearch://reference/data-analysis/text-analysis/analysis-standard-analyzer.md) can be configured to support a list of stop words: +The built-in analyzers can be used directly without any configuration. Some of them, however, support configuration options to alter their behaviour. For instance, the [`standard` analyzer](elasticsearch://reference/text-analysis/analysis-standard-analyzer.md) can be configured to support a list of stop words: ```console PUT my-index-000001 diff --git a/manage-data/data-store/text-analysis/create-custom-analyzer.md b/manage-data/data-store/text-analysis/create-custom-analyzer.md index 0fd42f661..977e66308 100644 --- a/manage-data/data-store/text-analysis/create-custom-analyzer.md +++ b/manage-data/data-store/text-analysis/create-custom-analyzer.md @@ -10,9 +10,9 @@ applies_to: When the built-in analyzers do not fulfill your needs, you can create a `custom` analyzer which uses the appropriate combination of: -* zero or more [character filters](elasticsearch://reference/data-analysis/text-analysis/character-filter-reference.md) -* a [tokenizer](elasticsearch://reference/data-analysis/text-analysis/tokenizer-reference.md) -* zero or more [token filters](elasticsearch://reference/data-analysis/text-analysis/token-filter-reference.md). +* zero or more [character filters](elasticsearch://reference/text-analysis/character-filter-reference.md) +* a [tokenizer](elasticsearch://reference/text-analysis/tokenizer-reference.md) +* zero or more [token filters](elasticsearch://reference/text-analysis/token-filter-reference.md). ## Configuration [_configuration] @@ -20,16 +20,16 @@ When the built-in analyzers do not fulfill your needs, you can create a `custom` The `custom` analyzer accepts the following parameters: `type` -: Analyzer type. Accepts [built-in analyzer types](elasticsearch://reference/data-analysis/text-analysis/analyzer-reference.md). For custom analyzers, use `custom` or omit this parameter. +: Analyzer type. Accepts [built-in analyzer types](elasticsearch://reference/text-analysis/analyzer-reference.md). For custom analyzers, use `custom` or omit this parameter. `tokenizer` -: A built-in or customised [tokenizer](elasticsearch://reference/data-analysis/text-analysis/tokenizer-reference.md). (Required) +: A built-in or customised [tokenizer](elasticsearch://reference/text-analysis/tokenizer-reference.md). (Required) `char_filter` -: An optional array of built-in or customised [character filters](elasticsearch://reference/data-analysis/text-analysis/character-filter-reference.md). +: An optional array of built-in or customised [character filters](elasticsearch://reference/text-analysis/character-filter-reference.md). `filter` -: An optional array of built-in or customised [token filters](elasticsearch://reference/data-analysis/text-analysis/token-filter-reference.md). +: An optional array of built-in or customised [token filters](elasticsearch://reference/text-analysis/token-filter-reference.md). `position_increment_gap` : When indexing an array of text values, Elasticsearch inserts a fake "gap" between the last term of one value and the first term of the next value to ensure that a phrase query doesn’t match two terms from different array elements. Defaults to `100`. See [`position_increment_gap`](elasticsearch://reference/elasticsearch/mapping-reference/position-increment-gap.md) for more. @@ -40,16 +40,16 @@ The `custom` analyzer accepts the following parameters: Here is an example that combines the following: Character Filter -: * [HTML Strip Character Filter](elasticsearch://reference/data-analysis/text-analysis/analysis-htmlstrip-charfilter.md) +: * [HTML Strip Character Filter](elasticsearch://reference/text-analysis/analysis-htmlstrip-charfilter.md) Tokenizer -: * [Standard Tokenizer](elasticsearch://reference/data-analysis/text-analysis/analysis-standard-tokenizer.md) +: * [Standard Tokenizer](elasticsearch://reference/text-analysis/analysis-standard-tokenizer.md) Token Filters -: * [Lowercase Token Filter](elasticsearch://reference/data-analysis/text-analysis/analysis-lowercase-tokenfilter.md) -* [ASCII-Folding Token Filter](elasticsearch://reference/data-analysis/text-analysis/analysis-asciifolding-tokenfilter.md) +: * [Lowercase Token Filter](elasticsearch://reference/text-analysis/analysis-lowercase-tokenfilter.md) +* [ASCII-Folding Token Filter](elasticsearch://reference/text-analysis/analysis-asciifolding-tokenfilter.md) ```console @@ -95,16 +95,16 @@ The previous example used tokenizer, token filters, and character filters with t Here is a more complicated example that combines the following: Character Filter -: * [Mapping Character Filter](elasticsearch://reference/data-analysis/text-analysis/analysis-mapping-charfilter.md), configured to replace `:)` with `_happy_` and `:(` with `_sad_` +: * [Mapping Character Filter](elasticsearch://reference/text-analysis/analysis-mapping-charfilter.md), configured to replace `:)` with `_happy_` and `:(` with `_sad_` Tokenizer -: * [Pattern Tokenizer](elasticsearch://reference/data-analysis/text-analysis/analysis-pattern-tokenizer.md), configured to split on punctuation characters +: * [Pattern Tokenizer](elasticsearch://reference/text-analysis/analysis-pattern-tokenizer.md), configured to split on punctuation characters Token Filters -: * [Lowercase Token Filter](elasticsearch://reference/data-analysis/text-analysis/analysis-lowercase-tokenfilter.md) -* [Stop Token Filter](elasticsearch://reference/data-analysis/text-analysis/analysis-stop-tokenfilter.md), configured to use the pre-defined list of English stop words +: * [Lowercase Token Filter](elasticsearch://reference/text-analysis/analysis-lowercase-tokenfilter.md) +* [Stop Token Filter](elasticsearch://reference/text-analysis/analysis-stop-tokenfilter.md), configured to use the pre-defined list of English stop words Here is an example: diff --git a/manage-data/data-store/text-analysis/specify-an-analyzer.md b/manage-data/data-store/text-analysis/specify-an-analyzer.md index 69f56074d..00b8502db 100644 --- a/manage-data/data-store/text-analysis/specify-an-analyzer.md +++ b/manage-data/data-store/text-analysis/specify-an-analyzer.md @@ -34,7 +34,7 @@ If you don’t typically create mappings for your indices, you can use [index te 1. The [`analyzer`](elasticsearch://reference/elasticsearch/mapping-reference/analyzer.md) mapping parameter for the field. See [Specify the analyzer for a field](#specify-index-field-analyzer). 2. The `analysis.analyzer.default` index setting. See [Specify the default analyzer for an index](#specify-index-time-default-analyzer). -If none of these parameters are specified, the [`standard` analyzer](elasticsearch://reference/data-analysis/text-analysis/analysis-standard-analyzer.md) is used. +If none of these parameters are specified, the [`standard` analyzer](elasticsearch://reference/text-analysis/analysis-standard-analyzer.md) is used. ## Specify the analyzer for a field [specify-index-field-analyzer] @@ -97,7 +97,7 @@ At search time, {{es}} determines which analyzer to use by checking the followin 3. The `analysis.analyzer.default_search` index setting. See [Specify the default search analyzer for an index](#specify-search-default-analyzer). 4. The [`analyzer`](elasticsearch://reference/elasticsearch/mapping-reference/analyzer.md) mapping parameter for the field. See [Specify the analyzer for a field](#specify-index-field-analyzer). -If none of these parameters are specified, the [`standard` analyzer](elasticsearch://reference/data-analysis/text-analysis/analysis-standard-analyzer.md) is used. +If none of these parameters are specified, the [`standard` analyzer](elasticsearch://reference/text-analysis/analysis-standard-analyzer.md) is used. ## Specify the search analyzer for a query [specify-search-query-analyzer] diff --git a/manage-data/data-store/text-analysis/stemming.md b/manage-data/data-store/text-analysis/stemming.md index d00720aed..e28f4db47 100644 --- a/manage-data/data-store/text-analysis/stemming.md +++ b/manage-data/data-store/text-analysis/stemming.md @@ -44,10 +44,10 @@ However, most algorithmic stemmers only alter the existing text of a word. This The following token filters use algorithmic stemming: -* [`stemmer`](elasticsearch://reference/data-analysis/text-analysis/analysis-stemmer-tokenfilter.md), which provides algorithmic stemming for several languages, some with additional variants. -* [`kstem`](elasticsearch://reference/data-analysis/text-analysis/analysis-kstem-tokenfilter.md), a stemmer for English that combines algorithmic stemming with a built-in dictionary. -* [`porter_stem`](elasticsearch://reference/data-analysis/text-analysis/analysis-porterstem-tokenfilter.md), our recommended algorithmic stemmer for English. -* [`snowball`](elasticsearch://reference/data-analysis/text-analysis/analysis-snowball-tokenfilter.md), which uses [Snowball](https://snowballstem.org/)-based stemming rules for several languages. +* [`stemmer`](elasticsearch://reference/text-analysis/analysis-stemmer-tokenfilter.md), which provides algorithmic stemming for several languages, some with additional variants. +* [`kstem`](elasticsearch://reference/text-analysis/analysis-kstem-tokenfilter.md), a stemmer for English that combines algorithmic stemming with a built-in dictionary. +* [`porter_stem`](elasticsearch://reference/text-analysis/analysis-porterstem-tokenfilter.md), our recommended algorithmic stemmer for English. +* [`snowball`](elasticsearch://reference/text-analysis/analysis-snowball-tokenfilter.md), which uses [Snowball](https://snowballstem.org/)-based stemming rules for several languages. ## Dictionary stemmers [dictionary-stemmers] @@ -68,10 +68,10 @@ In practice, algorithmic stemmers typically outperform dictionary stemmers. This * **Dictionary quality**
A dictionary stemmer is only as good as its dictionary. To work well, these dictionaries must include a significant number of words, be updated regularly, and change with language trends. Often, by the time a dictionary has been made available, it’s incomplete and some of its entries are already outdated. * **Size and performance**
Dictionary stemmers must load all words, prefixes, and suffixes from its dictionary into memory. This can use a significant amount of RAM. Low-quality dictionaries may also be less efficient with prefix and suffix removal, which can slow the stemming process significantly. -You can use the [`hunspell`](elasticsearch://reference/data-analysis/text-analysis/analysis-hunspell-tokenfilter.md) token filter to perform dictionary stemming. +You can use the [`hunspell`](elasticsearch://reference/text-analysis/analysis-hunspell-tokenfilter.md) token filter to perform dictionary stemming. ::::{tip} -If available, we recommend trying an algorithmic stemmer for your language before using the [`hunspell`](elasticsearch://reference/data-analysis/text-analysis/analysis-hunspell-tokenfilter.md) token filter. +If available, we recommend trying an algorithmic stemmer for your language before using the [`hunspell`](elasticsearch://reference/text-analysis/analysis-hunspell-tokenfilter.md) token filter. :::: @@ -83,10 +83,10 @@ Sometimes stemming can produce shared root words that are spelled similarly but To prevent this and better control stemming, you can use the following token filters: -* [`stemmer_override`](elasticsearch://reference/data-analysis/text-analysis/analysis-stemmer-override-tokenfilter.md), which lets you define rules for stemming specific tokens. -* [`keyword_marker`](elasticsearch://reference/data-analysis/text-analysis/analysis-keyword-marker-tokenfilter.md), which marks specified tokens as keywords. Keyword tokens are not stemmed by subsequent stemmer token filters. -* [`conditional`](elasticsearch://reference/data-analysis/text-analysis/analysis-condition-tokenfilter.md), which can be used to mark tokens as keywords, similar to the `keyword_marker` filter. +* [`stemmer_override`](elasticsearch://reference/text-analysis/analysis-stemmer-override-tokenfilter.md), which lets you define rules for stemming specific tokens. +* [`keyword_marker`](elasticsearch://reference/text-analysis/analysis-keyword-marker-tokenfilter.md), which marks specified tokens as keywords. Keyword tokens are not stemmed by subsequent stemmer token filters. +* [`conditional`](elasticsearch://reference/text-analysis/analysis-condition-tokenfilter.md), which can be used to mark tokens as keywords, similar to the `keyword_marker` filter. -For built-in [language analyzers](elasticsearch://reference/data-analysis/text-analysis/analysis-lang-analyzer.md), you also can use the [`stem_exclusion`](elasticsearch://reference/data-analysis/text-analysis/analysis-lang-analyzer.md#_excluding_words_from_stemming) parameter to specify a list of words that won’t be stemmed. +For built-in [language analyzers](elasticsearch://reference/text-analysis/analysis-lang-analyzer.md), you also can use the [`stem_exclusion`](elasticsearch://reference/text-analysis/analysis-lang-analyzer.md#_excluding_words_from_stemming) parameter to specify a list of words that won’t be stemmed. diff --git a/manage-data/data-store/text-analysis/token-graphs.md b/manage-data/data-store/text-analysis/token-graphs.md index 9be1ab772..b81c3c628 100644 --- a/manage-data/data-store/text-analysis/token-graphs.md +++ b/manage-data/data-store/text-analysis/token-graphs.md @@ -36,8 +36,8 @@ Some token filters can add tokens that span multiple positions. These can includ However, only some token filters, known as *graph token filters*, accurately record the `positionLength` for multi-position tokens. These filters include: -* [`synonym_graph`](elasticsearch://reference/data-analysis/text-analysis/analysis-synonym-graph-tokenfilter.md) -* [`word_delimiter_graph`](elasticsearch://reference/data-analysis/text-analysis/analysis-word-delimiter-graph-tokenfilter.md) +* [`synonym_graph`](elasticsearch://reference/text-analysis/analysis-synonym-graph-tokenfilter.md) +* [`word_delimiter_graph`](elasticsearch://reference/text-analysis/analysis-word-delimiter-graph-tokenfilter.md) Some tokenizers, such as the [`nori_tokenizer`](elasticsearch://reference/elasticsearch-plugins/analysis-nori-tokenizer.md), also accurately decompose compound tokens into multi-position tokens. @@ -81,8 +81,8 @@ This means the query matches documents containing either `dns is fragile` *or* ` The following token filters can add tokens that span multiple positions but only record a default `positionLength` of `1`: -* [`synonym`](elasticsearch://reference/data-analysis/text-analysis/analysis-synonym-tokenfilter.md) -* [`word_delimiter`](elasticsearch://reference/data-analysis/text-analysis/analysis-word-delimiter-tokenfilter.md) +* [`synonym`](elasticsearch://reference/text-analysis/analysis-synonym-tokenfilter.md) +* [`word_delimiter`](elasticsearch://reference/text-analysis/analysis-word-delimiter-tokenfilter.md) This means these filters will produce invalid token graphs for streams containing such tokens. diff --git a/manage-data/ingest.md b/manage-data/ingest.md index 28e2408f8..2a35a0d54 100644 --- a/manage-data/ingest.md +++ b/manage-data/ingest.md @@ -30,7 +30,7 @@ Elastic offer tools designed to ingest specific types of general content. The co * To index **documents** directly into {{es}}, use the {{es}} [document APIs](https://www.elastic.co/docs/api/doc/elasticsearch/group/endpoint-document). * To send **application data** directly to {{es}}, use an [{{es}} language client](https://www.elastic.co/guide/en/elasticsearch/client/index.html). * To index **web page content**, use the Elastic [web crawler](https://www.elastic.co/web-crawler). -* To sync **data from third-party sources**, use [connectors](elasticsearch://reference/ingestion-tools/search-connectors/index.md). A connector syncs content from an original data source to an {{es}} index. Using connectors you can create *searchable*, read-only replicas of your data sources. +* To sync **data from third-party sources**, use [connectors](elasticsearch://reference/search-connectors/index.md). A connector syncs content from an original data source to an {{es}} index. Using connectors you can create *searchable*, read-only replicas of your data sources. * To index **single files** for testing in a non-production environment, use the {{kib}} [file uploader](ingest/upload-data-files.md). If you would like to try things out before you add your own data, try using our [sample data](ingest/sample-data.md). diff --git a/manage-data/ingest/ingesting-data-for-elastic-solutions.md b/manage-data/ingest/ingesting-data-for-elastic-solutions.md index 9db4b3fa0..0fa6b3417 100644 --- a/manage-data/ingest/ingesting-data-for-elastic-solutions.md +++ b/manage-data/ingest/ingesting-data-for-elastic-solutions.md @@ -41,7 +41,7 @@ To use [Elastic Agent](https://www.elastic.co/guide/en/fleet/current) and [Elast * [{{es}} document APIs](https://www.elastic.co/docs/api/doc/elasticsearch/group/endpoint-document) * [{{es}} language clients](https://www.elastic.co/guide/en/elasticsearch/client/index.html) * [Elastic web crawler](https://www.elastic.co/web-crawler) - * [Elastic connectors](elasticsearch://reference/ingestion-tools/search-connectors/index.md) + * [Elastic connectors](elasticsearch://reference/search-connectors/index.md) @@ -101,6 +101,6 @@ Bring your ideas and use {{es}} and the {{stack}} to store, search, and visualiz * [{{es}} document APIs](https://www.elastic.co/docs/api/doc/elasticsearch/group/endpoint-document) * [{{es}} language clients](https://www.elastic.co/guide/en/elasticsearch/client/index.html) * [Elastic web crawler](https://www.elastic.co/web-crawler) - * [Elastic connectors](elasticsearch://reference/ingestion-tools/search-connectors/index.md) + * [Elastic connectors](elasticsearch://reference/search-connectors/index.md) * [Tutorial: Get started with vector search and generative AI](https://www.elastic.co/guide/en/starting-with-the-elasticsearch-platform-and-its-solutions/current/getting-started-general-purpose.html) diff --git a/manage-data/ingest/tools.md b/manage-data/ingest/tools.md index 9459e99c5..e751d6570 100644 --- a/manage-data/ingest/tools.md +++ b/manage-data/ingest/tools.md @@ -53,5 +53,5 @@ Depending on the type of data you want to ingest, you have a number of methods a | Application logs | Ingest application logs using Filebeat, {{agent}}, or the APM agent, or reformat application logs into Elastic Common Schema (ECS) logs and then ingest them using Filebeat or {{agent}}. | [Stream application logs](/solutions/observability/logs/stream-application-logs.md)
[ECS formatted application logs](/solutions/observability/logs/ecs-formatted-application-logs.md) | | Elastic Serverless forwarder for AWS | Ship logs from your AWS environment to cloud-hosted, self-managed Elastic environments, or {{ls}}. | [Elastic Serverless Forwarder](elastic-serverless-forwarder://reference/index.md) | | Connectors | Use connectors to extract data from an original data source and sync it to an {{es}} index. | [Ingest content with Elastic connectors -](elasticsearch://reference/ingestion-tools/search-connectors/index.md)
[Connector clients](elasticsearch://reference/ingestion-tools/search-connectors/index.md) | +](elasticsearch://reference/search-connectors/index.md)
[Connector clients](elasticsearch://reference/search-connectors/index.md) | | Web crawler | Discover, extract, and index searchable content from websites and knowledge bases using the web crawler. | [Elastic Open Web Crawler](https://github.com/elastic/crawler#readme) | \ No newline at end of file diff --git a/manage-data/ingest/transform-enrich/data-enrichment.md b/manage-data/ingest/transform-enrich/data-enrichment.md index a6711e288..f3af3eefa 100644 --- a/manage-data/ingest/transform-enrich/data-enrichment.md +++ b/manage-data/ingest/transform-enrich/data-enrichment.md @@ -9,7 +9,7 @@ applies_to: # Data enrichment -You can use the [enrich processor](elasticsearch://reference/ingestion-tools/enrich-processor/enrich-processor.md) to add data from your existing indices to incoming documents during ingest. +You can use the [enrich processor](elasticsearch://reference/enrich-processor/enrich-processor.md) to add data from your existing indices to incoming documents during ingest. For example, you can use the enrich processor to: diff --git a/manage-data/ingest/transform-enrich/example-enrich-data-based-on-exact-values.md b/manage-data/ingest/transform-enrich/example-enrich-data-based-on-exact-values.md index 81c96332e..ddf7938b8 100644 --- a/manage-data/ingest/transform-enrich/example-enrich-data-based-on-exact-values.md +++ b/manage-data/ingest/transform-enrich/example-enrich-data-based-on-exact-values.md @@ -53,7 +53,7 @@ Use the [execute enrich policy API](https://www.elastic.co/docs/api/doc/elastics POST /_enrich/policy/users-policy/_execute?wait_for_completion=false ``` -Use the [create or update pipeline API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-ingest-put-pipeline) to create an ingest pipeline. In the pipeline, add an [enrich processor](elasticsearch://reference/ingestion-tools/enrich-processor/enrich-processor.md) that includes: +Use the [create or update pipeline API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-ingest-put-pipeline) to create an ingest pipeline. In the pipeline, add an [enrich processor](elasticsearch://reference/enrich-processor/enrich-processor.md) that includes: * Your enrich policy. * The `field` of incoming documents used to match documents from the enrich index. diff --git a/manage-data/ingest/transform-enrich/example-enrich-data-based-on-geolocation.md b/manage-data/ingest/transform-enrich/example-enrich-data-based-on-geolocation.md index 286eb2b07..082a1c53f 100644 --- a/manage-data/ingest/transform-enrich/example-enrich-data-based-on-geolocation.md +++ b/manage-data/ingest/transform-enrich/example-enrich-data-based-on-geolocation.md @@ -66,7 +66,7 @@ Use the [execute enrich policy API](https://www.elastic.co/docs/api/doc/elastics POST /_enrich/policy/postal_policy/_execute?wait_for_completion=false ``` -Use the [create or update pipeline API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-ingest-put-pipeline) to create an ingest pipeline. In the pipeline, add an [enrich processor](elasticsearch://reference/ingestion-tools/enrich-processor/enrich-processor.md) that includes: +Use the [create or update pipeline API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-ingest-put-pipeline) to create an ingest pipeline. In the pipeline, add an [enrich processor](elasticsearch://reference/enrich-processor/enrich-processor.md) that includes: * Your enrich policy. * The `field` of incoming documents used to match the geoshape of documents from the enrich index. diff --git a/manage-data/ingest/transform-enrich/example-enrich-data-by-matching-value-to-range.md b/manage-data/ingest/transform-enrich/example-enrich-data-by-matching-value-to-range.md index 218c450a1..9f95321fd 100644 --- a/manage-data/ingest/transform-enrich/example-enrich-data-by-matching-value-to-range.md +++ b/manage-data/ingest/transform-enrich/example-enrich-data-by-matching-value-to-range.md @@ -63,7 +63,7 @@ Use the [execute enrich policy API](https://www.elastic.co/docs/api/doc/elastics POST /_enrich/policy/networks-policy/_execute?wait_for_completion=false ``` -Use the [create or update pipeline API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-ingest-put-pipeline) to create an ingest pipeline. In the pipeline, add an [enrich processor](elasticsearch://reference/ingestion-tools/enrich-processor/enrich-processor.md) that includes: +Use the [create or update pipeline API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-ingest-put-pipeline) to create an ingest pipeline. In the pipeline, add an [enrich processor](elasticsearch://reference/enrich-processor/enrich-processor.md) that includes: * Your enrich policy. * The `field` of incoming documents used to match documents from the enrich index. diff --git a/manage-data/ingest/transform-enrich/example-parse-logs.md b/manage-data/ingest/transform-enrich/example-parse-logs.md index 797d1d9ed..3a75a9032 100644 --- a/manage-data/ingest/transform-enrich/example-parse-logs.md +++ b/manage-data/ingest/transform-enrich/example-parse-logs.md @@ -31,7 +31,7 @@ These logs contain a timestamp, IP address, and user agent. You want to give the 2. Click **Create pipeline > New pipeline**. 3. Set **Name** to `my-pipeline` and optionally add a description for the pipeline. -4. Add a [grok processor](elasticsearch://reference/ingestion-tools/enrich-processor/grok-processor.md) to parse the log message: +4. Add a [grok processor](elasticsearch://reference/enrich-processor/grok-processor.md) to parse the log message: 1. Click **Add a processor** and select the **Grok** processor type. 2. Set **Field** to `message` and **Patterns** to the following [grok pattern](../../../explore-analyze/scripting/grok.md): @@ -47,9 +47,9 @@ These logs contain a timestamp, IP address, and user agent. You want to give the | Processor type | Field | Additional options | Description | | --- | --- | --- | --- | - | [**Date**](elasticsearch://reference/ingestion-tools/enrich-processor/date-processor.md) | `@timestamp` | **Formats**: `dd/MMM/yyyy:HH:mm:ss Z` | `Format '@timestamp' as 'dd/MMM/yyyy:HH:mm:ss Z'` | - | [**GeoIP**](elasticsearch://reference/ingestion-tools/enrich-processor/geoip-processor.md) | `source.ip` | **Target field**: `source.geo` | `Add 'source.geo' GeoIP data for 'source.ip'` | - | [**User agent**](elasticsearch://reference/ingestion-tools/enrich-processor/user-agent-processor.md) | `user_agent` | | `Extract fields from 'user_agent'` | + | [**Date**](elasticsearch://reference/enrich-processor/date-processor.md) | `@timestamp` | **Formats**: `dd/MMM/yyyy:HH:mm:ss Z` | `Format '@timestamp' as 'dd/MMM/yyyy:HH:mm:ss Z'` | + | [**GeoIP**](elasticsearch://reference/enrich-processor/geoip-processor.md) | `source.ip` | **Target field**: `source.geo` | `Add 'source.geo' GeoIP data for 'source.ip'` | + | [**User agent**](elasticsearch://reference/enrich-processor/user-agent-processor.md) | `user_agent` | | `Extract fields from 'user_agent'` | Your form should look similar to this: diff --git a/manage-data/ingest/transform-enrich/ingest-pipelines.md b/manage-data/ingest/transform-enrich/ingest-pipelines.md index 02a321917..26a00cd9b 100644 --- a/manage-data/ingest/transform-enrich/ingest-pipelines.md +++ b/manage-data/ingest/transform-enrich/ingest-pipelines.md @@ -10,7 +10,7 @@ applies_to: {{es}} ingest pipelines let you perform common transformations on your data before indexing. For example, you can use pipelines to remove fields, extract values from text, and enrich your data. -A pipeline consists of a series of configurable tasks called [processors](elasticsearch://reference/ingestion-tools/enrich-processor/index.md). Each processor runs sequentially, making specific changes to incoming documents. After the processors have run, {{es}} adds the transformed documents to your data stream or index. +A pipeline consists of a series of configurable tasks called [processors](elasticsearch://reference/enrich-processor/index.md). Each processor runs sequentially, making specific changes to incoming documents. After the processors have run, {{es}} adds the transformed documents to your data stream or index. :::{image} /manage-data/images/elasticsearch-reference-ingest-process.svg :alt: Ingest pipeline diagram @@ -49,7 +49,7 @@ The **New pipeline from CSV** option lets you use a CSV to create an ingest pipe :::: -You can also use the [ingest APIs](https://www.elastic.co/docs/api/doc/elasticsearch/group/endpoint-ingest) to create and manage pipelines. The following [create pipeline API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-ingest-put-pipeline) request creates a pipeline containing two [`set`](elasticsearch://reference/ingestion-tools/enrich-processor/set-processor.md) processors followed by a [`lowercase`](elasticsearch://reference/ingestion-tools/enrich-processor/lowercase-processor.md) processor. The processors run sequentially in the order specified. +You can also use the [ingest APIs](https://www.elastic.co/docs/api/doc/elasticsearch/group/endpoint-ingest) to create and manage pipelines. The following [create pipeline API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-ingest-put-pipeline) request creates a pipeline containing two [`set`](elasticsearch://reference/enrich-processor/set-processor.md) processors followed by a [`lowercase`](elasticsearch://reference/enrich-processor/lowercase-processor.md) processor. The processors run sequentially in the order specified. ```console PUT _ingest/pipeline/my-pipeline @@ -350,7 +350,7 @@ If you run {{agent}} standalone, you can apply pipelines using an [index templat ## Pipelines for search indices [pipelines-in-enterprise-search] -When you create Elasticsearch indices for search use cases, for example, using the [web crawler^](https://www.elastic.co/guide/en/enterprise-search/current/crawler.html) or [connectors](elasticsearch://reference/ingestion-tools/search-connectors/index.md), these indices are automatically set up with specific ingest pipelines. These processors help optimize your content for search. See [*Ingest pipelines in Search*](../../../solutions/search/ingest-for-search.md) for more information. +When you create Elasticsearch indices for search use cases, for example, using the [web crawler^](https://www.elastic.co/guide/en/enterprise-search/current/crawler.html) or [connectors](elasticsearch://reference/search-connectors/index.md), these indices are automatically set up with specific ingest pipelines. These processors help optimize your content for search. See [*Ingest pipelines in Search*](../../../solutions/search/ingest-for-search.md) for more information. ## Access source fields in a processor [access-source-fields] @@ -390,7 +390,7 @@ PUT _ingest/pipeline/my-pipeline Use dot notation to access object fields. ::::{important} -If your document contains flattened objects, use the [`dot_expander`](elasticsearch://reference/ingestion-tools/enrich-processor/dot-expand-processor.md) processor to expand them first. Other ingest processors cannot access flattened objects. +If your document contains flattened objects, use the [`dot_expander`](elasticsearch://reference/enrich-processor/dot-expand-processor.md) processor to expand them first. Other ingest processors cannot access flattened objects. :::: @@ -768,7 +768,7 @@ PUT _ingest/pipeline/my-pipeline ## Conditionally apply pipelines [conditionally-apply-pipelines] -Combine an `if` condition with the [`pipeline`](elasticsearch://reference/ingestion-tools/enrich-processor/pipeline-processor.md) processor to apply other pipelines to documents based on your criteria. You can use this pipeline as the [default pipeline](ingest-pipelines.md#set-default-pipeline) in an [index template](../../data-store/templates.md) used to configure multiple data streams or indices. +Combine an `if` condition with the [`pipeline`](elasticsearch://reference/enrich-processor/pipeline-processor.md) processor to apply other pipelines to documents based on your criteria. You can use this pipeline as the [default pipeline](ingest-pipelines.md#set-default-pipeline) in an [index template](../../data-store/templates.md) used to configure multiple data streams or indices. ```console PUT _ingest/pipeline/one-pipeline-to-rule-them-all diff --git a/manage-data/ingest/transform-enrich/set-up-an-enrich-processor.md b/manage-data/ingest/transform-enrich/set-up-an-enrich-processor.md index 158f821ce..6909fe742 100644 --- a/manage-data/ingest/transform-enrich/set-up-an-enrich-processor.md +++ b/manage-data/ingest/transform-enrich/set-up-an-enrich-processor.md @@ -68,7 +68,7 @@ Once the enrich policy is created, you need to execute it using the [execute enr The *enrich index* contains documents from the policy’s source indices. Enrich indices always begin with `.enrich-*`, are read-only, and are [force merged](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-forcemerge). ::::{warning} -Enrich indices should only be used by the [enrich processor](elasticsearch://reference/ingestion-tools/enrich-processor/enrich-processor.md) or the [{{esql}} `ENRICH` command](elasticsearch://reference/query-languages/esql/esql-commands.md#esql-enrich). Avoid using enrich indices for other purposes. +Enrich indices should only be used by the [enrich processor](elasticsearch://reference/enrich-processor/enrich-processor.md) or the [{{esql}} `ENRICH` command](elasticsearch://reference/query-languages/esql/esql-commands.md#esql-enrich). Avoid using enrich indices for other purposes. :::: @@ -82,7 +82,7 @@ Once you have source indices, an enrich policy, and the related enrich index in :alt: enrich processor ::: -Define an [enrich processor](elasticsearch://reference/ingestion-tools/enrich-processor/enrich-processor.md) and add it to an ingest pipeline using the [create or update pipeline API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-ingest-put-pipeline). +Define an [enrich processor](elasticsearch://reference/enrich-processor/enrich-processor.md) and add it to an ingest pipeline using the [create or update pipeline API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-ingest-put-pipeline). When defining the enrich processor, you must include at least the following: @@ -92,9 +92,9 @@ When defining the enrich processor, you must include at least the following: You also can use the `max_matches` option to set the number of enrich documents an incoming document can match. If set to the default of `1`, data is added to an incoming document’s target field as a JSON object. Otherwise, the data is added as an array. -See [Enrich](elasticsearch://reference/ingestion-tools/enrich-processor/enrich-processor.md) for a full list of configuration options. +See [Enrich](elasticsearch://reference/enrich-processor/enrich-processor.md) for a full list of configuration options. -You also can add other [processors](elasticsearch://reference/ingestion-tools/enrich-processor/index.md) to your ingest pipeline. +You also can add other [processors](elasticsearch://reference/enrich-processor/index.md) to your ingest pipeline. ## Ingest and enrich documents [ingest-enrich-docs] diff --git a/manage-data/lifecycle/rollup/understanding-groups.md b/manage-data/lifecycle/rollup/understanding-groups.md index 51dbf639d..af9aedca4 100644 --- a/manage-data/lifecycle/rollup/understanding-groups.md +++ b/manage-data/lifecycle/rollup/understanding-groups.md @@ -111,7 +111,7 @@ Ultimately, when configuring `groups` for a job, think in terms of how you might ## Calendar vs fixed time intervals [rollup-understanding-group-intervals] -Each rollup-job must have a date histogram group with a defined interval. {{es}} understands both [calendar and fixed time intervals](elasticsearch://reference/data-analysis/aggregations/search-aggregations-bucket-datehistogram-aggregation.md#calendar_and_fixed_intervals). Fixed time intervals are fairly easy to understand; `60s` means sixty seconds. But what does `1M` mean? One month of time depends on which month we are talking about, some months are longer or shorter than others. This is an example of calendar time and the duration of that unit depends on context. Calendar units are also affected by leap-seconds, leap-years, etc. +Each rollup-job must have a date histogram group with a defined interval. {{es}} understands both [calendar and fixed time intervals](elasticsearch://reference/aggregations/search-aggregations-bucket-datehistogram-aggregation.md#calendar_and_fixed_intervals). Fixed time intervals are fairly easy to understand; `60s` means sixty seconds. But what does `1M` mean? One month of time depends on which month we are talking about, some months are longer or shorter than others. This is an example of calendar time and the duration of that unit depends on context. Calendar units are also affected by leap-seconds, leap-years, etc. This is important because the buckets generated by rollup are in either calendar or fixed intervals and this limits how you can query them later. See [Requests must be multiples of the config](rollup-search-limitations.md#rollup-search-limitations-intervals). diff --git a/raw-migrated-files/docs-content/serverless/observability-plaintext-application-logs.md b/raw-migrated-files/docs-content/serverless/observability-plaintext-application-logs.md index 157be9faf..89c40c6cd 100644 --- a/raw-migrated-files/docs-content/serverless/observability-plaintext-application-logs.md +++ b/raw-migrated-files/docs-content/serverless/observability-plaintext-application-logs.md @@ -259,7 +259,7 @@ Also, refer to [{{filebeat}} and systemd](beats://reference/filebeat/running-wit Use an ingest pipeline to parse the contents of your logs into structured, [Elastic Common Schema (ECS)](ecs://reference/index.md)-compatible fields. -Create an ingest pipeline with a [dissect processor](elasticsearch://reference/ingestion-tools/enrich-processor/dissect-processor.md) to extract structured ECS fields from your log messages. In your project, go to **Developer Tools** and use a command similar to the following example: +Create an ingest pipeline with a [dissect processor](elasticsearch://reference/enrich-processor/dissect-processor.md) to extract structured ECS fields from your log messages. In your project, go to **Developer Tools** and use a command similar to the following example: ```shell PUT _ingest/pipeline/filebeat* <1> @@ -277,7 +277,7 @@ PUT _ingest/pipeline/filebeat* <1> ``` 1. `_ingest/pipeline/filebeat*`: The name of the pipeline. Update the pipeline name to match the name of your data stream. For more information, refer to [Data stream naming scheme](/reference/fleet/data-streams.md#data-streams-naming-scheme). -2. `processors.dissect`: Adds a [dissect processor](elasticsearch://reference/ingestion-tools/enrich-processor/dissect-processor.md) to extract structured fields from your log message. +2. `processors.dissect`: Adds a [dissect processor](elasticsearch://reference/enrich-processor/dissect-processor.md) to extract structured fields from your log message. 3. `field`: The field you’re extracting data from, `message` in this case. 4. `pattern`: The pattern of the elements in your log data. The pattern varies depending on your log format. `%{@timestamp}`, `%{log.level}`, `%{host.ip}`, and `%{{message}}` are common [ECS](ecs://reference/index.md) fields. This pattern would match a log file in this format: `2023-11-07T09:39:01.012Z ERROR 192.168.1.110 Server hardware failure detected.` @@ -344,7 +344,7 @@ To aggregate or search for information in plaintext logs, use an ingest pipeline 2. Select the integration policy you created in the previous section. 3. Click **Change defaults** → **Advanced options**. 4. Under **Ingest pipelines**, click **Add custom pipeline**. -5. Create an ingest pipeline with a [dissect processor](elasticsearch://reference/ingestion-tools/enrich-processor/dissect-processor.md) to extract structured fields from your log messages. +5. Create an ingest pipeline with a [dissect processor](elasticsearch://reference/enrich-processor/dissect-processor.md) to extract structured fields from your log messages. Click **Import processors** and add a similar JSON to the following example: @@ -362,7 +362,7 @@ To aggregate or search for information in plaintext logs, use an ingest pipeline } ``` - 1. `processors.dissect`: Adds a [dissect processor](elasticsearch://reference/ingestion-tools/enrich-processor/dissect-processor.md) to extract structured fields from your log message. + 1. `processors.dissect`: Adds a [dissect processor](elasticsearch://reference/enrich-processor/dissect-processor.md) to extract structured fields from your log message. 2. `field`: The field you’re extracting data from, `message` in this case. 3. `pattern`: The pattern of the elements in your log data. The pattern varies depending on your log format. `%{@timestamp}`, `%{log.level}`, `%{host.ip}`, and `%{{message}}` are common [ECS](ecs://reference/index.md) fields. This pattern would match a log file in this format: `2023-11-07T09:39:01.012Z ERROR 192.168.1.110 Server hardware failure detected.` diff --git a/reference/data-analysis/index.md b/reference/data-analysis/index.md index 5b9dff399..2326d928a 100644 --- a/reference/data-analysis/index.md +++ b/reference/data-analysis/index.md @@ -4,7 +4,7 @@ This section contains reference information for data analysis features, including: -* [Text analysis components](elasticsearch://reference/data-analysis/text-analysis/index.md) -* [Aggregations](elasticsearch://reference/data-analysis/aggregations/index.md) +* [Text analysis components](elasticsearch://reference/text-analysis/index.md) +* [Aggregations](elasticsearch://reference/aggregations/index.md) * [Machine learning functions](/reference/data-analysis/machine-learning/machine-learning-functions.md) * [Canvas functions](/reference/data-analysis/kibana/canvas-functions.md) diff --git a/reference/fleet/data-streams-pipeline-tutorial.md b/reference/fleet/data-streams-pipeline-tutorial.md index 046f6db45..34196ceaf 100644 --- a/reference/fleet/data-streams-pipeline-tutorial.md +++ b/reference/fleet/data-streams-pipeline-tutorial.md @@ -24,7 +24,7 @@ Create a custom ingest pipeline that will be called by the default integration p * Field: `test` * Value: `true` - The [Set processor](elasticsearch://reference/ingestion-tools/enrich-processor/set-processor.md) sets a document field and associates it with the specified value. + The [Set processor](elasticsearch://reference/enrich-processor/set-processor.md) sets a document field and associates it with the specified value. 4. Click **Add**. 5. Click **Create pipeline**. diff --git a/solutions/observability/apps/built-in-data-filters.md b/solutions/observability/apps/built-in-data-filters.md index 8a9e71ab6..863494b65 100644 --- a/solutions/observability/apps/built-in-data-filters.md +++ b/solutions/observability/apps/built-in-data-filters.md @@ -59,7 +59,7 @@ This setting supports [Central configuration](apm-agent-central-configuration.md By default, the APM Server captures some personal data associated with trace events: -* `client.ip`: The client’s IP address. Typically derived from the HTTP headers of incoming requests. `client.ip` is also used in conjunction with the [`geoip` processor](elasticsearch://reference/ingestion-tools/enrich-processor/geoip-processor.md) to assign geographical information to trace events. To learn more about how `client.ip` is derived, see [Deriving an incoming request’s `client.ip` address](anonymous-authentication.md#apm-derive-client-ip). +* `client.ip`: The client’s IP address. Typically derived from the HTTP headers of incoming requests. `client.ip` is also used in conjunction with the [`geoip` processor](elasticsearch://reference/enrich-processor/geoip-processor.md) to assign geographical information to trace events. To learn more about how `client.ip` is derived, see [Deriving an incoming request’s `client.ip` address](anonymous-authentication.md#apm-derive-client-ip). * `user_agent`: User agent data, including the client operating system, device name, vendor, and version. The capturing of this data can be turned off by setting **Capture personal data** to `false`. diff --git a/solutions/observability/apps/custom-filters.md b/solutions/observability/apps/custom-filters.md index 054086262..56d0165ca 100644 --- a/solutions/observability/apps/custom-filters.md +++ b/solutions/observability/apps/custom-filters.md @@ -51,7 +51,7 @@ Say you decide to [capture HTTP request bodies](built-in-data-filters.md#apm-fil } ``` -To obfuscate the passwords stored in the request body, you can use a series of [ingest processors](elasticsearch://reference/ingestion-tools/enrich-processor/index.md). +To obfuscate the passwords stored in the request body, you can use a series of [ingest processors](elasticsearch://reference/enrich-processor/index.md). ### Create a pipeline [_create_a_pipeline] @@ -78,7 +78,7 @@ To start, create a pipeline with a simple description and an empty array of proc #### Add a JSON processor [_add_a_json_processor] -Add your first processor to the processors array. Because the agent captures the request body as a string, use the [JSON processor](elasticsearch://reference/ingestion-tools/enrich-processor/json-processor.md) to convert the original field value into a structured JSON object. Save this JSON object in a new field: +Add your first processor to the processors array. Because the agent captures the request body as a string, use the [JSON processor](elasticsearch://reference/enrich-processor/json-processor.md) to convert the original field value into a structured JSON object. Save this JSON object in a new field: ```json { @@ -93,7 +93,7 @@ Add your first processor to the processors array. Because the agent captures the #### Add a set processor [_add_a_set_processor] -If `body.original_json` is not `null`, i.e., it exists, we’ll redact the `password` with the [set processor](elasticsearch://reference/ingestion-tools/enrich-processor/set-processor.md), by setting the value of `body.original_json.password` to `"redacted"`: +If `body.original_json` is not `null`, i.e., it exists, we’ll redact the `password` with the [set processor](elasticsearch://reference/enrich-processor/set-processor.md), by setting the value of `body.original_json.password` to `"redacted"`: ```json { @@ -108,7 +108,7 @@ If `body.original_json` is not `null`, i.e., it exists, we’ll redact the `pass #### Add a convert processor [_add_a_convert_processor] -Use the [convert processor](elasticsearch://reference/ingestion-tools/enrich-processor/convert-processor.md) to convert the JSON value of `body.original_json` to a string and set it as the `body.original` value: +Use the [convert processor](elasticsearch://reference/enrich-processor/convert-processor.md) to convert the JSON value of `body.original_json` to a string and set it as the `body.original` value: ```json { @@ -125,7 +125,7 @@ Use the [convert processor](elasticsearch://reference/ingestion-tools/enrich-pro #### Add a remove processor [_add_a_remove_processor] -Finally, use the [remove processor](elasticsearch://reference/ingestion-tools/enrich-processor/remove-processor.md) to remove the `body.original_json` field: +Finally, use the [remove processor](elasticsearch://reference/enrich-processor/remove-processor.md) to remove the `body.original_json` field: ```json { diff --git a/solutions/observability/apps/data-streams.md b/solutions/observability/apps/data-streams.md index 19c28490d..5189f42c3 100644 --- a/solutions/observability/apps/data-streams.md +++ b/solutions/observability/apps/data-streams.md @@ -75,7 +75,7 @@ Logs ## APM data stream rerouting [apm-data-stream-rerouting] -APM supports rerouting APM data to user-defined APM data stream names other than the defaults. This can be achieved by using a [`reroute` processor](elasticsearch://reference/ingestion-tools/enrich-processor/reroute-processor.md) in ingest pipelines to set the data stream dataset or namespace. The benefit of separating APM data streams is that custom retention and security policies can be used. +APM supports rerouting APM data to user-defined APM data stream names other than the defaults. This can be achieved by using a [`reroute` processor](elasticsearch://reference/enrich-processor/reroute-processor.md) in ingest pipelines to set the data stream dataset or namespace. The benefit of separating APM data streams is that custom retention and security policies can be used. For example, consider traces that would originally be indexed to `traces-apm-default`. To set the data stream namespace from the trace’s `service.environment` and fallback to a static string `"default"`, create an ingest pipeline named `traces-apm@custom` which will be used automatically: diff --git a/solutions/observability/apps/tutorial-monitor-java-application.md b/solutions/observability/apps/tutorial-monitor-java-application.md index 37d7cc859..cf91028b8 100644 --- a/solutions/observability/apps/tutorial-monitor-java-application.md +++ b/solutions/observability/apps/tutorial-monitor-java-application.md @@ -1342,9 +1342,9 @@ Visualize the number of log messages over time, split by the log level. Since th 1. Log into {{kib}} and select **Visualize** → **Create Visualization**. 2. Create a line chart and select `metricbeat-*` as the source. - The basic idea is to have a [max aggregation](elasticsearch://reference/data-analysis/aggregations/search-aggregations-metrics-max-aggregation.md) on the y-axis on the `prometheus.log4j2_events_total.rate` field, whereas the x-axis, is split by date using a [date_histogram aggregation](elasticsearch://reference/data-analysis/aggregations/search-aggregations-bucket-datehistogram-aggregation.md) on the `@timestamp` field. + The basic idea is to have a [max aggregation](elasticsearch://reference/aggregations/search-aggregations-metrics-max-aggregation.md) on the y-axis on the `prometheus.log4j2_events_total.rate` field, whereas the x-axis, is split by date using a [date_histogram aggregation](elasticsearch://reference/aggregations/search-aggregations-bucket-datehistogram-aggregation.md) on the `@timestamp` field. - There is one more split within each date histogram bucket, split by log level, using a [terms aggregation](elasticsearch://reference/data-analysis/aggregations/search-aggregations-bucket-terms-aggregation.md) on the `prometheus.labels.level`, which contains the log level. Also, increase the size of the log level to six to display every log level. + There is one more split within each date histogram bucket, split by log level, using a [terms aggregation](elasticsearch://reference/aggregations/search-aggregations-bucket-terms-aggregation.md) on the `prometheus.labels.level`, which contains the log level. Also, increase the size of the log level to six to display every log level. The final result looks like this. diff --git a/solutions/observability/incident-management/create-an-elasticsearch-query-rule.md b/solutions/observability/incident-management/create-an-elasticsearch-query-rule.md index 956106ec4..aec742e18 100644 --- a/solutions/observability/incident-management/create-an-elasticsearch-query-rule.md +++ b/solutions/observability/incident-management/create-an-elasticsearch-query-rule.md @@ -64,7 +64,7 @@ When you create an {{es}} query rule, your choice of query type affects the info : Specify how to calculate the value that is compared to the threshold. The value is calculated by aggregating a numeric field within the time window. The aggregation options are: `count`, `average`, `sum`, `min`, and `max`. When using `count` the document count is used and an aggregation field is not necessary. Over or Grouped Over - : Specify whether the aggregation is applied over all documents or split into groups using up to four grouping fields. If you choose to use grouping, it’s a [terms](elasticsearch://reference/data-analysis/aggregations/search-aggregations-bucket-terms-aggregation.md) or [multi terms aggregation](elasticsearch://reference/data-analysis/aggregations/search-aggregations-bucket-multi-terms-aggregation.md); an alert will be created for each unique set of values when it meets the condition. To limit the number of alerts on high cardinality fields, you must specify the number of groups to check against the threshold. Only the top groups are checked. + : Specify whether the aggregation is applied over all documents or split into groups using up to four grouping fields. If you choose to use grouping, it’s a [terms](elasticsearch://reference/aggregations/search-aggregations-bucket-terms-aggregation.md) or [multi terms aggregation](elasticsearch://reference/aggregations/search-aggregations-bucket-multi-terms-aggregation.md); an alert will be created for each unique set of values when it meets the condition. To limit the number of alerts on high cardinality fields, you must specify the number of groups to check against the threshold. Only the top groups are checked. Threshold : Defines a threshold value and a comparison operator (`is above`, `is above or equals`, `is below`, `is below or equals`, or `is between`). The value calculated by the aggregation is compared to this threshold. diff --git a/solutions/observability/logs/filter-aggregate-logs.md b/solutions/observability/logs/filter-aggregate-logs.md index 273dcb7ee..9b5e8448a 100644 --- a/solutions/observability/logs/filter-aggregate-logs.md +++ b/solutions/observability/logs/filter-aggregate-logs.md @@ -214,7 +214,7 @@ The filtered results should show `WARN` and `ERROR` logs that occurred within th ## Aggregate logs [logs-aggregate] -Use aggregation to analyze and summarize your log data to find patterns and gain insight. [Bucket aggregations](elasticsearch://reference/data-analysis/aggregations/bucket.md) organize log data into meaningful groups making it easier to identify patterns, trends, and anomalies within your logs. +Use aggregation to analyze and summarize your log data to find patterns and gain insight. [Bucket aggregations](elasticsearch://reference/aggregations/bucket.md) organize log data into meaningful groups making it easier to identify patterns, trends, and anomalies within your logs. For example, you might want to understand error distribution by analyzing the count of logs per log level. diff --git a/solutions/observability/logs/parse-route-logs.md b/solutions/observability/logs/parse-route-logs.md index 01901a634..ecef1e597 100644 --- a/solutions/observability/logs/parse-route-logs.md +++ b/solutions/observability/logs/parse-route-logs.md @@ -132,9 +132,9 @@ When looking into issues, you want to filter for logs by when the issue occurred #### Use an ingest pipeline to extract the `@timestamp` field [observability-parse-log-data-use-an-ingest-pipeline-to-extract-the-timestamp-field] -Ingest pipelines consist of a series of processors that perform common transformations on incoming documents before they are indexed. To extract the `@timestamp` field from the example log, use an ingest pipeline with a [dissect processor](elasticsearch://reference/ingestion-tools/enrich-processor/dissect-processor.md). The dissect processor extracts structured fields from unstructured log messages based on a pattern you set. +Ingest pipelines consist of a series of processors that perform common transformations on incoming documents before they are indexed. To extract the `@timestamp` field from the example log, use an ingest pipeline with a [dissect processor](elasticsearch://reference/enrich-processor/dissect-processor.md). The dissect processor extracts structured fields from unstructured log messages based on a pattern you set. -Elastic can parse string timestamps that are in `yyyy-MM-dd'T'HH:mm:ss.SSSZ` and `yyyy-MM-dd` formats into date fields. Since the log example’s timestamp is in one of these formats, you don’t need additional processors. More complex or nonstandard timestamps require a [date processor](elasticsearch://reference/ingestion-tools/enrich-processor/date-processor.md) to parse the timestamp into a date field. +Elastic can parse string timestamps that are in `yyyy-MM-dd'T'HH:mm:ss.SSSZ` and `yyyy-MM-dd` formats into date fields. Since the log example’s timestamp is in one of these formats, you don’t need additional processors. More complex or nonstandard timestamps require a [date processor](elasticsearch://reference/enrich-processor/date-processor.md) to parse the timestamp into a date field. Use the following command to extract the timestamp from the `message` field into the `@timestamp` field: @@ -306,7 +306,7 @@ You can now use the `@timestamp` field to sort your logs by the date and time th Check the following common issues and solutions with timestamps: * **Timestamp failure:** If your data has inconsistent date formats, set `ignore_failure` to `true` for your date processor. This processes logs with correctly formatted dates and ignores those with issues. -* **Incorrect timezone:** Set your timezone using the `timezone` option on the [date processor](elasticsearch://reference/ingestion-tools/enrich-processor/date-processor.md). +* **Incorrect timezone:** Set your timezone using the `timezone` option on the [date processor](elasticsearch://reference/enrich-processor/date-processor.md). * **Incorrect timestamp format:** Your timestamp can be a Java time pattern or one of the following formats: ISO8601, UNIX, UNIX_MS, or TAI64N. For more information on timestamp formats, refer to the [mapping date format](elasticsearch://reference/elasticsearch/mapping-reference/mapping-date-format.md). @@ -746,7 +746,7 @@ You’ll get the following results only showing logs in the range you’ve set: ## Reroute log data to specific data streams [observability-parse-log-data-reroute-log-data-to-specific-data-streams] -By default, an ingest pipeline sends your log data to a single data stream. To simplify log data management, use a [reroute processor](elasticsearch://reference/ingestion-tools/enrich-processor/reroute-processor.md) to route data from the generic data stream to a target data stream. For example, you might want to send high-severity logs to a specific data stream to help with categorization. +By default, an ingest pipeline sends your log data to a single data stream. To simplify log data management, use a [reroute processor](elasticsearch://reference/enrich-processor/reroute-processor.md) to route data from the generic data stream to a target data stream. For example, you might want to send high-severity logs to a specific data stream to help with categorization. This section shows you how to use a reroute processor to send the high-severity logs (`WARN` or `ERROR`) from the following example logs to a specific data stream and keep the regular logs (`DEBUG` and `INFO`) in the default data stream: diff --git a/solutions/observability/logs/plaintext-application-logs.md b/solutions/observability/logs/plaintext-application-logs.md index 536482c1e..5c6bb0a38 100644 --- a/solutions/observability/logs/plaintext-application-logs.md +++ b/solutions/observability/logs/plaintext-application-logs.md @@ -234,7 +234,7 @@ By default, Windows log files are stored in `C:\ProgramData\filebeat\Logs`. Use an ingest pipeline to parse the contents of your logs into structured, [Elastic Common Schema (ECS)](ecs://reference/index.md)-compatible fields. -Create an ingest pipeline that defines a [dissect processor](elasticsearch://reference/ingestion-tools/enrich-processor/dissect-processor.md) to extract structured ECS fields from your log messages. In your project, navigate to **Developer Tools** and using a command similar to the following example: +Create an ingest pipeline that defines a [dissect processor](elasticsearch://reference/enrich-processor/dissect-processor.md) to extract structured ECS fields from your log messages. In your project, navigate to **Developer Tools** and using a command similar to the following example: ```console PUT _ingest/pipeline/filebeat* <1> @@ -252,7 +252,7 @@ PUT _ingest/pipeline/filebeat* <1> ``` 1. `_ingest/pipeline/filebeat*`: The name of the pipeline. Update the pipeline name to match the name of your data stream. For more information, refer to [Data stream naming scheme](/reference/fleet/data-streams.md#data-streams-naming-scheme). -2. `processors.dissect`: Adds a [dissect processor](elasticsearch://reference/ingestion-tools/enrich-processor/dissect-processor.md) to extract structured fields from your log message. +2. `processors.dissect`: Adds a [dissect processor](elasticsearch://reference/enrich-processor/dissect-processor.md) to extract structured fields from your log message. 3. `field`: The field you’re extracting data from, `message` in this case. 4. `pattern`: The pattern of the elements in your log data. The pattern varies depending on your log format. `%{@timestamp}` is required. `%{log.level}`, `%{host.ip}`, and `%{{message}}` are common [ECS](ecs://reference/index.md) fields. This pattern would match a log file in this format: `2023-11-07T09:39:01.012Z ERROR 192.168.1.110 Server hardware failure detected.` @@ -300,7 +300,7 @@ To aggregate or search for information in plaintext logs, use an ingest pipeline 2. Select the integration policy you created in the previous section. 3. Click **Change defaults → Advanced options**. 4. Under **Ingest pipelines**, click **Add custom pipeline**. -5. Create an ingest pipeline with a [dissect processor](elasticsearch://reference/ingestion-tools/enrich-processor/dissect-processor.md) to extract structured fields from your log messages. +5. Create an ingest pipeline with a [dissect processor](elasticsearch://reference/enrich-processor/dissect-processor.md) to extract structured fields from your log messages. Click **Import processors** and add a similar JSON to the following example: @@ -318,7 +318,7 @@ To aggregate or search for information in plaintext logs, use an ingest pipeline } ``` - 1. `processors.dissect`: Adds a [dissect processor](elasticsearch://reference/ingestion-tools/enrich-processor/dissect-processor.md) to extract structured fields from your log message. + 1. `processors.dissect`: Adds a [dissect processor](elasticsearch://reference/enrich-processor/dissect-processor.md) to extract structured fields from your log message. 2. `field`: The field you’re extracting data from, `message` in this case. 3. `pattern`: The pattern of the elements in your log data. The pattern varies depending on your log format. `%{@timestamp}`, `%{log.level}`, `%{host.ip}`, and `%{{message}}` are common [ECS](ecs://reference/index.md) fields. This pattern would match a log file in this format: `2023-11-07T09:39:01.012Z ERROR 192.168.1.110 Server hardware failure detected.` diff --git a/solutions/observability/observability-ai-assistant.md b/solutions/observability/observability-ai-assistant.md index d1e8735b2..ae2415795 100644 --- a/solutions/observability/observability-ai-assistant.md +++ b/solutions/observability/observability-ai-assistant.md @@ -44,7 +44,7 @@ Also, the data you provide to the Observability AI assistant is *not* anonymized The AI assistant requires the following: * {{stack}} version 8.9 and later. -* A self-deployed connector service if [search connectors](elasticsearch://reference/ingestion-tools/search-connectors/self-managed-connectors.md) are used to populate external data into the knowledge base. +* A self-deployed connector service if [search connectors](elasticsearch://reference/search-connectors/self-managed-connectors.md) are used to populate external data into the knowledge base. * An account with a third-party generative AI provider that preferably supports function calling. If your AI provider does not support function calling, you can configure AI Assistant settings under **Stack Management** to simulate function calling, but this might affect performance. Refer to the [connector documentation](../../deploy-manage/manage-connectors.md) for your provider to learn about supported and default models. @@ -147,16 +147,16 @@ To add external data to the knowledge base in {{kib}}: ### Use search connectors [obs-ai-search-connectors] ::::{tip} -The [search connectors](elasticsearch://reference/ingestion-tools/search-connectors/index.md) described in this section differ from the [Stack management → Connectors](../../deploy-manage/manage-connectors.md) configured during the [AI Assistant setup](#obs-ai-set-up). Search connectors are only needed when importing external data into the Knowledge base of the AI Assistant, while the stack connector to the LLM is required for the AI Assistant to work. +The [search connectors](elasticsearch://reference/search-connectors/index.md) described in this section differ from the [Stack management → Connectors](../../deploy-manage/manage-connectors.md) configured during the [AI Assistant setup](#obs-ai-set-up). Search connectors are only needed when importing external data into the Knowledge base of the AI Assistant, while the stack connector to the LLM is required for the AI Assistant to work. :::: -[Connectors](elasticsearch://reference/ingestion-tools/search-connectors/index.md) allow you to index content from external sources thereby making it available for the AI Assistant. This can greatly improve the relevance of the AI Assistant’s responses. Data can be integrated from sources such as GitHub, Confluence, Google Drive, Jira, AWS S3, Microsoft Teams, Slack, and more. +[Connectors](elasticsearch://reference/search-connectors/index.md) allow you to index content from external sources thereby making it available for the AI Assistant. This can greatly improve the relevance of the AI Assistant’s responses. Data can be integrated from sources such as GitHub, Confluence, Google Drive, Jira, AWS S3, Microsoft Teams, Slack, and more. UI affordances for creating and managing search connectors are available in the Search Solution in {{kib}}. You can also use the {{es}} [Connector APIs](https://www.elastic.co/docs/api/doc/elasticsearch/group/endpoint-connector) to create and manage search connectors. -The infrastructure for deploying connectors must be [self-managed](elasticsearch://reference/ingestion-tools/search-connectors/self-managed-connectors.md). +The infrastructure for deploying connectors must be [self-managed](elasticsearch://reference/search-connectors/self-managed-connectors.md). By default, the AI Assistant queries all search connector indices. To override this behavior and customize which indices are queried, adjust the **Search connector index pattern** setting on the [AI Assistant Settings](#obs-ai-settings) page. This allows precise control over which data sources are included in AI Assistant knowledge base. @@ -171,9 +171,9 @@ To create a connector in the {{kib}} UI and make its content available to the AI 2. Follow the instructions to create a new connector. - For example, if you create a [GitHub connector](elasticsearch://reference/ingestion-tools/search-connectors/es-connectors-github.md) you have to set a `name`, attach it to a new or existing `index`, add your `personal access token` and include the `list of repositories` to synchronize. + For example, if you create a [GitHub connector](elasticsearch://reference/search-connectors/es-connectors-github.md) you have to set a `name`, attach it to a new or existing `index`, add your `personal access token` and include the `list of repositories` to synchronize. - Learn more about configuring and [using connectors](elasticsearch://reference/ingestion-tools/search-connectors/connectors-ui-in-kibana.md) in the Elasticsearch documentation. + Learn more about configuring and [using connectors](elasticsearch://reference/search-connectors/connectors-ui-in-kibana.md) in the Elasticsearch documentation. After creating your connector, create the embeddings needed by the AI Assistant. You can do this using either: diff --git a/solutions/search/full-text.md b/solutions/search/full-text.md index 849563bb8..08892297b 100644 --- a/solutions/search/full-text.md +++ b/solutions/search/full-text.md @@ -36,8 +36,8 @@ Learn about the core components of full-text search: * [Text fields](elasticsearch://reference/elasticsearch/mapping-reference/text.md) * [Text analysis](full-text/text-analysis-during-search.md) - * [Tokenizers](elasticsearch://reference/data-analysis/text-analysis/tokenizer-reference.md) - * [Analyzers](elasticsearch://reference/data-analysis/text-analysis/analyzer-reference.md) + * [Tokenizers](elasticsearch://reference/text-analysis/tokenizer-reference.md) + * [Analyzers](elasticsearch://reference/text-analysis/analyzer-reference.md) **{{es}} query languages** diff --git a/solutions/search/full-text/how-full-text-works.md b/solutions/search/full-text/how-full-text-works.md index a80bf60fe..fce499d5a 100644 --- a/solutions/search/full-text/how-full-text-works.md +++ b/solutions/search/full-text/how-full-text-works.md @@ -14,7 +14,7 @@ The following diagram illustrates the components of full-text search. At a high level, full-text search involves the following: -* [**Text analysis**](../../../manage-data/data-store/text-analysis.md): Analysis consists of a pipeline of sequential transformations. Text is transformed into a format optimized for searching using techniques such as stemming, lowercasing, and stop word elimination. {{es}} contains a number of built-in [analyzers](elasticsearch://reference/data-analysis/text-analysis/analyzer-reference.md) and tokenizers, including options to analyze specific language text. You can also create custom analyzers. +* [**Text analysis**](../../../manage-data/data-store/text-analysis.md): Analysis consists of a pipeline of sequential transformations. Text is transformed into a format optimized for searching using techniques such as stemming, lowercasing, and stop word elimination. {{es}} contains a number of built-in [analyzers](elasticsearch://reference/text-analysis/analyzer-reference.md) and tokenizers, including options to analyze specific language text. You can also create custom analyzers. ::::{tip} Refer to [Test an analyzer](../../../manage-data/data-store/text-analysis/test-an-analyzer.md) to learn how to test an analyzer and inspect the tokens and metadata it generates. :::: diff --git a/solutions/search/full-text/search-with-synonyms.md b/solutions/search/full-text/search-with-synonyms.md index cd58dc543..9a96233fb 100644 --- a/solutions/search/full-text/search-with-synonyms.md +++ b/solutions/search/full-text/search-with-synonyms.md @@ -136,10 +136,10 @@ An index with invalid synonym rules cannot be reopened, making it inoperable whe :::: -{{es}} uses synonyms as part of the [analysis process](../../../manage-data/data-store/text-analysis.md). You can use two types of [token filter](elasticsearch://reference/data-analysis/text-analysis/token-filter-reference.md) to include synonyms: +{{es}} uses synonyms as part of the [analysis process](../../../manage-data/data-store/text-analysis.md). You can use two types of [token filter](elasticsearch://reference/text-analysis/token-filter-reference.md) to include synonyms: -* [Synonym graph](elasticsearch://reference/data-analysis/text-analysis/analysis-synonym-graph-tokenfilter.md): It is recommended to use it, as it can correctly handle multi-word synonyms ("hurriedly", "in a hurry"). -* [Synonym](elasticsearch://reference/data-analysis/text-analysis/analysis-synonym-tokenfilter.md): Not recommended if you need to use multi-word synonyms. +* [Synonym graph](elasticsearch://reference/text-analysis/analysis-synonym-graph-tokenfilter.md): It is recommended to use it, as it can correctly handle multi-word synonyms ("hurriedly", "in a hurry"). +* [Synonym](elasticsearch://reference/text-analysis/analysis-synonym-tokenfilter.md): Not recommended if you need to use multi-word synonyms. Check each synonym token filter documentation for configuration details and instructions on adding it to an analyzer. diff --git a/solutions/search/full-text/text-analysis-during-search.md b/solutions/search/full-text/text-analysis-during-search.md index b3d88aab8..7bb34715c 100644 --- a/solutions/search/full-text/text-analysis-during-search.md +++ b/solutions/search/full-text/text-analysis-during-search.md @@ -32,9 +32,9 @@ Learn more about text analysis in the **Manage Data** section of the documentati * [Overview](../../../manage-data/data-store/text-analysis.md) * [Concepts](../../../manage-data/data-store/text-analysis/concepts.md) * [*Configure text analysis*](../../../manage-data/data-store/text-analysis/configure-text-analysis.md) -* [*Built-in analyzer reference*](elasticsearch://reference/data-analysis/text-analysis/analyzer-reference.md) -* [*Tokenizer reference*](elasticsearch://reference/data-analysis/text-analysis/tokenizer-reference.md) -* [*Token filter reference*](elasticsearch://reference/data-analysis/text-analysis/token-filter-reference.md) -* [*Character filters reference*](elasticsearch://reference/data-analysis/text-analysis/character-filter-reference.md) -* [*Normalizers*](elasticsearch://reference/data-analysis/text-analysis/normalizers.md) +* [*Built-in analyzer reference*](elasticsearch://reference/text-analysis/analyzer-reference.md) +* [*Tokenizer reference*](elasticsearch://reference/text-analysis/tokenizer-reference.md) +* [*Token filter reference*](elasticsearch://reference/text-analysis/token-filter-reference.md) +* [*Character filters reference*](elasticsearch://reference/text-analysis/character-filter-reference.md) +* [*Normalizers*](elasticsearch://reference/text-analysis/normalizers.md) diff --git a/solutions/search/rag/playground-troubleshooting.md b/solutions/search/rag/playground-troubleshooting.md index 2d4765cb4..ff61c2a8a 100644 --- a/solutions/search/rag/playground-troubleshooting.md +++ b/solutions/search/rag/playground-troubleshooting.md @@ -14,7 +14,7 @@ This functionality is in technical preview and may be changed or removed in a fu Dense vectors are not searchable -: Embeddings must be generated using the [inference processor](elasticsearch://reference/ingestion-tools/enrich-processor/inference-processor.md) with an ML node. +: Embeddings must be generated using the [inference processor](elasticsearch://reference/enrich-processor/inference-processor.md) with an ML node. Context length error : You’ll need to adjust the size of the context you’re sending to the model. Refer to [Optimize model context](playground-context.md). diff --git a/solutions/search/rag/playground.md b/solutions/search/rag/playground.md index e180ee2de..3f6bf77a8 100644 --- a/solutions/search/rag/playground.md +++ b/solutions/search/rag/playground.md @@ -146,7 +146,7 @@ If you need to update a connector, or add a new one, click the 🔧 **Manage** b There are many options for ingesting data into {{es}}, including: * The [Elastic crawler](https://www.elastic.co/guide/en/enterprise-search/current/crawler.html) for web content (**NOTE**: Not yet available in *Serverless*) -* [Elastic connectors](elasticsearch://reference/ingestion-tools/search-connectors/index.md) for data synced from third-party sources +* [Elastic connectors](elasticsearch://reference/search-connectors/index.md) for data synced from third-party sources * The {{es}} [Bulk API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-bulk) for JSON documents ::::{dropdown} **Expand** for example diff --git a/solutions/search/search-pipelines.md b/solutions/search/search-pipelines.md index 6582c2519..d6f3bd1d3 100644 --- a/solutions/search/search-pipelines.md +++ b/solutions/search/search-pipelines.md @@ -41,7 +41,7 @@ To this end, when you create indices for search use cases, (including web crawle This pipeline is called `search-default-ingestion`. While it is a "managed" pipeline (meaning it should not be tampered with), you can view its details via the Kibana UI or the Elasticsearch API. You can also [read more about its contents below](#ingest-pipeline-search-details-generic-reference). -You can control whether you run some of these processors. While all features are enabled by default, they are eligible for opt-out. For [Elastic crawler](https://www.elastic.co/guide/en/enterprise-search/current/crawler.html) and [connectors](elasticsearch://reference/ingestion-tools/search-connectors/index.md). , you can opt out (or back in) per index, and your choices are saved. For API indices, you can opt out (or back in) by including specific fields in your documents. [See below for details](#ingest-pipeline-search-pipeline-settings-using-the-api). +You can control whether you run some of these processors. While all features are enabled by default, they are eligible for opt-out. For [Elastic crawler](https://www.elastic.co/guide/en/enterprise-search/current/crawler.html) and [connectors](elasticsearch://reference/search-connectors/index.md). , you can opt out (or back in) per index, and your choices are saved. For API indices, you can opt out (or back in) by including specific fields in your documents. [See below for details](#ingest-pipeline-search-pipeline-settings-using-the-api). At the deployment level, you can change the default settings for all new indices. This will not effect existing indices. @@ -111,12 +111,12 @@ This pipeline is a "managed" pipeline. That means that it is not intended to be #### Processors [ingest-pipeline-search-details-generic-reference-processors] -1. `attachment` - this uses the [Attachment](elasticsearch://reference/ingestion-tools/enrich-processor/attachment.md) processor to convert any binary data stored in a document’s `_attachment` field to a nested object of plain text and metadata. -2. `set_body` - this uses the [Set](elasticsearch://reference/ingestion-tools/enrich-processor/set-processor.md) processor to copy any plain text extracted from the previous step and persist it on the document in the `body` field. -3. `remove_replacement_chars` - this uses the [Gsub](elasticsearch://reference/ingestion-tools/enrich-processor/gsub-processor.md) processor to remove characters like "�" from the `body` field. -4. `remove_extra_whitespace` - this uses the [Gsub](elasticsearch://reference/ingestion-tools/enrich-processor/gsub-processor.md) processor to replace consecutive whitespace characters with single spaces in the `body` field. While not perfect for every use case (see below for how to disable), this can ensure that search experiences display more content and highlighting and less empty space for your search results. -5. `trim` - this uses the [Trim](elasticsearch://reference/ingestion-tools/enrich-processor/trim-processor.md) processor to remove any remaining leading or trailing whitespace from the `body` field. -6. `remove_meta_fields` - this final step of the pipeline uses the [Remove](elasticsearch://reference/ingestion-tools/enrich-processor/remove-processor.md) processor to remove special fields that may have been used elsewhere in the pipeline, whether as temporary storage or as control flow parameters. +1. `attachment` - this uses the [Attachment](elasticsearch://reference/enrich-processor/attachment.md) processor to convert any binary data stored in a document’s `_attachment` field to a nested object of plain text and metadata. +2. `set_body` - this uses the [Set](elasticsearch://reference/enrich-processor/set-processor.md) processor to copy any plain text extracted from the previous step and persist it on the document in the `body` field. +3. `remove_replacement_chars` - this uses the [Gsub](elasticsearch://reference/enrich-processor/gsub-processor.md) processor to remove characters like "�" from the `body` field. +4. `remove_extra_whitespace` - this uses the [Gsub](elasticsearch://reference/enrich-processor/gsub-processor.md) processor to replace consecutive whitespace characters with single spaces in the `body` field. While not perfect for every use case (see below for how to disable), this can ensure that search experiences display more content and highlighting and less empty space for your search results. +5. `trim` - this uses the [Trim](elasticsearch://reference/enrich-processor/trim-processor.md) processor to remove any remaining leading or trailing whitespace from the `body` field. +6. `remove_meta_fields` - this final step of the pipeline uses the [Remove](elasticsearch://reference/enrich-processor/remove-processor.md) processor to remove special fields that may have been used elsewhere in the pipeline, whether as temporary storage or as control flow parameters. #### Control flow parameters [ingest-pipeline-search-details-generic-reference-params] @@ -161,8 +161,8 @@ This pipeline is a "managed" pipeline. That means that it is not intended to be In addition to the processors inherited from the [`search-default-ingestion` pipeline](#ingest-pipeline-search-details-generic-reference), the index-specific pipeline also defines: -* `index_ml_inference_pipeline` - this uses the [Pipeline](elasticsearch://reference/ingestion-tools/enrich-processor/pipeline-processor.md) processor to run the `@ml-inference` pipeline. This processor will only be run if the source document includes a `_run_ml_inference` field with the value `true`. -* `index_custom_pipeline` - this uses the [Pipeline](elasticsearch://reference/ingestion-tools/enrich-processor/pipeline-processor.md) processor to run the `@custom` pipeline. +* `index_ml_inference_pipeline` - this uses the [Pipeline](elasticsearch://reference/enrich-processor/pipeline-processor.md) processor to run the `@ml-inference` pipeline. This processor will only be run if the source document includes a `_run_ml_inference` field with the value `true`. +* `index_custom_pipeline` - this uses the [Pipeline](elasticsearch://reference/enrich-processor/pipeline-processor.md) processor to run the `@custom` pipeline. ##### Control flow parameters [ingest-pipeline-search-details-specific-reference-params] diff --git a/solutions/search/semantic-search/cohere-es.md b/solutions/search/semantic-search/cohere-es.md index 7fe25a4cc..0c118ff94 100644 --- a/solutions/search/semantic-search/cohere-es.md +++ b/solutions/search/semantic-search/cohere-es.md @@ -127,7 +127,7 @@ client.indices.create( ## Create the {{infer}} pipeline [cohere-es-infer-pipeline] -Now you have an {{infer}} endpoint and an index ready to store embeddings. The next step is to create an [ingest pipeline](../../../manage-data/ingest/transform-enrich/ingest-pipelines.md) with an [{{infer}} processor](elasticsearch://reference/ingestion-tools/enrich-processor/inference-processor.md) that will create the embeddings using the {{infer}} endpoint and stores them in the index. +Now you have an {{infer}} endpoint and an index ready to store embeddings. The next step is to create an [ingest pipeline](../../../manage-data/ingest/transform-enrich/ingest-pipelines.md) with an [{{infer}} processor](elasticsearch://reference/enrich-processor/inference-processor.md) that will create the embeddings using the {{infer}} endpoint and stores them in the index. ```py client.ingest.put_pipeline( diff --git a/solutions/search/semantic-search/semantic-search-elser-ingest-pipelines.md b/solutions/search/semantic-search/semantic-search-elser-ingest-pipelines.md index d0cb25ec5..d95b9a3a1 100644 --- a/solutions/search/semantic-search/semantic-search-elser-ingest-pipelines.md +++ b/solutions/search/semantic-search/semantic-search-elser-ingest-pipelines.md @@ -66,7 +66,7 @@ PUT my-index 4. The field type which is text in this example. -To learn how to optimize space, refer to the [Saving disk space by excluding the ELSER tokens from document source](/manage-data/ingest/transform-enrich/ingest-pipelines.md) with an [{{infer}} processor](elasticsearch://reference/ingestion-tools/enrich-processor/inference-processor.md) to use ELSER to infer against the data that is being ingested in the pipeline. +To learn how to optimize space, refer to the [Saving disk space by excluding the ELSER tokens from document source](/manage-data/ingest/transform-enrich/ingest-pipelines.md) with an [{{infer}} processor](elasticsearch://reference/enrich-processor/inference-processor.md) to use ELSER to infer against the data that is being ingested in the pipeline. ```console PUT _ingest/pipeline/elser-v2-test diff --git a/solutions/search/semantic-search/semantic-search-inference.md b/solutions/search/semantic-search/semantic-search-inference.md index d882c7a94..5e435893f 100644 --- a/solutions/search/semantic-search/semantic-search-inference.md +++ b/solutions/search/semantic-search/semantic-search-inference.md @@ -598,7 +598,7 @@ PUT alibabacloud-ai-search-embeddings ## Create an ingest pipeline with an inference processor [infer-service-inference-ingest-pipeline] -Create an [ingest pipeline](../../../manage-data/ingest/transform-enrich/ingest-pipelines.md) with an [{{infer}} processor](elasticsearch://reference/ingestion-tools/enrich-processor/inference-processor.md) and use the model you created above to infer against the data that is being ingested in the pipeline. +Create an [ingest pipeline](../../../manage-data/ingest/transform-enrich/ingest-pipelines.md) with an [{{infer}} processor](elasticsearch://reference/enrich-processor/inference-processor.md) and use the model you created above to infer against the data that is being ingested in the pipeline. :::::::{tab-set} diff --git a/solutions/search/serverless-elasticsearch-get-started.md b/solutions/search/serverless-elasticsearch-get-started.md index 67d593a38..9be9b1bec 100644 --- a/solutions/search/serverless-elasticsearch-get-started.md +++ b/solutions/search/serverless-elasticsearch-get-started.md @@ -108,7 +108,7 @@ If you’re already familiar with Elasticsearch, you can jump right into setting 2. Ingest your data. Elasticsearch provides several methods for ingesting data: * [{{es}} API](ingest-for-search.md) - * [Connector clients](elasticsearch://reference/ingestion-tools/search-connectors/index.md) + * [Connector clients](elasticsearch://reference/search-connectors/index.md) * [File Uploader](/manage-data/ingest/upload-data-files.md) * [{{beats}}](beats://reference/index.md) * [{{ls}}](logstash://reference/index.md) diff --git a/solutions/search/site-or-app/clients.md b/solutions/search/site-or-app/clients.md index d146863ad..f3ec9e23c 100644 --- a/solutions/search/site-or-app/clients.md +++ b/solutions/search/site-or-app/clients.md @@ -27,7 +27,7 @@ applies_to: In addition to official clients, the Elastic community has contributed libraries for other programming languages. -- [Community-contributed clients](elasticsearch://reference/community-contributed.md) +- [Community-contributed clients](elasticsearch://reference/community-contributed/index.md) ::::{tip} Learn how to [connect to your {{es}} endpoint](/solutions/search/search-connection-details.md). diff --git a/solutions/search/the-search-api.md b/solutions/search/the-search-api.md index 3f4c77f98..99e0b0189 100644 --- a/solutions/search/the-search-api.md +++ b/solutions/search/the-search-api.md @@ -119,7 +119,7 @@ Instead of indexing your data and then searching it, you can define [runtime fie For example, the following query defines a runtime field called `day_of_week`. The included script calculates the day of the week based on the value of the `@timestamp` field, and uses `emit` to return the calculated value. -The query also includes a [terms aggregation](elasticsearch://reference/data-analysis/aggregations/search-aggregations-bucket-terms-aggregation.md) that operates on `day_of_week`. +The query also includes a [terms aggregation](elasticsearch://reference/aggregations/search-aggregations-bucket-terms-aggregation.md) that operates on `day_of_week`. ```console GET /my-index-000001/_search diff --git a/solutions/search/vector/dense-versus-sparse-ingest-pipelines.md b/solutions/search/vector/dense-versus-sparse-ingest-pipelines.md index e6ca8a356..dfec1642f 100644 --- a/solutions/search/vector/dense-versus-sparse-ingest-pipelines.md +++ b/solutions/search/vector/dense-versus-sparse-ingest-pipelines.md @@ -113,7 +113,7 @@ PUT my-index ## Generate text embeddings [deployed-generate-embeddings] -Once you have created the mappings for the index, you can generate text embeddings from your input text. This can be done by using an [ingest pipeline](../../../manage-data/ingest/transform-enrich/ingest-pipelines.md) with an [inference processor](elasticsearch://reference/ingestion-tools/enrich-processor/inference-processor.md). The ingest pipeline processes the input data and indexes it into the destination index. At index time, the inference ingest processor uses the trained model to infer against the data ingested through the pipeline. After you created the ingest pipeline with the inference processor, you can ingest your data through it to generate the model output. +Once you have created the mappings for the index, you can generate text embeddings from your input text. This can be done by using an [ingest pipeline](../../../manage-data/ingest/transform-enrich/ingest-pipelines.md) with an [inference processor](elasticsearch://reference/enrich-processor/inference-processor.md). The ingest pipeline processes the input data and indexes it into the destination index. At index time, the inference ingest processor uses the trained model to infer against the data ingested through the pipeline. After you created the ingest pipeline with the inference processor, you can ingest your data through it to generate the model output. :::::::{tab-set} diff --git a/solutions/security/ai/ai-assistant-knowledge-base.md b/solutions/security/ai/ai-assistant-knowledge-base.md index 8fdfbb975..eb4b729b8 100644 --- a/solutions/security/ai/ai-assistant-knowledge-base.md +++ b/solutions/security/ai/ai-assistant-knowledge-base.md @@ -18,7 +18,7 @@ AI Assistant’s Knowledge Base feature enables AI Assistant to recall specific ::::{admonition} Requirements -* To use Knowledge Base, the `Elastic AI Assistant: All` privilege. +* To use Knowledge Base, the `Elastic AI Assistant: All` privilege. * To edit global Knowledge Base entries (information that will affect the AI Assistant experience for other users in the {{kib}} space), the `Allow Changes to Global Entries` privilege. * You must [enable machine learning](/solutions/security/advanced-entity-analytics/machine-learning-job-rule-requirements.md) with a minimum ML node size of 4 GB. @@ -146,7 +146,7 @@ Refer to the following video for an example of adding an index to Knowledge Base You can use an {{es}} connector or web crawler to create an index that contains data you want to add to Knowledge Base. -This section provides an example of adding a threat intelligence feed to Knowledge Base using a web crawler. For more information on adding data to {{es}} using a connector, refer to [Ingest data with Elastic connectors](elasticsearch://reference/ingestion-tools/search-connectors/index.md). For more information on web crawlers, refer to [Elastic web crawler](https://www.elastic.co/guide/en/enterprise-search/current/crawler.html). +This section provides an example of adding a threat intelligence feed to Knowledge Base using a web crawler. For more information on adding data to {{es}} using a connector, refer to [Ingest data with Elastic connectors](elasticsearch://reference/search-connectors/index.md). For more information on web crawlers, refer to [Elastic web crawler](https://www.elastic.co/guide/en/enterprise-search/current/crawler.html). #### Use a web crawler to add threat intelligence to Knowledge Base [_use_a_web_crawler_to_add_threat_intelligence_to_knowledge_base] diff --git a/troubleshoot/elasticsearch/troubleshooting-searches.md b/troubleshoot/elasticsearch/troubleshooting-searches.md index 83a4479c3..4fc6d1543 100644 --- a/troubleshoot/elasticsearch/troubleshooting-searches.md +++ b/troubleshoot/elasticsearch/troubleshooting-searches.md @@ -126,7 +126,7 @@ GET /my-index-000001/_count } ``` -If the field is aggregatable, you can use [aggregations](../../explore-analyze/query-filter/aggregations.md) to check the field’s values. For `keyword` fields, you can use a [terms aggregation](elasticsearch://reference/data-analysis/aggregations/search-aggregations-bucket-terms-aggregation.md) to retrieve the field’s most common values: +If the field is aggregatable, you can use [aggregations](../../explore-analyze/query-filter/aggregations.md) to check the field’s values. For `keyword` fields, you can use a [terms aggregation](elasticsearch://reference/aggregations/search-aggregations-bucket-terms-aggregation.md) to retrieve the field’s most common values: ```console GET /my-index-000001/_search?filter_path=aggregations @@ -143,7 +143,7 @@ GET /my-index-000001/_search?filter_path=aggregations } ``` -For numeric fields, you can use the [stats aggregation](elasticsearch://reference/data-analysis/aggregations/search-aggregations-metrics-stats-aggregation.md) to get an idea of the field’s value distribution: +For numeric fields, you can use the [stats aggregation](elasticsearch://reference/aggregations/search-aggregations-metrics-stats-aggregation.md) to get an idea of the field’s value distribution: ```console GET my-index-000001/_search?filter_path=aggregations diff --git a/troubleshoot/observability/troubleshoot-logs.md b/troubleshoot/observability/troubleshoot-logs.md index 0ff9770e8..883c7af66 100644 --- a/troubleshoot/observability/troubleshoot-logs.md +++ b/troubleshoot/observability/troubleshoot-logs.md @@ -243,7 +243,7 @@ Provided Grok patterns do not match field value... #### Solution [logs-mapping-troubleshooting-grok-solution] -Make sure your [grok](elasticsearch://reference/ingestion-tools/enrich-processor/grok-processor.md) or [dissect](elasticsearch://reference/ingestion-tools/enrich-processor/dissect-processor.md) processor pattern matches your log document format. +Make sure your [grok](elasticsearch://reference/enrich-processor/grok-processor.md) or [dissect](elasticsearch://reference/enrich-processor/dissect-processor.md) processor pattern matches your log document format. You can build and debug grok patterns in {{kib}} using the [Grok Debugger](../../explore-analyze/query-filter/tools/grok-debugger.md). Find the **Grok Debugger** by navigating to the **Developer tools** page using the navigation menu or the global search field.