Skip to content

XML documentation for Time Series #3444

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Apr 21, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions docs/api-reference/io-time-series-change-point.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
### Input and Output Columns
There is only one input column and its type is <xref:System.Single>.
This estimator adds the following output columns:

| Output Column Name | Column Type | Description|
| -- | -- | -- |
| `Prediction` | 4-element vector of <xref:System.Double> | It sequentially contains alert level (non-zero value means a change point), score, p-value, and martingale value. |
7 changes: 7 additions & 0 deletions docs/api-reference/io-time-series-spike.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
### Input and Output Columns
There is only one input column and its type is <xref:System.Single>.
This estimator adds the following output columns:

| Output Column Name | Column Type | Description|
| -- | -- | -- |
| `Prediction` | 3-element vector of <xref:System.Double> | It sequentially contains alert level (non-zero value means a change point), score, and p-value. |
4 changes: 4 additions & 0 deletions docs/api-reference/time-series-iid.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
### Training Algorithm Details
This trainer assumes that data points collected in the time series are independently sampled from the same distribution (independent identically distributed).
Thus, the value at the current timestamp can be viewed as the value at the next timestamp in expectation.
If the observed value at timestamp $t-1$ is $p$, the predicted value at $t$ timestamp would be $p$ as well.
7 changes: 7 additions & 0 deletions docs/api-reference/time-series-props.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
### Estimator Characteristics
| | |
| -- | -- |
| Machine learning task | Anomaly detection |
| Is normalization required? | No |
| Is caching required? | No |
| Required NuGet in addition to Microsoft.ML | Microsoft.ML.TimeSeries |
26 changes: 26 additions & 0 deletions docs/api-reference/time-series-scorer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
### Anomaly Scorer
Once the raw score at a timestamp is computed, it is fed to the anomaly scorer component to calculate the final anomaly score at that timestamp.
There are two statistics involved in this scorer, p-value and martingale score.

#### Spike detection based on p-value
The p-value score indicates the p-value of the current computed raw score according to a distribution of raw scores.
Here, the distribution is estimated based on the most recent raw score values up to certain depth back in the history.
More specifically, this distribution is estimated using [kernel density estimation](https://en.wikipedia.org/wiki/Kernel_density_estimation)
with the Gaussian [kernels](https://en.wikipedia.org/wiki/Kernel_(statistics)#In_non-parametric_statistics) of adaptive bandwidth.
The p-value score is always in $[0, 1]$, and the lower its value, the more likely the current point is an outlier (also known as a spike).
If the p-value score exceeds $1 - \frac{\text{confidence}}{100}$, the associated timestamp may get a non-zero alert value in spike detection, which means a spike point is detected.
Note that $\text{confidence}$ is defined in the signatures of [DetectChangePointBySsa](xref:Microsoft.ML.TimeSeriesCatalog.DetectChangePointBySsa(Microsoft.ML.TransformsCatalog,System.String,System.String,System.Int32,System.Int32,System.Int32,System.Int32,Microsoft.ML.Transforms.TimeSeries.ErrorFunction,Microsoft.ML.Transforms.TimeSeries.MartingaleType,System.Double))
and [DetectIidChangePoint](xref:Microsoft.ML.TimeSeriesCatalog.DetectIidChangePoint(Microsoft.ML.TransformsCatalog,System.String,System.String,System.Int32,System.Int32,Microsoft.ML.Transforms.TimeSeries.MartingaleType,System.Double)).


#### Change point detection based on martingale score
The martingale score is an extra level of scoring that is built upon the p-value scores.
The idea is based on the [Exchangeability Martingales](https://arxiv.org/pdf/1204.3251.pdf) that detect a change of distribution over a stream of i.i.d. values.
In short, the value of the martingale score starts increasing significantly when a sequence of small p-values detected in a row; this indicates the change of the distribution of the underlying data generation process.
Thus, the martingale score is used for change point detection.
Given a sequence of most recently observed p-values, $p1, \dots, p_n$, the martingale score is computed as:? $s(p1, \dots, p_n) = \prod_{i=1}^n \beta(p_i)$.
There are two choices of $\beta$: $\beta(p) = e p^{\epsilon - 1}$ for $0 < \epsilon < 1$ or $\beta(p) = \int_{0}^1 \epsilon p^{\epsilon - 1} d\epsilon$.

If the martingle score exceeds $s(q_1, \dots, q_n)$ where $q_i=1 - \frac{\text{confidence}}{100}$, the associated timestamp may get a non-zero alert value for change point detection.
Note that $\text{confidence}$ is defined in the signatures of [DetectChangePointBySsa](xref:Microsoft.ML.TimeSeriesCatalog.DetectChangePointBySsa(Microsoft.ML.TransformsCatalog,System.String,System.String,System.Int32,System.Int32,System.Int32,System.Int32,Microsoft.ML.Transforms.TimeSeries.ErrorFunction,Microsoft.ML.Transforms.TimeSeries.MartingaleType,System.Double)) or
[DetectIidChangePoint](xref:Microsoft.ML.TimeSeriesCatalog.DetectIidChangePoint(Microsoft.ML.TransformsCatalog,System.String,System.String,System.Int32,System.Int32,Microsoft.ML.Transforms.TimeSeries.MartingaleType,System.Double)).
5 changes: 5 additions & 0 deletions docs/api-reference/time-series-ssa.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
### Training Algorithm Details
This class implements the general anomaly detection transform based on [Singular Spectrum Analysis (SSA)](https://en.wikipedia.org/wiki/Singular_spectrum_analysis).
SSA is a powerful framework for decomposing the time-series into trend, seasonality and noise components as well as forecasting the future values of the time-series.
In principle, SSA performs spectral analysis on the input time-series where each component in the spectrum corresponds to a trend, seasonal or noise component in the time-series.
For details of the Singular Spectrum Analysis (SSA), refer to [this document](http://arxiv.org/pdf/1206.6910.pdf).
38 changes: 21 additions & 17 deletions src/Microsoft.ML.TimeSeries/ExtensionsCatalog.cs
Original file line number Diff line number Diff line change
Expand Up @@ -10,14 +10,14 @@ namespace Microsoft.ML
public static class TimeSeriesCatalog
{
/// <summary>
/// Create a new instance of <see cref="IidChangePointEstimator"/> that detects a change of in an
/// <a href="https://en.wikipedia.org/wiki/Independent_and_identically_distributed_random_variables"> independent identically distributed (i.i.d.)</a> time series.
/// Detection is based on adaptive kernel density estimations and martingale scores.
/// Create <see cref="IidChangePointEstimator"/>, which predicts change points in an
/// <a href="https://en.wikipedia.org/wiki/Independent_and_identically_distributed_random_variables">independent identically distributed (i.i.d.)</a>
/// time series based on adaptive kernel density estimations and martingale scores.
/// </summary>
/// <param name="catalog">The transform's catalog.</param>
/// <param name="outputColumnName">Name of the column resulting from the transformation of <paramref name="inputColumnName"/>.
/// Column is a vector of type double and size 4. The vector contains Alert, Raw Score, P-Value and Martingale score as first four values.</param>
/// <param name="inputColumnName">Name of column to transform. If set to <see langword="null"/>, the value of the <paramref name="outputColumnName"/> will be used as source.</param>
/// The column data is a vector of <see cref="System.Double"/>. The vector contains 4 elements: alert (non-zero value means a change point), raw score, p-Value and martingale score.</param>
/// <param name="inputColumnName">Name of column to transform. The column data must be <see cref="System.Single"/>. If set to <see langword="null"/>, the value of the <paramref name="outputColumnName"/> will be used as source.</param>
/// <param name="confidence">The confidence for change point detection in the range [0, 100].</param>
/// <param name="changeHistoryLength">The length of the sliding window on p-values for computing the martingale score.</param>
/// <param name="martingale">The martingale used for scoring.</param>
Expand All @@ -34,13 +34,15 @@ public static IidChangePointEstimator DetectIidChangePoint(this TransformsCatalo
=> new IidChangePointEstimator(CatalogUtils.GetEnvironment(catalog), outputColumnName, confidence, changeHistoryLength, inputColumnName, martingale, eps);

/// <summary>
/// Create a new instance of <see cref="IidSpikeEstimator"/> that detects a spike in an
/// <a href="https://en.wikipedia.org/wiki/Independent_and_identically_distributed_random_variables">independent identically distributed (i.i.d.)</a> time series.
/// Detection is based on adaptive kernel density estimations and martingale scores.
/// Create <see cref="IidSpikeEstimator"/>, which predicts spikes in
/// <a href="https://en.wikipedia.org/wiki/Independent_and_identically_distributed_random_variables">independent identically distributed (i.i.d.)</a>
/// time series based on adaptive kernel density estimations and martingale scores.
/// </summary>
/// <param name="catalog">The transform's catalog.</param>
/// <param name="outputColumnName">Name of the column resulting from the transformation of <paramref name="inputColumnName"/></param>.
/// <param name="inputColumnName">Name of column to transform. If set to <see langword="null"/>, the value of the <paramref name="outputColumnName"/> will be used as source.</param>
/// <param name="outputColumnName">Name of the column resulting from the transformation of <paramref name="inputColumnName"/>.
/// The column data is a vector of <see cref="System.Double"/>. The vector contains 3 elements: alert (non-zero value means a spike), raw score, and p-value.</param>
/// <param name="inputColumnName">Name of column to transform. The column data must be <see cref="System.Single"/>.
/// If set to <see langword="null"/>, the value of the <paramref name="outputColumnName"/> will be used as source.</param>
/// <param name="confidence">The confidence for spike detection in the range [0, 100].</param>
/// <param name="pvalueHistoryLength">The size of the sliding window for computing the p-value.</param>
/// <param name="side">The argument that determines whether to detect positive or negative anomalies, or both.</param>
Expand All @@ -56,13 +58,14 @@ public static IidSpikeEstimator DetectIidSpike(this TransformsCatalog catalog, s
=> new IidSpikeEstimator(CatalogUtils.GetEnvironment(catalog), outputColumnName, confidence, pvalueHistoryLength, inputColumnName, side);

/// <summary>
/// Create a new instance of <see cref="SsaChangePointEstimator"/> for detecting a change in a time series signal
/// Create <see cref="SsaChangePointEstimator"/>, which predicts change points in time series
/// using <a href="https://en.wikipedia.org/wiki/Singular_spectrum_analysis">Singular Spectrum Analysis (SSA)</a>.
/// </summary>
/// <param name="catalog">The transform's catalog.</param>
/// <param name="outputColumnName">Name of the column resulting from the transformation of <paramref name="inputColumnName"/>.
/// Column is a vector of type double and size 4. The vector contains Alert, Raw Score, P-Value and Martingale score as first four values.</param>
/// <param name="inputColumnName">Name of column to transform. If set to <see langword="null"/>, the value of the <paramref name="outputColumnName"/> will be used as source.</param>
/// The column data is a vector of <see cref="System.Double"/>. The vector contains 4 elements: alert (non-zero value means a change point), raw score, p-Value and martingale score.</param>
/// <param name="inputColumnName">Name of column to transform. The column data must be <see cref="System.Single"/>.
/// If set to <see langword="null"/>, the value of the <paramref name="outputColumnName"/> will be used as source.</param>
/// <param name="confidence">The confidence for change point detection in the range [0, 100].</param>
/// <param name="trainingWindowSize">The number of points from the beginning of the sequence used for training.</param>
/// <param name="changeHistoryLength">The size of the sliding window for computing the p-value.</param>
Expand Down Expand Up @@ -94,17 +97,18 @@ public static SsaChangePointEstimator DetectChangePointBySsa(this TransformsCata
});

/// <summary>
/// Create a new instance of <see cref="SsaSpikeEstimator"/> for detecting a spike in a time series signal
/// Create <see cref="SsaSpikeEstimator"/>, which predicts spikes in time series
/// using <a href="https://en.wikipedia.org/wiki/Singular_spectrum_analysis">Singular Spectrum Analysis (SSA)</a>.
/// </summary>
/// <param name="catalog">The transform's catalog.</param>
/// <param name="outputColumnName">Name of the column resulting from the transformation of <paramref name="inputColumnName"/>.</param>
/// <param name="inputColumnName">Name of column to transform. If set to <see langword="null"/>, the value of the <paramref name="outputColumnName"/> will be used as source.
/// <param name="outputColumnName">Name of the column resulting from the transformation of <paramref name="inputColumnName"/>.
/// The column data is a vector of <see cref="System.Double"/>. The vector contains 3 elements: alert (non-zero value means a spike), raw score, and p-value.</param>
/// <param name="inputColumnName">Name of column to transform. The column data must be <see cref="System.Single"/>.
/// If set to <see langword="null"/>, the value of the <paramref name="outputColumnName"/> will be used as source.</param>
/// <param name="confidence">The confidence for spike detection in the range [0, 100].</param>
/// <param name="pvalueHistoryLength">The size of the sliding window for computing the p-value.</param>
/// <param name="trainingWindowSize">The number of points from the beginning of the sequence used for training.</param>
/// <param name="seasonalityWindowSize">An upper bound on the largest relevant seasonality in the input time-series.</param>
/// The vector contains Alert, Raw Score, P-Value as first three values.</param>
/// <param name="side">The argument that determines whether to detect positive or negative anomalies, or both.</param>
/// <param name="errorFunction">The function used to compute the error between the expected and the observed value.</param>
/// <example>
Expand Down
2 changes: 1 addition & 1 deletion src/Microsoft.ML.TimeSeries/IidAnomalyDetectionBase.cs
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ public class IidAnomalyDetectionBaseWrapper : IStatefulTransformer, ICanSaveMode
bool ITransformer.IsRowToRowMapper => ((ITransformer)InternalTransform).IsRowToRowMapper;

/// <summary>
/// Creates a clone of the transfomer. Used for taking the snapshot of the state.
/// Create a clone of the transformer. Used for taking the snapshot of the state.
/// </summary>
/// <returns></returns>
IStatefulTransformer IStatefulTransformer.Clone() => InternalTransform.Clone();
Expand Down
21 changes: 18 additions & 3 deletions src/Microsoft.ML.TimeSeries/IidChangePointDetector.cs
Original file line number Diff line number Diff line change
Expand Up @@ -191,10 +191,25 @@ private static IRowMapper Create(IHostEnvironment env, ModelLoadContext ctx, Dat
}

/// <summary>
/// The <see cref="IEstimator{ITransformer}"/> for detecting a signal change on an
/// <a href="https://en.wikipedia.org/wiki/Independent_and_identically_distributed_random_variables"> independent identically distributed (i.i.d.)</a> time series.
/// Detection is based on adaptive kernel density estimation and martingales.
/// The <see cref="IEstimator{TTransformer}"/> to detect a signal change on an
/// <a href="https://en.wikipedia.org/wiki/Independent_and_identically_distributed_random_variables"> independent identically distributed (i.i.d.)</a>
/// time series based on adaptive kernel density estimation and martingales.
/// </summary>
/// <remarks>
/// <format type="text/markdown"><![CDATA[
/// To create this estimator, use [DetectIidChangePoint](xref:Microsoft.ML.TimeSeriesCatalog.DetectIidChangePoint(Microsoft.ML.TransformsCatalog,System.String,System.String,System.Int32,System.Int32,Microsoft.ML.Transforms.TimeSeries.MartingaleType,System.Double)).
///
/// [!include[io](~/../docs/samples/docs/api-reference/io-time-series-change-point.md)]
///
/// [!include[io](~/../docs/samples/docs/api-reference/time-series-props.md)]
///
/// [!include[io](~/../docs/samples/docs/api-reference/time-series-iid.md)]
///
/// [!include[io](~/../docs/samples/docs/api-reference/time-series-scorer.md)]
/// ]]>
/// </format>
/// </remarks>
/// <seealso cref="Microsoft.ML.TimeSeriesCatalog.DetectIidChangePoint(Microsoft.ML.TransformsCatalog,System.String,System.String,System.Int32,System.Int32,Microsoft.ML.Transforms.TimeSeries.MartingaleType,System.Double)" />
public sealed class IidChangePointEstimator : TrivialEstimator<IidChangePointDetector>
{
/// <summary>
Expand Down
21 changes: 18 additions & 3 deletions src/Microsoft.ML.TimeSeries/IidSpikeDetector.cs
Original file line number Diff line number Diff line change
Expand Up @@ -171,10 +171,25 @@ private static IRowMapper Create(IHostEnvironment env, ModelLoadContext ctx, Dat
}

/// <summary>
/// The <see cref="IEstimator{ITransformer}"/> for detecting a signal spike on an
/// <a href="https://en.wikipedia.org/wiki/Independent_and_identically_distributed_random_variables"> independent identically distributed (i.i.d.)</a> time series.
/// Detection is based on adaptive kernel density estimation.
/// The <see cref="IEstimator{TTransformer}"/> to detect a signal spike on an
/// <a href="https://en.wikipedia.org/wiki/Independent_and_identically_distributed_random_variables"> independent identically distributed (i.i.d.)</a>
/// time series based on adaptive kernel density estimation.
/// </summary>
/// <remarks>
/// <format type="text/markdown"><![CDATA[
/// To create this estimator, use [DetectIidSpike](xref:Microsoft.ML.TimeSeriesCatalog.DetectIidSpike(Microsoft.ML.TransformsCatalog,System.String,System.String,System.Int32,System.Int32,Microsoft.ML.Transforms.TimeSeries.AnomalySide)).
///
/// [!include[io](~/../docs/samples/docs/api-reference/io-time-series-spike.md)]
///
/// [!include[io](~/../docs/samples/docs/api-reference/time-series-props.md)]
///
/// [!include[io](~/../docs/samples/docs/api-reference/time-series-iid.md)]
///
/// [!include[io](~/../docs/samples/docs/api-reference/time-series-scorer.md)]
/// ]]>
/// </format>
/// </remarks>
/// <seealso cref="Microsoft.ML.TimeSeriesCatalog.DetectIidSpike(Microsoft.ML.TransformsCatalog,System.String,System.String,System.Int32,System.Int32,Microsoft.ML.Transforms.TimeSeries.AnomalySide)" />
public sealed class IidSpikeEstimator : TrivialEstimator<IidSpikeDetector>
{
/// <summary>
Expand Down
4 changes: 2 additions & 2 deletions src/Microsoft.ML.TimeSeries/SsaAnomalyDetectionBase.cs
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@ public class SsaAnomalyDetectionBaseWrapper : IStatefulTransformer, ICanSaveMode
bool ITransformer.IsRowToRowMapper => ((ITransformer)InternalTransform).IsRowToRowMapper;

/// <summary>
/// Creates a clone of the transfomer. Used for taking the snapshot of the state.
/// Creates a clone of the transformer. Used for taking the snapshot of the state.
/// </summary>
/// <returns></returns>
IStatefulTransformer IStatefulTransformer.Clone() => InternalTransform.Clone();
Expand Down Expand Up @@ -340,7 +340,7 @@ private protected override void InitializeAnomalyDetector()

private protected override double ComputeRawAnomalyScore(ref Single input, FixedSizeQueue<Single> windowedBuffer, long iteration)
{
// Get the prediction for the next point opn the series
// Get the prediction for the next point in the series
Single expectedValue = 0;
_model.PredictNext(ref expectedValue);

Expand Down
Loading