Anomaly detection notebook #348

karynzv · 2024-03-05T12:18:55Z

Here is a short notebook on anomaly detection using Pycaret and CrateDB. This was based on this blog post by Christian

Summary of the changes / Why this is an improvement

Checklist

Link to issue this PR refers to (if applicable):
CLA is signed

Here is a short notebook on anomaly detection using Pycaret and CrateDB. This was based on the blog post bt Christian https://cratedb.com/blog/introduction-to-time-series-modeling-with-cratedb-machine-learning-time-series-data

Comment out pip install to remove long output from cell

…te/cratedb-examples into timeseries-anomaly-detection

marijaselakovic

Update README.md file with the description of the new notebook and link to Google Colab
Add a paragraph at the beginning explaining what the Anomaly Detection is and key learning points in this notebook
It would be good to execute step 3 with sqlalchemy: first create a table and then run COPY FROM statement
Show rendered image in step 7 with anomalies

topic/timeseries/timeseries-anomaly-detection.ipynb

ckurze

Thanks for your work on this notebook!

I would suggest that we structure it the same way as EDA and Timeseries Decomposition, so they are almost the same. Nitty-picky here, but would make it easier for users to recognize the same pattern, imho.

The original blog post also visualises the known anomalies first and then compares with the anomaly detection, could this be included as well?

Not sure, but usually the huge amount of lines comes from the plots. Other notebooks (like topic/timeseries/exploratory_data_analysis.ipynb) set the renderer to PNG to avoid it, maybe it solves the large size of the notebook/many lines?

# Plotly plots will be rendered as PNG images, uncomment the following line to crate an interactive chart
plotly.io.renderers.default = 'png'

Thank you, great work!

amotl · 2024-03-06T20:56:00Z

Not sure, but usually the huge amount of lines comes from the plots. Other notebooks (like topic/timeseries/exploratory_data_analysis.ipynb) set the renderer to PNG to avoid it, maybe it solves the large size of the notebook/many lines?

Yes, that makes them smaller, at least in terms of "lines of code". Most often, they don't get much larger in terms of size, depending on individual image compression capacities I guess. We will need to inspect the outcome if it will get better after applying @ckurze's suggestion, thanks!

Current size: 49183 loc · 1.08 MB

karynzv · 2024-03-12T15:35:41Z

Hey, thanks for everyone's comments, I'll start working on this again now.

This is not done yet, so no need to review! I'm just sharing the latest changes after your comments. Thank you all again for the input.

I added the initial plot for the anomalies and also changed the text and structure of the notebook

amotl · 2024-03-19T20:47:46Z

Previous size: 49183 loc · 1.08 MB

New size: 732 loc · 222 KB

Excellent, thanks! ¹

Can't repeat that too often: Please make sure to amend your commit, or squash before merging, otherwise the issue will not be remedied completely. ↩

amotl

Wonderful. Just providing a few more suggestions to use at your own disposal. Thanks again!

topic/timeseries/timeseries-anomaly-detection.ipynb

amotl · 2024-03-19T21:01:50Z

topic/timeseries/timeseries-anomaly-detection.ipynb

+    "with engine.connect() as conn:\n",
+    "    result = conn.execute(sa.text(query))\n",
+    "    columns = result.keys() # Extract column names\n",
+    "    df = pd.DataFrame(result.fetchall(), columns=columns)\n",


Just spotted a little code smell here.

Is it possible to use the improved variant offered by modern SA, using .mappings() already, to request and process the record as a map directly?

with engine.connect() as connection: with connection.execute(sa.text(query)) as result: pp(result.mappings().fetchall())

-- https://cratedb.com/docs/python/en/latest/#sqlalchemy

On the other hand, why not use pandas' native .read_sql(), providing the most compact representation to load database table data into a data frame in Python?

with engine.connect() as connection: df = pd.read_sql(sql=sa.text(query), con=connection)

-- https://cratedb.com/docs/python/en/latest/#sqlalchemy

topic/timeseries/timeseries-anomaly-detection.ipynb

typo correction Co-authored-by: Andreas Motl <[email protected]>

Apply suggestions by Moll Co-authored-by: Andreas Motl <[email protected]>

@amotl

Changed variable names to match the chosen model Changed the connection approach as suggested by @amotl

…te/cratedb-examples into timeseries-anomaly-detection

topic/timeseries/timeseries-anomaly-detection.ipynb

I've already address this

amotl · 2024-03-25T18:56:37Z

Thanks again for your efforts, and for merging! 💯

karynzv added 2 commits March 5, 2024 12:17

Anomaly detection notebook

f2cb812

Here is a short notebook on anomaly detection using Pycaret and CrateDB. This was based on the blog post bt Christian https://cratedb.com/blog/introduction-to-time-series-modeling-with-cratedb-machine-learning-time-series-data

Merge branch 'main' into timeseries-anomaly-detection

50dc94e

karynzv self-assigned this Mar 5, 2024

karynzv requested review from amotl, ckurze and marijaselakovic March 5, 2024 12:19

karynzv added 2 commits March 5, 2024 12:35

Comment pip install

8e6785f

Comment out pip install to remove long output from cell

Merge branch 'timeseries-anomaly-detection' of https://github.com/cra…

719ffd6

…te/cratedb-examples into timeseries-anomaly-detection

marijaselakovic suggested changes Mar 5, 2024

View reviewed changes

topic/timeseries/timeseries-anomaly-detection.ipynb Outdated Show resolved Hide resolved

amotl mentioned this pull request Mar 5, 2024

TSML Primer: Article series about "Machine Learning for Time Series Data" crate/cratedb-guide#54

Draft

amotl reviewed Mar 5, 2024

View reviewed changes

topic/timeseries/timeseries-anomaly-detection.ipynb Show resolved Hide resolved

ckurze previously requested changes Mar 6, 2024

View reviewed changes

karynzv added 3 commits March 14, 2024 12:37

Latest updates to the notebook

38f7bbe

This is not done yet, so no need to review! I'm just sharing the latest changes after your comments. Thank you all again for the input.

add initial plot and further details

3ecd9ce

I added the initial plot for the anomalies and also changed the text and structure of the notebook

Merge branch 'main' into timeseries-anomaly-detection

86740b5

karynzv requested review from amotl and marijaselakovic March 19, 2024 19:36

amotl approved these changes Mar 19, 2024

View reviewed changes

karynzv and others added 5 commits March 20, 2024 11:29

Update topic/timeseries/timeseries-anomaly-detection.ipynb

531ebf8

typo correction Co-authored-by: Andreas Motl <[email protected]>

Apply suggestions from code review

e801bf5

Apply suggestions by Moll Co-authored-by: Andreas Motl <[email protected]>

Update connection and variable names

c8e308d

Changed variable names to match the chosen model Changed the connection approach as suggested by @amotl

Merge branch 'timeseries-anomaly-detection' of https://github.com/cra…

a3c65ad

…te/cratedb-examples into timeseries-anomaly-detection

Merge branch 'main' into timeseries-anomaly-detection

b1a9b34

karynzv requested a review from ckurze March 20, 2024 17:15

marijaselakovic approved these changes Mar 21, 2024

View reviewed changes

amotl reviewed Mar 21, 2024

View reviewed changes

topic/timeseries/timeseries-anomaly-detection.ipynb Show resolved Hide resolved

karynzv merged commit d92d514 into main Mar 25, 2024

karynzv deleted the timeseries-anomaly-detection branch March 25, 2024 17:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Anomaly detection notebook #348

Anomaly detection notebook #348

Uh oh!

karynzv commented Mar 5, 2024

Uh oh!

marijaselakovic left a comment

Uh oh!

Uh oh!

Uh oh!

ckurze left a comment

Uh oh!

amotl commented Mar 6, 2024 •

edited

Loading

Uh oh!

karynzv commented Mar 12, 2024

Uh oh!

amotl commented Mar 19, 2024 •

edited

Loading

Uh oh!

amotl left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

amotl Mar 19, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

amotl commented Mar 25, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Anomaly detection notebook #348

Anomaly detection notebook #348

Uh oh!

Conversation

karynzv commented Mar 5, 2024

Summary of the changes / Why this is an improvement

Checklist

Uh oh!

marijaselakovic left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ckurze left a comment

Choose a reason for hiding this comment

Uh oh!

amotl commented Mar 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

karynzv commented Mar 12, 2024

Uh oh!

amotl commented Mar 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Footnotes

Uh oh!

amotl left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

amotl Mar 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

amotl commented Mar 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

amotl commented Mar 6, 2024 •

edited

Loading

amotl commented Mar 19, 2024 •

edited

Loading

amotl Mar 19, 2024 •

edited

Loading

amotl commented Mar 25, 2024 •

edited

Loading