-
Notifications
You must be signed in to change notification settings - Fork 9
Anomaly detection notebook #348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Here is a short notebook on anomaly detection using Pycaret and CrateDB. This was based on the blog post bt Christian https://cratedb.com/blog/introduction-to-time-series-modeling-with-cratedb-machine-learning-time-series-data
Comment out pip install to remove long output from cell
…te/cratedb-examples into timeseries-anomaly-detection
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Update README.md file with the description of the new notebook and link to Google Colab
- Add a paragraph at the beginning explaining what the Anomaly Detection is and key learning points in this notebook
- It would be good to execute step 3 with sqlalchemy: first create a table and then run
COPY FROM
statement - Show rendered image in step 7 with anomalies
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your work on this notebook!
I would suggest that we structure it the same way as EDA and Timeseries Decomposition, so they are almost the same. Nitty-picky here, but would make it easier for users to recognize the same pattern, imho.
The original blog post also visualises the known anomalies first and then compares with the anomaly detection, could this be included as well?
Not sure, but usually the huge amount of lines comes from the plots. Other notebooks (like topic/timeseries/exploratory_data_analysis.ipynb) set the renderer to PNG to avoid it, maybe it solves the large size of the notebook/many lines?
# Plotly plots will be rendered as PNG images, uncomment the following line to crate an interactive chart
plotly.io.renderers.default = 'png'
Thank you, great work!
Yes, that makes them smaller, at least in terms of "lines of code". Most often, they don't get much larger in terms of size, depending on individual image compression capacities I guess. We will need to inspect the outcome if it will get better after applying @ckurze's suggestion, thanks! Current size: 49183 loc · 1.08 MB |
Hey, thanks for everyone's comments, I'll start working on this again now. |
This is not done yet, so no need to review! I'm just sharing the latest changes after your comments. Thank you all again for the input.
I added the initial plot for the anomalies and also changed the text and structure of the notebook
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wonderful. Just providing a few more suggestions to use at your own disposal. Thanks again!
"with engine.connect() as conn:\n", | ||
" result = conn.execute(sa.text(query))\n", | ||
" columns = result.keys() # Extract column names\n", | ||
" df = pd.DataFrame(result.fetchall(), columns=columns)\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just spotted a little code smell here.
Is it possible to use the improved variant offered by modern SA, using .mappings()
already, to request and process the record as a map directly?
with engine.connect() as connection:
with connection.execute(sa.text(query)) as result:
pp(result.mappings().fetchall())
-- https://cratedb.com/docs/python/en/latest/#sqlalchemy
On the other hand, why not use pandas' native .read_sql()
, providing the most compact representation to load database table data into a data frame in Python?
with engine.connect() as connection:
df = pd.read_sql(sql=sa.text(query), con=connection)
typo correction Co-authored-by: Andreas Motl <[email protected]>
Apply suggestions by Moll Co-authored-by: Andreas Motl <[email protected]>
Changed variable names to match the chosen model Changed the connection approach as suggested by @amotl
…te/cratedb-examples into timeseries-anomaly-detection
Thanks again for your efforts, and for merging! 💯 |
Here is a short notebook on anomaly detection using Pycaret and CrateDB. This was based on this blog post by Christian
Summary of the changes / Why this is an improvement
Checklist