Skip to content

Commit 5dc96d1

Browse files
authored
Merge pull request #1 from DataScienceSpecialization/master
Updating 20180908
2 parents 2716dec + aa0b465 commit 5dc96d1

File tree

16 files changed

+293
-10
lines changed

16 files changed

+293
-10
lines changed

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,4 @@
11
_site
22
.DS_Store
3+
.Rhistory
4+
.Rproj.user

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ Since the beginning of the Data Science Specialization we've noticed the unbelie
44

55
## Contributing
66

7-
If you've created a web page, video, sideshow, or any other kind of media you think should be shared through this directory you should:
7+
If you've created a web page, video, slideshow, or any other kind of media you think should be shared through this directory you should:
88

99
1. Fork this repository.
1010
2. Add a link to your content on the appropriate course page.

about.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,10 @@ The [Data Science Specialization](https://www.coursera.org/specialization/jhudat
1919
- [Kevin Markham](http://www.dataschool.io/)
2020
- Derek Franks
2121
- David Hood
22+
- [Leonard Greski](https://github.com/lgreski)
2223
- Michael Sachs
2324
- Allan Inocêncio de Souza Costa
24-
- [stepds](https://github.com/stepds)
25+
- [stepds](https://github.com/stepds)
26+
- Bastiaan Quast
27+
- [Xing Su](http://sux13.github.io/DataScienceSpCourseNotes/)
28+
- [Edmund julian Ofilada](https://github.com/DocOfi)

capstone.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
---
2+
title: "Capstone"
3+
permalink: /capstone/
4+
layout: page
5+
---
6+
## Reference Material
7+
8+
- [Speech and Language Processing, 3rd Edition](https://web.stanford.edu/~jurafsky/slp3/) Working version of Jurafsky, et. al. book on natural language processing whose content on n-grams is helpful for the capstone.
9+
10+
## Course Project
11+
12+
- [n-gram Computations and Computer Capacity](http://bit.ly/2couvxh) Explains the amount of memory required to convert the text files for the course project into n-grams, using the <strong>quanteda</strong> package.
13+
- [Capstone Strategy](http://bit.ly/2rGcgc6) Describes a general strategy to get through the Capstone: use the simplest approaches possible.
14+
- [Choosing a Text Analysis Package](http://bit.ly/2qagsPa) Reviews pros and cons of various R packages used for natural language processing, in the context of requirements for the Capstone project.

curated.md

Lines changed: 80 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,85 @@
11
---
22
layout: page
3-
title: Curated Knowledge
3+
title: Curated Pages
44
permalink: /curated/
55
---
66

7+
### Analytics
8+
9+
- [Huge Trello Board Collection of Data Science Resources](https://trello.com/b/rbpEfMld/data-science)
10+
- [Diving Into Data Science Flipboard](https://flipboard.com/@thiakx/diving-into-data-science-5823ectuy)
11+
- [OLAP Operation in R](http://architects.dzone.com/articles/olap-operation-r)
12+
- [Journal of Statistical Software: Tidy data](http://www.jstatsoft.org/v59/i10/paper)
13+
- [Verzani: simpleR – Using R for Introductory Statistics](http://cran.r-project.org/doc/contrib/Verzani-SimpleR.pdf)
14+
- [Data Visualization packages](http://www.datavis.ca/R/)
15+
- [Visualization hints: plotting numeric data by groups](http://www.r-bloggers.com/visualization-series-insight-from-cleveland-and-tufte-on-plotting-numeric-data-by-groups/)
16+
- [Matrix rotation for image and contour plots in R](http://blog.snap.uaf.edu/2012/06/08/matrix-rotation-for-image-and-contour-plots-in-r/)
17+
- [Fig Data: 11 Tips on How to Handle Big Data in R (and 1 Bad Pun)](http://theodi.org/blog/fig-data-11-tips-how-handle-big-data-r-and-1-bad-pun)
18+
- [Data from 538](https://github.com/fivethirtyeight/data)
19+
20+
### Command Line
21+
22+
- [explainshell.com - match command-line arguments to their help text](http://explainshell.com/)
23+
- [The Command Line Crash Course - Quick course in using the command line](http://cli.learncodethehardway.org/book/)
24+
- [Mastering the command line, in one page](https://github.com/jlevy/the-art-of-command-line/blob/master/README.md)
25+
26+
### R
27+
28+
- [Try R](http://tryr.codeschool.com/)
29+
- [The R Book by Michael J. Crawley](https://archive.org/details/TheRBook/)
30+
- [Univ. of Calif. Riverside R Programming](http://manuals.bioinformatics.ucr.edu/home/programming-in-r#TOC-R-Basics)
31+
- [G. Sanchez - Strings in R](http://gastonsanchez.com/Handling_and_Processing_Strings_in_R.pdf)
32+
- [The Lubridate Package](http://www.jstatsoft.org/v40/i03/paper)
33+
- [Google Developers R Programming Video Lectures](http://www.r-bloggers.com/google-developers-r-programming-video-lectures/)
34+
- [awesome R](https://github.com/qinwf/awesome-R) - A curated list of awesome R frameworks, packages and software.
35+
- [awesome machine learning](https://github.com/josephmisiti/awesome-machine-learning#r) - A curated list of awesome Machine Learning frameworks, libraries and software.
36+
- [Google's R Style Guide](https://google-styleguide.googlecode.com/svn/trunk/Rguide.xml)
37+
- [Tufte-style HTML in rmarkdown](http://sachsmc.github.io/tufterhandout/)
38+
- [Creating an R Package](http://hilaryparker.com/2014/04/29/writing-an-r-package-from-scratch/)
39+
- [R Packages (Hadley online book)](http://r-pkgs.had.co.nz/) - How to write your own R packages.
40+
- [Beautiful ggplot2 Cheatsheet](http://zevross.com/blog/2014/08/04/beautiful-plotting-in-r-a-ggplot2-cheatsheet-3/)
41+
- [Intro to Graphics](http://bcb.dfci.harvard.edu/~aedin/courses/Bioconductor/2.Plotting.pdf)
42+
- [data.table cheat sheet](https://s3.amazonaws.com/assets.datacamp.com/img/blog/data+table+cheat+sheet.pdf)
43+
- [Exploratory Data Analysis with data.table](http://varianceexplained.org/RData/lessons/lesson4/)
44+
- [Fast summary statistics in R with data.table](http://blog.yhathq.com/posts/fast-summary-statistics-with-data-dot-table.html)
45+
- [R online in r-fiddle.org](http://www.r-fiddle.org/)
46+
47+
### Probability and Statistics
48+
49+
- [Probability and Statistics Cookbook](http://matthias.vallentin.net/probability-and-statistics-cookbook/)
50+
51+
### GitHub
52+
53+
- [Official Git Tutorial](http://git-scm.com/docs/gittutorial)
54+
- [Git - Simple Guide](http://rogerdudler.github.io/git-guide/)
55+
- [Git Immersion - A guided tour through the fundamentals of Git](http://gitimmersion.com/)
56+
- [GitHub - Dealing with Multiple Accounts](http://hmkcode.com/git-tutorial/how-to-deal-with-multiple-github-accounts-on-one-computer/)
57+
- [Try Git](https://try.github.io/levels/1/challenges/1)
58+
- [Learn Git Branching: Interactive Game](http://pcottle.github.com/learnGitBranching/)
59+
- [Atlassian Git Tutorials - Branches](https://www.atlassian.com/git/tutorials/using-branches/)
60+
61+
### Reproducible Research
62+
- [Markdown live demo](http://markdown-here.com/livedemo.html)
63+
- [Boosting Slides by Ron Meir](https://github.com/Aratinga/Misc/blob/master/BoostingTutorial.pdf)
64+
- [Reproducible Research website](http://reproducibleresearch.net/)
65+
66+
### Machine Learning
67+
- [UC Irvine Machine Learning Data Repository](http://archive.ics.uci.edu/ml/)
68+
69+
### Textbooks
70+
- [OpenIntro textbook](https://www.openintro.org/stat/textbook.php)
71+
- [Statlect - The digital textbook on probability and statistics](http://www.statlect.com/)
72+
- [An Introduction to Statistical Learning with Applications in R](http://www-bcf.usc.edu/~gareth/ISL/) [[PDF, 4th printing]](http://www-bcf.usc.edu/~gareth/ISL/ISLR%20Fourth%20Printing.pdf)
73+
- [The Elements of Statistical Learning: Data Mining, Inference, and Prediction](http://statweb.stanford.edu/~tibs/ElemStatLearn/) [[PDF, 10th ed]](http://statweb.stanford.edu/~tibs/ElemStatLearn/printings/ESLII_print10.pdf)
74+
75+
### Further Reading
76+
77+
- [Data Elixir - Free weekly newsletter of the best data-related resources and inspirations from around the web.](http://dataelixir.com/?referred=true)
78+
- [Linkedin - Top 10 Big Data and Analytics References](https://www.linkedin.com/pulse/article/20140810194033-111366377-top-10-big-data-and-analytics-references)
79+
- [Linkedin - Let's Get Nerdy: Data Analytics for Business Leaders Explained](https://www.linkedin.com/pulse/article/20140918162814-111366377-let-s-get-nerdy-data-analytics-for-business-leaders-explained)
80+
- [Data Science Central : a great repository of news and resources for data science practitioners.](http://www.datasciencecentral.com)
81+
- [Data Science Ontology - A visualized overview of Data Science concepts and tools](http://datascienceontology.com/)
82+
83+
### Data Science Groups, Meetups, and Networking
84+
85+
- [LinkedIn Data Science Specialisation Group](https://www.linkedin.com/groups/Coursera-Specialization-Data-Science-7495000?home=&gid=7495000&trk=anet_ug_hm&goback=%2Egmp_7495000)

ddp.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,3 +5,21 @@ permalink: /ddp/
55
---
66

77
- [Slidify to Github walkthrough](http://rpubs.com/thoughtfulbloke/25103)
8+
- [ggvis and rmarkdown slides with interactive plots](http://qua.st/ggvis-shiny-html5-slides)
9+
10+
## Shiny
11+
- Choropleth of PBS WARN Distribution of Wireless Emergency Alerts
12+
- [Code for Shiny App](https://github.com/amsilvr/shiny_choropleth)
13+
- [App running on shinyapps.ip](https://silverman.shinyapps.io/warn_wea/)
14+
- [Shiny app to simulate 401K growth with interactive plots](http://www.mephistosoftware.com/shiny/401k_simulator/)
15+
- [Shiny Video Tutorials Playlist on Youtube](http://www.youtube.com/playlist?list=PL6wLL_RojB5xNOhe2OTSd-DPkMLVY9DfB)
16+
- [Tutorial on writing Shiny simulation apps](https://github.com/homerhanumat/shinyTutorials)
17+
- [Dockerize a Shiny App](http://www.rmining.net/2015/04/30/dockerizing-a-shiny-app/)
18+
- [Git pushing Shiny Apps with Docker/Dokku](http://www.rmining.net/2015/05/11/git-pushing-shiny-apps-with-docker-dokku/)
19+
- [Share your Shiny Apps with Docker and Kitematic](http://www.rmining.net/2015/08/10/share-your-shiny-apps-with-docker-and-kitematic/)
20+
- [Shinyapps.io: Configuring Application Timeout](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/dataProd-shinyTimeoutConfig.md)
21+
- [Plotting Natural Disasters](http://www.rpubs.com/DocOfi/367052)
22+
23+
## Comprehensive Notes
24+
25+
- Complete notes for [Developing Data Products](http://sux13.github.io/DataScienceSpCourseNotes/)

eda.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,3 +4,13 @@ title: Exploratory Data Analysis
44
permalink: /eda/
55
---
66

7+
- [Creating a Kite Graph](http://rpubs.com/thoughtfulbloke/kitegraph)
8+
- [Analyzing Top/Green500 Supercomputer Technology Trends](http://github.com/ww44ss/Exascalar-Analysis-)
9+
- [Emissions Choropleth Maps](https://github.com/BillSeliger/ExData_Plotting2)
10+
- [Data Analysis using Twitter API and Python](http://blog.impiyush.com/2015/03/data-analysis-using-twitter-api-and.html)
11+
- [Exploratory Data Analysis using Flexdashboard](http://rpubs.com/DocOfi/350830)
12+
- [Plotting using Metricsgraphics](http://www.rpubs.com/DocOfi/352947)
13+
14+
## Comprehensive Notes
15+
16+
- Complete notes for [Exploratory Data Analysis](http://sux13.github.io/DataScienceSpCourseNotes/)

getclean.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,3 +6,22 @@ permalink: /getclean/
66

77
- [Subsetting example walkthrough](http://rpubs.com/thoughtfulbloke/subset)
88
- [Apples to Oranges Data Organisation Challenge](https://github.com/thoughtfulbloke/faoexample)
9+
- [dplyr introductory tutorial](https://www.youtube.com/watch?v=jWjqLW-u3hc) and [R Markdown document](http://rpubs.com/justmarkham/dplyr-tutorial): A 39-minute video tutorial that covers the five basic dplyr "verbs" and a dozen other dplyr functions. dplyr is an [update](http://blog.rstudio.org/2014/01/17/introducing-dplyr/) to the plyr package, useful for subsetting, sorting, summarizing, and merging data using a more intuitive syntax than plyr or base R.
10+
- [dplyr "going deeper" tutorial](https://www.youtube.com/watch?v=2mh1PqfsXVI) and [R Markdown document](http://rpubs.com/justmarkham/dplyr-tutorial-part-2): A 37-minute video tutorial that covers the new functionality in dplyr versions 0.3 and 0.4.
11+
- [Downloading files general advice](http://rpubs.com/thoughtfulbloke/downloadtips)
12+
- [Codebook sample](https://gist.github.com/kirstenfrank/218c36a1938055d0f4e4)
13+
- [Second Codebook sample](https://gist.github.com/kirstenfrank/699abe3e16fd1dc36e5d)
14+
- [Query string (and other fields-within-fields) unrolling](http://rpubs.com/schnee/32988)
15+
- [Pre-processing Excel files before loading them into R](https://github.com/alkashef/cleaningexceldata)
16+
- [Codebook template that can be used in the Getting and Cleaning Data project](https://gist.github.com/JorisSchut/dbc1fc0402f28cad9b41)
17+
- ["Real world" example - reading American Community Survey 2000 PUMS Data:](https://github.com/lgreski/acsexample) Demonstrates how to extract records of a given type from a data file containing multiple record types, and how to use an Excel-based code book to specify arguments for reading a fixed-width file.
18+
- [18 Months of CTA advice](https://thoughtfulbloke.wordpress.com/2015/08/31/hello-world)
19+
- [Common Problems: Quiz 1 - Missing Java Runtime](http://bit.ly/2jjtyXM) Explains how to solve the problem of a missing Java Runtime for the question that requires students to process a Microsoft Excel spreadsheet.
20+
- [Strategy for Reading Files & APIs / Quiz 2](http://bit.ly/2e4L5oF)
21+
- [Common Problems: Quiz 2 - sqldf() driver fails to connect](http://bit.ly/2kD2KTY)
22+
- [Tutorial: Downloading Files](http://bit.ly/2iP2suj) Illustrates various ways of downloading files, including binary and text files.
23+
- [Creating dataframes from xml data](https://www.dropbox.com/s/7bbzzp4bwsmfl5y/CreatingDataframesfrom%20XmlFiles.odt?dl=0)
24+
25+
## Comprehensive Notes
26+
27+
- Complete notes for [Getting and Cleaning Data](http://sux13.github.io/DataScienceSpCourseNotes/)

index.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ layout: page
44

55
## Table of Contents
66

7-
This is site is meant to serve as a directory for the amazing content the
7+
This site is meant to serve as a directory for the amazing content the
88
community has created around the Data Science Specialization. If you are
99
interested in contributing [click here](https://github.com/DataScienceSpecialization/DataScienceSpecialization.github.io#contributing).
1010

@@ -17,6 +17,7 @@ interested in contributing [click here](https://github.com/DataScienceSpecializa
1717
7. [Regression Models](/regmod/)
1818
8. [Practical Machine Learning](/pml/)
1919
9. [Developing Data Products](/ddp/)
20+
10. [Capstone](/capstone/)
2021

2122
- [Other Resources](/other/)
22-
- [Curated Knowledge](/curated/)
23+
- [Curated Pages](/curated/)

other.md

Lines changed: 24 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,30 @@ title: Other Resources
44
permalink: /other/
55
---
66

7-
## Troubleshooting
7+
## Configuring R and RStudio (Linux)
88

99
- [Installing xlsx and XML packages on Debian Wheezy](http://allanino.me/blog/programming/installing-some-r-packages/)
10+
- [Rscript to customize R environment](http://bit.ly/r-customize-script) - Installs packages used in the specialization.
11+
- [Installing Some Basic R Packages in Ubuntu; Ibrahim El Merehbi](http://elmerehbi.wordpress.com/2014/09/09/installing-some-basic-r-packages-in-ubuntu)
12+
- [Using Projects in RStudio](https://support.rstudio.com/hc/en-us/articles/200526207-Using-Projects)
13+
- [Using Version Control with RStudio](https://support.rstudio.com/hc/en-us/articles/200532077-Version-Control-with-Git-and-SVN)
14+
- [Using R behind HTTP/HTTPS Proxy](https://support.rstudio.com/hc/en-us/articles/200488488-Configuring-R-to-Use-an-HTTP-or-HTTPS-Proxy)
15+
16+
### Ignoring R & RStudio files
17+
- [gitignore template for R](https://github.com/github/gitignore/blob/master/R.gitignore) (source:[gitignore](https://github.com/github/gitignore))
18+
- [Github Help - Using Git / Ignoring files](https://help.github.com/articles/ignoring-files/)
19+
20+
## Troubleshooting
1021
- [Windows batch file to work around RStudio startup issues](https://github.com/stepds/contrib-DataScienceSpecialization/blob/master/README.md)
22+
23+
## Pre-built virtual machines for R development.
24+
- [Here's a pre-built lightweight Linux machine with R and RStudio already installed](https://github.com/queirozfcom/r-box). You just need to install [vagrant](https://www.vagrantup.com/downloads.html), download (or clone) the github repository and you'll get a clean ubuntu machine with the tools you'll need for the assignments.
25+
26+
- [Data Science Toolbox](http://datasciencetoolbox.org/) - A virtual environment that allows you to start doing data science in a matter of minutes.
27+
28+
- [Virtual machine with RStudio server and github setup](https://github.com/tboloo/vagrant-rstudio) - A VirtualBox, Vagrant & chef-solo managed virtual machine which provides RStudio server with git & github setup
29+
30+
## Deploying and sharing Shiny Apps with Docker
31+
- [Dockerize a Shiny App](http://www.rmining.net/2015/04/30/dockerizing-a-shiny-app/)
32+
- [Git pushing Shiny Apps with Docker/Dokku](http://www.rmining.net/2015/05/11/git-pushing-shiny-apps-with-docker-dokku/)
33+
- [Share your Shiny Apps with Docker and Kitematic](http://www.rmining.net/2015/08/10/share-your-shiny-apps-with-docker-and-kitematic/)

0 commit comments

Comments
 (0)