Skip to content

Conversation

jjn13
Copy link
Collaborator

@jjn13 jjn13 commented Sep 17, 2018

Created raw-data directory and with code for creating presidential_speech dataset.

@michaelweylandt
Copy link
Member

We don't need the pyc files and I don't think we need the html scrape files. (I don't think this needs to be fully bit-for-bit reproducible: just illustrative). Thoughts?

@jjn13
Copy link
Collaborator Author

jjn13 commented Sep 17, 2018

That sounds fine. I'll drop the pyc and raw speechs, and resubmit.

@michaelweylandt
Copy link
Member

We also don't need the rds file (that's not the "raw" data)

wrd.var <- apply(dtm.mat.log,2,var)
top.wrd.var <- names(sort(wrd.var,decreasing = TRUE)[1:75])
dtm.mat.log <- dtm.mat.log[,colnames(dtm.mat.log) %in% top.wrd.var]
saveRDS(dtm.mat.log,"presidential_speech.rds") No newline at end of file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing line ending

library(SnowballC)
library(parallel)
library(Matrix)
library(tidyverse)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are all of these dependencies used? I only see tm stringr and tidyverse below.

@@ -0,0 +1,11 @@
#!/bin/bash
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file appears to be identical to move_inaug_results.sh. Is that intentional?

class InaugTextSpider(scrapy.Spider):
name = "inaug_text"
allowed_domains = ["http://www.presidency.ucsb.edu/"]
start_urls = (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a note explaining what these URLs are and how to keep this list up to date (if possible)

name = "sou_text"
allowed_domains = ["http://www.presidency.ucsb.edu"]
start_urls = (
'http://www.presidency.ucsb.edu/ws/index.php?pid=123408',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explanatory note needed.

@michaelweylandt
Copy link
Member

@jjn13 Any updates on this PR?

@michaelweylandt michaelweylandt added the Documentation Documentation-related issues label Oct 11, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Documentation Documentation-related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants