Skip to content

helldrum/pastebin_dynamic_scraping

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Introduction: This projet as been initiate in order to scrap pastebin with the google research feature inplemented in the site. The pastebin API don't allow research and flexible wait to scrape all the website data. I choose to use selenium in order to simulate user trafic on google and a large temporisation in order to not get blocked by google capcha that's why it's running low but you have the possibility to change the default temporisation values.

requirement:

you need to use a virtualenv called venv in order to run the script (shebang harcoded on the top of the script) you may need to install pip and virtualenv

virtualenv venv

python selenium

pip install selenium

geckodriver

get https://github.com/mozilla/geckodriver/releases/download/v0.13.0/geckodriver-v0.13.0-linux64.tar.gz          
tar -xvzf geckodriver-v0.13.0-linux64.tar.gz
chmod +x geckodriver
put geckodriver in your PATH variable or cp into /usr/bin

BeautifulSoup

pip install beautifulsoup4

usage:

Usage: selenium_scrap.py [options]

Options:
  -h, --help            show this help message and exit
  -t SEARCH_KEYWORD, --search_term=SEARCH_KEYWORD
                        search keyword mandatory parameter
  -f OUTPUT_FILE, --file=OUTPUT_FILE
                        output file, if this arg is not provide, result will
                        be print in stdout
  -g GOOGLE_TEMPO, --google_tempo=GOOGLE_TEMPO
                        google tempo during scrap, default value 30
  -p PASTEBIN_TEMPO, --pastebin_tempo=PASTEBIN_TEMPO
                        pastebin tempo during scrap, default value 10
  -s STARTER, --page-start=STARTER
                        google scrap starter page, default value 0 (first
                        page)

parameter --search_term is required

About

scrap pastebin dynamically with selenium

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages