Google ngram downloader

The Google Books Ngram Viewer dataset is a freely available resource under a Creative Commons Attribution 3.0 Unported License which provides ngram counts over books scanned by Google.

The data is so big, that storing it is almost impossible. However, sometimes you need an aggregate data over the dataset. For example to build a co-occurrence matrix.

This package provides an iterator over the dataset stored at Google. It decompresses the data on the fly and provides you the access to the underlying data.

Example use

>>> from google_ngram_downloader import readline_google_store
>>>
>>> fname, url, records = next(readline_google_store(ngram_len=5))
>>> fname
'googlebooks-eng-all-5gram-20120701-0.gz'
>>> url
'http://storage.googleapis.com/books/ngrams/books/googlebooks-eng-all-5gram-20120701-0.gz'
>>> next(records)
Record(ngram=u'0 " A most useful', year=1860, match_count=1, volume_count=1)

Installation

pip install google-ngram-downloader

The command line tool

It also provides a simple command line tool to download the ngrams called google-ngram-downloader.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
google_ngram_downloader		google_ngram_downloader
test		test
.gitignore		.gitignore
.travis.yml		.travis.yml
CHANGES.rst		CHANGES.rst
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.rst		README.rst
requirements-testing.txt		requirements-testing.txt
setup.py		setup.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Google ngram downloader

Example use

Installation

The command line tool

About

Uh oh!

Releases

Packages

License

barry/google-ngram-downloader

Folders and files

Latest commit

History

Repository files navigation

Google ngram downloader

Example use

Installation

The command line tool

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages