Skip to content

git2rdata, a companion package for git2r #263

@ThierryO

Description

@ThierryO

Summary

  • What does this package do? (explain in 50 words or less):
    Store dataframes as plain text files along with metadata to ensure that attributes like factor levels are maintained. The dataframes are optimized in order to minimize both file size and diffs, making it useful in combination with version control.

  • Paste the full DESCRIPTION file inside a code block below:

Package: git2rdata
Title: Store and Retrieve Data.frames in a Git Repository
Version: 0.0.1.9000
Date: 2018-11-12
Authors@R: c(
  person(
    "Thierry", "Onkelinx", role = c("aut", "cre"), 
    email = "[email protected]", 
    comment = c(ORCID = "0000-0001-8804-4216")),
  person(
    "Research Institute for Nature and Forest",
    role = c("cph", "fnd"), email = "[email protected]"))
Description: Make versioning of data.frame easy and efficient using git repositories.
Depends: R (>= 3.4.0)
Imports:
  assertthat,
  git2r (>= 0.23.0),
  methods,
  readr
Suggests:
  knitr,
  rmarkdown,
  testthat
License: GPL-3
Encoding: UTF-8
LazyData: true
Roxygen: list(markdown = TRUE)
RoxygenNote: 6.1.1
URL: https://github.com/inbo/git2rdata
BugReports: https://github.com/inbo/git2rdata/issues
Collate:
    'write_vc.R'
    'auto_commit.R'
    'clean_data_path.R'
    'git2rdata-package.R'
    'meta.R'
    'read_vc.R'
    'recent_commit.R'
    'reexport.R'
    'rm_data.R'
VignetteBuilder: knitr
  • URL for the package (the development repository, not a stylized html page):

https://github.com/inbo/git2rdata

  • Please indicate which category or categories from our package fit policies this package falls under *and why(? (e.g., data retrieval, reproducibility. If you are unsure, we suggest you make a pre-submission inquiry.):
    [e.g., "data extraction, because the package parses a scientific data file format"]

reproducibility, because it help to store dataframes as plain text files without loss of information, while minimize file sizes and diff.

  •   Who is the target audience and what are scientific applications of this package?  

Anyone who wants to work with medium sized dataframes and have them under version control. This is useful in case of recurrent analysis on growing datasets. E.g. each year new data is added to the data and a new report is created. A snapshot of the raw data can be stored as plain text files under version control.

The initial idea was to add this functionality into git2r. After a discussion with the maintainer, we decide to create a separate package. We use functions from read_r and improve them by storing the metadata.

  •   If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.

Requirements

Confirm each of the following by checking the box. This package:

  • does not violate the Terms of Service of any service it interacts with.
  • has a CRAN and OSI accepted license.
  • contains a README with instructions for installing the development version.
  • includes documentation with examples for all functions. examples are available in the vignette
  • contains a vignette with examples of its essential functions and uses.
  • has a test suite.
  • has continuous integration, including reporting of test coverage, using services such as Travis CI, Coveralls and/or CodeCov.
  • I agree to abide by ROpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.

Publication options

  • Do you intend for this package to go on CRAN?
  • Do you wish to automatically submit to the Journal of Open Source Software? If so:
    • The package has an obvious research application according to JOSS's definition.
    • The package contains a paper.md matching JOSS's requirements with a high-level description in the package root or in inst/.
    • The package is deposited in a long-term repository with the DOI:
    • (Do not submit your package separately to JOSS)
  • Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:
    • The package is novel and will be of interest to the broad readership of the journal.
    • The manuscript describing the package is no longer than 3000 words.
    • You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see MEE's Policy on Publishing Code)
    • (Scope: Do consider MEE's Aims and Scope for your manuscript. We make no guarantee that your manuscript will be within MEE scope.)
    • (Although not required, we strongly recommend having a full manuscript prepared when you submit here.)
    • (Please do not submit your package separately to Methods in Ecology and Evolution)

Detail

  • Does R CMD check (or devtools::check()) succeed? Paste and describe any errors or warnings:

  • Does the package conform to rOpenSci packaging guidelines? Please describe any exceptions:

We use read_vc() rather than vc_read() because it matches read.table(), read_csv(), ...

  • If this is a resubmission following rejection, please explain the change in circumstances:

  • If possible, please provide recommendations of reviewers - those with experience with similar packages and/or likely users of your package - and their GitHub user names:

@jennybc @stewid

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions