-
Notifications
You must be signed in to change notification settings - Fork 35
VARGRAM submission #243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Editor in Chief checksHi there! Thank you for submitting your package for pyOpenSci Please check our Python packaging guide for more information on the elements
Editor commentsHi, thanks for submittiing the package. Regarding the examples and quick start, please include a quick command, e.g. The following addresses this first step discussed under Example. Sample implementation for `setup_test_data()`import requests
import zipfile
import os
import tempfile
import shutil
from pathlib import Path
from typing import Optional, Union, Tuple
def setup_test_data(
repo: str = "pgcbioinfo/vargram",
source_path: str = "tests/test_data",
target_dir: Optional[Union[str, Path]] = None
) -> Path:
"""
Download the latest release of a GitHub repository and extract a specific directory's
contents to a target directory.
This function is useful for setting up test data for a package. It fetches the latest
release from GitHub, extracts the specified directory's contents, and places them
directly in the target directory.
Parameters
----------
repo : str, default "pgcbioinfo/vargram"
The GitHub repository in format "owner/repo".
source_path : str, default "tests/test_data"
Path within the repository to extract. This directory's contents will be
placed in the target directory.
target_dir : str or Path, optional
Directory where the contents should be extracted. If None, the current
working directory is used.
Returns
-------
Path
A Path object pointing to the target directory where files were extracted.
Raises
------
requests.HTTPError
If the API request to GitHub fails.
FileNotFoundError
If the specified source_path doesn't exist in the repository.
ValueError
If the repository format is invalid.
Examples
--------
>>> # Extract to current directory
>>> data_path = setup_test_data()
>>>
>>> # Extract to a specific directory
>>> data_path = setup_test_data(target_dir="./my_test_data")
>>>
>>> # Extract a different path from a different repo
>>> data_path = setup_test_data(
... repo="username/repo",
... source_path="data/examples",
... target_dir="./examples"
... )
"""
# Validate repo format
if not repo or "/" not in repo:
raise ValueError(f"Invalid repository format: {repo}. Expected format: 'owner/repo'")
# Convert target_dir to Path if specified, otherwise use current directory
if target_dir is None:
target_dir = Path.cwd()
else:
target_dir = Path(target_dir)
target_dir.mkdir(parents=True, exist_ok=True)
# Step 1: Get the latest release information
print(f"Getting latest release for {repo}...")
response = requests.get(f"https://github.com/api/repos/{repo}/releases/latest")
response.raise_for_status() # Raise an exception for HTTP errors
release_data = response.json()
zipball_url = release_data["zipball_url"]
release_tag = release_data["tag_name"]
print(f"Found release: {release_tag}")
# Step 2: Download the release zip file
print(f"Downloading release from {zipball_url}...")
zip_response = requests.get(zipball_url, stream=True)
zip_response.raise_for_status()
# Create temporary files
with tempfile.NamedTemporaryFile(delete=False, suffix='.zip') as temp_zip:
# Write the downloaded zip to the temporary file
for chunk in zip_response.iter_content(chunk_size=8192):
temp_zip.write(chunk)
temp_zip_path = temp_zip.name
temp_dir = tempfile.mkdtemp()
try:
# Step 3: Extract the archive
print("Extracting the release...")
with zipfile.ZipFile(temp_zip_path, 'r') as zip_ref:
zip_ref.extractall(temp_dir)
# Step 4: Find the extracted directory
extracted_dir = next(Path(temp_dir).iterdir()) # Get the first (and only) directory
# Step 5: Check if the source path directory exists
source_data_path = extracted_dir / source_path
if not source_data_path.exists():
raise FileNotFoundError(f"{source_path} directory not found in the release")
# Step 6: Copy each item from source directory directly to the target directory
print(f"Copying contents of {source_path} to {target_dir}...")
for item in source_data_path.iterdir():
dest_path = target_dir / item.name
# If it's a directory, copy the entire directory tree
if item.is_dir():
if dest_path.exists():
shutil.rmtree(dest_path)
shutil.copytree(item, dest_path)
# If it's a file, just copy the file
else:
if dest_path.exists():
os.remove(dest_path)
shutil.copy2(item, dest_path)
print(f"Successfully extracted {source_path} contents to {target_dir}")
return target_dir
finally:
# Step 7: Clean up temporary files
os.unlink(temp_zip_path)
shutil.rmtree(temp_dir)
setup_test_data() An alternative approach would be to create a tar of all the data files and, then, provide a series of shell commands that download and extract the data into the working directory. With this in mind, there are two different data locations with different data files. Some data is used, some is not, and some is missing: For the later example with mutation profiles, there lacks a discussion about where I'll kick start the review process on the package as the reviews can use the above script in the interim to explore the initial example. |
Thanks for the initial comments @coatless . I just have an important deadline to meet this coming week, but I'll work on your suggestions as soon as I can. Looking forward to the review. |
Submitting Author: (@cjpalpallatoc)
All current maintainers: (@cjpalpallatoc)
Package Name: VARGRAM
One-Line Description of Package: A Python visualization tool for genomic surveillance
Repository Link: https://github.com/pgcbioinfo/vargram
Version submitted: 0.3.0
EiC: @coatless
Editor: TBD
Reviewer 1: TBD
Reviewer 2: TBD
Archive: TBD
JOSS DOI: TBD
Version accepted: TBD
Date accepted (month/day/year): TBD
Code of Conduct & Commitment to Maintain Package
Description
During a viral outbreak, the diversity of sampled sequences often needs to be quickly determined to understand the evolution of a pathogen. VARGRAM (Visual ARrays for GRaphical Analysis of Mutations) empowers researchers to quickly generate a mutation profile to compare batches of sequences against each other and against a reference set of mutations. A publication-ready profile can be generated in a couple lines of code by providing sequence files (FASTA, GFF3) or tabular data (CSV, TSV, Pandas DataFrame). When sequence files are provided, VARGRAM leverages Nextclade CLI to perform mutation calling. We have user-friendly installation instructions and tutorials on our documentation website.
Scope
Please indicate which category or categories.
Check out our package scope page to learn more about our
scope. (If you are unsure of which category you fit, we suggest you make a pre-submission inquiry):
Domain Specific
Community Partnerships
If your package is associated with an
existing community please check below:
For all submissions, explain how and why the package falls under the categories you indicated above. In your explanation, please address the following points (briefly, 1-2 sentences for each):
Who is the target audience and what are scientific applications of this package?
We hope that VARGRAM would be useful for researchers, analysts, and students in the field of molecular epidemiology/genomic surveillance. During the pandemic, we've used an early mutation profile script to characterize emergent variants and potential recombinants.
Are there other Python packages that accomplish the same thing? If so, how does yours differ?
The closest we're aware of is snipit. The main difference is that VARGRAM provides a visual comparison of mutation profiles between groups or within a population of samples. There are also additional features such as grouping mutations per gene, adding multiple sets of reference mutations, and other customizations. We also plan to expand the package to provide other types of visualization relevant to genomic surveillance.
We're also aware of packages like Marsilea that can in principle be used to make a profile, but these are more general in scope and would require more work for the user than if they used VARGRAM. Outside Python, we've seen researchers create mutation profiles with custom scripts (in R) and there are also web tools available like Nextclade. VARGRAM differs by making the process substantially convenient in terms of generation and customization of the figure.
If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or
@tag
the editor you contacted:VARGRAM #225
Technical checks
For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:
Publication Options
JOSS Checks
paper.md
matching JOSS's requirements with a high-level description in the package root or ininst/
.Note: JOSS accepts our review as theirs. You will NOT need to go through another full review. JOSS will only review your paper.md file. Be sure to link to this pyOpenSci issue when a JOSS issue is opened for your package. Also be sure to tell the JOSS editor that this is a pyOpenSci reviewed package once you reach this step.
Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?
This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.
Confirm each of the following by checking the box.
Please fill out our survey
submission and improve our peer review process. We will also ask our reviewers
and editors to fill this out.
P.S. Have feedback/comments about our review process? Leave a comment here
Editor and Review Templates
The editor template can be found here.
The review template can be found here.
Footnotes
Please fill out a pre-submission inquiry before submitting a data visualization package. ↩
The text was updated successfully, but these errors were encountered: