VARGRAM submission #243

cjpalpallatoc · 2025-05-09T07:56:45Z

Submitting Author: (@cjpalpallatoc)
All current maintainers: (@cjpalpallatoc)
Package Name: VARGRAM
One-Line Description of Package: A Python visualization tool for genomic surveillance
Repository Link: https://github.com/pgcbioinfo/vargram
Version submitted: 0.3.0
EiC: @coatless
Editor: TBD
Reviewer 1: TBD
Reviewer 2: TBD
Archive: TBD
JOSS DOI: TBD
Version accepted: TBD
Date accepted (month/day/year): TBD

Code of Conduct & Commitment to Maintain Package

I agree to abide by pyOpenSci's Code of Conduct during the review process and in maintaining my package after should it be accepted.
I have read and will commit to package maintenance after the review as per the pyOpenSci Policies Guidelines.

Description

Include a brief paragraph describing what your package does:

During a viral outbreak, the diversity of sampled sequences often needs to be quickly determined to understand the evolution of a pathogen. VARGRAM (Visual ARrays for GRaphical Analysis of Mutations) empowers researchers to quickly generate a mutation profile to compare batches of sequences against each other and against a reference set of mutations. A publication-ready profile can be generated in a couple lines of code by providing sequence files (FASTA, GFF3) or tabular data (CSV, TSV, Pandas DataFrame). When sequence files are provided, VARGRAM leverages Nextclade CLI to perform mutation calling. We have user-friendly installation instructions and tutorials on our documentation website.

Scope

Please indicate which category or categories.
Check out our package scope page to learn more about our
scope. (If you are unsure of which category you fit, we suggest you make a pre-submission inquiry):
- Data retrieval
- Data extraction
- Data processing/munging
- Data deposition
- Data validation and testing
- Data visualization¹
- Workflow automation
- Citation management and bibliometrics
- Scientific software wrappers
- Database interoperability

Domain Specific

Geospatial
Education

Community Partnerships

If your package is associated with an
existing community please check below:

Astropy:My package adheres to Astropy community standards
Pangeo: My package adheres to the Pangeo standards listed in the pyOpenSci peer review guidebook

For all submissions, explain how and why the package falls under the categories you indicated above. In your explanation, please address the following points (briefly, 1-2 sentences for each):
- Who is the target audience and what are scientific applications of this package?
  We hope that VARGRAM would be useful for researchers, analysts, and students in the field of molecular epidemiology/genomic surveillance. During the pandemic, we've used an early mutation profile script to characterize emergent variants and potential recombinants.
- Are there other Python packages that accomplish the same thing? If so, how does yours differ?
  The closest we're aware of is snipit. The main difference is that VARGRAM provides a visual comparison of mutation profiles between groups or within a population of samples. There are also additional features such as grouping mutations per gene, adding multiple sets of reference mutations, and other customizations. We also plan to expand the package to provide other types of visualization relevant to genomic surveillance.
  We're also aware of packages like Marsilea that can in principle be used to make a profile, but these are more general in scope and would require more work for the user than if they used VARGRAM. Outside Python, we've seen researchers create mutation profiles with custom scripts (in R) and there are also web tools available like Nextclade. VARGRAM differs by making the process substantially convenient in terms of generation and customization of the figure.
- If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted:
  VARGRAM #225

Technical checks

For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:

does not violate the Terms of Service of any service it interacts with.
uses an OSI approved license.
contains a README with instructions for installing the development version.
includes documentation with examples for all functions.
contains a tutorial with examples of its essential functions and uses.
has a test suite.
has continuous integration setup, such as GitHub Actions CircleCI, and/or others.

Publication Options

Do you wish to automatically submit to the Journal of Open Source Software? If so:

JOSS Checks

The package has an obvious research application according to JOSS's definition in their submission requirements. Be aware that completing the pyOpenSci review process does not guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS.
The package is not a "minor utility" as defined by JOSS's submission requirements: "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria.
The package contains a paper.md matching JOSS's requirements with a high-level description in the package root or in inst/.
The package is deposited in a long-term repository with the DOI:

Note: JOSS accepts our review as theirs. You will NOT need to go through another full review. JOSS will only review your paper.md file. Be sure to link to this pyOpenSci issue when a JOSS issue is opened for your package. Also be sure to tell the JOSS editor that this is a pyOpenSci reviewed package once you reach this step.

Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?

This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.

Yes I am OK with reviewers submitting requested changes as issues to my repo. Reviewers will then link to the issues in their submitted review.

Confirm each of the following by checking the box.

I have read the author guide.
I expect to maintain this package for at least 2 years and can help find a replacement for the maintainer (team) if needed.

Please fill out our survey

Last but not least please fill out our pre-review survey. This helps us track
submission and improve our peer review process. We will also ask our reviewers
and editors to fill this out.

P.S. Have feedback/comments about our review process? Leave a comment here

Editor and Review Templates

The editor template can be found here.

The review template can be found here.

Please fill out a pre-submission inquiry before submitting a data visualization package. ↩

The text was updated successfully, but these errors were encountered:

coatless · 2025-05-14T02:10:03Z

Editor in Chief checks

Hi there! Thank you for submitting your package for pyOpenSci
review. Below are the basic checks that your package needs to pass
to begin our review. If some of these are missing, we will ask you
to work on them before the review process begins.

Please check our Python packaging guide for more information on the elements
below.

Initial onboarding survey was filled out
We appreciate each maintainer of the package filling out this survey individually. 🙌
Thank you authors in advance for setting aside five to ten minutes to do this. It truly helps our organization. 🙌

Editor comments

Hi, thanks for submittiing the package.

Regarding the examples and quick start, please include a quick command, e.g. setup_test_data(), that can be used to obtain the required data from the repository so that the package's example can be run.

The following addresses this first step discussed under Example.

Sample implementation for `setup_test_data()`

import requests
import zipfile
import os
import tempfile
import shutil
from pathlib import Path
from typing import Optional, Union, Tuple

def setup_test_data(
    repo: str = "pgcbioinfo/vargram",
    source_path: str = "tests/test_data",
    target_dir: Optional[Union[str, Path]] = None
) -> Path:
    """
    Download the latest release of a GitHub repository and extract a specific directory's
    contents to a target directory.
    
    This function is useful for setting up test data for a package. It fetches the latest
    release from GitHub, extracts the specified directory's contents, and places them
    directly in the target directory.
    
    Parameters
    ----------
    repo : str, default "pgcbioinfo/vargram"
        The GitHub repository in format "owner/repo".
    source_path : str, default "tests/test_data"
        Path within the repository to extract. This directory's contents will be
        placed in the target directory.
    target_dir : str or Path, optional
        Directory where the contents should be extracted. If None, the current
        working directory is used.
        
    Returns
    -------
    Path
        A Path object pointing to the target directory where files were extracted.
        
    Raises
    ------
    requests.HTTPError
        If the API request to GitHub fails.
    FileNotFoundError
        If the specified source_path doesn't exist in the repository.
    ValueError
        If the repository format is invalid.
        
    Examples
    --------
    >>> # Extract to current directory
    >>> data_path = setup_test_data()
    >>> 
    >>> # Extract to a specific directory
    >>> data_path = setup_test_data(target_dir="./my_test_data")
    >>> 
    >>> # Extract a different path from a different repo
    >>> data_path = setup_test_data(
    ...     repo="username/repo",
    ...     source_path="data/examples",
    ...     target_dir="./examples"
    ... )
    """
    # Validate repo format
    if not repo or "/" not in repo:
        raise ValueError(f"Invalid repository format: {repo}. Expected format: 'owner/repo'")
    
    # Convert target_dir to Path if specified, otherwise use current directory
    if target_dir is None:
        target_dir = Path.cwd()
    else:
        target_dir = Path(target_dir)
        target_dir.mkdir(parents=True, exist_ok=True)
    
    # Step 1: Get the latest release information
    print(f"Getting latest release for {repo}...")
    response = requests.get(f"https://github.com/api/repos/{repo}/releases/latest")
    response.raise_for_status()  # Raise an exception for HTTP errors
    release_data = response.json()
    zipball_url = release_data["zipball_url"]
    release_tag = release_data["tag_name"]
    print(f"Found release: {release_tag}")
    
    # Step 2: Download the release zip file
    print(f"Downloading release from {zipball_url}...")
    zip_response = requests.get(zipball_url, stream=True)
    zip_response.raise_for_status()
    
    # Create temporary files
    with tempfile.NamedTemporaryFile(delete=False, suffix='.zip') as temp_zip:
        # Write the downloaded zip to the temporary file
        for chunk in zip_response.iter_content(chunk_size=8192):
            temp_zip.write(chunk)
        temp_zip_path = temp_zip.name
    
    temp_dir = tempfile.mkdtemp()
    
    try:
        # Step 3: Extract the archive
        print("Extracting the release...")
        with zipfile.ZipFile(temp_zip_path, 'r') as zip_ref:
            zip_ref.extractall(temp_dir)
        
        # Step 4: Find the extracted directory
        extracted_dir = next(Path(temp_dir).iterdir())  # Get the first (and only) directory
        
        # Step 5: Check if the source path directory exists
        source_data_path = extracted_dir / source_path
        if not source_data_path.exists():
            raise FileNotFoundError(f"{source_path} directory not found in the release")
        
        # Step 6: Copy each item from source directory directly to the target directory
        print(f"Copying contents of {source_path} to {target_dir}...")
        for item in source_data_path.iterdir():
            dest_path = target_dir / item.name
            
            # If it's a directory, copy the entire directory tree
            if item.is_dir():
                if dest_path.exists():
                    shutil.rmtree(dest_path)
                shutil.copytree(item, dest_path)
            # If it's a file, just copy the file
            else:
                if dest_path.exists():
                    os.remove(dest_path)
                shutil.copy2(item, dest_path)
                
        print(f"Successfully extracted {source_path} contents to {target_dir}")
        
        return target_dir
    
    finally:
        # Step 7: Clean up temporary files
        os.unlink(temp_zip_path)
        shutil.rmtree(temp_dir)

setup_test_data()

An alternative approach would be to create a tar of all the data files and, then, provide a series of shell commands that download and extract the data into the working directory.

With this in mind, there are two different data locations with different data files. Some data is used, some is not, and some is missing:

For the later example with mutation profiles, there lacks a discussion about where nextclade_analysis.csv or covid_samples/* data can be obtained from once Nextclade CLI is installed.

I'll kick start the review process on the package as the reviews can use the above script in the interim to explore the initial example.

cjpalpallatoc · 2025-05-19T06:37:15Z

Thanks for the initial comments @coatless . I just have an important deadline to meet this coming week, but I'll work on your suggestions as soon as I can. Looking forward to the review.

cjpalpallatoc added 0/pre-review-checks New Submission! labels May 9, 2025

github-project-automation bot added this to peer-review-status May 9, 2025

lwasser moved this to pre-review-checks in peer-review-status May 9, 2025

coatless added 0/seeking-editor and removed 0/pre-review-checks New Submission! labels May 14, 2025

lwasser moved this from pre-review-checks to seeking-editor in peer-review-status May 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VARGRAM submission #243

VARGRAM submission #243

cjpalpallatoc commented May 9, 2025 •

edited by coatless

Loading

coatless commented May 14, 2025 •

edited

Loading

cjpalpallatoc commented May 19, 2025

VARGRAM submission #243

VARGRAM submission #243

Comments

cjpalpallatoc commented May 9, 2025 • edited by coatless Loading

Code of Conduct & Commitment to Maintain Package

Description

Scope

Domain Specific

Community Partnerships

Technical checks

Publication Options

Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?

Please fill out our survey

Editor and Review Templates

Footnotes

coatless commented May 14, 2025 • edited Loading

Editor in Chief checks

Editor comments

cjpalpallatoc commented May 19, 2025

cjpalpallatoc commented May 9, 2025 •

edited by coatless

Loading

coatless commented May 14, 2025 •

edited

Loading