Skip to content

ReproNim/reproschema

Repository files navigation

Python package DOI

ReproSchema: Enhancing Research Reproducibility through Standardized Survey Data Collection

Table of Contents

Introduction

ReproSchema is a standardized framework for creating, sharing, and reusing cognitive and clinical assessments. It addresses the lack of consistency in assessment data acquisition across studies by providing a common schema that captures relationships between questionnaire elements from the start.

Key Benefits:

  • 📊 Rich Context: JSON-LD format provides semantic relationships rather than flat CSV files
  • 🔄 Version Control: Track different versions of questionnaires (e.g., PHQ-9, PHQ-8)
  • 🌍 Internationalization: Built-in support for multiple languages
  • 🔗 Persistent Identifiers: Unique IDs for items, activities, and protocols
  • Validation: Schema validation using SHACL ensures data quality
  • 🚀 Implementation Agnostic: Use with any software platform

Quick Start

Get started with ReproSchema in minutes:

# Install the ReproSchema Python package
pip install reproschema

# Validate an example schema
reproschema validate examples/protocols/protocol1.jsonld

# Create a new protocol from template (requires cookiecutter)
pip install cookiecutter
cookiecutter https://github.com/ReproNim/reproschema-protocol-cookiecutter

Prerequisites

Before using ReproSchema, ensure you have:

  • Python 3.8+: Required for the reproschema-py tools
  • Git: For version control and cloning repositories
  • Text Editor: Preferably with JSON/JSON-LD support (e.g., VS Code)
  • Basic JSON Knowledge: Understanding of JSON syntax

Optional but recommended:

  • GitHub Account: For hosting and sharing your schemas
  • Node.js: If using the reproschema-ui interface

Installation

Installing the Python Package

The easiest way to work with ReproSchema is through the Python package:

# Using pip
pip install reproschema

# Using pip with specific version
pip install reproschema==1.0.0

# For development (with latest changes)
pip install git+https://github.com/ReproNim/reproschema-py.git

Cloning the Repository

To access examples and contribute to the schema:

git clone https://github.com/ReproNim/reproschema.git
cd reproschema

Usage Examples

Creating a Simple Item

An item represents a single question in a questionnaire. Here's a basic example:

{
  "@context": "https://github.com/raw/ReproNim/reproschema/1.0.0/contexts/reproschema",
  "@type": "reproschema:Field",
  "@id": "age_item",
  "prefLabel": "Age",
  "description": "Participant's age in years",
  "schemaVersion": "1.0.0",
  "version": "1.0.0",
  "ui": {
    "inputType": "number"
  },
  "responseOptions": {
    "valueType": "xsd:integer",
    "minValue": 0,
    "maxValue": 120,
    "unitCode": "years"
  }
}

Creating an Activity

An activity groups related items (like a complete questionnaire):

{
  "@context": "https://github.com/raw/ReproNim/reproschema/1.0.0/contexts/reproschema",
  "@type": "reproschema:Activity",
  "@id": "demographics_activity",
  "prefLabel": "Demographics",
  "description": "Basic demographic information",
  "schemaVersion": "1.0.0",
  "version": "1.0.0",
  "ui": {
    "order": ["age_item", "gender_item"],
    "shuffle": false,
    "addProperties": [
      {
        "variableName": "age",
        "isAbout": "age_item",
        "isVis": true
      },
      {
        "variableName": "gender",
        "isAbout": "gender_item",
        "isVis": true
      }
    ]
  }
}

Creating a Protocol

A protocol combines multiple activities for a complete study:

{
  "@context": "https://github.com/raw/ReproNim/reproschema/1.0.0/contexts/reproschema",
  "@type": "reproschema:Protocol",
  "@id": "my_study_protocol",
  "prefLabel": "My Research Study",
  "description": "Protocol for my research study",
  "schemaVersion": "1.0.0",
  "version": "1.0.0",
  "ui": {
    "order": ["demographics_activity", "phq9_activity"],
    "shuffle": false,
    "addProperties": [
      {
        "variableName": "demographics",
        "isAbout": "demographics_activity",
        "prefLabel": "Demographics",
        "isVis": true
      },
      {
        "variableName": "phq9",
        "isAbout": "phq9_activity",
        "prefLabel": "PHQ-9 Depression Scale",
        "isVis": true
      }
    ]
  }
}

Validating Schemas

Always validate your schemas to ensure they're correctly formatted:

# Validate a single file
reproschema validate my_protocol.jsonld

# Validate all files in a directory
reproschema validate protocols/

# Validate with detailed output
reproschema --log-level DEBUG validate my_schema.jsonld

Schema Structure

File Formats

ReproSchema uses several file formats:

  • JSON-LD (.jsonld): Primary format combining JSON with Linked Data
  • Turtle (.ttl): RDF serialization format
  • N-Triples (.nt): Line-based RDF format
  • YAML: Used for LinkML schema definitions

Schema Components

The ReproSchema consists of three hierarchical levels:

  1. Items (Fields): Individual questions or data points

    • Question text and descriptions
    • Input types (text, number, select, etc.)
    • Response options and constraints
    • Visibility conditions
  2. Activities: Collections of related items

    • Groups items into logical assessments
    • Defines item order and display logic
    • Can include scoring computations
    • Supports branching logic
  3. Protocols: Complete study designs

    • Combines multiple activities
    • Defines activity order and scheduling
    • Manages participant flow
    • Includes study-level metadata

ReproSchema Ecosystem

The ReproSchema project integrates five key components designed to standardize research protocols and enhance consistency across various stages of data collection:

1. Foundational Schema (reproschema)

This core schema delineates the content and relationships of protocols, assessments, and items to ensure consistency and facilitate data harmonization across studies.

2. Assessment Library (reproschema-library)

A comprehensive collection of standardized questionnaires, supporting the application of uniform assessments across time and different studies.

3. Python CLI Tool (reproschema-py)

Command-line interface tool that facilitates schema development and validation, aiding researchers in efficiently creating and refining data collection frameworks.

4. User Interface (reproschema-ui)

An intuitive web interface that simplifies the visualization and interaction with data, enhancing the manageability of the data collection process for researchers.

5. Protocol Template (reproschema-protocol-cookiecutter)

A customizable template that supports the design and implementation of research protocols tailored to specific study requirements.

Repository Structure

This repository contains:

reproschema/
├── terms/              # ReproSchema vocabulary terms
├── contexts/           # JSON-LD context files
├── examples/           # Example protocols, activities, and items
│   ├── activities/     # Sample activities
│   ├── protocols/      # Sample protocols
│   └── responses/      # Sample response data
├── linkml-schema/      # LinkML schema definitions
├── releases/           # Official release versions
├── docs/               # Documentation
│   ├── tutorials/      # Step-by-step guides
│   ├── how-to/         # Task-specific instructions
│   └── user-guide/     # Comprehensive user documentation
└── scripts/            # Utility scripts

Developing ReproSchema

Updating the schema

As of release 1.0.0, a linked data modeling language, LinkML, is used to create a YAML file with the schema.

The context file was automatically generated using LinkML, and then manually curated in order to support all the reproschema feature.

Style

This repo uses pre-commit to check styling.

  • Install pre-commit with pip: pip install pre-commit
  • In order to use it with the repository, you have to run run pre-commit install in the root directory the first time you use it.

Release

Upon release, there are additional formats, jsonsld, turtle, n-triples and pydantic that are created using LinkML tools, reproschema-py, and reproschema-specific script to "fix" the pydantic format. The entire process is automated in the GitHub Action Workflow: Validate and Release. This workflow must be manually triggered by the core developers once a new release is ready. All the releases can be found in releases directory.

Updating model in reproschema-py

Another GitHub Action Workflow: Create Pull Request to reproschema-py is responsible for creating pull request to the reproschema-py Python library with the new version of pydantic model and context. The workflow is currently also triggered manually by the core developers.

Licenses

Code

The content of this repository is distributed under the Apache 2.0 license.

Documentation

Creative Commons License
The corresponding documentation is licensed under a Creative Commons Attribution 4.0 International License.

Citation

If you use ReproSchema in your research, please cite our paper:

Chen Y, Jarecka D, Abraham S, Gau R, Ng E, Low D, Bevers I, Johnson A, Keshavan A, Klein A, Clucas J, Rosli Z, Hodge S, Linkersdörfer J, Bartsch H, Das S, Fair D, Kennedy D, Ghosh S. Standardizing Survey Data Collection to Enhance Reproducibility: Development and Comparative Evaluation of the ReproSchema Ecosystem. J Med Internet Res 2025;27:e63343. DOI: 10.2196/63343

Contributors

https://github.com/ReproNim/reproschema/graphs/contributors

About

A standardized form generation and data collection schema to harmonize results by design across projects.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Contributors 12

Languages