Welcome to the HDF5 development community! This comprehensive guide covers everything you need to know about contributing to HDF5, from getting started to submitting your changes.
Important
No contribution can be accepted unless the contributor agrees to the HDF Group's software license terms, which can be found in the LICENSE file located in the top source directory of every branch.
- Getting Started
- Prerequisites
- Getting the Source Code
- Building for Development
- Source Code Overview
- Development Conventions
- Contributing Changes
- Testing
- Documentation
- Command-Line Tools
- Checklist for Contributors
- Getting Help
The HDF Group welcomes contributions of all kinds, from fixing typos to adding significant features. We are dedicated to making the contribution process enjoyable and straightforward.
Note
This guide offers a brief introduction to the HDF5 library and its development procedures. In contrast, An Overview of the HDF5 Library Architecture aims to provide a comprehensive understanding of the inner workings of the HDF5 library by exploring its fundamental principles. It covers the systematic, structural, and organized aspects that enable the library to function clearly and effectively. By reviewing this document, readers can gain insights into the library's architecture and learn how to use it efficiently. Additionally, it will provide an overview of the various approaches used to simplify the understanding of the HDF5 library's operations.
Before you begin, ensure your development machine has:
- A C11-compatible C compiler (MSVC on Windows is supported).
- A build system: CMake is required.
- Perl: Needed to run build and test scripts, even on Windows.
- Git: For version control.
- If you are new to Git and GitHub, we encourage you to check out the GitHub tutorial, which takes about 10 minutes to complete.
- clang-format: For code formatting. The CI system will automatically format pull requests if needed.
- codespell: For identifying spelling issues before submission.
- Doxygen: For compiling the documentation.
Depending on which features you want to build or enable:
- A C++11-compatible compiler for the C++ wrappers.
- A Fortran 2003-compatible compiler for the Fortran wrappers.
- A Java 8-compatible compiler for the Java wrappers.
flex
/lex
andbison
/yacc
if you want to modify the high-level parsers.- Development versions of zlib and szip for compression support.
- An MPI-3 compatible MPI library for parallel HDF5 development.
curl
and other components for the read-only S3 VFD.
The HDF5 source code is hosted on GitHub:
git clone https://github.com/HDFGroup/hdf5.git
cd hdf5
CMake is the required build system for all platforms:
-
Create a build directory:
mkdir build && cd build
-
Configure the build:
cmake -G "Unix Makefiles" -DHDF5_ENABLE_DEVELOPER_MODE=ON ..
The
HDF5_ENABLE_DEVELOPER_MODE
option enables debug symbols, warnings as errors, and other developer-friendly settings. -
Build the library:
make
- Memory Checking: Use
HDF5_ENABLE_USING_MEMCHECKER:BOOL=ON
when using tools like Valgrind. This disables internal memory pools that can hide memory issues. - Developer Warnings: Enable extra warnings with
HDF5_ENABLE_DEV_WARNINGS:BOOL=ON
(generates significant output but can be useful). - Warnings as Errors: The CI system builds with
-Werror
, so fix all compiler warnings before submitting pull requests.
Here's where to find things in the source tree:
src/
: Main C library source codetest/
: C library test codetestpar/
: Parallel C library test codetools/
: Command-line tools (h5dump, h5repack, etc.)HDF5Examples/
: Library exampleshl/
: High-level library source, tests, and examplesc++/
: C++ language wrapperfortran/
: Fortran language wrapperjava/
: JNI/Java language wrapperbin/
: Build scripts and miscellaneous toolsconfig/
: Configuration files for CMakedoxygen/
: Doxygen build files and documentationrelease_docs/
: Install instructions and release notesutils/
: Small utility programs
HDF5 code is organized into packages that encapsulate related functionality (e.g., H5D
for datasets).
Functions have three visibility levels:
-
Public: User-facing API functions
- Format:
H5Xfoo()
(e.g.,H5Dcreate
) - Headers:
H5Xpublic.h
- Format:
-
Private: Internal library API, usable across packages
- Format:
H5X_foo()
(one underscore, e.g.,H5D_create
)
- Format:
-
Package: Used only within the defining package
- Format:
H5X__foo()
(two underscores, e.g.,H5D__create
)
- Format:
HDF5 functions follow a consistent structure for entry/exit and error handling:
/*
* Function description
*/
herr_t
H5X_do_stuff(/*parameters*/)
{
/* 1. Variables declared at top */
void *foo = NULL;
herr_t ret_value = SUCCEED; /* 2. Return value variable */
FUNC_ENTER_NOAPI(FAIL) /* 3. Function entry macro */
HDassert(/*parameter check*/);
/* 4. Check for errors and goto done */
if (H5X_other_call() < 0)
HGOTO_ERROR(H5E_MAJ, H5E_MIN, FAIL, "An error occurred");
done: /* 5. Target for error jumps */
if (ret_value < 0)
/* error cleanup */
/* regular cleanup */
FUNC_LEAVE_NOAPI(ret_value); /* 6. Function leave macro */
}
Public functions use FUNC_ENTER_API
, include H5TRACE
macros for API tracing, and perform more rigorous parameter checking.
- Almost all functions return
herr_t
orhid_t
error codes FUNC_ENTER_*
macros set up error handling stackHGOTO_ERROR
pushes errors onto stack and jumps to cleanupFUNC_LEAVE_*
macros return the result- Always check return values of functions that can fail
HDF5 uses a compatibility layer for platform differences:
- Standard C and POSIX calls are prefixed with
HD
(e.g.,HDmalloc
,HDopen
) H5private.h
andH5win32defs.h
map these to platform-specific functions- This layer is being modernized as C99 and POSIX become universal
Use HDF5's internal memory management instead of direct malloc
/free
:
H5MM
: General-purpose memory management (recommended for most uses)H5FL
: Memory pools for fixed-size, frequently allocated objects (use only when performance testing shows clear benefits)
-
Open a GitHub issue (HDF5 Issues)
- Required unless the change is minor (e.g., typo fix).
- Describe the problem or feature request clearly.
-
Fork the repository and create your branch
- Target the
develop
branch for new features and bug fixes. - Use descriptive branch names.
- Target the
-
Make your changes
- Follow HDF5 coding conventions.
- Add tests for new functionality or bug fixes.
- Update documentation as needed.
-
Build and test thoroughly
- Follow build instructions in
release_docs/INSTALL*
files. - Ensure all tests pass.
- Follow build instructions in
-
Submit a pull request (PR)
- Address any formatting or testing issues reported by CI.
- Make sure to include the issue that the PR addresses in the description.
- Work with HDF Group developers to meet acceptance criteria.
For a pull request to be accepted, it must satisfy:
- Clear purpose: What does it address? How does it benefit the HDF5 community?
- Proper documentation: Code must be documented for maintainability.
- Testing: Must pass HDF5 regression testing and include appropriate tests.
- We do not expect you to perform comprehensive testing across multiple platforms before we accept the pull request.
- Compatibility: Must not compromise HDF5's core principles:
- 100% backward compatibility (any HDF5 file must remain readable).
- If your patch's purpose is to modify the HDF5 data model or file format, please discuss this with us first. File format changes and features required by those changes can be introduced only in a new major release.
- Machine independence (data readable across all platforms).
- Binary compatibility for maintenance releases (no changes to public APIs/structures).
- 100% backward compatibility (any HDF5 file must remain readable).
- Documentation: New features must be properly documented. This includes using Doxygen
and providing information in release documents such as
RELEASE.txt
.
- Small features: Develop in forks of the main repository.
- Large collaborative work: Use feature branches named
feature/<feature>
in the main repository. - Add
BRANCH.md
file explaining branch purpose and contact info for feature branches.
HDF5 uses custom testing macros rather than standard frameworks. There are two systems:
#include "h5test.h"
static int
test_feature(void)
{
TESTING("some feature");
/* test code */
if (error_condition)
TEST_ERROR;
PASSED();
return SUCCEED;
error:
return FAIL;
}
Used only by the large testhdf5
program. Uses global variables and should be avoided.
All new functionality and bug fixes must include tests.
- Add tests to existing test files when appropriate.
- Create new test programs using
h5test.h
macros. - Avoid adding to the
testhdf5
program. - Update
CMakeLists.txt
in thetest/
directory. - Ensure tests run and pass under CMake.
Write release notes for changes that affect users:
- Required: User-visible changes in functionality or behavior.
- Required: Known problems and user-reported issue fixes.
- Not required: Internal code changes, comments, or build process changes.
- Title/Problem
Problem description paragraph explaining the issue and conditions
where it occurs.
Solution paragraph describing what was done to resolve the issue
and any functional impact or workarounds.
- Title: Categories to help readers identify relevance.
- Problem: Clear description of the issue and conditions.
- Solution: What was done, functional impact, and any workarounds.
- Public functions: Must have Doxygen markup in
H5Xpublic.h
headers. - New features: Document in user guide content in
H5Xmodule.h
files. - Developer docs: By means of well documented source.
Tools in the tools/
directory:
- Written in C using only the public HDF5 API.
- Organized with central tools library (
tools/lib
) and individual tool directories. - Use simplified error-handling compared to main library.
- Examples:
h5dump
,h5diff
,h5repack
.
Before submitting your pull request, verify:
- Corresponding GitHub issue exists (unless minor change).
- Follows HDF5 conventions (naming, portability, structure).
- Applicable to other branches? (document in GitHub issue).
- Sufficiently documented for maintenance.
- API changes follow compatibility guidelines.
- Change described in
release_docs/RELEASE.txt
. - New functions documented with Doxygen in public headers.
- New features documented for HDF5 community.
- Pull request includes tests.
- Consider performance impact.
- HDF Forum: Best place for questions about HDF5 usage and development (on HDF Group website).
- GitHub Issues: For bug reports and feature requests.
- Documentation: Check existing docs on the HDF Group website.
The HDF5 community is here to help. Don't hesitate to reach out with questions or for guidance on contributions.
Thank you for contributing to HDF5! Your efforts help maintain and improve one of the most widely used data formats today.