Skip to content

Commit 1308840

Browse files
committed
docs: update readme and move contributing docs to CONTRIBUTING.md
1 parent f640841 commit 1308840

File tree

2 files changed

+241
-373
lines changed

2 files changed

+241
-373
lines changed

CONTRIBUTING.md

Lines changed: 238 additions & 124 deletions
Original file line numberDiff line numberDiff line change
@@ -1,128 +1,242 @@
1-
# Contributing
1+
# Contributing to duckdb-python
22

3-
## Code of Conduct
3+
Start by <a href="https://github.com/duckdb/duckdb-python/fork"><svg height="16" viewBox="0 0 16 16" version="1.1" width="16">
4+
<path fill-rule="evenodd" d="M5 3.25a.75.75 0 11-1.5 0 .75.75 0 011.5 0zm0 2.122a2.25 2.25 0 10-1.5 0v.878A2.25 2.25 0 005.75 8.5h1.5v2.128a2.251 2.251 0 101.5 0V8.5h1.5a2.25 2.25 0 002.25-2.25v-.878a2.25 2.25 0 10-1.5 0v.878a.75.75 0 01-.75.75h-4.5A.75.75 0 015 6.25v-.878z"></path>
5+
</svg>forking duckdb-python</a>.
46

5-
This project and everyone participating in it is governed by a [Code of Conduct](CODE_OF_CONDUCT.md). By participating, you are expected to uphold this code. Please report unacceptable behavior to [[email protected]](mailto:[email protected]).
7+
### Cloning
68

9+
After forking the duckdb-python repo we recommend you clone your fork as follows:
10+
```shell
11+
git clone --recurse-submodules $REPO_URL
12+
git remote add upstream https://github.com/duckdb/duckdb-python.git
13+
git fetch --all
14+
```
15+
16+
... or, if you have already cloned your fork:
17+
```shell
18+
git submodule update --init --recursive
19+
git remote add upstream https://github.com/duckdb/duckdb-python.git
20+
git fetch --all
21+
```
22+
23+
### Submodule update hook
24+
25+
If you'll be switching between branches that are have the submodule set to different refs, then make your life
26+
easier and add the git hooks in the .githooks directory to your local config:
27+
```shell
28+
git config --local core.hooksPath .githooks/
29+
```
30+
31+
32+
### Editable installs (general)
33+
34+
It's good to be aware of the following when performing an editable install:
35+
- `uv sync` or `uv run [tool]` perform an editable install by default. We have
36+
configured the project so that scikit-build-core will use a persistent build-dir, but since the build itself
37+
happens in an isolated, ephemeral environment, cmake's paths will point to non-existing directories. CMake itself
38+
will be missing.
39+
- You should install all development dependencies, and then build the project without build isolation, in two separate
40+
steps. After this you can happily keep building and running, as long as you don't forget to pass in the
41+
`--no-build-isolation` flag.
42+
43+
```bash
44+
# install all dev dependencies without building the project (needed once)
45+
uv sync -p 3.11 --no-install-project
46+
# build and install without build isolation
47+
uv sync --no-build-isolation
48+
```
49+
50+
### Editable installs (IDEs)
51+
52+
If you're using an IDE then life is a little simpler. You install build dependencies and the project in the two
53+
steps outlined above, and from that point on you can rely on e.g. CLion's cmake capabilities to do incremental
54+
compilation and editable rebuilds. This will skip scikit-build-core's build backend and all of uv's dependency
55+
management, so for "real" builds you better revert to the CLI. However, this should work fine for coding and debugging.
56+
57+
58+
### Cleaning
59+
60+
```shell
61+
uv cache clean
62+
rm -rf build .venv uv.lock
63+
```
64+
65+
66+
### Building wheels and sdists
67+
68+
To build a wheel and sdist for your system and the default Python version:
69+
```bash
70+
uv build
71+
````
72+
73+
To build a wheel for a different Python version:
74+
```bash
75+
# E.g. for Python 3.9
76+
uv build -p 3.9
77+
```
78+
79+
### Running tests
80+
81+
Run all pytests:
82+
```bash
83+
uv run --no-build-isolation pytest ./tests --verbose
84+
```
85+
86+
Exclude the test/slow directory:
87+
```bash
88+
uv run --no-build-isolation pytest ./tests --verbose --ignore=./tests/slow
89+
```
90+
91+
### Test coverage
92+
93+
Run with coverage (during development you probably want to specify which tests to run):
94+
```bash
95+
COVERAGE=1 uv run --no-build-isolation coverage run -m pytest ./tests --verbose
96+
```
97+
98+
The `COVERAGE` env var will compile the extension with `--coverage`, allowing us to collect coverage stats of C++
99+
code as well as Python code.
100+
101+
Check coverage for Python code:
102+
```bash
103+
uvx coverage html -d htmlcov-python
104+
uvx coverage report --format=markdown
105+
```
106+
107+
Check coverage for C++ code (note: this will clutter your project dir with html files, consider saving them in some
108+
other place):
109+
```bash
110+
uvx gcovr \
111+
--gcov-ignore-errors all \
112+
--root "$PWD" \
113+
--filter "${PWD}/src/duckdb_py" \
114+
--exclude '.*/\.cache/.*' \
115+
--gcov-exclude '.*/\.cache/.*' \
116+
--gcov-exclude '.*/external/.*' \
117+
--gcov-exclude '.*/site-packages/.*' \
118+
--exclude-unreachable-branches \
119+
--exclude-throw-branches \
120+
--html --html-details -o coverage-cpp.html \
121+
build/coverage/src/duckdb_py \
122+
--print-summary
123+
```
124+
125+
### Typechecking and linting
126+
127+
- We're not running any mypy typechecking tests at the moment
128+
- We're not running any Ruff / linting / formatting at the moment
129+
130+
### Cibuildwheel
131+
132+
You can run cibuildwheel locally for Linux. E.g. limited to Python 3.9:
133+
```bash
134+
CIBW_BUILD='cp39-*' uvx cibuildwheel --platform linux .
135+
```
136+
137+
### Code conventions
138+
139+
* Follow the [Google Python styleguide](https://google.github.io/styleguide/pyguide.html)
140+
* See the section on [Comments and Docstrings](https://google.github.io/styleguide/pyguide.html#s3.8-comments-and-docstrings)
141+
142+
### Tooling
143+
144+
This codebase is developed with the following tools:
145+
- [Astral uv](https://docs.astral.sh/uv/) - for dependency management across all platforms we provide wheels for,
146+
and for Python environment management. It will be hard to work on this codebase without having UV installed.
147+
- [Scikit-build-core](https://scikit-build-core.readthedocs.io/en/latest/index.html) - the build backend for
148+
building the extension. On the background, scikit-build-core uses cmake and ninja for compilation.
149+
- [pybind11](https://pybind11.readthedocs.io/en/stable/index.html) - a bridge between C++ and Python.
150+
- [CMake](https://cmake.org/) - the build system for both DuckDB itself and the DuckDB Python module.
151+
- Cibuildwheel
152+
153+
### Merging changes to pythonpkg from duckdb main
154+
155+
1. Checkout main
156+
2Identify the merge commits that brought in tags to main:
157+
```bash
158+
git log --graph --oneline --decorate main --simplify-by-decoration
159+
```
160+
161+
3. Get the log of commits
162+
```bash
163+
git log --oneline 71c5c07cdd..c9254ecff2 -- tools/pythonpkg/
164+
```
165+
166+
4. Checkout v1.3-ossivalis
167+
5. Get the log of commits
168+
```bash
169+
git log --oneline v1.3.0..v1.3.1 -- tools/pythonpkg/
170+
```
171+
git diff --name-status 71c5c07cdd c9254ecff2 -- tools/pythonpkg/
172+
173+
```bash
174+
git log --oneline 71c5c07cdd..c9254ecff2 -- tools/pythonpkg/
175+
git diff --name-status <HASH_A> <HASH_B> -- tools/pythonpkg/
176+
```
177+
178+
179+
## Versioning and Releases
180+
181+
The DuckDB Python package versioning and release scheme follows that of DuckDB itself. This means that a `X.Y.Z[.
182+
postN]` release of the Python package ships the DuckDB stable release `X.Y.Z`. The optional `.postN` releases ship the same stable release of DuckDB as their predecessors plus Python package-specific fixes and / or features.
183+
184+
| Types | DuckDB Version | Resulting Python Extension Version |
185+
|------------------------------------------------------------------------|----------------|------------------------------------|
186+
| Stable release: DuckDB stable release | `1.3.1` | `1.3.1` |
187+
| Stable post release: DuckDB stable release + Python fixes and features | `1.3.1` | `1.3.1.postX` |
188+
| Nightly micro: DuckDB next micro nightly + Python next micro nightly | `1.3.2.devM` | `1.3.2.devN` |
189+
| Nightly minor: DuckDB next minor nightly + Python next minor nightly | `1.4.0.devM` | `1.4.0.devN` |
190+
191+
Note that we do not ship nightly post releases (e.g. we don't ship `1.3.1.post2.dev3`).
192+
193+
### Branch and Tag Strategy
194+
195+
We cut releases as follows:
196+
197+
| Type | Tag | How |
198+
|----------------------|--------------|---------------------------------------------------------------------------------|
199+
| Stable minor release | vX.Y.0 | Adding a tag on `main` |
200+
| Stable micro release | vX.Y.Z | Adding a tag on a minor release branch (e.g. `v1.3-ossivalis`) |
201+
| Stable post release | vX.Y.Z-postN | Adding a tag on a post release branch (e.g. `v1.3.1-post`) |
202+
| Nightly micro | _not tagged_ | Combining HEAD of the _micro_ release branches of DuckDB and the Python package |
203+
| Nightly minor | _not tagged_ | Combining HEAD of the _minor_ release branches of DuckDB and the Python package |
204+
205+
### Release Runbooks
206+
207+
We cut a new **stable minor release** with the following steps:
208+
1. Create a PR on `main` to pin the DuckDB submodule to the tag of its current release.
209+
1. Iff all tests pass in CI, merge the PR.
210+
1. Manually start the release workflow with the hash of this commit, and the tag name.
211+
1. Iff all goes well, create a new PR to let the submodule track DuckDB main.
212+
213+
We cut a new **stable micro release** with the following steps:
214+
1. Create a PR on the minor release branch to pin the DuckDB submodule to the tag of its current release.
215+
1. Iff all tests pass in CI, merge the PR.
216+
1. Manually start the release workflow with the hash of this commit, and the tag name.
217+
1. Iff all goes well, create a new PR to let the submodule track DuckDB's minor release branch.
218+
219+
We cut a new **stable post release** with the following steps:
220+
1. Create a PR on the post release branch to pin the DuckDB submodule to the tag of its current release.
221+
1. Iff all tests pass in CI, merge the PR.
222+
1. Manually start the release workflow with the hash of this commit, and the tag name.
223+
1. Iff all goes well, create a new PR to let the submodule track DuckDB's minor release branch.
7224
8-
## **Did you find a bug?**
9-
10-
* **Ensure the bug was not already reported** by searching on GitHub under [Issues](https://github.com/duckdb/duckdb/issues).
11-
* If you're unable to find an open issue addressing the problem, [open a new one](https://github.com/duckdb/duckdb/issues/new/choose). Be sure to include a **title and clear description**, as much relevant information as possible, and a **code sample** or an **executable test case** demonstrating the expected behavior that is not occurring.
12-
13-
## **Did you write a patch that fixes a bug?**
14-
15-
* Great!
16-
* If possible, add a unit test case to make sure the issue does not occur again.
17-
* Make sure you run the code formatter (`make format-fix`).
18-
* Open a new GitHub pull request with the patch.
19-
* Ensure the PR description clearly describes the problem and solution. Include the relevant issue number if applicable.
20-
21-
## Outside Contributors
22-
23-
* Discuss your intended changes with the core team on Github
24-
* Announce that you are working or want to work on a specific issue
25-
* Avoid large pull requests - they are much less likely to be merged as they are incredibly hard to review
26-
27-
## Pull Requests
28-
29-
* Do not commit/push directly to the main branch. Instead, create a fork and file a pull request.
30-
* When maintaining a branch, merge frequently with the main.
31-
* When maintaining a branch, submit pull requests to the main frequently.
32-
* If you are working on a bigger issue try to split it up into several smaller issues.
33-
* Please do not open "Draft" pull requests. Rather, use issues or discussion topics to discuss whatever needs discussing.
34-
* We reserve full and final discretion over whether or not we will merge a pull request. Adhering to these guidelines is not a complete guarantee that your pull request will be merged.
35-
36-
## CI for pull requests
37-
38-
* Pull requests will need to pass all continuous integration checks before merging.
39-
* For faster iteration and more control, consider running CI on your own fork or when possible directly locally.
40-
* Submitting changes to an open pull request will move it to 'draft' state.
41-
* Pull requests will get a complete run on the main repo CI only when marked as 'ready for review' (via Web UI, button on bottom right).
42-
43-
## Nightly CI
44-
45-
* Packages creation and long running tests will be performed during a nightly run
46-
* On your fork you can trigger long running tests (NightlyTests.yml) for any branch following information from https://docs.github.com/en/actions/using-workflows/manually-running-a-workflow#running-a-workflow
47-
48-
## Building
49-
50-
* To build the project, run `make`.
51-
* To build the project for debugging, run `make debug`.
52-
* For parallel builds, you can use the [Ninja](https://ninja-build.org/) build system: `GEN=ninja make`.
53-
* The default number of parallel processes can lock up the system depending on the CPU-to-memory ratio. If this happens, restrict the maximum number of build processes: `CMAKE_BUILD_PARALLEL_LEVEL=4 GEN=ninja make`.
54-
* Without using Ninja, build times can still be reduced by setting `CMAKE_BUILD_PARALLEL_LEVEL=$(nproc)`.
55-
56-
## Testing
57-
58-
* Unit tests can be written either using the sqllogictest framework (`.test` files) or in C++ directly. We **strongly** prefer tests to be written using the sqllogictest framework. Only write tests in C++ if you absolutely need to (e.g. when testing concurrent connections or other exotic behavior).
59-
* Documentation for the testing framework can be found [here](https://duckdb.org/dev/testing).
60-
* Write many tests.
61-
* Test with different types, especially numerics, strings and complex nested types.
62-
* Try to test unexpected/incorrect usage as well, instead of only the happy path.
63-
* `make unit` runs the **fast** unit tests (~one minute), `make allunit` runs **all** unit tests (~one hour).
64-
* Make sure **all** unit tests pass before sending a PR.
65-
* Slower tests should be added to the **all** unit tests. You can do this by naming the test file `.test_slow` in the sqllogictests, or by adding `[.]` after the test group in the C++ tests.
66-
* Look at the code coverage report of your branch and attempt to cover all code paths in the fast unit tests. Attempt to trigger exceptions as well. It is acceptable to have some exceptions not triggered (e.g. out of memory exceptions or type switch exceptions), but large branches of code should always be either covered or removed.
67-
* DuckDB uses GitHub Actions as its continuous integration (CI) tool. You also have the option to run GitHub Actions on your forked repository. For detailed instructions, you can refer to the [GitHub documentation](https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/enabling-features-for-your-repository/managing-github-actions-settings-for-a-repository). Before running GitHub Actions, please ensure that you have all the Git tags from the duckdb/duckdb repository. To accomplish this, execute the following commands `git fetch <your-duckdb/duckdb-repo-remote-name> --tags` and then
68-
`git push --tags` These commands will fetch all the git tags from the duckdb/duckdb repository and push them to your forked repository. This ensures that you have all the necessary tags available for your GitHub Actions workflow.
69-
70-
## Formatting
71-
72-
* Use tabs for indentation, spaces for alignment.
73-
* Lines should not exceed 120 columns.
74-
* To make sure the formatting is consistent, please use version 11.0.1, installable through `python3 -m pip install clang-format==11.0.1` or `pipx install clang-format==11.0.1`.
75-
* `clang_format` and `black` enforce these rules automatically, use `make format-fix` to run the formatter.
76-
* The project also comes with an [`.editorconfig` file](https://editorconfig.org/) that corresponds to these rules.
77-
78-
## C++ Guidelines
79-
80-
* Do not use `malloc`, prefer the use of smart pointers. Keywords `new` and `delete` are a code smell.
81-
* Strongly prefer the use of `unique_ptr` over `shared_ptr`, only use `shared_ptr` if you **absolutely** have to.
82-
* Use `const` whenever possible.
83-
* Do **not** import namespaces (e.g. `using std`).
84-
* All functions in source files in the core (`src` directory) should be part of the `duckdb` namespace.
85-
* When overriding a virtual method, avoid repeating virtual and always use `override` or `final`.
86-
* Use `[u]int(8|16|32|64)_t` instead of `int`, `long`, `uint` etc. Use `idx_t` instead of `size_t` for offsets/indices/counts of any kind.
87-
* Prefer using references over pointers as arguments.
88-
* Use `const` references for arguments of non-trivial objects (e.g. `std::vector`, ...).
89-
* Use C++11 for loops when possible: `for (const auto& item : items) {...}`
90-
* Use braces for indenting `if` statements and loops. Avoid single-line if statements and loops, especially nested ones.
91-
* **Class Layout:** Start out with a `public` block containing the constructor and public variables, followed by a `public` block containing public methods of the class. After that follow any private functions and private variables. For example:
92-
```cpp
93-
class MyClass {
94-
public:
95-
MyClass();
96-
97-
int my_public_variable;
98-
99-
public:
100-
void MyFunction();
101-
102-
private:
103-
void MyPrivateFunction();
104-
105-
private:
106-
int my_private_variable;
107-
};
108-
```
109-
* Avoid [unnamed magic numbers](https://en.wikipedia.org/wiki/Magic_number_(programming)). Instead, use named variables that are stored in a `constexpr`.
110-
* [Return early](https://medium.com/swlh/return-early-pattern-3d18a41bba8). Avoid deep nested branches.
111-
* Do not include commented out code blocks in pull requests.
112-
113-
## Error Handling
114-
115-
* Use exceptions **only** when an error is encountered that terminates a query (e.g. parser error, table not found). Exceptions should only be used for **exceptional** situations. For regular errors that do not break the execution flow (e.g. errors you **expect** might occur) use a return value instead.
116-
* Try to add test cases that trigger exceptions. If an exception cannot be easily triggered using a test case then it should probably be an assertion. This is not always true (e.g. out of memory errors are exceptions, but are very hard to trigger).
117-
* Use `D_ASSERT` to assert. Use **assert** only when failing the assert means a programmer error. Assert should never be triggered by user input. Avoid code like `D_ASSERT(a > b + 3);` without comments or context.
118-
* Assert liberally, but make it clear with comments next to the assert what went wrong when the assert is triggered.
119-
120-
## Naming Conventions
121-
122-
* Choose descriptive names. Avoid single-letter variable names.
123-
* Files: lowercase separated by underscores, e.g., abstract_operator.cpp
124-
* Types (classes, structs, enums, typedefs, using): CamelCase starting with uppercase letter, e.g., BaseColumn
125-
* Variables: lowercase separated by underscores, e.g., chunk_size
126-
* Functions: CamelCase starting with uppercase letter, e.g., GetChunk
127-
* Avoid `i`, `j`, etc. in **nested** loops. Prefer to use e.g. **column_idx**, **check_idx**. In a **non-nested** loop it is permissible to use **i** as iterator index.
128-
* These rules are partially enforced by `clang-tidy`.
225+
### Dynamic Versioning Integration
226+
227+
The package uses `setuptools_scm` with `scikit-build` for automatic version determination, and implements a custom
228+
versioning scheme.
229+
230+
- **pyproject.toml configuration**:
231+
```toml
232+
[tool.scikit-build]
233+
metadata.version.provider = "scikit_build_core.metadata.setuptools_scm"
234+
235+
[tool.setuptools_scm]
236+
version_scheme = "duckdb_packaging._setuptools_scm_version:version_scheme"
237+
```
238+
239+
- **Environment variables**:
240+
- `MAIN_BRANCH_VERSIONING=0`: Use release branch versioning (patch increments)
241+
- `MAIN_BRANCH_VERSIONING=1`: Use main branch versioning (minor increments)
242+
- `OVERRIDE_GIT_DESCRIBE`: Override version detection

0 commit comments

Comments
 (0)