Zink #236

deepanwadhwa · 2025-03-26T23:29:30Z

Submitting Author: Deepan Wadhwa (@deepanwadhwa)
Package Name: Zink
One-Line Description of Package: Anonymize any type of entities in text data.
Repository Link (if existing): https://github.com/deepanwadhwa/zink
EiC: @coatless

Code of Conduct & Commitment to Maintain Package

I agree to abide by pyOpenSci's Code of Conduct during the review process and in maintaining my package after should it be accepted.
I have read and will commit to package maintenance after the review as per the pyOpenSci Policies Guidelines.

Description

Valuable research, particularly in sensitive fields like healthcare, often faces delays or cancellations due to challenges in anonymizing private data. Current tools can lack the necessary capabilities to handle diverse information securely. Zink addresses that need by effectively anonymizing any type of sensitive detail within text, enabling important studies to proceed while protecting privacy.

Community Partnerships

We partner with communities to support peer review with an additional layer of
checks that satisfy community requirements. If your package fits into an
existing community please check below:

Astropy: My package adheres to Astropy community standards
Pangeo: My package adheres to the Pangeo standards listed in the pyOpenSci peer review guidebook

Scope

Please indicate which category or categories this package falls under:
- Data retrieval
- Data extraction
- Data processing/munging
- Data deposition
- Data validation and testing
- Data visualization
- Workflow automation
- Citation management and bibliometrics
- Scientific software wrappers
- Database interoperability

Domain Specific

Geospatial
Education

Explain how and why the package falls under these categories (briefly, 1-2 sentences).
This anonymization tool falls squarely under data processing and munging because its core function is to transform text data—a common format in scientific workflows—into a state suitable for further analysis. By altering or removing private information, it 'munges' the raw input, enabling researchers to ethically and effectively work with otherwise restricted datasets.
For community partnerships, check also their specific guidelines as documented in the links above. Please note any areas you are unsure of:
Who is the target audience and what are the scientific applications of this package?
Any researchers who are working with unstructured text data which contains any type of sensitive information.
Are there other Python packages that accomplish similar things? If so, how does yours differ?
This is an optimal tool for anonymization as it runs locally (even on CPUs) and it can anonymize entities in a zero-shot manner, basically any type of entity. It is extremely fast as it uses an onnx model. I have not come across any python package which does what zink does.
Any other questions or issues we should be aware of:

P.S. Have feedback/comments about our review process? Leave a comment here
Hoping to hear from your team soon.

lwasser · 2025-05-07T21:04:47Z

Hey there @deepanwadhwa I"m just dropping in to let you know that this pre submission is the next in our list to review for scope. 🚀

lwasser · 2025-05-07T21:20:58Z

Hi again 👋🏻 . After looking at your package, @deepanwadhwa I have two questions/requests

When I search, I find other packages that handle redacting information from text and tools that do similar things (e.g., https://github.com/brootware/PyRedactKit). Can you help us understand how Zink fits into the landscape as a unique tool?

It looks like some of the core pyOpenSci review requirements are missing. Please read through our author guide for more details (also linked below) and let us know if you have any questions.

Specifically, please have a look at our basic requirements here

A few things that are currently missing:

CI/CD pipelines for both docs and tests should be setup
Your docs, once linked and built, should have tutorials and API docs
API documentation is light and needs further development.

Please work through the items above (with an emphasis on package overlap as that will help us determine if Zink is in scope) and let us know when you have addressed them so we can have another look at your package.

deepanwadhwa · 2025-05-07T21:35:09Z

Hi @lwasser - First of all, thank you so much for your feedback and the questions.

https://github.com/brootware/PyRedactKit - even though a great toolkit, is limited in scope. It can redact the following pieces of information:
sg nric 🆔
credit cards 🏧
domain names 🌐
emails ✉️
ipv4 📟
ipv6 📟
base64 🅱️

The above 7 categories could contain sensitive information depending upon context but it's limited to these 7 categories only.

If a researchers requirement is to redact a piece of information which doesn't belong to the above list of categories, say medical conditions e.g, then the above package would not help whereas Zink gives a user the ability to redact any type of information in a zero-shot manner. The emphasis is on zero-shot, because it truly allows the users to use zink out of the box to redact or replace any type of information. I hope that helps.

For the rest of the feedback, I will start working on it right away. Thank you again for the feedback :)

Best regards,
Deepan

lwasser · 2025-05-07T22:44:36Z

Thank you Deepan, I am still looking into the scope issue and will reply back here soon. Please note that if we move forward with a full submission (pending our scope decision), we will need you to be clear about other packages in the ecosystem that perform similar tasks and how Zink differs from them! More from me soon on the scope check!

deepanwadhwa added the presubmission label Mar 26, 2025

github-project-automation bot added this to presubmission-inquiries Mar 26, 2025

lwasser added this to peer-review-status Mar 26, 2025

lwasser moved this to pre-submission in peer-review-status Mar 26, 2025

lwasser added the presubmission-needs-improvements label May 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zink #236

Zink #236

deepanwadhwa commented Mar 26, 2025 •

edited by coatless

Loading

lwasser commented May 7, 2025

lwasser commented May 7, 2025

deepanwadhwa commented May 7, 2025

lwasser commented May 7, 2025

Zink #236

Zink #236

Comments

deepanwadhwa commented Mar 26, 2025 • edited by coatless Loading

Code of Conduct & Commitment to Maintain Package

Description

Community Partnerships

Scope

Domain Specific

lwasser commented May 7, 2025

lwasser commented May 7, 2025

deepanwadhwa commented May 7, 2025

lwasser commented May 7, 2025

deepanwadhwa commented Mar 26, 2025 •

edited by coatless

Loading