Skip to content
@Trustworthy-ML-Lab

Trustworthy-ML-Lab

Popular repositories Loading

  1. Label-free-CBM Label-free-CBM Public

    [ICLR 23] A new framework to transform any neural networks into an interpretable concept-bottleneck-model (CBM) without needing labeled concept data

    Jupyter Notebook 112 25

  2. CLIP-dissect CLIP-dissect Public

    [ICLR 23 spotlight] An automatic and efficient tool to describe functionalities of individual neurons in DNNs

    Jupyter Notebook 55 16

  3. CB-LLMs CB-LLMs Public

    [ICLR 25] A novel framework for building intrinsically interpretable LLMs with human-understandable concepts to ensure safety, reliability, transparency, and trustworthiness.

    Python 24 5

  4. VLG-CBM VLG-CBM Public

    [NeurIPS 24] A new training and evaluation framework for learning interpretable deep vision models and benchmarking different interpretable concept-bottleneck-models (CBMs)

    Jupyter Notebook 21 2

  5. posthoc-generative-cbm posthoc-generative-cbm Public

    [CVPR 2025] Concept Bottleneck Autoencoder (CB-AE) -- efficiently transform any pretrained (black-box) image generative model into an interpretable generative concept bottleneck model (CBM) with mi…

    Jupyter Notebook 14 1

  6. ThinkEdit ThinkEdit Public

    An effective weight-editing method for mitigating overly short reasoning in LLMs, and a mechanistic study uncovering how reasoning length is encoded in the model’s representation space.

    Python 14 1

Repositories

Showing 10 of 23 repositories
  • ThinkEdit Public

    An effective weight-editing method for mitigating overly short reasoning in LLMs, and a mechanistic study uncovering how reasoning length is encoded in the model’s representation space.

    Trustworthy-ML-Lab/ThinkEdit’s past year of commit activity
    Python 14 1 0 0 Updated Aug 20, 2025
  • Trustworthy-ML-Lab/Concept-Bottleneck-LLM’s past year of commit activity
    Python 5 0 0 0 Updated Aug 15, 2025
  • Robust_HighUtil_Smoothed_DRL Public

    [ICML 24] S-DQN and S-PPO: Robust smoothed deep RL agents without sacrificing performance

    Trustworthy-ML-Lab/Robust_HighUtil_Smoothed_DRL’s past year of commit activity
    Python 5 0 0 0 Updated Aug 15, 2025
  • CB-LLMs Public

    [ICLR 25] A novel framework for building intrinsically interpretable LLMs with human-understandable concepts to ensure safety, reliability, transparency, and trustworthiness.

    Trustworthy-ML-Lab/CB-LLMs’s past year of commit activity
    Python 24 5 0 0 Updated Aug 15, 2025
  • Neuron_Eval Public

    [ICML 25] A unified mathematical framework to evaluate neuron explanations of deep learning models with sanity tests

    Trustworthy-ML-Lab/Neuron_Eval’s past year of commit activity
    Jupyter Notebook 6 0 0 0 Updated Jul 1, 2025
  • Trustworthy-ML-Lab/efficient_neuron_eval’s past year of commit activity
    1 0 0 0 Updated Jun 10, 2025
  • VLG-CBM Public

    [NeurIPS 24] A new training and evaluation framework for learning interpretable deep vision models and benchmarking different interpretable concept-bottleneck-models (CBMs)

    Trustworthy-ML-Lab/VLG-CBM’s past year of commit activity
    Jupyter Notebook 21 2 1 0 Updated Jun 5, 2025
  • posthoc-generative-cbm Public

    [CVPR 2025] Concept Bottleneck Autoencoder (CB-AE) -- efficiently transform any pretrained (black-box) image generative model into an interpretable generative concept bottleneck model (CBM) with minimal concept supervision, while preserving image quality

    Trustworthy-ML-Lab/posthoc-generative-cbm’s past year of commit activity
    Jupyter Notebook 14 1 1 0 Updated Jun 4, 2025
  • Linear-Explanations Public

    [ICML 24] A novel automated neuron explanation framework that can accurately describe poly-semantic concepts in deep neural networks

    Trustworthy-ML-Lab/Linear-Explanations’s past year of commit activity
    Jupyter Notebook 13 0 0 0 Updated May 2, 2025
  • effective_skill_unlearning Public

    [NAACL 25] Two novel, light-weight, and training-free skill unlearning methods for LLMs

    Trustworthy-ML-Lab/effective_skill_unlearning’s past year of commit activity
    Python 4 0 0 0 Updated Mar 27, 2025

People

This organization has no public members. You must be a member to see who’s a part of this organization.