Skip to content

krishnaura45/tbp-skin-detect

Repository files navigation

tbp-skin-detect

Skin Cancer Detection from 3D Total Body Photos

Python Kaggle Scikit-Learn Evaluator - Partial area under the ROC curve Metric Score Rank Solo Optuna

Project Duration: Aug 15, 2024 - Sep 7, 2024


🧠 Objective

TThe goal was to build binary classifiers to predict malignant skin lesions from single-lesion crops extracted from 3D total body photos (TBP). This project is part of the ISIC 2024 - Skin Cancer Detection from 3D-TBP Kaggle competition. Submissions were evaluated on partial AUC (pAUC) for true positive rates (TPR) above 80%.


🧩 Approach

You can explore the complete methodology in this notebook: 🔗 ISIC24 - Heavy Feature Eng with Polars + Boosting + CV + Ensemble Blend

Key steps followed:

  • ✏️ Feature Engineering: spatial insights

    • Extracted 3D landmark distances using pairwise Euclidean computations across TBP points.
    • Constructed derived geometric features reflecting anatomical symmetry and patient-level spatial variation.
  • ⚖️ Patient-Level Normalization: consistency modeling

    • Applied normalization of features at the patient level to control for inter-subject variability.
    • Included feature columns for image count per patient.
  • 📊 Categorical Handling: info retention

    • Employed OneHotEncoder for categorical variables.
    • Converted them to category dtype for memory efficiency.
  • 🧰 Ensemble Learning: reducing model variance

    • Trained LightGBM, XGBoost, and CatBoost models independently.
    • Combined predictions using a weighted ensemble method for improved pAUC.
  • 📊 Custom Evaluation Metric: tailored pAUC

    • Implemented a custom scoring function to simulate competition-specific pAUC above 80% TPR.
    • This metric guided model selection and cross-validation.

🏆 Results / Outcomes

isic2024-leaderboard-scores

  • ✅ Public Leaderboard Scores:

    • 0.18368, 0.18412, 0.18519
  • 🏁 Private Leaderboard Scores:

    • 0.16733, 0.16753, 0.16930 (final best)
  • 🥇 Rank Achieved:

    • Placed 184th out of 3410 participants and 2739 teams as a solo competitor

🔗 References


🛠️ Tech Stack

  • Language: Python 🐍
  • Libraries:
    • polars for dataframe operations
    • pandas, numpy for numerical tasks
    • stratifiedKFold for cross validation
    • sklearn, lightgbm, xgboost, catboost for modeling
    • matplotlib, seaborn for visualization
    • optuna for hyperparameter tuning
  • Tools:
    • Jupyter Notebook / Kaggle Notebooks 📓 for experimentation and code
    • Custom Python metric functions for pAUC evaluation