Skin Cancer Detection from 3D Total Body Photos
TThe goal was to build binary classifiers to predict malignant skin lesions from single-lesion crops extracted from 3D total body photos (TBP). This project is part of the ISIC 2024 - Skin Cancer Detection from 3D-TBP
Kaggle competition. Submissions were evaluated on partial AUC (pAUC) for true positive rates (TPR) above 80%.
You can explore the complete methodology in this notebook: 🔗 ISIC24 - Heavy Feature Eng with Polars + Boosting + CV + Ensemble Blend
Key steps followed:
-
✏️ Feature Engineering: spatial insights
- Extracted 3D landmark distances using pairwise Euclidean computations across TBP points.
- Constructed derived geometric features reflecting anatomical symmetry and patient-level spatial variation.
-
⚖️ Patient-Level Normalization: consistency modeling
- Applied normalization of features at the patient level to control for inter-subject variability.
- Included feature columns for image count per patient.
-
📊 Categorical Handling: info retention
- Employed OneHotEncoder for categorical variables.
- Converted them to category dtype for memory efficiency.
-
🧰 Ensemble Learning: reducing model variance
- Trained LightGBM, XGBoost, and CatBoost models independently.
- Combined predictions using a weighted ensemble method for improved pAUC.
-
📊 Custom Evaluation Metric: tailored pAUC
- Implemented a custom scoring function to simulate competition-specific pAUC above 80% TPR.
- This metric guided model selection and cross-validation.
-
✅ Public Leaderboard Scores:
- 0.18368, 0.18412, 0.18519
-
🏁 Private Leaderboard Scores:
- 0.16733, 0.16753, 0.16930 (final best)
-
🥇 Rank Achieved:
- Placed
184th
out of 3410 participants and 2739 teams as a solo competitor
- Placed
- 🏆 Kaggle Competition: ISIC 2024 - Skin Cancer Detection with 3D-TBP
- Language: Python 🐍
- Libraries:
polars
for dataframe operationspandas
,numpy
for numerical tasksstratifiedKFold
for cross validationsklearn
,lightgbm
,xgboost
,catboost
for modelingmatplotlib
,seaborn
for visualizationoptuna
for hyperparameter tuning
- Tools:
- Jupyter Notebook / Kaggle Notebooks 📓 for experimentation and code
- Custom Python metric functions for pAUC evaluation