You need to agree to share your contact information to access this model
This repository is publicly accessible, but you have to accept the conditions to access its files and content.
Request access for research or deployment evaluation. Please share a short justification for why you need the DermoLens model.
Log in or Sign Up to review the conditions and access this model content.
DermoLens TIPSv2 + MIL Infectious Screening Deployment Package
This folder packages the latest Training-C production candidate for Hugging Face or container-based deployment.
The realistic deployment model is a two-tier pipeline:
- Raw dermatology images are passed through
google/tipsv2-l14. - The resulting TIPSv2 CLS embeddings are passed into the DermoLens MIL classifier.
- The MIL probability is converted to a final class using the production threshold.
Caution
This package is TIPSv2-only. Do not use Derm Foundation embeddings, Derm Foundation .npz archives, or 6144-d feature files.
Access Requests
This repository is intended to be published as a gated Hugging Face model card.
By default, Hugging Face already collects the requester email and username for gated models. The extra fields above add:
- a short free-text justification
- intended use
- a research-only acknowledgment checkbox
If the repository remains private, the request form will not be visible. To use the request workflow, the model should be published as a public gated repo.
What Is Included
deploy-hf/
README.md
README.production-bundle.md
Dockerfile
requirements.txt
deployment_config.json
model/
binary_tipsv2_screening_model.keras
metadata/
thresholds.json
production_config.json
production_validation_metrics.json
production_training_history.csv
production_validation_predictions.npz
revised_binary_label_summary.json
best_hyperparameters.json
figures/
production_learning_curves.png
src/
inference.py
tipsv2_common_training_reference.py
scripts/
download_tipsv2.py
tipsv2-local-reference/
configuration and remote-code reference files from the local TIPSv2 checkout
Model Formats
There are two model components:
| Component | Model | Framework / Format | Role |
|---|---|---|---|
| Feature extractor | google/tipsv2-l14 |
PyTorch via Hugging Face Transformers remote code, safetensors weights |
Converts raw images to 1024-d CLS embeddings |
| MIL classifier | binary_tipsv2_screening_model.keras |
TensorFlow / Keras .keras |
Converts a (3, 1024) casebag to P(Infectious) |
The current system is therefore mixed-framework: PyTorch for TIPSv2 and TensorFlow/Keras for the MIL head.
Comprehensive Benchmarks
The full benchmark ledger is also copied into this package as FINAL_BENCHMARKS.md. The same content is mirrored below so the Hugging Face model card is self-contained.
Binary Screening
Evaluation performed on the full 3,061 case validation set.
Operating Threshold: P(Infectious) >= 0.35
| Model Architecture / Format | Model Size | AUC | Accuracy | Precision | Recall (Sensitivity) | F1 Score | Notes |
|---|---|---|---|---|---|---|---|
| Original Keras (Training-C) | 1.15 GB+ | 0.755784 | 0.661875 | 0.544653 | 0.780960 | 0.641745 | The original fragmented FP32 pipeline. |
| PyTorch Unified (FP32) | 1,865 MB | 0.755784 | 0.661875 | 0.544653 | 0.780960 | 0.641745 | The final production monolith. Mathematically identical to Keras. |
| PyTorch Unified (FP16) | 932 MB | 0.755789 | 0.661875 | 0.544653 | 0.780960 | 0.641745 | Halves RAM usage with essentially no accuracy loss. |
| LiteRT Edge (FP32) | 1,163 MB | 0.755784 | 0.661875 | 0.544653 | 0.780960 | 0.641745 | Mathematically identical to PyTorch FP32. |
| LiteRT Edge (INT8 PTQ) | 297 MB | 0.736973 | 0.669716 | 0.561798 | 0.673968 | 0.612792 | The quantization tradeoff. Lower sensitivity. |
10-Disease Classification
Evaluation performed on the preliminary 2,336 case dataset.
Representative class-level agreement is shown below; equivalence holds across the 10 classes.
Class 0 (Eczema) - Threshold: 0.4747
| Model Architecture / Format | AUC | Accuracy | Precision | Recall | F1 Score |
|---|---|---|---|---|---|
| Original Keras (Training-A) | 0.739529 | 0.656678 | 0.598756 | 0.729167 | 0.657558 |
| PyTorch Unified (FP32) | 0.739529 | 0.656678 | 0.598756 | 0.729167 | 0.657558 |
| PyTorch Unified (FP16) | 0.739538 | 0.656678 | 0.598756 | 0.729167 | 0.657558 |
Class 1 (Allergic Contact Dermatitis) - Threshold: 0.3838
| Model Architecture / Format | AUC | Accuracy | Precision | Recall | F1 Score |
|---|---|---|---|---|---|
| Original Keras (Training-A) | 0.739767 | 0.684932 | 0.572334 | 0.620848 | 0.595604 |
| PyTorch Unified (FP32) | 0.739767 | 0.684932 | 0.572334 | 0.620848 | 0.595604 |
| PyTorch Unified (FP16) | 0.739774 | 0.685360 | 0.572785 | 0.621993 | 0.596376 |
Technical Conclusions
- Mathematical Equivalence: The manual port of the complex gated attention pooling and global average pooling layers from Keras to PyTorch is numerically aligned for the supported benchmark runs.
- The Power of FP16: Converting the PyTorch unified engine to FP16 reduces the Docker container memory footprint while preserving the clinical ROC-AUC and sensitivity from the FP32 runs.
- LiteRT Limitations: The LiteRT FP32 export is mathematically sound, but FP16 conversion for large vision transformers can fail in the Google AI Edge toolchain. INT8 PTQ succeeds but reduces clinical sensitivity.
Final Deployment Target: unified_engine_fp16_weights.pt running on CPU via FastAPI.
Production Decision Rule
The MIL model outputs one probability:
P(Infectious)
The production threshold is:
0.35
Final classification:
if p_infectious >= 0.35:
prediction = "Infectious"
else:
prediction = "Non Infectious"
Do not silently use 0.5 for production inference.
Input Contract
Input to the full deployment pipeline:
1 to 3 RGB images from the same patient case
Input to the MIL classifier after TIPSv2:
casebag.shape == (3, 1024)
Rules:
- Each submitted image is converted to RGB.
- Each image is resized to
448 x 448, matching the Training-C extraction process. - Each image is passed through
google/tipsv2-l14usingmodel.encode_image(pixel_values). - Each row is the final-layer TIPSv2 CLS token:
out.cls_token[0, 0]. - Each CLS embedding must be 1024-d.
- Cases with fewer than 3 images are automatically zero-padded to 3 MIL slots.
- Do not mix images from different patient cases.
- Do not flatten this into image-level classification unless explicitly doing a different experiment.
Exact Casebag Behavior
The MIL model always receives exactly 3 slots:
(3, 1024)
If 1 image is submitted:
slot 1 = TIPSv2(image_1)
slot 2 = zeros(1024)
slot 3 = zeros(1024)
If 2 images are submitted:
slot 1 = TIPSv2(image_1)
slot 2 = TIPSv2(image_2)
slot 3 = zeros(1024)
If 3 images are submitted:
slot 1 = TIPSv2(image_1)
slot 2 = TIPSv2(image_2)
slot 3 = TIPSv2(image_3)
The padding is handled automatically by src/inference.py. The submitted image order is preserved. If a case has more than 3 images, do not pass all images blindly; select/split intentionally because this model was trained with at most 3 images per case.
Encoding Contract From Training-C
The deployment encoder must match Training-C:
image = Image.open(image_path).convert("RGB")
pixel_values = Resize((448, 448))(image)
pixel_values = ToTensor()(pixel_values).unsqueeze(0)
out = tipsv2_model.encode_image(pixel_values)
embedding = out.cls_token[0, 0].float().cpu().numpy()
This is the same logic used during APR26/data_extraction/extract_all_cases_tipsv2.py.
Do not use:
- patch-token averages,
- register tokens,
- normalized text/image similarity vectors,
- Derm Foundation embeddings,
- image-level logits from another model.
Production Metrics
Training-C production validation metrics at threshold 0.35:
| Metric | Value |
|---|---|
| ROC AUC | 0.7194 |
| PR-AUC | 0.5868 |
| Sensitivity / Recall | 0.7697 |
| Specificity | 0.5851 |
| Accuracy | 0.6565 |
| Precision | 0.5394 |
| F1 | 0.6343 |
| Youden J | 0.3548 |
Dataset state:
| Item | Value |
|---|---|
| Cases | 3,061 |
| Images / embeddings | 6,517 |
| Infectious cases | 1,187 |
| Non-infectious cases | 1,874 |
| Feature dimension | 1024 |
Running Inference Locally
From this folder:
pip install -r requirements.txt
python src/inference.py case_image_1.png case_image_2.png
The script accepts 1 to 3 images from the same patient case.
If TIPSv2 is already cached locally:
python src/inference.py case_image_1.png --local-files-only
If using a vendored/local TIPSv2 folder:
python src/inference.py case_image_1.png --tipsv2-model /path/to/google/tipsv2-l14/snapshot
Output example:
{
"prediction": "Infectious",
"p_infectious": 0.47,
"threshold": 0.35,
"image_count": 2,
"rule": "Infectious if P(Infectious) >= threshold else Non Infectious"
}
Docker Usage
Build:
docker build -t dermolens-tipsv2-mil .
Run:
docker run --rm -v "$PWD/examples:/data" dermolens-tipsv2-mil /data/case_image_1.png /data/case_image_2.png
The default Dockerfile does not bake the 1.8 GB TIPSv2 weights into the image. This keeps the image smaller and lets the runtime download or mount the Hugging Face cache.
For a self-contained container, uncomment this line in the Dockerfile:
# RUN python scripts/download_tipsv2.py
That will pre-cache google/tipsv2-l14 inside the image.
Hugging Face Push Strategy
Recommended setup:
- Push this
deploy-hf/folder as the DermoLens model repository. - Reference
google/tipsv2-l14as the upstream feature extractor instead of duplicating the full TIPSv2 weights. - Include
src/inference.pyas the canonical end-to-end raw-image inference code. - Put the raw image dataset in a separate Hugging Face dataset repository.
- Keep case-level metadata in the dataset repository so MIL grouping is preserved.
This is better than copying TIPSv2 weights into our repo because:
- TIPSv2 is already a Hugging Face model with its own versioning.
- The real weights are about 1.8 GB.
- Duplicating them creates storage, sync, and licensing ambiguity.
- The deployment container can still be fully self-contained by pre-caching TIPSv2 at Docker build time.
Should We Convert Models?
Current recommendation: do not convert yet.
Reason:
- TIPSv2 uses PyTorch / Hugging Face remote code.
- The MIL head is small and already saved as TensorFlow/Keras.
- Mixed-framework inference is acceptable inside Docker.
- Conversion adds risk unless we have a specific deployment target that requires ONNX/TFLite/TensorRT.
Future conversion options:
| Option | When useful | Risk |
|---|---|---|
| Convert MIL Keras model to ONNX | If we want one ONNX runtime for the MIL head | Low to moderate |
| Convert TIPSv2 to ONNX | If deploying to a strict ONNX/TensorRT environment | Higher, because custom remote code and image encoder outputs must be validated |
| Retrain/rebuild MIL head in PyTorch | If we want a single PyTorch-only pipeline | Moderate, requires reproducing Training-C weights or retraining |
| Keep mixed PyTorch + TensorFlow | Best current path for Hugging Face/Cloud Run/GCE | Larger dependency footprint |
For Hugging Face, GCloud, Firebase-backed services, or generic Docker deployment, the current mixed-framework package is the pragmatic choice.
Deployment Interpretation
This is a research production candidate, not a standalone clinical diagnostic device. It is suitable for controlled research inference, screening-threshold experiments, and deployment engineering validation.
- Downloads last month
- -
Model tree for HawkFranklin-Research/PelliScope
Base model
google/tipsv2-l14