You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Request access for research or deployment evaluation. Please share a short justification for why you need the DermoLens model.

DermoLens TIPSv2 + MIL Infectious Screening Deployment Package

This folder packages the latest Training-C production candidate for Hugging Face or container-based deployment.

The realistic deployment model is a two-tier pipeline:

Raw dermatology images are passed through google/tipsv2-l14.
The resulting TIPSv2 CLS embeddings are passed into the DermoLens MIL classifier.
The MIL probability is converted to a final class using the production threshold.

Caution

This package is TIPSv2-only. Do not use Derm Foundation embeddings, Derm Foundation .npz archives, or 6144-d feature files.

Access Requests

This repository is intended to be published as a gated Hugging Face model card.

By default, Hugging Face already collects the requester email and username for gated models. The extra fields above add:

a short free-text justification
intended use
a research-only acknowledgment checkbox

If the repository remains private, the request form will not be visible. To use the request workflow, the model should be published as a public gated repo.

What Is Included

deploy-hf/
  README.md
  README.production-bundle.md
  Dockerfile
  requirements.txt
  deployment_config.json
  model/
    binary_tipsv2_screening_model.keras
  metadata/
    thresholds.json
    production_config.json
    production_validation_metrics.json
    production_training_history.csv
    production_validation_predictions.npz
    revised_binary_label_summary.json
    best_hyperparameters.json
  figures/
    production_learning_curves.png
  src/
    inference.py
    tipsv2_common_training_reference.py
  scripts/
    download_tipsv2.py
  tipsv2-local-reference/
    configuration and remote-code reference files from the local TIPSv2 checkout

Model Formats

There are two model components:

Component	Model	Framework / Format	Role
Feature extractor	`google/tipsv2-l14`	PyTorch via Hugging Face Transformers remote code, `safetensors` weights	Converts raw images to 1024-d CLS embeddings
MIL classifier	`binary_tipsv2_screening_model.keras`	TensorFlow / Keras `.keras`	Converts a `(3, 1024)` casebag to `P(Infectious)`

The current system is therefore mixed-framework: PyTorch for TIPSv2 and TensorFlow/Keras for the MIL head.

Comprehensive Benchmarks

The full benchmark ledger is also copied into this package as FINAL_BENCHMARKS.md. The same content is mirrored below so the Hugging Face model card is self-contained.

Binary Screening

Evaluation performed on the full 3,061 case validation set.
Operating Threshold: P(Infectious) >= 0.35

Model Architecture / Format	Model Size	AUC	Accuracy	Precision	Recall (Sensitivity)	F1 Score	Notes
Original Keras (Training-C)	1.15 GB+	0.755784	0.661875	0.544653	0.780960	0.641745	The original fragmented FP32 pipeline.
PyTorch Unified (FP32)	1,865 MB	0.755784	0.661875	0.544653	0.780960	0.641745	The final production monolith. Mathematically identical to Keras.
PyTorch Unified (FP16)	932 MB	0.755789	0.661875	0.544653	0.780960	0.641745	Halves RAM usage with essentially no accuracy loss.
LiteRT Edge (FP32)	1,163 MB	0.755784	0.661875	0.544653	0.780960	0.641745	Mathematically identical to PyTorch FP32.
LiteRT Edge (INT8 PTQ)	297 MB	0.736973	0.669716	0.561798	0.673968	0.612792	The quantization tradeoff. Lower sensitivity.

10-Disease Classification

Evaluation performed on the preliminary 2,336 case dataset.
Representative class-level agreement is shown below; equivalence holds across the 10 classes.

Class 0 (Eczema) - Threshold: 0.4747

Model Architecture / Format	AUC	Accuracy	Precision	Recall	F1 Score
Original Keras (Training-A)	0.739529	0.656678	0.598756	0.729167	0.657558
PyTorch Unified (FP32)	0.739529	0.656678	0.598756	0.729167	0.657558
PyTorch Unified (FP16)	0.739538	0.656678	0.598756	0.729167	0.657558

Class 1 (Allergic Contact Dermatitis) - Threshold: 0.3838

Model Architecture / Format	AUC	Accuracy	Precision	Recall	F1 Score
Original Keras (Training-A)	0.739767	0.684932	0.572334	0.620848	0.595604
PyTorch Unified (FP32)	0.739767	0.684932	0.572334	0.620848	0.595604
PyTorch Unified (FP16)	0.739774	0.685360	0.572785	0.621993	0.596376

Technical Conclusions

Mathematical Equivalence: The manual port of the complex gated attention pooling and global average pooling layers from Keras to PyTorch is numerically aligned for the supported benchmark runs.
The Power of FP16: Converting the PyTorch unified engine to FP16 reduces the Docker container memory footprint while preserving the clinical ROC-AUC and sensitivity from the FP32 runs.
LiteRT Limitations: The LiteRT FP32 export is mathematically sound, but FP16 conversion for large vision transformers can fail in the Google AI Edge toolchain. INT8 PTQ succeeds but reduces clinical sensitivity.

Final Deployment Target: unified_engine_fp16_weights.pt running on CPU via FastAPI.

Production Decision Rule

The MIL model outputs one probability:

P(Infectious)

The production threshold is:

0.35

Final classification:

if p_infectious >= 0.35:
    prediction = "Infectious"
else:
    prediction = "Non Infectious"

Do not silently use 0.5 for production inference.

Input Contract

Input to the full deployment pipeline:

1 to 3 RGB images from the same patient case

Input to the MIL classifier after TIPSv2:

casebag.shape == (3, 1024)

Rules:

Each submitted image is converted to RGB.
Each image is resized to 448 x 448, matching the Training-C extraction process.
Each image is passed through google/tipsv2-l14 using model.encode_image(pixel_values).
Each row is the final-layer TIPSv2 CLS token: out.cls_token[0, 0].
Each CLS embedding must be 1024-d.
Cases with fewer than 3 images are automatically zero-padded to 3 MIL slots.
Do not mix images from different patient cases.
Do not flatten this into image-level classification unless explicitly doing a different experiment.

Exact Casebag Behavior

The MIL model always receives exactly 3 slots:

(3, 1024)

If 1 image is submitted:

slot 1 = TIPSv2(image_1)
slot 2 = zeros(1024)
slot 3 = zeros(1024)

If 2 images are submitted:

slot 1 = TIPSv2(image_1)
slot 2 = TIPSv2(image_2)
slot 3 = zeros(1024)

If 3 images are submitted:

slot 1 = TIPSv2(image_1)
slot 2 = TIPSv2(image_2)
slot 3 = TIPSv2(image_3)

The padding is handled automatically by src/inference.py. The submitted image order is preserved. If a case has more than 3 images, do not pass all images blindly; select/split intentionally because this model was trained with at most 3 images per case.

Encoding Contract From Training-C

The deployment encoder must match Training-C:

image = Image.open(image_path).convert("RGB")
pixel_values = Resize((448, 448))(image)
pixel_values = ToTensor()(pixel_values).unsqueeze(0)
out = tipsv2_model.encode_image(pixel_values)
embedding = out.cls_token[0, 0].float().cpu().numpy()

This is the same logic used during APR26/data_extraction/extract_all_cases_tipsv2.py.

Do not use:

patch-token averages,
register tokens,
normalized text/image similarity vectors,
Derm Foundation embeddings,
image-level logits from another model.

Production Metrics

Training-C production validation metrics at threshold 0.35:

Metric	Value
ROC AUC	0.7194
PR-AUC	0.5868
Sensitivity / Recall	0.7697
Specificity	0.5851
Accuracy	0.6565
Precision	0.5394
F1	0.6343
Youden J	0.3548

Dataset state:

Item	Value
Cases	3,061
Images / embeddings	6,517
Infectious cases	1,187
Non-infectious cases	1,874
Feature dimension	1024

Running Inference Locally

From this folder:

pip install -r requirements.txt
python src/inference.py case_image_1.png case_image_2.png

The script accepts 1 to 3 images from the same patient case.

If TIPSv2 is already cached locally:

python src/inference.py case_image_1.png --local-files-only

If using a vendored/local TIPSv2 folder:

python src/inference.py case_image_1.png --tipsv2-model /path/to/google/tipsv2-l14/snapshot

Output example:

{
  "prediction": "Infectious",
  "p_infectious": 0.47,
  "threshold": 0.35,
  "image_count": 2,
  "rule": "Infectious if P(Infectious) >= threshold else Non Infectious"
}

Docker Usage

Build:

docker build -t dermolens-tipsv2-mil .

Run:

docker run --rm -v "$PWD/examples:/data" dermolens-tipsv2-mil /data/case_image_1.png /data/case_image_2.png

The default Dockerfile does not bake the 1.8 GB TIPSv2 weights into the image. This keeps the image smaller and lets the runtime download or mount the Hugging Face cache.

For a self-contained container, uncomment this line in the Dockerfile:

# RUN python scripts/download_tipsv2.py

That will pre-cache google/tipsv2-l14 inside the image.

Hugging Face Push Strategy

Recommended setup:

Push this deploy-hf/ folder as the DermoLens model repository.
Reference google/tipsv2-l14 as the upstream feature extractor instead of duplicating the full TIPSv2 weights.
Include src/inference.py as the canonical end-to-end raw-image inference code.
Put the raw image dataset in a separate Hugging Face dataset repository.
Keep case-level metadata in the dataset repository so MIL grouping is preserved.

This is better than copying TIPSv2 weights into our repo because:

TIPSv2 is already a Hugging Face model with its own versioning.
The real weights are about 1.8 GB.
Duplicating them creates storage, sync, and licensing ambiguity.
The deployment container can still be fully self-contained by pre-caching TIPSv2 at Docker build time.

Should We Convert Models?

Current recommendation: do not convert yet.

Reason:

TIPSv2 uses PyTorch / Hugging Face remote code.
The MIL head is small and already saved as TensorFlow/Keras.
Mixed-framework inference is acceptable inside Docker.
Conversion adds risk unless we have a specific deployment target that requires ONNX/TFLite/TensorRT.

Future conversion options:

Option	When useful	Risk
Convert MIL Keras model to ONNX	If we want one ONNX runtime for the MIL head	Low to moderate
Convert TIPSv2 to ONNX	If deploying to a strict ONNX/TensorRT environment	Higher, because custom remote code and image encoder outputs must be validated
Retrain/rebuild MIL head in PyTorch	If we want a single PyTorch-only pipeline	Moderate, requires reproducing Training-C weights or retraining
Keep mixed PyTorch + TensorFlow	Best current path for Hugging Face/Cloud Run/GCE	Larger dependency footprint

For Hugging Face, GCloud, Firebase-backed services, or generic Docker deployment, the current mixed-framework package is the pragmatic choice.

Deployment Interpretation

This is a research production candidate, not a standalone clinical diagnostic device. It is suitable for controlled research inference, screening-threshold experiments, and deployment engineering validation.

Downloads last month: -

Model tree for HawkFranklin-Research/PelliScope

Base model

google/tipsv2-l14

Finetuned

(1)

this model

HawkFranklin-Research
/

PelliScope