AI & ML interests
PII detection, data anonymization, privacy-preserving AI, LLM security, information extraction, multilingual NLP
Recent Activity
Organization Card
DomainShield
Privacy protection for LLM pipelines
DomainShield is a research project focused on preventing sensitive data leakage when using external large language model APIs.
Overview
The system acts as a middleware firewall:
- Masks sensitive information before sending data to external LLMs
- Handles both general PII and domain-specific sensitive entities
- Reconstructs the original content after receiving the response
Key Focus
- PII masking (names, emails, identifiers)
- Domain-specific entity protection (internal terms, codes, private vocabularies)
- Multilingual robustness under noisy conditions
- Comparison of adaptation methods (prompting, RAG, fine-tuning, NER)
Approach
We evaluate multiple strategies for detecting and masking sensitive data:
- Prompt-based methods
- Retrieval-augmented approaches (RAG)
- Supervised fine-tuning (LoRA)
- Token classification (NER)
- Hybrid and ensemble methods
Status
Active research project. Models, benchmarks, and demos coming soon.
models 0
None public yet
datasets 0
None public yet