LocalLaws/LOCUS-Substantive

A ModernBERT classifier for the Substantive (binary) axis of the LOCUS (Local Ordinances Corpus, United States) dataset.

Fine-tuned from answerdotai/ModernBERT-base on LocalLaws/LOCUS-v1.0.

Labels

not_substantive
substantive

Training


Base model	`answerdotai/ModernBERT-base`
Max length	1024
Classifier pooling	`mean`
Train / val / test	79106 / 10447 / 10447

Evaluation


Metric	binary-F1
Validation binary-F1	0.9402
Test binary-F1	0.9422
Test accuracy	0.9328

              precision    recall  f1-score   support

           0     0.9517    0.8898    0.9197      4519
           1     0.9200    0.9656    0.9422      5928

    accuracy                         0.9328     10447
   macro avg     0.9358    0.9277    0.9310     10447
weighted avg     0.9337    0.9328    0.9325     10447

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tok = AutoTokenizer.from_pretrained("LocalLaws/LOCUS-Substantive")
model = AutoModelForSequenceClassification.from_pretrained("LocalLaws/LOCUS-Substantive")
model.eval()

text = "No person shall keep any swine within the city limits."
enc = tok(text, return_tensors="pt", truncation=True, max_length=1024)
with torch.no_grad():
    logits = model(**enc).logits
pred = logits.argmax(-1).item()
print(model.config.id2label[pred])