On Eliciting Syntax from Language Models via Hashing
Model Details
This repository contains the implementation of Parserker v2, a hashing-based unsupervised parser trained on the Penn Treebank dataset using only the raw text (no syntactic annotations or tree labels).
Usage
Requirements
pip install transformers torch nltk torchrua
Demo
from nltk import TreePrettyPrinter
from transformers import AutoModel, AutoTokenizer
model = AutoModel.from_pretrained("yehzw/parserker", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("yehzw/parserker", trust_remote_code=True)
model.eval()
words, input_ids, duration = tokenizer([
"The quick brown fox jumps over the lazy dog",
"The man who you met yesterday is my teacher",
"The boy saw the girl with a telescope",
"The dog on the hill barked at the man who laughed",
])
for w, s in zip(words, model.parse(input_ids, duration).tolist()):
t = model.to_tree(w, s)
t = TreePrettyPrinter(t).text()
print(t)
Ouput Examples
The quick brown fox jumps over the lazy dog
854D
_____________________|_________
| DF41
| _________|____
955F | DD59
__________|_____ | _________|____
| DC45 | | D457
| __________|____ | | _________|____
| | DE45 | | | DECD
| | ____|____ | | | ____|____
103B C404 DC05 D60D 9300 C995 D0B7 DC8D DE8D
| | | | | | | | |
The quick brown fox jumps over the lazy dog
The man who you met yesterday is my teacher
C50D
________________________|____________
| D558
| _________________|___________
| 9718 |
| _________|____ |
| | C718 5D52
| | _________|____ ____|____
965F | | DF00 | DF5D
____|____ | | ____|_______ | ____|______
103B C60D 1799 4719 D300 47BC 7192 5895 CE0D
| | | | | | | | |
The man who you met yesterday is my teacher
The boy saw the girl with a telescope
C14C
______________|_________
| DF41
| ______________|____
| | C54D
| | _________|____
| | | 9D59
| | | ____|____
165F | D657 | 9E55
____|____ | ____|____ | ____|_______
103B C20D D100 D0B7 C60D 3991 9817 CE8D
| | | | | | | |
The boy saw the girl with a telescope
The dog on the hill barked at the man who laughed
C50D
____________________|__________
| DF40
| __________|_________
C54D | DD19
_________|____ | ______________|____
| 9D09 | | C5CD
| ____|____ | | _________|_________
965F | D65F | | D657 97D8
____|____ | ____|____ | | ____|____ ____|______
103B C20D 2F99 D0B3 C60D D300 6395 D0B7 C60D 1799 CF89
| | | | | | | | | | |
The dog on the hill barked at the man who laughed
Citation
@inproceedings{wang-utiyama-2024-eliciting,
title = "On Eliciting Syntax from Language Models via Hashing",
author = "Wang, Yiran and
Utiyama, Masao",
editor = "Al-Onaizan, Yaser and
Bansal, Mohit and
Chen, Yun-Nung",
booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing",
month = nov,
year = "2024",
address = "Miami, Florida, USA",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.emnlp-main.479/",
doi = "10.18653/v1/2024.emnlp-main.479",
pages = "8412--8427"
}
- Downloads last month
- 204
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for yehzw/parserker
Base model
FacebookAI/roberta-base