On Eliciting Syntax from Language Models via Hashing

Model Details

This repository contains the implementation of Parserker v2, a hashing-based unsupervised parser trained on the Penn Treebank dataset using only the raw text (no syntactic annotations or tree labels).

Usage

Requirements

pip install transformers torch nltk torchrua

Demo

from nltk import TreePrettyPrinter
from transformers import AutoModel, AutoTokenizer

model = AutoModel.from_pretrained("yehzw/parserker", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("yehzw/parserker", trust_remote_code=True)

model.eval()

words, input_ids, duration = tokenizer([
    "The quick brown fox jumps over the lazy dog",
    "The man who you met yesterday is my teacher",
    "The boy saw the girl with a telescope",
    "The dog on the hill barked at the man who laughed",
])

for w, s in zip(words, model.parse(input_ids, duration).tolist()):
    t = model.to_tree(w, s)
    t = TreePrettyPrinter(t).text()
    print(t)

Ouput Examples

The quick brown fox jumps over the lazy dog

                                 854D                              
             _____________________|_________                        
            |                              DF41                    
            |                      _________|____                   
           955F                   |             DD59               
  __________|_____                |     _________|____              
 |               DC45             |    |             D457          
 |      __________|____           |    |     _________|____         
 |     |              DE45        |    |    |             DECD     
 |     |           ____|____      |    |    |          ____|____    
103B  C404       DC05      D60D  9300 C995 D0B7      DC8D      DE8D
 |     |          |         |     |    |    |         |         |   
The  quick      brown      fox  jumps over the       lazy      dog

The man who you met yesterday is my teacher

                              C50D                                      
       ________________________|____________                             
      |                                    D558                         
      |                    _________________|___________                 
      |                  9718                           |               
      |          _________|____                         |                
      |         |             C718                     5D52             
      |         |     _________|____                ____|____            
     965F       |    |             DF00            |        DF5D        
  ____|____     |    |          ____|_______       |     ____|______     
103B      C60D 1799 4719      D300         47BC   7192 5895        CE0D 
 |         |    |    |         |            |      |    |           |    
The       man  who  you       met       yesterday  is   my       teacher

The boy saw the girl with a telescope

                    C14C                                   
       ______________|_________                             
      |                       DF41                         
      |          ______________|____                        
      |         |                  C54D                    
      |         |          _________|____                   
      |         |         |             9D59               
      |         |         |          ____|____              
     165F       |        D657       |        9E55          
  ____|____     |     ____|____     |     ____|_______      
103B      C20D D100 D0B7      C60D 3991 9817         CE8D  
 |         |    |    |         |    |    |            |     
The       boy  saw  the       girl with  a        telescope

The dog on the hill barked at the man who laughed

                                    C50D                                            
                 ____________________|__________                                     
                |                              DF40                                 
                |                     __________|_________                           
               C54D                  |                   DD19                       
       _________|____                |      ______________|____                      
      |             9D09             |     |                  C5CD                  
      |          ____|____           |     |          _________|_________            
     965F       |        D65F        |     |        D657                97D8        
  ____|____     |     ____|____      |     |     ____|____           ____|______     
103B      C20D 2F99 D0B3      C60D  D300  6395 D0B7      C60D      1799        CF89 
 |         |    |    |         |     |     |    |         |         |           |    
The       dog   on  the       hill barked  at  the       man       who       laughed

Citation

@inproceedings{wang-utiyama-2024-eliciting,
    title = "On Eliciting Syntax from Language Models via Hashing",
    author = "Wang, Yiran  and
      Utiyama, Masao",
    editor = "Al-Onaizan, Yaser  and
      Bansal, Mohit  and
      Chen, Yun-Nung",
    booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2024",
    address = "Miami, Florida, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.emnlp-main.479/",
    doi = "10.18653/v1/2024.emnlp-main.479",
    pages = "8412--8427"
}
Downloads last month
204
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for yehzw/parserker

Finetuned
(2263)
this model