Precisions about the config properties wrt the paper

by TomSchelsen - opened Dec 20, 2024

Dec 20, 2024

In https://huggingface.co/answerdotai/ModernBERT-base/blob/main/config.json , we see "hidden_activation": "gelu" and "position_embedding_type": "absolute" (even though rope related settings do appear in the config as well), whereas the paper says that GeGLU and RoPE are used respectively. Is it expected and a strangeness coming from the transformers library itself or is it a misconfig/export ? Thanks

bwarner

Dec 21, 2024

As we mention in the paper, GeGLU is GLU with GeLU instead of sigmoid. "hidden_activation": "gelu" is correct.

We adopt GeGLU (Shazeer, 2020), a Gated-Linear Units (GLU)-based (Dauphin et al., 2017) activation function built on top of the original BERT’s GeLU.

I believe position_embedding_type is a default config argument in transformers. ModernBERT doesn't use it, I'll have to check if we can remove it from the config.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment