Description
Longformer is a transformer model for long documents.
legal_longformer_base
is a BERT-like model started from the RoBERTa checkpoint and pretrained for MLM on long documents. It supports sequences of length up to 4,096 and it’s specifically trained on legal documents
Longformer uses a combination of a sliding window (local) attention and global attention. Global attention is user-configured based on the task to allow the model to learn task-specific representations.
If you use Longformer
in your research, please cite Longformer: The Long-Document Transformer.
@article{Beltagy2020Longformer,
title={Longformer: The Long-Document Transformer},
author={Iz Beltagy and Matthew E. Peters and Arman Cohan},
journal={arXiv:2004.05150},
year={2020},
}
Longformer
is an open-source project developed by the Allen Institute for Artificial Intelligence (AI2).
AI2 is a non-profit institute with the mission to contribute to humanity through high-impact AI research and engineering.
Predicted Entities
How to use
embeddings = LongformerEmbeddings\
.pretrained("legal_longformer_base", "en")\
.setInputCols(["document", "token"])\
.setOutputCol("embeddings")\
.setCaseSensitive(True)\
.setMaxSentenceLength(4096)
val embeddings = LongformerEmbeddings.pretrained("legal_longformer_base", "en")
.setInputCols("document", "token")
.setOutputCol("embeddings")
.setCaseSensitive(true)
.setMaxSentenceLength(4096)
import nlu
nlu.load("en.embed.longformer.base_legal").predict("""Put your text here.""")
Model Information
Model Name: | legal_longformer_base |
Compatibility: | Spark NLP 4.2.1+ |
License: | Open Source |
Edition: | Official |
Input Labels: | [sentence, token] |
Output Labels: | [embeddings] |
Language: | en |
Size: | 531.1 MB |
Case sensitive: | true |
Max sentence length: | 4096 |
References
https://huggingface.co/saibo/legal-longformer-base-4096