Legal Longformer (Base, 4096)

Description

Longformer is a transformer model for long documents.

legal_longformer_base is a BERT-like model started from the RoBERTa checkpoint and pretrained for MLM on long documents. It supports sequences of length up to 4,096 and it’s specifically trained on legal documents

Longformer uses a combination of a sliding window (local) attention and global attention. Global attention is user-configured based on the task to allow the model to learn task-specific representations.

If you use Longformer in your research, please cite Longformer: The Long-Document Transformer.

@article{Beltagy2020Longformer,
title={Longformer: The Long-Document Transformer},
author={Iz Beltagy and Matthew E. Peters and Arman Cohan},
journal={arXiv:2004.05150},
year={2020},
}

Longformer is an open-source project developed by the Allen Institute for Artificial Intelligence (AI2). AI2 is a non-profit institute with the mission to contribute to humanity through high-impact AI research and engineering.

Predicted Entities

Download Copy S3 URI

How to use

embeddings = LongformerEmbeddings\
  .pretrained("legal_longformer_base", "en")\
  .setInputCols(["document", "token"])\
  .setOutputCol("embeddings")\
  .setCaseSensitive(True)\
  .setMaxSentenceLength(4096)
val embeddings = LongformerEmbeddings.pretrained("legal_longformer_base", "en")
  .setInputCols("document", "token") 
  .setOutputCol("embeddings")
  .setCaseSensitive(true)
  .setMaxSentenceLength(4096)
import nlu
nlu.load("en.embed.longformer.base_legal").predict("""Put your text here.""")

Model Information

Model Name: legal_longformer_base
Compatibility: Spark NLP 4.2.1+
License: Open Source
Edition: Official
Input Labels: [sentence, token]
Output Labels: [embeddings]
Language: en
Size: 531.1 MB
Case sensitive: true
Max sentence length: 4096

References

https://huggingface.co/saibo/legal-longformer-base-4096