NER DL uses Char CNNs - BiLSTM - CRF Neural Network architecture. Spark NLP defines this architecture through a Tensorflow graph, which requires the following parameters:
- Tags
- Embeddings Dimension
- Number of Chars
Spark NLP infers these values from the training dataset used in NerDLApproach annotator and tries to load the graph embedded on spark-nlp package. Currently, Spark NLP has graphs for the most common combination of tags, embeddings, and number of chars values:
Tags | Embeddings Dimension |
---|---|
10 | 100 |
10 | 200 |
10 | 300 |
10 | 768 |
10 | 1024 |
25 | 300 |
All of these graphs use an LSTM of size 128 and number of chars 100
In case, your train dataset has a different number of tags, embeddings dimension, number of chars and LSTM size combinations shown in the table above, NerDLApproach
will raise an IllegalArgumentException exception during runtime with the message below:
Graph [parameter] should be [value]: Could not find a suitable tensorflow graph for embeddings dim: [value] tags: [value] nChars: [value]. Check https://sparknlp.org/docs/en/graph for instructions to generate the required graph.
To overcome this exception message we have to follow these steps:
-
Clone spark-nlp github repo
-
Run python file
create_models
with number of tags, embeddings dimension and number of char values mentioned on your exception message error.cd spark-nlp/python/tensorflow export PYTHONPATH=lib/ner python ner/create_models.py [number_of_tags] [embeddings_dimension] [number_of_chars] [output_path]
-
This will generate a graph on the directory defined on `output_path argument.
-
Retry training with
NerDLApproach
annotator but this time use the parametersetGraphFolder
with the path of your graph.
Note: Make sure that you have Python 3 and Tensorflow 1.15.0 installed on your system since create_models
requires those versions to generate the graph successfully.
Note: We also have a notebook in the same directory if you prefer Jupyter notebook to cerate your custom graph (create_models.ipynb
).