package btm
- Alphabetic
- Public
- All
Type Members
-
class
BigTextMatcher extends AnnotatorApproach[BigTextMatcherModel] with HasStorage
Annotator to match exact phrases (by token) provided in a file against a Document.
Annotator to match exact phrases (by token) provided in a file against a Document.
A text file of predefined phrases must be provided with
setStoragePath
. The text file can als be set directly as an ExternalResource.In contrast to the normal
TextMatcher
, theBigTextMatcher
is designed for large corpora.For extended examples of usage, see the BigTextMatcherTestSpec.
Example
In this example, the entities file is of the form
... dolore magna aliqua lorem ipsum dolor. sit laborum ...
where each line represents an entity phrase to be extracted.
import spark.implicits._ import com.johnsnowlabs.nlp.DocumentAssembler import com.johnsnowlabs.nlp.annotator.Tokenizer import com.johnsnowlabs.nlp.annotator.BigTextMatcher import com.johnsnowlabs.nlp.util.io.ReadAs import org.apache.spark.ml.Pipeline val documentAssembler = new DocumentAssembler() .setInputCol("text") .setOutputCol("document") val tokenizer = new Tokenizer() .setInputCols("document") .setOutputCol("token") val data = Seq("Hello dolore magna aliqua. Lorem ipsum dolor. sit in laborum").toDF("text") val entityExtractor = new BigTextMatcher() .setInputCols("document", "token") .setStoragePath("src/test/resources/entity-extractor/test-phrases.txt", ReadAs.TEXT) .setOutputCol("entity") .setCaseSensitive(false) val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, entityExtractor)) val results = pipeline.fit(data).transform(data) results.selectExpr("explode(entity)").show(false) +--------------------------------------------------------------------+ |col | +--------------------------------------------------------------------+ |[chunk, 6, 24, dolore magna aliqua, [sentence -> 0, chunk -> 0], []]| |[chunk, 53, 59, laborum, [sentence -> 0, chunk -> 1], []] | +--------------------------------------------------------------------+
-
class
BigTextMatcherModel extends AnnotatorModel[BigTextMatcherModel] with HasSimpleAnnotate[BigTextMatcherModel] with HasStorageModel
Instantiated model of the BigTextMatcher.
Instantiated model of the BigTextMatcher. For usage and examples see the documentation of the main class.
- trait ReadablePretrainedBigTextMatcher extends StorageReadable[BigTextMatcherModel] with HasPretrained[BigTextMatcherModel]
- class TMEdgesReadWriter extends TMEdgesReader with StorageReadWriter[Int]
- class TMEdgesReader extends StorageReader[Int]
- class TMNodesReader extends StorageReader[TrieNode]
- class TMNodesWriter extends StorageBatchWriter[TrieNode]
- class TMVocabReadWriter extends TMVocabReader with StorageReadWriter[Int]
- class TMVocabReader extends StorageReader[Int]
- case class TrieNode(pi: Int, isLeaf: Boolean, length: Int, lastLeaf: Int) extends Product with Serializable
Value Members
-
object
BigTextMatcher extends DefaultParamsReadable[BigTextMatcher] with Serializable
This is the companion object of BigTextMatcher.
This is the companion object of BigTextMatcher. Please refer to that class for the documentation.
-
object
BigTextMatcherModel extends ReadablePretrainedBigTextMatcher with Serializable
This is the companion object of BigTextMatcherModel.
This is the companion object of BigTextMatcherModel. Please refer to that class for the documentation.