Spark NLP is the central hub for all your State of the Art Natural Language Processing needs. Whether you’re looking for demos, use cases, models, or datasets, you’ll find the resources you need to begin any NLP task right here!
Natural Language Processing
Text Classification
Text classification is the process of automatically categorizing text into predefined labels or categories based on its content.
Token Classification
Token classification is the process of assigning labels to individual tokens (words or subwords) in a text, commonly used for tasks like named entity recognition or part-of-speech tagging.
Zero-Shot Classification
Zero-shot classification is the process of categorizing text into labels without the model having seen any examples of those labels during training, using general knowledge and context.
Text Generation
Text generation is the process of automatically creating coherent and contextually relevant text based on a given input or prompt using machine learning models.
Question Answering
Question answering models can retrieve answers from a given text, making them useful for searching documents. Some models can even generate answers independently, without needing any context!
Table Question Answering
Table question answering models can extract answers from structured data in tables, making it easy to query and retrieve specific information.
Summarization
Summarization models condense long texts into shorter versions, capturing the main ideas and key points while maintaining the overall meaning of the original content.
Translation
Translation models automatically convert text from one language to another while preserving the meaning and context of the original content.
Text Preprocessing
Text Preprocessing is the task of cleaning and transforming raw text into a format suitable for NLP tasks. This includes steps like tokenization, lowercasing, removing stop words, and stemming or lemmatization to prepare text for analysis.
Dependency Parsing
Dependency Parsing is a syntactic analysis method that examines the grammatical structure of a sentence by identifying the dependencies between its words. It illustrates how words relate to each other through a dependency tree or graph, where some words act as “parents” and others as “children.”
Computer Vision
Image Classification
Image classification models automatically categorize images into predefined labels or classes based on their visual content.
Image Captioning
Image captioning models generate descriptive text for images, providing context and details about the visual content they depict.
Zero-Shot Image Classification
Zero-shot image classification is the process of categorizing images into labels without the model having seen any examples of those labels during training, using general knowledge and context.
Audio
Automatic Speech Recognition
Automatic speech recognition (ASR) is the process of converting spoken language into written text.