sparknlp.annotator.seq2seq.llama3_transformer
#
Contains classes for the LLAMA3Transformer.
Module Contents#
Classes#
Llama 3: Cutting-Edge Foundation and Fine-Tuned Chat Models |
- class LLAMA3Transformer(classname='com.johnsnowlabs.nlp.annotators.seq2seq.LLAMA3Transformer', java_model=None)[source]#
Llama 3: Cutting-Edge Foundation and Fine-Tuned Chat Models
The Llama 3 release introduces a new family of pretrained and fine-tuned LLMs, ranging in scale from 8B and 70B parameters. Llama 3 models are designed with enhanced efficiency, performance, and safety, making them more capable than previous versions. These models are trained on a more diverse and expansive dataset, offering improved contextual understanding and generation quality.
The fine-tuned models, known as Llama 3-instruct, are optimized for dialogue applications using an advanced version of Reinforcement Learning from Human Feedback (RLHF). Llama 3-instruct models demonstrate superior performance across multiple benchmarks, outperforming Llama 2 and even matching or exceeding the capabilities of some closed-source models.
Pretrained models can be loaded with
pretrained()
of the companion object:>>> llama3 = LLAMA3Transformer.pretrained() \ ... .setInputCols(["document"]) \ ... .setOutputCol("generation")
The default model is
"llama3-7b"
, if no name is provided. For available pretrained models please see the Models Hub.Input Annotation types
Output Annotation type
DOCUMENT
DOCUMENT
- Parameters:
- configProtoBytes
ConfigProto from tensorflow, serialized into byte array.
- minOutputLength
Minimum length of the sequence to be generated, by default 0
- maxOutputLength
Maximum length of output text, by default 60
- doSample
Whether or not to use sampling; use greedy decoding otherwise, by default False
- temperature
The value used to modulate the next token probabilities, by default 1.0
- topK
The number of highest probability vocabulary tokens to keep for top-k-filtering, by default 40
- topP
Top cumulative probability for vocabulary tokens, by default 1.0
If set to float < 1, only the most probable tokens with probabilities that add up to
topP
or higher are kept for generation.- repetitionPenalty
The parameter for repetition penalty, 1.0 means no penalty. , by default 1.0
- noRepeatNgramSize
If set to int > 0, all ngrams of that size can only occur once, by default 0
- ignoreTokenIds
A list of token ids which are ignored in the decoder’s output, by default []
Notes
This is a very computationally expensive module, especially on larger sequences. The use of an accelerator such as GPU is recommended.
References
Paper Abstract:
Llama 3 is the latest iteration of large language models from Meta, offering a range of models from 1 billion to 70 billion parameters. The fine-tuned versions, known as Llama 3-Chat, are specifically designed for dialogue applications and have been optimized using advanced techniques such as RLHF. Llama 3 models show remarkable improvements in both safety and performance, making them a leading choice in both open-source and commercial settings. Our comprehensive approach to training and fine-tuning these models is aimed at encouraging responsible AI development and fostering community collaboration.
Examples
>>> import sparknlp >>> from sparknlp.base import * >>> from sparknlp.annotator import * >>> from pyspark.ml import Pipeline >>> documentAssembler = DocumentAssembler() \ ... .setInputCol("text") \ ... .setOutputCol("documents") >>> llama3 = LLAMA3Transformer.pretrained("llama_3_7b_chat_hf_int8") \ ... .setInputCols(["documents"]) \ ... .setMaxOutputLength(60) \ ... .setOutputCol("generation") >>> pipeline = Pipeline().setStages([documentAssembler, llama3]) >>> data = spark.createDataFrame([ ... ( ... 1, ... "<|start_header_id|>system<|end_header_id|> \n"+ ... "You are a minion chatbot who always responds in minion speak! \n" + ... "<|start_header_id|>user<|end_header_id|> \n" + ... "Who are you? \n" + ... "<|start_header_id|>assistant<|end_header_id|> \n" ... ) ... ]).toDF("id", "text") >>> result = pipeline.fit(data).transform(data) >>> result.select("generation.result").show(truncate=False) +------------------------------------------------+ |result | +------------------------------------------------+ |[Oooh, me am Minion! Me help you with things! Me speak Minion language, yeah! Bana-na-na!]| +------------------------------------------------+
- setIgnoreTokenIds(value)[source]#
A list of token ids which are ignored in the decoder’s output.
- Parameters:
- valueList[int]
The words to be filtered out
- setConfigProtoBytes(b)[source]#
Sets configProto from tensorflow, serialized into byte array.
- Parameters:
- bList[int]
ConfigProto from tensorflow, serialized into byte array
- setMinOutputLength(value)[source]#
Sets minimum length of the sequence to be generated.
- Parameters:
- valueint
Minimum length of the sequence to be generated
- setMaxOutputLength(value)[source]#
Sets maximum length of output text.
- Parameters:
- valueint
Maximum length of output text
- setDoSample(value)[source]#
Sets whether or not to use sampling, use greedy decoding otherwise.
- Parameters:
- valuebool
Whether or not to use sampling; use greedy decoding otherwise
- setTemperature(value)[source]#
Sets the value used to module the next token probabilities.
- Parameters:
- valuefloat
The value used to module the next token probabilities
- setTopK(value)[source]#
Sets the number of highest probability vocabulary tokens to keep for top-k-filtering.
- Parameters:
- valueint
Number of highest probability vocabulary tokens to keep
- setTopP(value)[source]#
Sets the top cumulative probability for vocabulary tokens.
If set to float < 1, only the most probable tokens with probabilities that add up to
topP
or higher are kept for generation.- Parameters:
- valuefloat
Cumulative probability for vocabulary tokens
- setRepetitionPenalty(value)[source]#
Sets the parameter for repetition penalty. 1.0 means no penalty.
- Parameters:
- valuefloat
The repetition penalty
References
See Ctrl: A Conditional Transformer Language Model For Controllable Generation for more details.
- setNoRepeatNgramSize(value)[source]#
Sets size of n-grams that can only occur once.
If set to int > 0, all ngrams of that size can only occur once.
- Parameters:
- valueint
N-gram size can only occur once
- setBeamSize(value)[source]#
Sets the number of beams to use for beam search.
- Parameters:
- valueint
The number of beams to use for beam search
- setStopTokenIds(value)[source]#
Sets a list of token ids which are considered as stop tokens in the decoder’s output.
- Parameters:
- valueList[int]
The words to be considered as stop tokens
- static loadSavedModel(folder, spark_session, use_openvino=False)[source]#
Loads a locally saved model.
- Parameters:
- folderstr
Folder of the saved model
- spark_sessionpyspark.sql.SparkSession
The current SparkSession
- Returns:
- LLAMA3Transformer
The restored model
- static pretrained(name='llama_3_7b_chat_hf_int4', lang='en', remote_loc=None)[source]#
Downloads and loads a pretrained model.
- Parameters:
- namestr, optional
Name of the pretrained model, by default “llama_2_7b_chat_hf_int4”
- langstr, optional
Language of the pretrained model, by default “en”
- remote_locstr, optional
Optional remote address of the resource, by default None. Will use Spark NLPs repositories otherwise.
- Returns:
- LLAMA3Transformer
The restored model