sparknlp.logging.comet
#
Package that contains classes for integration with Comet.
Module Contents#
Classes#
Logger class for Comet integration |
- class CometLogger(workspace=None, project_name=None, comet_mode=None, experiment_id=None, tags=None, **experiment_kwargs)[source]#
Logger class for Comet integration
Comet is a meta machine learning platform designed to help AI practitioners and teams build reliable machine learning models for real-world applications by streamlining the machine learning model lifecycle. By leveraging Comet, users can track, compare, explain and reproduce their machine learning experiments.
To log a Spark NLP annotator, it will need an “outputLogPath” parameter, as the CometLogger reads the log file generated during the training process.
For more examples see the Examples.
- Parameters:
- workspacestr, optional
Name of the workspace in Comet, by default None
- project_namestr, optional
Name of the project in Comet, by default None
- comet_modestr, optional
Mode of logging, by default None. If set to “offline” then offline mode will be used, otherwise online.
- experiment_idstr, optional
Id of the experiment, if it is reused, by default None
- tagsList[str], optional
List of tags for the experiment, by default None
- Raises:
- ImportError
If the package comet-ml is not installed
Examples
Metrics while training an annotator can be logged with for example:
>>> import sparknlp >>> from sparknlp.base import * >>> from sparknlp.annotator import * >>> from sparknlp.logging.comet import CometLogger >>> spark = sparknlp.start()
To run an online experiment, the logger is defined like so.
>>> OUTPUT_LOG_PATH = "./run" >>> logger = CometLogger()
Then the experiment can start like so
>>> document = DocumentAssembler() \ ... .setInputCol("text")\ ... .setOutputCol("document") >>> embds = UniversalSentenceEncoder.pretrained() \ ... .setInputCols("document") \ ... .setOutputCol("sentence_embeddings") >>> multiClassifier = MultiClassifierDLApproach() \ ... .setInputCols("sentence_embeddings") \ ... .setOutputCol("category") \ ... .setLabelColumn("labels") \ ... .setBatchSize(128) \ ... .setLr(1e-3) \ ... .setThreshold(0.5) \ ... .setShufflePerEpoch(False) \ ... .setEnableOutputLogs(True) \ ... .setOutputLogsPath(OUTPUT_LOG_PATH) \ ... .setMaxEpochs(1) >>> logger.monitor(logdir=OUTPUT_LOG_PATH, model=multiClassifier) >>> trainDataset = spark.createDataFrame( ... [("Nice.", ["positive"]), ("That's bad.", ["negative"])], ... schema=["text", "labels"], ... ) >>> pipeline = Pipeline(stages=[document, embds, multiClassifier]) >>> pipeline.fit(trainDataset) >>> logger.end()
If you are using a jupyter notebook, it is possible to display the live web interface with
>>> logger.experiment.display(tab='charts')
- Attributes:
- experimentcomet_ml.Experiment
Object representing the Comet experiment
- log_pipeline_parameters(pipeline, stages=None)[source]#
Iterates over the different stages in a pyspark PipelineModel object and logs the parameters to Comet.
- Parameters:
- pipelinepyspark.ml.PipelineModel
PipelineModel object
- stagesList[str], optional
Names of the stages of the pipeline to include, by default None (logs all)
Examples
The pipeline model contains the annotators of Spark NLP, that were fitted to a dataframe.
>>> logger.log_pipeline_parameters(pipeline_model)
- log_visualization(html, name='viz.html')[source]#
Uploads a NER visualization from Spark NLP Display to comet.
- Parameters:
- htmlstr
HTML of the spark NLP Display visualization
- namestr, optional
Name for the visualization in comet, by default “viz.html”
Examples
This example has NER chunks (NER extracted by e.g.
NerDLModel
and converted by aNerConverter
) extracted in the colum “ner_chunk”.>>> from sparknlp_display import NerVisualizer >>> logger = CometLogger() >>> for idx, result in enumerate(results.collect()): ... viz = NerVisualizer().display( ... result=result, ... label_col='ner_chunk', ... document_col='document', ... return_html=True ... ) ... logger.log_visualization(viz, name=f'viz-{idx}.html')
- log_metrics(metrics, step=None, epoch=None, prefix=None)[source]#
Submits logs of an evaluation metrics.
- Parameters:
- metricsdict
Dictionary with key value pairs corresponding to the measured metric and its value
- stepint, optional
Used to associate a specific step, by default None
- epochint, optional
Used to associate a specific epoch, by default None
- prefixstr, optional
Name prefix for this metric, by default None. This can be used to identify for example different features by name.
Examples
In this example, sklearn is used to retrieve the metrics.
>>> from sklearn.preprocessing import MultiLabelBinarizer >>> from sklearn.metrics import classification_report >>> prediction = model.transform(testDataset) >>> preds_df = prediction.select('labels', 'category.result').toPandas()
>>> mlb = MultiLabelBinarizer() >>> y_true = mlb.fit_transform(preds_df['labels']) >>> y_pred = mlb.fit_transform(preds_df['result']) >>> report = classification_report(y_true, y_pred, output_dict=True)
Iterate over the report and log the metrics:
>>> for key, value in report.items(): ... logger.log_metrics(value, prefix=key) >>> logger.end()
If you are using Spark NLP in a notebook, then you can display the metrics directly with
>>> logger.experiment.display(tab='metrics')
- log_parameters(parameters, step=None)[source]#
Logs a dictionary (or dictionary-like object) of multiple parameters.
- Parameters:
- parametersdict
Parameters in a key : value form
- stepint, optional
Used to associate a specific step, by default None, by default None
- log_completed_run(log_file_path)[source]#
Submit logs of training metrics after a run has completed.
- Parameters:
- log_file_pathstr
Path to log file containing training metrics
- log_asset(asset_path, metadata=None, step=None)[source]#
Uploads an asset to comet.
- Parameters:
- asset_pathstr
Path to the asset
- metadatastr, optional
Some additional data to attach to the the audio asset. Must be a JSON-encodable dict, by default None
- stepint, optional
Used to associate a specific step, by default None, by default None
- log_asset_data(asset, name, overwrite=False, metadata=None, step=None)[source]#
Uploads the data given to comet (str, binary, or JSON).
- Parameters:
- assetstr or bytes or dict
Data to be saved as asset
- namestr
A custom file name to be displayed
- overwritebool, optional
If True will overwrite all existing assets with the same name, by default False
- metadatadict, optional
Some additional data to attach to the the asset data. Must be a JSON-encodable dict, by default None
- stepint, optional
Used to associate a specific step, by default None, by default None
- monitor(logdir, model, interval=10)[source]#
Monitors the training of the model and submits logs to comet, given by an interval.
To log a Spark NLP annotator, it will need an “outputLogPath” parameter, as the CometLogger reads the log file generated during the training process.
If you are not able to monitor the live training, you can still log the training at the end with
log_completed_run()
.- Parameters:
- logdirstr
Path to the output of the logs
- modelAnnotatorApproach
Annotator to monitor
- intervalint, optional
Interval for refreshing, by default 10