Library Reference¶

ML2P Core¶

ML2P core utilities.

Models¶

class ml2p.core.Model[source]¶

A holder for dataset generator, trainer and predictor.

Sub-classes should:

Set the attribute DATASET_GENERATOR to a ModelDatasetGenerator sub-class.
Set the attribute TRAINER to a ModelTrainer sub-class.
Set the attribute PREDICTOR to a ModelPredictor sub-class.

class ml2p.core.ModelTrainer(env)[source]¶

An interface that allows ml2p-docker to train models within SageMaker.

train()[source]¶

Train the model.

This method should:

Read training data (using self.env to determine where to read data from).
Train the model.
Write the model out (using self.env to determine where to write the model to).
Write out any validation or model analysis alongside the model.

class ml2p.core.ModelPredictor(env)[source]¶

An interface that allows ml2p-docker to make predictions from a model within SageMaker.

batch_invoke(data)[source]¶

Invokes the model on a batch of input data and returns the full result for each instance.

Parameters: data (dict) – The batch of input data the model is being invoked with.
Return type: list
Returns: The result as a list of dictionaries.

By default this method results a list of dictionaries containing:

metadata: The result of calling .metadata().

result: The result of calling .batch_result(data).

batch_result(data)[source]¶

Make a batch prediction given a batch of input data.

Parameters: data (dict) – The batch of input data to make a prediction from.
Return type: list
Returns: The list of predictions made for instance of the input data.

This method can be overrided for sub-classes in order to improve performance of batch predictions.

invoke(data)[source]¶

Invokes the model and returns the full result.

Parameters: data (dict) – The input data the model is being invoked with.
Return type: dict
Returns: The result as a dictionary.

By default this method results a dictionary containing:

metadata: The result of calling .metadata().

result: The result of calling .result(data).

metadata()[source]¶

Return metadata for a prediction that is about to be made.

Return type: dict
Returns: The metadata as a dictionary.

By default this method returns a dictionary containing:

model_version: The ML2P_MODEL_VERSION (str).

timestamp: The UTC POSIX timestamp in seconds (float).

record_invoke(datum, prediction)[source]¶

Store an invocation of the endpoint in the ML2P project S3 bucket.

Parameters

datum (dict) – The dictionary of input values passed when invoking the endpoint.
result (dict) – The prediction returned for datum by this predictor.

record_invoke_id(datum, prediction)[source]¶

Return an id for an invocation record.

Parameters

datum (dict) – The dictionary of input values passed when invoking the endpoint.
result (dict) – The prediction returned for datum by this predictor.

Returns dict

Returns an ordered dictionary of key-value pairs that make up the unique identifier for the invocation request.

By default this method returns a dictionary containing the following:

“ts”: an ISO8601 formatted UTC timestamp.

“uuid”: a UUID4 unique identifier.

Sub-classes may override this method to return their own identifiers, but including these default identifiers is recommended.

The name of the record in S3 is determined by combining the key value pairs with a dash (“-”) and then separating each pair with a double dash (”–“).

result(data)[source]¶

Make a prediction given the input data.

Parameters: data (dict) – The input data to make a prediction from.
Return type: dict
Returns: The prediction result as a dictionary.

setup()[source]¶

Called once before any calls to .predict(…) are made.

This method should:

Load the model (using self.env to determine where to read the model from).
Allocate any other resources needed in order to make predictions.

teardown()[source]¶

Called once after all calls to .predict(…) have ended.

This method should:

Cleanup any resources acquired in .setup().

class ml2p.core.ModelDatasetGenerator(env)[source]¶

An interface that allows ml2p-docker to generate a dataset within SageMaker.

generate()[source]¶

Generates and stores a dataset to S3.

This method should:

Read data from source (e.g. S3, Redshift, …).
Process the dataset.
Write the dataset to S3 (using self.env to determine where to write the data to).

upload_to_s3(file_path)[source]¶

Uploads the file to the S3 dataset folder

:param str file_path”: The path of the file to upload to S3.

SageMakerEnv¶

class ml2p.core.SageMakerEnv(ml_folder, environ=None)[source]¶

An interface to the SageMaker docker environment.

Attributes that are expected to be available in both training and serving environments:

env_type - Whether this is a training, serving or local environment (type: ml2p.core.SageMakerEnvType).
project - The ML2P project name (type: str).
model_cls - The fulled dotted Python name of the ml2p.core.Model class to be used for training and prediction (type: str). This may be None if the docker image itself specifies the name with ml2p-docker –model ….
s3 - The URL of the project S3 bucket (type: ml2p.core.S3URL).

Attributes that are only expected to be available while training (and that will be None when serving the model):

training_job_name - The full job name of the training job (type: str).

Attributes that are only expected to be available while serving the model (and that will be None when serving the model):

model_version - The full job name of the deployed model, or None during training (type: str).
record_invokes - Whether to store a record of each invocation of the endpoint in S3 (type: bool).

In the training environment settings are loaded from hyperparameters stored by ML2P when the training job is created.

In the serving environment settings are loaded from environment variables stored by ML2P when the model is created.

class ml2p.core.SageMakerEnvType[source]¶

The type of SageMakerEnvironment.

DATASET = 'dataset'¶

TRAIN = 'train'¶

SERVE = 'serve'¶

LOCAL = 'local'¶

LocalEnv¶

class ml2p.core.LocalEnv(ml_folder, cfg, session=None)[source]¶

An interface to a local dummy of the SageMaker environment.

Parameters

ml_folder (str) – The directory the environments files are stored in. An error is raised if this directory does not exist. Files and folders are created within this directory as needed.
cfg (str) – The path to an ml2p.yml configuration file.
session (boto3.session.Session) – A boto3 session object. Maybe be None if downloading files from S3 is not required.

Attributes that are expected to be available in the local environment:

env_type - Whether this is a training, serving or local environment (type: ml2p.core.SageMakerEnvType).
project - The ML2P project name (type: str).
s3 - The URL of the project S3 bucket (type: ml2p.core.S3URL).
model_version - The fixed value “local” (type: str).

In the local environment settings are loaded directly from the ML2P configuration file.

clean_model_folder()[source]¶

Remove and recreate the model folder.

This is useful to run before training a model if one wants to ensure that the model folder is empty beforehand.

download_dataset(dataset)[source]¶

Download the given dataset from S3 into the local environment.

Parameters: dataset (str) – The name of the dataset in S3 to download.

download_model(training_job)[source]¶

Download the given trained model from S3 and unpack it into the local environment.

Parameters: training_job (str) – The name of the training job whose model should be downloaded.

S3URL¶

class ml2p.core.S3URL(s3folder)[source]¶

A friendly interface to an S3 URL.

bucket()[source]¶

Return the bucket of the S3 URL.

Return type: str
Returns: The bucket of the S3 URL.

path(suffix)[source]¶

Return the base path of the S3 URL followed by a ‘/’ and the given suffix.

Parameters: suffix (str) – The suffix to append.
Return type: str
Returns: The path with the suffix appended.

url(suffix='')[source]¶

Return S3 URL followed by a ‘/’ and the given suffix.

Parameters: suffix (str) – The suffix to append. Default: “”.
Return type: str
Returns: The URL with the suffix appended.