API

Datafile

class octue.resources.datafile.Datafile(path, local_path=None, cloud_path=None, timestamp=None, mode='r', update_metadata=True, ignore_stored_metadata=False, id=None, tags=None, labels=None, **kwargs)

A representation of a data file with metadata.

Metadata consists of id, timestamp, tags, and labels, available as attributes on the instance. On instantiation, metadata for the file is obtained from its stored location (the corresponding cloud object metadata or a local .octue metadata file) if present. Metadata values can alternatively be passed as arguments at instantiation but will only be used if stored metadata cannot be found - i.e. stored metadata always takes precedence (use the ignore_stored_metadata parameter to override this behaviour). Stored metadata can be updated after instantiation using the update_metadata method.

Parameters
  • path (str|None) – The path of this file locally or in the cloud, which may include folders or subfolders, within the dataset

  • local_path (str|None) – If a cloud path is given as the path parameter, this is the path to an existing local file that is known to be in sync with the cloud object

  • cloud_path (str|None) – If a local path is given for the path parameter, this is a cloud path to keep in sync with the local file

  • timestamp (datetime.datetime|int|float|None) – A posix timestamp associated with the file, in seconds since epoch, typically when it was created but could relate to a relevant time point for the data

  • mode (str) – if using as a context manager, open the datafile for reading/editing in this mode (the mode options are the same as for the builtin open function)

  • update_metadata (bool) – if using as a context manager and this is True, update the stored metadata of the datafile when the context is exited

  • ignore_stored_metadata (bool) – if True, ignore any metadata stored for this datafile locally or in the cloud and use whatever is given at instantiation

  • id (str) – The Universally Unique ID of this file (checked to be valid if not None, generated if None)

  • tags (dict|octue.resources.tag.TagDict|None) – key-value pairs with string keys conforming to the Octue tag format (see TagDict)

  • labels (iter(str)|octue.resources.label.LabelSet|None) – Space-separated string of labels relevant to this file

Return None

classmethod deserialise(serialised_datafile, from_string=False)

Deserialise a Datafile from a dictionary or JSON string.

Parameters
  • serialised_datafile (dict|str) –

  • from_string (bool) –

Return Datafile

property name

Get the name of the datafile.

Return str

property extension

Get the extension of the datafile.

Return str

property cloud_path

Get the cloud path of the datafile.

Return str|None

property cloud_hash_value

Get the hash value of the datafile according to its cloud file.

Return str|None

None if no cloud metadata is available

property timestamp

Get the timestamp of the datafile.

Return float

property posix_timestamp

Get the timestamp of the datafile in posix format.

Return float

property size_bytes

Get the size of the datafile in bytes.

Return float|None

property exists_locally

Return True if the file exists locally.

Return bool

property path

Alias to the local_path property.

Return str

property local_path

Get the local path for the datafile, downloading it from the cloud to a temporary file if necessary.

Return str

The local path of the datafile.

property open

Open the datafile for reading/writing. Usage is the same as the python built-in open context manager but it can only be used as a context manager e.g.

with datafile.open("w") as f:
    f.write("some data")
upload(cloud_path=None, update_cloud_metadata=True)

Upload a datafile to Google Cloud Storage.

Parameters
  • cloud_path (str|None) – full path to cloud storage location to store datafile at (e.g. gs://bucket_name/path/to/file.csv)

  • update_cloud_metadata (bool) – if True, update the metadata of the datafile in the cloud at upload time

Return str

gs:// path for datafile

download(local_path=None)

Download the file from the cloud to the given local path or a temporary path if none is given.

Parameters

local_path (str|None) – The local path to download the datafile to. A temporary path is used if none is given.

Raises

octue.exceptions.CloudLocationNotSpecified – If the datafile does not exist in the cloud

Return str

The path to the local file

metadata(include_id=True, include_sdk_version=True, use_octue_namespace=True)

Get the datafile’s metadata in a serialised form (i.e. the attributes id, timestamp, labels, tags, and sdk_version).

Parameters
  • include_id (bool) – if True, include the ID of the datafile

  • include_sdk_version (bool) – if True, include the octue version that instantiated the datafile in the metadata

  • use_octue_namespace (bool) – if True, prefix metadata names with “octue__

Return dict

update_metadata()

Using the datafile instance’s in-memory metadata, update its cloud metadata (if the datafile is cloud-based) or its local metadata file (if the datafile is local).

Return None

update_cloud_metadata()

Update the cloud metadata for the datafile.

Return None

update_local_metadata()

Create or update the local octue metadata file with the datafile’s metadata.

Return None

generate_signed_url(expiration=datetime.timedelta(days=7))

Generate a signed URL for the datafile.

Parameters

expiration (datetime.datetime|datetime.timedelta) – the amount of time or date after which the URL should expire

Return str

the signed URL for the datafile

add_labels(*args)

Add one or more new labels to the object. New labels will be cleaned and validated.

add_tags(tags=None, **kwargs)

Add one or more new tags to the object. New tags will be cleaned and validated.

property hash_value

Get the hash of the instance.

Return str

property id

Get the ID of the identifiable instance.

Return str

property labels

Get the labels of the labelled object.

Return iter

reset_hash()

Reset the hash value to the calculated hash (rather than whatever value has been set).

Return None

serialise(**kwargs)

Serialise the instance to a JSON string of primitives. See the Serialisable constructor for more information.

Return str

a JSON string containing the instance as a serialised python primitive

property tags

Get the tags of the taggable instance.

Return iter

to_file(filename, **kwargs)

Write the instance to a JSON file.

Parameters

filename (str) – path of file to write to, including relative or absolute path and .json extension

Return None

to_primitive()

Convert the instance into a JSON-compatible python dictionary of its attributes as primitives. See the Serialisable constructor for more information.

Return dict

Dataset

class octue.resources.dataset.Dataset(path=None, files=None, recursive=False, ignore_stored_metadata=False, id=None, name=None, tags=None, labels=None)

A representation of a dataset with metadata.

The default usage is to provide the path to a local or cloud directory and create the dataset from the files it contains. Alternatively, the files parameter can be provided and only those files are included. Either way, the path parameter should be explicitly set to something meaningful.

Metadata consists of id, name, tags, and labels, available as attributes on the instance. On instantiation, metadata for the dataset is obtained from its stored location (the corresponding cloud object metadata or a local .octue metadata file) if present. Metadata values can alternatively be passed as arguments at instantiation but will only be used if stored metadata cannot be found - i.e. stored metadata always takes precedence (use the ignore_stored_metadata parameter to override this behaviour). Stored metadata can be updated after instantiation using the update_metadata method.

Parameters
  • path (str|None) – the path to the dataset (defaults to the current working directory if none is given)

  • files (iter(str|dict|octue.resources.datafile.Datafile)|None) – the files belonging to the dataset

  • recursive (bool) – if True, include in the dataset all files in the subdirectories recursively contained within the dataset directory

  • ignore_stored_metadata (bool) – if True, ignore any metadata stored for this dataset locally or in the cloud and use whatever is given at instantiation

  • id (str|None) – an optional UUID to assign to the dataset (defaults to a random UUID if none is given)

  • name (str|None) – an optional name to give to the dataset (defaults to the dataset directory name)

  • tags (dict|octue.resources.tag.TagDict|None) – key-value pairs with string keys conforming to the Octue tag format (see TagDict)

  • labels (iter(str)|octue.resources.label.LabelSet|None) – space-separated string of labels relevant to the dataset

Return None

property name

Get the name of the dataset

Return str

property exists_locally

Return True if the dataset exists locally.

Return bool

property all_files_are_in_cloud

Do all the files of the dataset exist in the cloud?

Return bool

upload(cloud_path=None, update_cloud_metadata=True)

Upload a dataset to the given cloud path.

Parameters
  • cloud_path (str|None) – cloud path to store dataset at (e.g. gs://bucket_name/path/to/dataset)

  • update_cloud_metadata (bool) – if True, update the metadata of the dataset in the cloud at upload time

Return str

cloud path for dataset

update_metadata()

Using the dataset instance’s in-memory metadata, update its cloud metadata (if the dataset is cloud-based) or its local metadata file (if the dataset is local).

Return None

update_cloud_metadata()

Create or update the cloud metadata file for the dataset.

Return None

update_local_metadata()

Create or update the local octue metadata file with the dataset’s metadata.

Return None

generate_signed_url(expiration=datetime.timedelta(days=7))

Generate a signed URL for the dataset. This is done by uploading a uniquely named metadata file containing signed URLs to the datasets’ files and returning a signed URL to that metadata file.

Parameters

expiration (datetime.datetime|datetime.timedelta) – the amount of time or date after which the URL should expire

Return str

the signed URL for the dataset

add(datafile, path_in_dataset=None)

Add a datafile to the dataset. If the datafile’s location is outside the dataset, it is copied to the dataset root or to the path_in_dataset if provided.

Parameters
  • datafile (octue.resources.datafile.Datafile) – the datafile to add to the dataset

  • path_in_dataset (str|None) – if provided, set the datafile’s local path to this path within the dataset

Raises

octue.exceptions.InvalidInputException – if the datafile is not a Datafile instance

Return None

get_file_by_label(label)

Get a single datafile from a dataset by filtering for files with the provided label.

Parameters

label (str) – the label to filter for

Raises

octue.exceptions.UnexpectedNumberOfResultsException – if zero or more than one results satisfy the filters

Return octue.resources.datafile.DataFile

download(local_directory=None)

Download all files in the dataset into the given local directory. If no path to a local directory is given, the files will be downloaded to temporary locations.

Parameters

local_directory (str|None) –

Return None

to_primitive(include_files=True)

Convert the dataset to a dictionary of primitives, converting its files into their paths for a lightweight serialisation.

Parameters

include_files (bool) – if True, include the files parameter in the dictionary

Return dict

Manifest

class octue.resources.manifest.Manifest(datasets=None, id=None, name=None)

A representation of a manifest, which can contain multiple datasets This is used to manage all files coming into (or leaving), a data service for an analysis at the configuration, input or output stage.

Parameters
  • datasets (dict(str, octue.resources.dataset.Dataset|dict|str)|None) – a mapping of dataset names to Dataset instances, serialised datasets, or paths to datasets

  • id (str|None) – the UUID of the manifest (a UUID is generated if one isn’t given)

  • name (str|None) – an optional name to give to the manifest

Return None

classmethod from_cloud(cloud_path)

Instantiate a Manifest from Google Cloud storage.

Parameters

cloud_path (str) – full path to manifest in cloud storage (e.g. gs://bucket_name/path/to/manifest.json)

Return Dataset

property all_datasets_are_in_cloud

Do all the files of all the datasets of the manifest exist in the cloud?

Return bool

use_signed_urls_for_datasets()

Generate signed URLs for any cloud datasets in the manifest and use these as their paths instead of regular cloud paths. URLs will not be generated for any local datasets in the manifest.

Return None

to_cloud(cloud_path)

Upload a manifest to a cloud location, optionally uploading its datasets into the same directory.

Parameters

cloud_path (str) – full path to cloud storage location to store manifest at (e.g. gs://bucket_name/path/to/manifest.json)

Return None

get_dataset(key)

Get a dataset by its key (as defined in the twine).

Parameters

key (str) –

Return octue.resources.dataset.Dataset

prepare(data)

Prepare new manifest from a manifest_spec.

Parameters

data (dict) –

Return Manifest

to_primitive()

Convert the manifest to a dictionary of primitives, converting its datasets into their paths for a lightweight serialisation.

Return dict

classmethod deserialise(serialised_object, from_string=False)

Deserialise the given JSON-serialised object into an instance of the class.

Parameters
  • serialised_object (str|dict) – the string or dictionary of python primitives to deserialise into an instance

  • from_string (bool) – if True, deserialise from a JSON string; otherwise, deserialise from a dictionary

Return any

classmethod hash_non_class_object(object_)

Use the Hashable class to hash an arbitrary object that isn’t an attribute of a class instance.

Parameters

object (any) –

Return str

property hash_value

Get the hash of the instance.

Return str

property id

Get the ID of the identifiable instance.

Return str

metadata(include_id=True, include_sdk_version=True, **kwargs)

Get the instance’s metadata in primitive form. The metadata is the set of attributes included in the class variable self._METADATA_ATTRIBUTES.

Parameters
  • include_id (bool) – if True, include the ID of the instance if it is included in self._METADATA_ATTRIBUTES

  • include_sdk_version (bool) – if True, include the octue version that instantiated the instance

  • kwargs – any kwargs to use in an overridden self.metadata method

Return dict

property metadata_hash_value

Get the hash of the instance’s metadata, not including its ID.

Return str

property name

Get the name of the identifiable instance.

Return str

reset_hash()

Reset the hash value to the calculated hash (rather than whatever value has been set).

Return None

serialise(**kwargs)

Serialise the instance to a JSON string of primitives. See the Serialisable constructor for more information.

Return str

a JSON string containing the instance as a serialised python primitive

to_file(filename, **kwargs)

Write the instance to a JSON file.

Parameters

filename (str) – path of file to write to, including relative or absolute path and .json extension

Return None

Analysis

class octue.resources.analysis.Analysis(twine, handle_monitor_message=None, **kwargs)

A class representing a scientific or computational analysis. It holds references to all configuration, input, and output data, logs, connections to child services, credentials, etc. It’s essentially the “Internal API” for your service - a single point of contact where you can get or update anything you need.

An Analysis instance is automatically provided to the app in an Octue service when a question is received. Its attributes include every strand that can be added to a Twine, although only the strands specified in the service’s twine will be non-None. Incoming data is validated before it’s added to the analysis.

All input and configuration attributes are hashed using a BLAKE3 hash so the inputs and configuration that produced a given output in your app can always be verified. These hashes exist on the following attributes:

  • input_values_hash

  • input_manifest_hash

  • configuration_values_hash

  • configuration_manifest_hash

If a strand is None, so will its corresponding hash attribute be. The hash of a datafile is the hash of its file, while the hash of a manifest or dataset is the cumulative hash of the files it refers to.

Parameters
  • twine (twined.Twine|dict|str) – the twine, dictionary defining a twine, or path to “twine.json” file defining the service’s data interface

  • handle_monitor_message (callable|None) – an optional function for sending monitor messages to the parent that requested the analysis

  • configuration_values (any) – the configuration values for the analysis - this can be expressed as a python primitive (e.g. dict), a path to a JSON file, or a JSON string.

  • configuration_manifest (octue.resources.manifest.Manifest) – a manifest of configuration datasets for the analysis if required

  • input_values (any) – the input values for the analysis - this can be expressed as a python primitive (e.g. dict), a path to a JSON file, or a JSON string.

  • input_manifest (octue.resources.manifest.Manifest) – a manifest of input datasets for the analysis if required

  • output_values (any) – any output values the analysis produces

  • output_manifest (octue.resources.manifest.Manifest) – a manifest of output dataset from the analysis if it produces any

  • children (dict) – a mapping of string key to Child instance for all the children used by the service

  • id (str) – Optional UUID for the analysis

Return None

property finalised

Check whether the analysis has been finalised (i.e. whether its outputs have been validated and, if an output manifest is produced, its datasets uploaded).

Return bool

send_monitor_message(data)

Send a monitor message to the parent that requested the analysis.

Parameters

data (any) – any JSON-compatible data structure

Return None

finalise(upload_output_datasets_to=None)

Validate the output values and output manifest and, if the analysis produced an output manifest, upload its output datasets to a unique subdirectory within the analysis’s output location. This output location can be overridden by providing a different cloud path via the upload_output_datasets_to parameter. Either way, the dataset paths in the output manifest are replaced with signed URLs for easier, expiring access.

Parameters

upload_output_datasets_to (str|None) – If not provided but an output location was provided at instantiation, upload any output datasets into a unique subdirectory within this output location; if provided, upload into this location instead. The output manifest is updated with the upload locations.

Return None

add_labels(*args)

Add one or more new labels to the object. New labels will be cleaned and validated.

add_tags(tags=None, **kwargs)

Add one or more new tags to the object. New tags will be cleaned and validated.

classmethod deserialise(serialised_object, from_string=False)

Deserialise the given JSON-serialised object into an instance of the class.

Parameters
  • serialised_object (str|dict) – the string or dictionary of python primitives to deserialise into an instance

  • from_string (bool) – if True, deserialise from a JSON string; otherwise, deserialise from a dictionary

Return any

property id

Get the ID of the identifiable instance.

Return str

property labels

Get the labels of the labelled object.

Return iter

property name

Get the name of the identifiable instance.

Return str

serialise(**kwargs)

Serialise the instance to a JSON string of primitives. See the Serialisable constructor for more information.

Return str

a JSON string containing the instance as a serialised python primitive

property tags

Get the tags of the taggable instance.

Return iter

to_file(filename, **kwargs)

Write the instance to a JSON file.

Parameters

filename (str) – path of file to write to, including relative or absolute path and .json extension

Return None

to_primitive()

Convert the instance into a JSON-compatible python dictionary of its attributes as primitives. See the Serialisable constructor for more information.

Return dict

Child

class octue.resources.child.Child(id, backend, internal_service_name=None)

A class representing an Octue child service that can be asked questions. It is a convenience wrapper for Service that makes question asking more intuitive and allows easier selection of backends.

Parameters
  • id (str) – the ID of the child

  • backend (dict) – must include the key “name” with a value of the name of the type of backend e.g. “GCPPubSubBackend” and key-value pairs for any other parameters the chosen backend expects

  • internal_service_name (str|None) – the name to give to the internal service used to ask questions to the child

Return None

ask(input_values=None, input_manifest=None, subscribe_to_logs=True, allow_local_files=False, handle_monitor_message=None, record_messages_to=None, allow_save_diagnostics_data_on_crash=True, question_uuid=None, timeout=86400)

Ask the child a question and wait for its answer - i.e. send it input values and/or an input manifest and wait for it to analyse them and return output values and/or an output manifest. The input values and manifest must conform to the schema in the child’s twine.

Parameters
  • input_values (any|None) – any input values for the question

  • input_manifest (octue.resources.manifest.Manifest|None) – an input manifest of any datasets needed for the question

  • subscribe_to_logs (bool) – if True, subscribe to logs from the child and handle them with the local log handlers

  • allow_local_files (bool) – if True, allow the input manifest to contain references to local files - this should only be set to True if the child will have access to these local files

  • handle_monitor_message (callable|None) – a function to handle monitor messages (e.g. send them to an endpoint for plotting or displaying) - this function should take a single JSON-compatible python primitive as an argument (note that this could be an array or object)

  • record_messages_to (str|None) – if given a path to a JSON file, messages received in response to the question are saved to it

  • allow_save_diagnostics_data_on_crash (bool) – if True, allow the input values and manifest (and its datasets) to be saved by the child if it fails while processing them

  • question_uuid (str|None) – the UUID to use for the question if a specific one is needed; a UUID is generated if not

  • timeout (float) – time in seconds to wait for an answer before raising a timeout error

Raises

TimeoutError – if the timeout is exceeded while waiting for an answer

Return dict

a dictionary containing the keys “output_values” and “output_manifest”

ask_multiple(*questions)

Ask the child multiple questions in parallel and wait for the answers. Each question should be provided as a dictionary of Child.ask keyword arguments.

Parameters

questions – any number of questions provided as dictionaries of arguments to the Child.ask method

Return list

the answers to the questions in the same order as the questions

Child emulator

class octue.cloud.emulators.child.ChildEmulator(id=None, backend=None, internal_service_name=None, messages=None)

An emulator for the octue.resources.child.Child class that sends the given messages to the parent for handling without contacting the real child or using Pub/Sub. Any messages a real child could produce are supported. Child instances can be replaced/mocked like-for-like by ChildEmulator without the parent knowing.

Parameters
  • id (str|None) – the ID of the child; a UUID is generated if none is provided

  • backend (dict|None) – a dictionary including the key “name” with a value of the name of the type of backend (e.g. “GCPPubSubBackend”) and key-value pairs for any other parameters the chosen backend expects; a mock backend is used if none is provided

  • internal_service_name (str|None) – the name to give to the internal service used to ask questions to the child; defaults to “<id>-parent”

  • messages (list(dict)|None) – the list of messages to send to the parent

Return None

classmethod from_file(path)

Instantiate a child emulator from a JSON file at the given path. All/any/none of the instantiation arguments can be given in the file.

Parameters

path (str) – the path to a JSON file representing a child emulator

Return ChildEmulator

ask(input_values=None, input_manifest=None, subscribe_to_logs=True, allow_local_files=False, handle_monitor_message=None, record_messages_to=None, question_uuid=None, timeout=86400)
Ask the child emulator a question and receive its emulated response messages. Unlike a real child, the input

values and manifest are not validated against the schema in the child’s twine as it is only available to the real child. Hence, the input values and manifest do not affect the messages returned by the emulator.

Parameters
  • input_values (any|None) – any input values for the question

  • input_manifest (octue.resources.manifest.Manifest|None) – an input manifest of any datasets needed for the question

  • subscribe_to_logs (bool) – if True, subscribe to logs from the child and handle them with the local log handlers

  • allow_local_files (bool) – if True, allow the input manifest to contain references to local files - this should only be set to True if the child will have access to these local files

  • handle_monitor_message (callable|None) – a function to handle monitor messages (e.g. send them to an endpoint for plotting or displaying) - this function should take a single JSON-compatible python primitive as an argument (note that this could be an array or object)

  • record_messages_to (str|None) – if given a path to a JSON file, messages received in response to the question are saved to it

  • question_uuid (str|None) – the UUID to use for the question if a specific one is needed; a UUID is generated if not

  • timeout (float) – time in seconds to wait for an answer before raising a timeout error

Raises

TimeoutError – if the timeout is exceeded while waiting for an answer

Return dict

a dictionary containing the keys “output_values” and “output_manifest”

Filter containers

FilterSet

class octue.resources.filter_containers.FilterSet
filter(ignore_items_without_attribute=True, **kwargs)

Return a new instance containing only the Filterable`s to which the given filter criteria are `True.

Parameters
  • ignore_items_without_attribute (bool) – if True, just ignore any members of the container without a filtered-for attribute rather than raising an error

  • {str – any} kwargs: keyword arguments whose keys are the name of the filter and whose values are the values to filter for

Return octue.resources.filter_containers.FilterContainer

one(**kwargs)

If a single result exists for the given filters, return it. Otherwise, raise an error.

Parameters

{str – any} kwargs: keyword arguments whose keys are the name of the filter and whose values are the values to filter for

Raises

octue.exceptions.UnexpectedNumberOfResultsException – if zero or more than one results satisfy the filters

Return octue.resources.mixins.filterable.Filterable

order_by(attribute_name, check_start_value=None, check_constant_increment=None, reverse=False)

Order the Filterable`s in the container by an attribute with the given name, returning them as a new `FilterList regardless of the type of filter container begun with (`FilterSet`s and `FilterDict`s are inherently orderless).

Parameters
  • attribute_name (str) – name of attribute (optionally nested) to order by e.g. “a”, “a.b”, “a.b.c”

  • check_start_value (any) – if provided, check that the first item in the ordered container has the given start value for the attribute ordered by

  • check_constant_increment (int|float|None) – if given, check that the ordered-by attribute of each of the items in the ordered container increases by the given value when progressing along the sequence

  • reverse (bool) – if True, reverse the ordering

Raises

octue.exceptions.InvalidInputException – if an attribute with the given name doesn’t exist on any of the container’s members

Return FilterList

FilterList

class octue.resources.filter_containers.FilterList(iterable=(), /)
filter(ignore_items_without_attribute=True, **kwargs)

Return a new instance containing only the Filterable`s to which the given filter criteria are `True.

Parameters
  • ignore_items_without_attribute (bool) – if True, just ignore any members of the container without a filtered-for attribute rather than raising an error

  • {str – any} kwargs: keyword arguments whose keys are the name of the filter and whose values are the values to filter for

Return octue.resources.filter_containers.FilterContainer

one(**kwargs)

If a single result exists for the given filters, return it. Otherwise, raise an error.

Parameters

{str – any} kwargs: keyword arguments whose keys are the name of the filter and whose values are the values to filter for

Raises

octue.exceptions.UnexpectedNumberOfResultsException – if zero or more than one results satisfy the filters

Return octue.resources.mixins.filterable.Filterable

order_by(attribute_name, check_start_value=None, check_constant_increment=None, reverse=False)

Order the Filterable`s in the container by an attribute with the given name, returning them as a new `FilterList regardless of the type of filter container begun with (`FilterSet`s and `FilterDict`s are inherently orderless).

Parameters
  • attribute_name (str) – name of attribute (optionally nested) to order by e.g. “a”, “a.b”, “a.b.c”

  • check_start_value (any) – if provided, check that the first item in the ordered container has the given start value for the attribute ordered by

  • check_constant_increment (int|float|None) – if given, check that the ordered-by attribute of each of the items in the ordered container increases by the given value when progressing along the sequence

  • reverse (bool) – if True, reverse the ordering

Raises

octue.exceptions.InvalidInputException – if an attribute with the given name doesn’t exist on any of the container’s members

Return FilterList

FilterDict

class octue.resources.filter_containers.FilterDict(**kwargs)

A dictionary that is filterable by its values’ attributes. Each key can be anything, but each value must be an octue.mixins.filterable.Filterable instance.

filter(ignore_items_without_attribute=True, **kwargs)

Return a new instance containing only the Filterables for which the given filter criteria apply are satisfied.

Parameters
  • ignore_items_without_attribute (bool) – if True, just ignore any members of the container without a filtered-for attribute rather than raising an error

  • {str – any} kwargs: keyword arguments whose keys are the name of the filter and whose values are the values to filter for

Return FilterDict

order_by(attribute_name, reverse=False)

Order the instance by the given attribute_name, returning the instance’s elements as a new FilterList.

Parameters
  • attribute_name (str) – name of attribute (optionally nested) to order by e.g. “a”, “a.b”, “a.b.c”

  • reverse (bool) – if True, reverse the ordering

Raises

octue.exceptions.InvalidInputException – if an attribute with the given name doesn’t exist on any of the FilterDict’s values

Return FilterList

one(**kwargs)

If a single item exists for the given filters, return it. Otherwise, raise an error.

Parameters

{str – any} kwargs: keyword arguments whose keys are the name of the filter and whose values are the values to filter for

Raises

octue.exceptions.UnexpectedNumberOfResultsException – if zero or more than one results satisfy the filters

Return (any, octue.resources.mixins.filterable.Filterable)

Configuration

octue.configuration.load_service_and_app_configuration(service_configuration_path)

Load the service configuration from the given YAML file and the app configuration referenced in it. If no app configuration is referenced, an empty one is returned.

Parameters

service_configuration_path (str) – path to service configuration file

Return (octue.configuration.ServiceConfiguration, octue.configuration.AppConfiguration)

Service configuration

class octue.configuration.ServiceConfiguration(name, organisation=None, app_source_path='.', twine_path='twine.json', app_configuration_path=None, crash_diagnostics_cloud_path=None, repository_name=None, repository_owner=None, project_name=None, region=None, dockerfile_path=None, cloud_build_configuration_path=None, maximum_instances=10, branch_pattern='^main$', environment_variables=None, secrets=None, concurrency=10, memory='128Mi', cpus=1, minimum_instances=0, temporary_files_location=None, setup_file_path=None, service_account_email=None, machine_type=None, **kwargs)

A class containing the details needed to configure a service.

Parameters
  • name (str) – the name to give the service

  • organisation (str|None) – the name of the organisation providing the service

  • app_source_path (str) – the path to the directory containing the app’s source code

  • twine_path (str) – the path to the twine file defining the schema for input, output, and configuration data for the service

  • app_configuration_path (str|None) – the path to the app configuration file containing configuration data for the service; if this is None, the default application configuration is used

  • crash_diagnostics_cloud_path (str|None) – the path to a cloud directory to store crash diagnostics in the event that the service fails while processing a question (this includes the configuration, input values and manifest, and logs)

Return None

classmethod from_file(path)

Load a service configuration from a file.

Parameters

path (str) –

Return ServiceConfiguration

App configuration

class octue.configuration.AppConfiguration(configuration_values=None, configuration_manifest=None, children=None, output_location=None, **kwargs)

A class containing the configuration data needed to start an app as a service. The configuration data should conform to the service’s twine schema.

Parameters
  • configuration_values (str|dict|list|None) – values to configure the app

  • configuration_manifest (str|dict|octue.resources.Manifest|None) – a manifest of datasets to configure the app

  • children (str|None|list) – details of the children the app requires

  • output_location (str|None) – the path to a cloud directory to save output datasets at

Return None

classmethod from_file(path)

Load an app configuration from a file.

Parameters

path (str) –

Return AppConfiguration

Runner

class octue.runner.Runner(app_src, twine='twine.json', configuration_values=None, configuration_manifest=None, children=None, output_location=None, crash_diagnostics_cloud_path=None, project_name=None, service_id=None)

A runner of analyses for a given service.

The Runner class provides a set of configuration parameters for use by your application, together with a range of methods for managing input and output file parsing as well as controlling logging.

Parameters
  • app_src (callable|type|module|str) – either a function that accepts an Octue analysis, a class with a run method that accepts an Octue analysis, or a path to a directory containing an app.py file containing one of these

  • twine (str|dict|twined.Twine) – path to the twine file, a string containing valid twine json, or a Twine instance

  • configuration_values (str|dict|None) – The strand data. Can be expressed as a string path of a *.json file (relative or absolute), as an open file-like object (containing json data), as a string of json data or as an already-parsed dict.

  • configuration_manifest (str|dict|None) – The strand data. Can be expressed as a string path of a *.json file (relative or absolute), as an open file-like object (containing json data), as a string of json data or as an already-parsed dict.

  • children (str|dict|None) – The children strand data. Can be expressed as a string path of a *.json file (relative or absolute), as an open file-like object (containing json data), as a string of json data or as an already-parsed dict.

  • output_location (str|None) – the path to a cloud directory to save output datasets at

  • crash_diagnostics_cloud_path (str|None) – the path to a cloud directory to store crash diagnostics in the event that the service fails while processing a question (this includes the configuration, input values and manifest, and logs)

  • project_name (str|None) – name of Google Cloud project to get credentials from

  • service_id (str|None) – the ID of the service being run

Return None

run(analysis_id=None, input_values=None, input_manifest=None, analysis_log_level=20, analysis_log_handler=None, handle_monitor_message=None, allow_save_diagnostics_data_on_crash=True, sent_messages=None)

Run an analysis.

Parameters
  • analysis_id (str|None) – UUID of analysis

  • input_values (str|dict|None) – the input_values strand data. Can be expressed as a string path of a *.json file (relative or absolute), as an open file-like object (containing json data), as a string of json data or as an already-parsed dict.

  • input_manifest (str|dict|octue.resources.manifest.Manifest|None) – The input_manifest strand data. Can be expressed as a string path of a *.json file (relative or absolute), as an open file-like object (containing json data), as a string of json data or as an already-parsed dict.

  • analysis_log_level (str) – the level below which to ignore log messages

  • analysis_log_handler (logging.Handler|None) – the logging.Handler instance which will be used to handle logs for this analysis run. Handlers can be created as per the logging cookbook https://docs.python.org/3/howto/logging-cookbook.html but should use the format defined above in LOG_FORMAT.

  • handle_monitor_message (callable|None) – a function that sends monitor messages to the parent that requested the analysis

  • allow_save_diagnostics_data_on_crash (bool) – if True, allow the input values and manifest (and its datasets) to be saved if the analysis fails

  • sent_messages (list|None) – the list of messages sent by the service running this runner (this should update in real time) to save if crash diagnostics are enabled

Return octue.resources.analysis.Analysis

Octue essential monitor messages

A module containing helper functions for sending monitor messages that conform to the Octue essential monitor message schema https://refs.schema.octue.com/octue/essential-monitors/0.0.2.json

octue.essentials.monitor_messages.send_status_text(analysis, text, service_name)

Send a status-type monitor message and additionally log it to the info level.

Parameters
  • analysis (octue.resources.analysis.Analysis) – the analysis from which to send the status text

  • text (str) – the text of the status message

  • service_name (str) – the name of the service/child running the analysis

Return None

octue.essentials.monitor_messages.send_estimated_seconds_remaining(analysis, estimated_seconds_remaining, service_name)

Send an estimated-seconds-remaining monitor message.

Parameters
  • analysis (octue.resources.analysis.Analysis) – the analysis from which to send the estimate

  • estimated_seconds_remaining (float) –

  • service_name (str) – the name of the service/child running the analysis

Octue log handler

octue.log_handlers.apply_log_handler(logger_name=None, logger=None, handler=None, log_level=20, formatter=None, include_line_number=False, include_process_name=False, include_thread_name=False)

Apply a log handler with the given formatter to the logger with the given name. By default, the default Octue log handler is used on the root logger.

Parameters
  • logger_name (str|None) – the name of the logger to apply the handler to; if this and logger are None, the root logger is used

  • logger (logging.Logger|None) – the logger instance to apply the handler to (takes precedence over a logger name)

  • handler (logging.Handler|None) – The handler to use. If None, the default StreamHandler will be attached.

  • log_level (int|str) – ignore log messages below this level

  • formatter (logging.Formatter|None) – if provided, this formatter is used and the other formatting options are ignored

  • include_line_number (bool) – if True, include the line number in the log context

  • include_process_name (bool) – if True, include the process name in the log context

  • include_thread_name (bool) – if True, include the thread name in the log context

Return logging.Handler