Datafile¶
A Datafile
is an Octue type that corresponds to a file, which may exist on your computer or in a cloud store. It has
the following main attributes:
path
- the path of this file, which may include folders or subfolders, within the dataset.cluster
- the integer cluster of files, within a dataset, to which this belongs (default 0)sequence
- a sequence number of this file within its cluster (if sequences are appropriate)tags
- key-value pairs of metadata relevant to this filelabels
- a space-separated string or iterable of labels relevant to this filetimestamp
- a posix timestamp associated with the file, in seconds since epoch, typically when it was created but could relate to a relevant time point for the data
Usage¶
Datafile
can be used functionally or as a context manager. When used as a context manager, it is analogous to the
builtin open
function context manager. On exiting the context (with
block), it closes the datafile locally and,
if it is a cloud datafile, updates the cloud object with any data or metadata changes.
Example A¶
Scenario: Download a cloud object, calculate Octue metadata from its contents, and add the new metadata to the cloud object
Starting point: Object in cloud with or without Octue metadata
Goal: Object in cloud with updated metadata
from octue.resources import Datafile
project_name = "my-project"
bucket_name = "my-bucket",
datafile_path = "path/to/data.csv"
with Datafile.from_cloud(project_name, bucket_name, datafile_path, mode="r") as (datafile, f):
data = f.read()
new_metadata = metadata_calculating_function(data)
datafile.timestamp = new_metadata["timestamp"]
datafile.cluster = new_metadata["cluster"]
datafile.sequence = new_metadata["sequence"]
datafile.tags = new_metadata["tags"]
datafile.labels = new_metadata["labels"]
Example B¶
Scenario: Add or update Octue metadata on an existing cloud object without downloading its content
Starting point: A cloud object with or without Octue metadata
Goal: Object in cloud with updated metadata
from datetime import datetime
from octue.resources import Datafile
project_name = "my-project"
bucket_name = "my-bucket"
datafile_path = "path/to/data.csv"
datafile = Datafile.from_cloud(project_name, bucket_name, datafile_path):
datafile.timestamp = datetime.now()
datafile.cluster = 0
datafile.sequence = 3
datafile.tags = {"manufacturer": "Vestas", "output": "1MW"}
datafile.labels = {"new"}
datafile.to_cloud() # Or, datafile.update_cloud_metadata()
Example C¶
Scenario: Read in the contents and Octue metadata of an existing cloud object without intent to update it in the cloud
Starting point: A cloud object with Octue metadata
Goal: Cloud object data (contents) and metadata held locally in local variables
from octue.resources import Datafile
project_name = "my-project"
bucket_name = "my-bucket"
datafile_path = "path/to/data.csv"
datafile = Datafile.from_cloud(project_name, bucket_name, datafile_path)
with datafile.open("r") as f:
data = f.read()
metadata = datafile.metadata()
Example D¶
Scenario: Create a new cloud object from local data, adding Octue metadata
Starting point: A file-like locally (or content data in local variable) with Octue metadata stored in local variables
Goal: A new object in the cloud with data and Octue metadata
For creating new data in a new local file:
from octue.resources import Datafile
sequence = 2
tags = {"cleaned": True, "type": "linear"}
labels = {"Vestas"}
with Datafile(path="path/to/local/file.dat", sequence=sequence, tags=tags, labels=labels, mode="w") as (datafile, f):
f.write("This is some cleaned data.")
datafile.to_cloud(project_name="my-project", bucket_name="my-bucket", path_in_bucket="path/to/data.dat")
For existing data in an existing local file:
from octue.resources import Datafile
sequence = 2
tags = {"cleaned": True, "type": "linear"}
labels = {"Vestas"}
datafile = Datafile(path="path/to/local/file.dat", sequence=sequence, tags=tags, labels=labels)
datafile.to_cloud(project_name="my-project", bucket_name="my-bucket", path_in_bucket="path/to/data.dat")