Dataset
A Dataset
contains any number of Datafiles
along with the following metadata:
name
tags
labels
The files are stored in a FilterSet
, meaning they can be easily filtered according to any attribute of the
Datafile instances contained.
Filtering files in a Dataset
You can filter a Dataset
’s files as follows:
dataset = Dataset(
files=[
Datafile(path="path-within-dataset/my_file.csv", labels=["one", "a", "b" "all"]),
Datafile(path="path-within-dataset/your_file.txt", labels=["two", "a", "b", "all"),
Datafile(path="path-within-dataset/another_file.csv", labels=["three", "all"]),
]
)
dataset.files.filter(name__ends_with=".csv")
>>> <FilterSet({<Datafile('my_file.csv')>, <Datafile('another_file.csv')>})>
dataset.files.filter(labels__contains="a")
>>> <FilterSet({<Datafile('my_file.csv')>, <Datafile('your_file.txt')>})>
You can also chain filters indefinitely, or specify them all at the same time:
dataset.files.filter(name__ends_with=".csv").filter(labels__contains="a")
>>> <FilterSet({<Datafile('my_file.csv')>})>
dataset.files.filter(name__ends_with=".csv", labels__contains="a")
>>> <FilterSet({<Datafile('my_file.csv')>})>
Find out more about FilterSets
here, including all the possible filters available for each type of object stored on
an attribute of a FilterSet
member, and how to convert them to primitive types such as set
or list
.