Dataset¶
A Dataset
contains any number of Datafiles
along with the following metadata:
name
tags
The files are stored in a FilterSet
, meaning they can be easily filtered according to any attribute of the
Datafile instances it contains.
Filtering files in a Dataset
¶
You can filter a Dataset
’s files as follows:
dataset = Dataset(
files=[
Datafile(timestamp=time.time(), path="path-within-dataset/my_file.csv", tags="one a:2 b:3 all"),
Datafile(timestamp=time.time(), path="path-within-dataset/your_file.txt", tags="two a:2 b:3 all"),
Datafile(timestamp=time.time(), path="path-within-dataset/another_file.csv", tags="three all"),
]
)
dataset.files.filter(filter_name="name__ends_with", filter_value=".csv")
>>> <FilterSet({<Datafile('my_file.csv')>, <Datafile('another_file.csv')>})>
dataset.files.filter("tags__contains", filter_value="a:2")
>>> <FilterSet({<Datafile('my_file.csv')>, <Datafile('your_file.txt')>})>
You can also chain filters indefinitely:
dataset.files.filter(filter_name="name__ends_with", filter_value=".csv").filter("tags__contains", filter_value="a:2")
>>> <FilterSet({<Datafile('my_file.csv')>})>
Find out more about FilterSets
here, including all the possible filters available for each type of object stored on
an attribute of a FilterSet
member, and how to convert them to primitive types such as set
or list
.