base datasets
- load_dataset(dataset, unpack_dataset_columns=False, **kwargs)
Load dataset as np.ndarray of shape (nr_of_samples, 2).
It is 2D array with each row representing one point in time series. The first column is the x-variable and the second column is the y-variable.
If unpack_dataset_columns=True is specified as kwargs, the dataset is unpacked to two separate arrays x and y.
The list of available datasets is in the traffic_weaver.datasets.data_description module.
- Parameters:
dataset (str) – Name of the dataset to load.
unpack_dataset_columns (bool, default=False) – If True, the dataset is unpacked to two separate arrays x and y.
- Returns:
dataset – 2D array with each row representing one point in time series. The first column is the x-variable and the second column is the y-variable.
- Return type:
np.ndarray of shape (nr_of_samples, 2)
Examples
>>> data = load_dataset('sandvine_audio')
- get_data_home(data_home: str = None) str
Return the path of the data directory.
Datasets are stored in ‘.traffic-weaver-data’ directory in the user directory.
This directory can be changed by setting TRAFFIC_WEAVER_DATA environment variable.
- Parameters:
data_home (str, default=None) – The path to the data directory. If None, the default directory is .traffic-weaver-data.
- Returns:
data_home – The path to the data directory.
- Return type:
str
Examples
>>> import os >>> from traffic_weaver.datasets import get_data_home >>> data_home = get_data_home() >>> os.path.exists(data_home) True
- clear_data_home(data_home: str = None)
Remove all files in the data directory.
- Parameters:
data_home (str, default=None) – The path to the data directory. If None, the default directory is .traffic-weaver-data.
- load_csv_dataset_from_resources(file_name, resources_module='traffic_weaver.datasets.data', unpack_dataset_columns=False)
Load dataset from resources.
- Parameters:
file_name (str) – name of the file to load.
resources_module (str, default='traffic_weaver.datasets.data') – The package name where the resources are located.
unpack_dataset_columns (bool, default=False) – If True, the dataset is unpacked to two separate arrays x and y.
- Returns:
dataset – 2D array with each row representing one point in time series. The first column is the x-variable and the second column is the y-variable.
- Return type:
np.ndarray of shape (nr_of_samples, 2)
- load_csv_dataset_from_remote(remote: RemoteFileMetadata, dataset_filename, dataset_folder, data_home=None, download_if_missing: bool = True, download_even_if_available: bool = False, validate_checksum: bool = True, n_retries=3, delay=1.0, gzip=False, unpack_dataset_columns=False)
Load a dataset from a remote location in csv.gz format. After downloading the dataset it is stored in the cache folder for further use in pickle format.
- Parameters:
remote (RemoteFileMetadata) – Named tuple containing remote dataset meta information: url, filename, checksum.
dataset_filename (str) – Name for the dataset file.
dataset_folder (str) – Folder in data_home where the dataset is stored.
data_home (str, default=None) – Download cache folder fot the dataset. By default data is stored in ~/.traffic-weaver-data.
download_if_missing (bool, default=True) – If False, raise an OSError if the data is not locally available instead of trying to download the data from the source.
download_even_if_available (bool, default=False) – If True, download the data even if it is already available locally.
validate_checksum (bool, default=True) – If True, check the SHA256 checksum of the downloaded file.
n_retries (int, default=3) – Number of retries in case of HTTPError or URLError when downloading the data.
delay (float, default=1.0) – Number of seconds between retries.
gzip (bool, default=False) – If True, the file is assumed to be compressed in gzip format in the remote.
unpack_dataset_columns (bool, default=False) – If True, the dataset is unpacked to two separate arrays x and y.
- Returns:
dataset – 2D array with each row representing one point in time series. The first column is the x-variable and the second column is the y-variable.
- Return type:
np.ndarray of shape (nr_of_samples, 2)
- load_dataset_description(datasetsource_filename, resources_module='traffic_weaver.datasets.data_description')
Load source of the dataset from filename from resources.
- Parameters:
datasetsource_filename (str) – name of the file to load.
resources_module (str, default='traffic_weaver.datasets.datadescription') – The package name where the resources are located.
- Returns:
description – Source of the dataset.
- Return type:
str