plexus.data.HuggingFaceDataCache module

class plexus.data.HuggingFaceDataCache.HuggingFaceDataCache(**parameters)

Bases: DataCache

A class to load and cache datasets from Hugging Face.

Initialize the DataCache instance with the given parameters.

Parameters

**parametersdict

Arbitrary keyword arguments that are used to initialize the Parameters instance.

Raises

ValidationError

If the provided parameters do not pass validation.

class Parameters(*, class_name: str = 'DataCache', name: str)

Bases: Parameters

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: str
__init__(**parameters)

Initialize the DataCache instance with the given parameters.

Parameters

**parametersdict

Arbitrary keyword arguments that are used to initialize the Parameters instance.

Raises

ValidationError

If the provided parameters do not pass validation.

analyze_dataset(df: DataFrame)

Display basic analysis of the loaded DataFrame.

load_dataframe(*args, **kwargs)

Load a dataframe based on the provided parameters.

Returns

pd.DataFrame

The loaded dataframe.

This method must be implemented by all subclasses.

verify_dataset(df: DataFrame)

Perform basic verification checks on the loaded DataFrame and return results in a more readable format and a dictionary for testing.