Input Module¶

This section documents the input components of the Nextmv Python SDK.

input ¶

Module for handling input sources and data.

This module provides classes and functions for loading and handling input data in various formats for decision problems. It supports JSON, plain text, CSV, and CSV archive formats and can load data from standard input or files.

CLASS	DESCRIPTION
`InputFormat`	Enum defining supported input data formats (JSON, TEXT, CSV, CSV_ARCHIVE).
`Input`	Container for input data with format specification and options.
`InputLoader`	Base class for loading inputs from various sources.
`LocalInputLoader`	Class for loading inputs from local files or stdin.

FUNCTION	DESCRIPTION
`load`	Load input data using a specified loader.

DataFile `dataclass` ¶

DataFile(
    name: str,
    loader: Callable[[str], Any],
    loader_kwargs: Optional[dict[str, Any]] = None,
    loader_args: Optional[list[Any]] = None,
    input_data_key: Optional[str] = None,
)

Represents data to be read from a file.

You can import the DataFile class directly from nextmv:

from nextmv import DataFile

This class is used to define data that will be read from a file in the filesystem. It includes the name of the file, and the reader function that will handle the loading, and deserialization of the data from the file. This DataFile class is typically used in the Input, when the Input.input_format is set to InputFormat.MULTI_FILE. Given that it is difficul to handle every edge case of how data is deserialized, and read from a file, this class exists so that the user can implement the reader callable of their choice and provide it with any reader_args and reader_kwargs they might need.

PARAMETER DESCRIPTION

`name` ¶

Name of the data (input) file. The file extension should be included in the name.

TYPE: str

`reader` ¶

Callable that reads the data from the file. This should be a function implemented by the user. There are convenience functions that you can use as a reader as well. The reader must receive, at the very minimum, the following arguments:

file_path: a str argument which is the location where this data will be read from. This includes the dir and name of the file. As such, the name parameter of this class is going to be passed to the reader function, joined with the directory where the file will be read from.

The reader can also receive additional arguments, and keyword arguments. The reader_args and reader_kwargs parameters of this class can be used to provide those additional arguments.

The reader function should return the data that will be used in the model.

TYPE: Callable[[str], Any]

input_data_key `class-attribute` `instance-attribute` ¶

input_data_key: Optional[str] = None

Use this parameter to set a custom key to represent your file.

When using InputFormat.MULTI_FILE as the input_format of the Input, the data from the file is loaded to the .data parameter of the Input. In that case, the type of .data is dict[str, Any], where each key represents the file name (with extension) and the value is the data that is actually loaded from the file using the loader function. You can set a custom key to represent your file by using this attribute.

loader `instance-attribute` ¶

loader: Callable[[str], Any]

Callable that reads (loads) the data from the file. This should be a function implemented by the user. There are convenience functions that you can use as a loader as well. The loader must receive, at the very minimum, the following arguments:

file_path: a str argument which is the location where this data will be read from. This includes the dir and name of the file. As such, the name parameter of this class is going to be passed to the loader function, joined with the directory where the file will be read from.

The loader can also receive additional arguments, and keyword arguments. The loader_args and loader_kwargs parameters of this class can be used to provide those additional arguments.

The loader function should return the data that will be used in the model.

loader_args `class-attribute` `instance-attribute` ¶

loader_args: Optional[list[Any]] = None

Optional positional arguments to pass to the loader function. This can be used to customize the behavior of the loader.

loader_kwargs `class-attribute` `instance-attribute` ¶

loader_kwargs: Optional[dict[str, Any]] = None

Optional keyword arguments to pass to the loader function. This can be used to customize the behavior of the loader.

name `instance-attribute` ¶

name: str

Name of the data (input) file. The file extension should be included in the name.

Input `dataclass` ¶

Input(
    data: Union[
        Union[dict[str, Any], Any],
        str,
        list[dict[str, Any]],
        dict[str, list[dict[str, Any]]],
        dict[str, Any],
    ],
    input_format: Optional[InputFormat] = JSON,
    options: Optional[Options] = None,
)

Input for a decision problem.

You can import the Input class directly from nextmv:

from nextmv import Input

The data's type must match the input_format:

InputFormat.JSON: the data is Union[dict[str, Any], Any]. This just means that the data must be JSON-deserializable, which includes dicts and lists.
InputFormat.TEXT: the data is str, and it must be utf-8 encoded.
InputFormat.CSV: the data is list[dict[str, Any]], where each dict represents a row in the CSV.
InputFormat.CSV_ARCHIVE: the data is dict[str, list[dict[str, Any]]], where each key is the name of a CSV file and the value is a list of dicts representing the rows in that CSV file.
InputFormat.MULTI_FILE: the data is dict[str, Any], where for each item, the key is the file name (with the extension) and the actual data from the file is the value. When working with multi-file, data is loaded from one or more files in a specific directory. Given that each file can be of different types (JSON, CSV, Excel, etc...), the data captured from each might vary. To reflect this, the data is loaded as a dict of items. You can have a custom key for the data, that is not the file name, if you use the input_data_key parameter of the DataFile class.

PARAMETER	DESCRIPTION
`data` ¶	TYPE: `Union[Union[dict[str, Any], Any], str, list[dict[str, Any]],`
`dict` ¶	The actual data.
`input_format` ¶	Format of the input data. Default is `InputFormat.JSON`. TYPE: `InputFormat` DEFAULT: `JSON`
`options` ¶	Options that the input was created with. TYPE: `Options` DEFAULT: `None`

RAISES	DESCRIPTION
`ValueError`	If the data type doesn't match the expected type for the given format.
`ValueError`	If the `input_format` is not one of the supported formats.

data `instance-attribute` ¶

data: Union[
    Union[dict[str, Any], Any],
    str,
    list[dict[str, Any]],
    dict[str, list[dict[str, Any]]],
    dict[str, Any],
]

The actual data.

The data can be of various types, depending on the input format:

For JSON: Union[dict[str, Any], Any]
For TEXT: str
For CSV: list[dict[str, Any]]
For CSV_ARCHIVE: dict[str, list[dict[str, Any]]]
For MULTI_FILE: dict[str, Any]

input_format `class-attribute` `instance-attribute` ¶

input_format: Optional[InputFormat] = JSON

Format of the input data.

Default is InputFormat.JSON.

options `class-attribute` `instance-attribute` ¶

options: Optional[Options] = None

Options that the Input was created with.

A copy of the options is made during initialization, ensuring the original options remain unchanged even if modified later.

to_dict ¶

to_dict() -> dict[str, Any]

Convert the input to a dictionary.

This method serializes the Input object to a dictionary format that can be easily converted to JSON or other serialization formats. When the input_type is set to InputFormat.MULTI_FILE, it will not include the data field, as it is uncertain how data is deserialized from the file.

RETURNS	DESCRIPTION
`dict[str, Any]`	A dictionary containing the input data, format, and options. The structure is: `{ "data": <the input data>, "input_format": <the input format as a string>, "options": <the options as a dictionary or None> }`

Examples:

>>> from nextmv.input import Input, InputFormat
>>> input_obj = Input(data={"key": "value"}, input_format=InputFormat.JSON)
>>> input_dict = input_obj.to_dict()
>>> print(input_dict)
{'data': {'key': 'value'}, 'input_format': 'json', 'options': None}

Source code in nextmv/nextmv/input.py

def to_dict(self) -> dict[str, Any]:
    """
    Convert the input to a dictionary.

    This method serializes the Input object to a dictionary format that can
    be easily converted to JSON or other serialization formats. When the
    `input_type` is set to `InputFormat.MULTI_FILE`, it will not include
    the `data` field, as it is uncertain how data is deserialized from the file.

    Returns
    -------
    dict[str, Any]
        A dictionary containing the input data, format, and options.

        The structure is:
        ```python
        {
            "data": <the input data>,
            "input_format": <the input format as a string>,
            "options": <the options as a dictionary or None>
        }
        ```

    Examples
    --------
    >>> from nextmv.input import Input, InputFormat
    >>> input_obj = Input(data={"key": "value"}, input_format=InputFormat.JSON)
    >>> input_dict = input_obj.to_dict()
    >>> print(input_dict)
    {'data': {'key': 'value'}, 'input_format': 'json', 'options': None}
    """

    input_dict = {
        "input_format": self.input_format.value,
        "options": self.options.to_dict() if self.options is not None else None,
    }

    if self.input_format == InputFormat.MULTI_FILE:
        return input_dict

    input_dict["data"] = self.data

    return input_dict

InputFormat ¶

Bases: str, Enum

Format of an Input.

You can import the InputFormat class directly from nextmv:

from nextmv import InputFormat

This enum specifies the supported formats for input data.

ATTRIBUTE	DESCRIPTION
`JSON`	JSON format, utf-8 encoded. TYPE: `str`
`TEXT`	Text format, utf-8 encoded. TYPE: `str`
`CSV`	CSV format, utf-8 encoded. TYPE: `str`
`CSV_ARCHIVE`	CSV archive format: multiple CSV files. TYPE: `str`
`MULTI_FILE`	Multi-file format, used for loading multiple files in a single input. TYPE: `str`

CSV `class-attribute` `instance-attribute` ¶

CSV = 'csv'

CSV format, utf-8 encoded.

CSV_ARCHIVE `class-attribute` `instance-attribute` ¶

CSV_ARCHIVE = 'csv-archive'

CSV archive format: multiple CSV files.

JSON `class-attribute` `instance-attribute` ¶

JSON = 'json'

JSON format, utf-8 encoded.

MULTI_FILE `class-attribute` `instance-attribute` ¶

MULTI_FILE = 'multi-file'

Multi-file format, used for loading multiple files in a single input.

TEXT `class-attribute` `instance-attribute` ¶

TEXT = 'text'

Text format, utf-8 encoded.

InputLoader ¶

Base class for loading inputs.

You can import the InputLoader class directly from nextmv:

from nextmv import InputLoader

This abstract class defines the interface for input loaders. Subclasses must implement the load method to provide concrete input loading functionality.

load ¶

load(
    input_format: InputFormat = JSON,
    options: Optional[Options] = None,
    *args,
    **kwargs,
) -> Input

Read the input data. This method should be implemented by subclasses.

PARAMETER	DESCRIPTION
`input_format` ¶	Format of the input data. Default is `InputFormat.JSON`. TYPE: `InputFormat` DEFAULT: `JSON`
`options` ¶	Options for loading the input data. TYPE: `Options` DEFAULT: `None`
`*args` ¶	Additional positional arguments. DEFAULT: `()`
`**kwargs` ¶	Additional keyword arguments. DEFAULT: `{}`

RETURNS	DESCRIPTION
`Input`	The input data.

RAISES	DESCRIPTION
`NotImplementedError`	If the method is not implemented.

Source code in nextmv/nextmv/input.py

def load(
    self,
    input_format: InputFormat = InputFormat.JSON,
    options: Optional[Options] = None,
    *args,
    **kwargs,
) -> Input:
    """
    Read the input data. This method should be implemented by
    subclasses.

    Parameters
    ----------
    input_format : InputFormat, optional
        Format of the input data. Default is `InputFormat.JSON`.
    options : Options, optional
        Options for loading the input data.
    *args
        Additional positional arguments.
    **kwargs
        Additional keyword arguments.

    Returns
    -------
    Input
        The input data.

    Raises
    ------
    NotImplementedError
        If the method is not implemented.
    """

    raise NotImplementedError

LocalInputLoader ¶

Bases: InputLoader

Class for loading local inputs.

You can import the LocalInputLoader class directly from nextmv:

from nextmv import LocalInputLoader

This class can load input data from the local filesystem, by using stdin, a file, or a directory, where applicable. It supports various input formats like JSON, TEXT, CSV, and CSV archive.

Call the load method to read the input data.

Examples:

>>> from nextmv.input import LocalInputLoader, InputFormat
>>> loader = LocalInputLoader()
>>> # Load JSON from stdin or file
>>> input_obj = loader.load(input_format=InputFormat.JSON, path="data.json")
>>> # Load CSV from a file
>>> input_obj = loader.load(input_format=InputFormat.CSV, path="data.csv")

FILE_READERS `class-attribute` `instance-attribute` ¶

FILE_READERS = {
    JSON: _read_json,
    TEXT: _read_text,
    CSV: _read_csv,
}

Dictionary of functions to read from files.

Each key is an InputFormat, and each value is a function that reads from a file in that format.

STDIN_READERS `class-attribute` `instance-attribute` ¶

STDIN_READERS = {
    JSON: lambda _: load(stdin),
    TEXT: lambda _: rstrip("\n"),
    CSV: lambda csv_configurations: list(
        DictReader(stdin, **csv_configurations)
    ),
}

Dictionary of functions to read from standard input.

Each key is an InputFormat, and each value is a function that reads from standard input in that format.

load ¶

load(
    input_format: Optional[InputFormat] = JSON,
    options: Optional[Options] = None,
    path: Optional[str] = None,
    csv_configurations: Optional[dict[str, Any]] = None,
    data_files: Optional[list[DataFile]] = None,
) -> Input

Load the input data. The input data can be in various formats. For InputFormat.JSON, InputFormat.TEXT, and InputFormat.CSV, the data can be streamed from stdin or read from a file. When the path argument is provided (and valid), the input data is read from the file specified by path, otherwise, it is streamed from stdin. For InputFormat.CSV_ARCHIVE, the input data is read from the directory specified by path. If the path is not provided, the default location input is used. The directory should contain one or more files, where each file in the directory is a CSV file.

The Input that is returned contains the data attribute. This data can be of different types, depending on the provided input_format:

InputFormat.JSON: the data is a dict[str, Any].
InputFormat.TEXT: the data is a str.
InputFormat.CSV: the data is a list[dict[str, Any]].
InputFormat.CSV_ARCHIVE: the data is a dict[str, list[dict[str, Any]]]. Each key is the name of the CSV file, minus the .csv extension.
InputFormat.MULTI_FILE: the data is a dict[str, Any], where each key is the file name (with extension) and the value is the data read from the file. The data can be of any type, depending on the file type and the reader function provided in the DataFile instances.

PARAMETER	DESCRIPTION
`input_format` ¶	Format of the input data. Default is `InputFormat.JSON`. TYPE: `InputFormat` DEFAULT: `JSON`
`options` ¶	Options for loading the input data. TYPE: `Options` DEFAULT: `None`
`path` ¶	Path to the input data. TYPE: `str` DEFAULT: `None`
`csv_configurations` ¶	Configurations for loading CSV files. The default `DictReader` is used when loading a CSV file, so you have the option to pass in a dictionary with custom kwargs for the `DictReader`. TYPE: `dict[str, Any]` DEFAULT: `None`
`data_files` ¶	List of `DataFile` instances to read from. This is used when the `input_format` is set to `InputFormat.MULTI_FILE`. Each `DataFile` instance should have a `name` (the file name with extension) and a `loader` function that reads the data from the file. The `loader` function should accept the file path as its first argument and return the data read from the file. The `loader` can also accept additional positional and keyword arguments, which can be provided through the `loader_args` and `loader_kwargs` attributes of the `DataFile` instance. TYPE: `list[DataFile]` DEFAULT: `None`

RETURNS	DESCRIPTION
`Input`	The input data.

RAISES	DESCRIPTION
`ValueError`	If the path is not a directory when working with CSV_ARCHIVE.

Source code in nextmv/nextmv/input.py

def load(
    self,
    input_format: Optional[InputFormat] = InputFormat.JSON,
    options: Optional[Options] = None,
    path: Optional[str] = None,
    csv_configurations: Optional[dict[str, Any]] = None,
    data_files: Optional[list[DataFile]] = None,
) -> Input:
    """
    Load the input data. The input data can be in various formats. For
    `InputFormat.JSON`, `InputFormat.TEXT`, and `InputFormat.CSV`, the data
    can be streamed from stdin or read from a file. When the `path`
    argument is provided (and valid), the input data is read from the file
    specified by `path`, otherwise, it is streamed from stdin. For
    `InputFormat.CSV_ARCHIVE`, the input data is read from the directory
    specified by `path`. If the `path` is not provided, the default
    location `input` is used. The directory should contain one or more
    files, where each file in the directory is a CSV file.

    The `Input` that is returned contains the `data` attribute. This data
    can be of different types, depending on the provided `input_format`:

    - `InputFormat.JSON`: the data is a `dict[str, Any]`.
    - `InputFormat.TEXT`: the data is a `str`.
    - `InputFormat.CSV`: the data is a `list[dict[str, Any]]`.
    - `InputFormat.CSV_ARCHIVE`: the data is a `dict[str, list[dict[str, Any]]]`.
      Each key is the name of the CSV file, minus the `.csv` extension.
    - `InputFormat.MULTI_FILE`: the data is a `dict[str, Any]`, where each
      key is the file name (with extension) and the value is the data read
      from the file. The data can be of any type, depending on the file
      type and the reader function provided in the `DataFile` instances.

    Parameters
    ----------
    input_format : InputFormat, optional
        Format of the input data. Default is `InputFormat.JSON`.
    options : Options, optional
        Options for loading the input data.
    path : str, optional
        Path to the input data.
    csv_configurations : dict[str, Any], optional
        Configurations for loading CSV files. The default `DictReader` is
        used when loading a CSV file, so you have the option to pass in a
        dictionary with custom kwargs for the `DictReader`.
    data_files : list[DataFile], optional
        List of `DataFile` instances to read from. This is used when the
        `input_format` is set to `InputFormat.MULTI_FILE`. Each `DataFile`
        instance should have a `name` (the file name with extension) and a
        `loader` function that reads the data from the file. The `loader`
        function should accept the file path as its first argument and return
        the data read from the file. The `loader` can also accept additional
        positional and keyword arguments, which can be provided through the
        `loader_args` and `loader_kwargs` attributes of the `DataFile`
        instance.

    Returns
    -------
    Input
        The input data.

    Raises
    ------
    ValueError
        If the path is not a directory when working with CSV_ARCHIVE.
    """

    data: Any = None
    if csv_configurations is None:
        csv_configurations = {}

    if input_format in [InputFormat.JSON, InputFormat.TEXT, InputFormat.CSV]:
        data = self._load_utf8_encoded(path=path, input_format=input_format, csv_configurations=csv_configurations)
    elif input_format == InputFormat.CSV_ARCHIVE:
        data = self._load_archive(path=path, csv_configurations=csv_configurations)
    elif input_format == InputFormat.MULTI_FILE:
        if data_files is None:
            raise ValueError("data_files must be provided when input_format is InputFormat.MULTI_FILE")

        if not isinstance(data_files, list):
            raise ValueError("data_files must be a list of DataFile instances")

        data = self._load_multi_file(data_files=data_files, path=path)

    return Input(data=data, input_format=input_format, options=options)

csv_data_file ¶

csv_data_file(
    name: str,
    csv_configurations: Optional[dict[str, Any]] = None,
    input_data_key: Optional[str] = None,
) -> DataFile

This is a convenience function to create a DataFile that reads CSV data.

You can import the csv_data_file function directly from nextmv:

from nextmv import csv_data_file

PARAMETER	DESCRIPTION
`name` ¶	Name of the data file. You don't need to include the `.csv` extension. TYPE: `str`
`csv_configurations` ¶	CSV-specific configurations for reading the data. TYPE: `dict[str, Any]` DEFAULT: `None`
`input_data_key` ¶	A custom key to represent the data from this file. When using `InputFormat.MULTI_FILE` as the `input_format` of the `Input`, the data from the file is loaded to the `.data` parameter of the `Input`. In that case, the type of `.data` is `dict[str, Any]`, where each key represents the file name (with extension) and the value is the data that is actually loaded from the file using the `loader` function. You can set a custom key to represent your file by using this attribute. TYPE: `str` DEFAULT: `None`

RETURNS	DESCRIPTION
`DataFile`	A `DataFile` instance that reads CSV data from a file with the given name.

Examples:

>>> from nextmv import csv_data_file
>>> data_file = csv_data_file("my_data")
>>> data = data_file.read()
>>> print(data)
[
    {"column1": "value1", "column2": "value2"},
    {"column1": "value3", "column2": "value4"}
]

Source code in nextmv/nextmv/input.py

def csv_data_file(
    name: str,
    csv_configurations: Optional[dict[str, Any]] = None,
    input_data_key: Optional[str] = None,
) -> DataFile:
    """
    This is a convenience function to create a `DataFile` that reads CSV data.

    You can import the `csv_data_file` function directly from `nextmv`:

    ```python
    from nextmv import csv_data_file
    ```

    Parameters
    ----------
    name : str
        Name of the data file. You don't need to include the `.csv` extension.
    csv_configurations : dict[str, Any], optional
        CSV-specific configurations for reading the data.
    input_data_key : str, optional
        A custom key to represent the data from this file.

        When using `InputFormat.MULTI_FILE` as the `input_format` of the `Input`,
        the data from the file is loaded to the `.data` parameter of the `Input`.
        In that case, the type of `.data` is `dict[str, Any]`, where each key
        represents the file name (with extension) and the value is the data that is
        actually loaded from the file using the `loader` function. You can set a
        custom key to represent your file by using this attribute.

    Returns
    -------
    DataFile
        A `DataFile` instance that reads CSV data from a file with the given
        name.

    Examples
    --------
    >>> from nextmv import csv_data_file
    >>> data_file = csv_data_file("my_data")
    >>> data = data_file.read()
    >>> print(data)
    [
        {"column1": "value1", "column2": "value2"},
        {"column1": "value3", "column2": "value4"}
    ]
    """

    if not name.endswith(".csv"):
        name += ".csv"

    csv_configurations = csv_configurations or {}

    def loader(file_path: str) -> list[dict[str, Any]]:
        with open(file_path, encoding="utf-8") as f:
            return list(csv.DictReader(f, **csv_configurations))

    return DataFile(
        name=name,
        loader=loader,
        input_data_key=input_data_key,
    )

json_data_file ¶

json_data_file(
    name: str,
    json_configurations: Optional[dict[str, Any]] = None,
    input_data_key: Optional[str] = None,
) -> DataFile

This is a convenience function to create a DataFile that reads JSON data.

You can import the json_data_file function directly from nextmv:

from nextmv import json_data_file

PARAMETER	DESCRIPTION
`name` ¶	Name of the data file. You don't need to include the `.json` extension. TYPE: `str`
`json_configurations` ¶	JSON-specific configurations for reading the data. TYPE: `dict[str, Any]` DEFAULT: `None`
`input_data_key` ¶	A custom key to represent the data from this file. When using `InputFormat.MULTI_FILE` as the `input_format` of the `Input`, the data from the file is loaded to the `.data` parameter of the `Input`. In that case, the type of `.data` is `dict[str, Any]`, where each key represents the file name (with extension) and the value is the data that is actually loaded from the file using the `loader` function. You can set a custom key to represent your file by using this attribute. TYPE: `str` DEFAULT: `None`

RETURNS	DESCRIPTION
`DataFile`	A `DataFile` instance that reads JSON data from a file with the given name.

Examples:

>>> from nextmv import json_data_file
>>> data_file = json_data_file("my_data")
>>> data = data_file.read()
>>> print(data)
{
    "key": "value",
    "another_key": [1, 2, 3]
}

Source code in nextmv/nextmv/input.py

def json_data_file(
    name: str,
    json_configurations: Optional[dict[str, Any]] = None,
    input_data_key: Optional[str] = None,
) -> DataFile:
    """
    This is a convenience function to create a `DataFile` that reads JSON data.

    You can import the `json_data_file` function directly from `nextmv`:

    ```python
    from nextmv import json_data_file
    ```

    Parameters
    ----------
    name : str
        Name of the data file. You don't need to include the `.json` extension.
    json_configurations : dict[str, Any], optional
        JSON-specific configurations for reading the data.
    input_data_key : str, optional
        A custom key to represent the data from this file.

        When using `InputFormat.MULTI_FILE` as the `input_format` of the `Input`,
        the data from the file is loaded to the `.data` parameter of the `Input`.
        In that case, the type of `.data` is `dict[str, Any]`, where each key
        represents the file name (with extension) and the value is the data that is
        actually loaded from the file using the `loader` function. You can set a
        custom key to represent your file by using this attribute.

    Returns
    -------
    DataFile
        A `DataFile` instance that reads JSON data from a file with the given
        name.

    Examples
    --------
    >>> from nextmv import json_data_file
    >>> data_file = json_data_file("my_data")
    >>> data = data_file.read()
    >>> print(data)
    {
        "key": "value",
        "another_key": [1, 2, 3]
    }
    """

    if not name.endswith(".json"):
        name += ".json"

    json_configurations = json_configurations or {}

    def loader(file_path: str) -> Union[dict[str, Any], Any]:
        with open(file_path, encoding="utf-8") as f:
            return json.load(f, **json_configurations)

    return DataFile(
        name=name,
        loader=loader,
        input_data_key=input_data_key,
    )

load ¶

load(
    input_format: Optional[InputFormat] = JSON,
    options: Optional[Options] = None,
    path: Optional[str] = None,
    csv_configurations: Optional[dict[str, Any]] = None,
    loader: Optional[InputLoader] = _LOCAL_INPUT_LOADER,
    data_files: Optional[list[DataFile]] = None,
) -> Input

Load input data using the specified loader.

You can import the load function directly from nextmv:

from nextmv import load

This is a convenience function for loading an Input object. By default, it uses the LocalInputLoader to load data from local sources.

The input data can be in various formats and can be loaded from different sources depending on the loader:

InputFormat.JSON: the data is a dict[str, Any]
InputFormat.TEXT: the data is a str
InputFormat.CSV: the data is a list[dict[str, Any]]
InputFormat.CSV_ARCHIVE: the data is a dict[str, list[dict[str, Any]]] Each key is the name of the CSV file, minus the .csv extension.
InputFormat.MULTI_FILE: the data is a dict[str, Any] where each key is the file name (with extension) and the value is the data read from the file. This is used for loading multiple files in a single input, where each file can be of different types (JSON, CSV, Excel, etc.). The data is loaded as a dict of items, where each item corresponds to a file and its content.

When specifying input_format as InputFormat.MULTI_FILE, the data_files argument must be provided. This argument is a list of DataFile instances, each representing a file to be read. Each DataFile instance should have a name (the file name with extension) and a loader function that reads the data from the file. The loader function should accept the file path as its first argument and return the data read from the file. The loader can also accept additional positional and keyword arguments, which can be provided through the loader_args and loader_kwargs attributes of the DataFile instance.

There are convenience functions that can be used to create DataFile classes, such as:

json_data_file: Creates a DataFile that reads JSON data.
csv_data_file: Creates a DataFile that reads CSV data.
text_data_file: Creates a DataFile that reads utf-8 encoded text data.

When workiing with data in other formats, such as Excel files, you are encouraged to create your own DataFile objects with your own implementation of the loader function. This allows you to read data from files in a way that suits your needs, while still adhering to the DataFile interface.

PARAMETER	DESCRIPTION
`input_format` ¶	Format of the input data. Default is `InputFormat.JSON`. TYPE: `InputFormat` DEFAULT: `JSON`
`options` ¶	Options for loading the input data. TYPE: `Options` DEFAULT: `None`
`path` ¶	Path to the input data. For file-based loaders: - If provided, reads from the specified file or directory - If None, typically reads from stdin (for JSON, TEXT, CSV) or uses a default directory (for CSV_ARCHIVE) TYPE: `str` DEFAULT: `None`
`csv_configurations` ¶	Configurations for loading CSV files. Custom kwargs for Python's `csv.DictReader`. TYPE: `dict[str, Any]` DEFAULT: `None`
`loader` ¶	The loader to use for loading the input data. Default is an instance of `LocalInputLoader`. TYPE: `InputLoader` DEFAULT: `_LOCAL_INPUT_LOADER`
`data_files` ¶	List of `DataFile` instances to read from. This is used when the `input_format` is set to `InputFormat.MULTI_FILE`. Each `DataFile` instance should have a `name` (the file name with extension) and a `loader` function that reads the data from the file. The `loader` function should accept the file path as its first argument and return the data read from the file. The `loader` can also accept additional positional and keyword arguments, which can be provided through the `loader_args` and `loader_kwargs` attributes of the `DataFile` instance. There are convenience functions that can be used to create `DataFile` classes, such as `json_data_file`, `csv_data_file`, and `text_data_file`. When working with data in other formats, such as Excel files, you are encouraged to create your own `DataFile` objects with your own implementation of the `loader` function. This allows you to read data from files in a way that suits your needs, while still adhering to the `DataFile` interface. TYPE: `list[DataFile]` DEFAULT: `None`

RETURNS	DESCRIPTION
`Input`	The loaded input data in an Input object.

RAISES	DESCRIPTION
`ValueError`	If the path is invalid or data format is incorrect.

Examples:

>>> from nextmv.input import load, InputFormat
>>> # Load JSON from stdin
>>> input_obj = load(input_format=InputFormat.JSON)
>>> # Load CSV from a file
>>> input_obj = load(input_format=InputFormat.CSV, path="data.csv")
>>> # Load CSV archive from a directory
>>> input_obj = load(input_format=InputFormat.CSV_ARCHIVE, path="input_dir")

Source code in nextmv/nextmv/input.py

def load(
    input_format: Optional[InputFormat] = InputFormat.JSON,
    options: Optional[Options] = None,
    path: Optional[str] = None,
    csv_configurations: Optional[dict[str, Any]] = None,
    loader: Optional[InputLoader] = _LOCAL_INPUT_LOADER,
    data_files: Optional[list[DataFile]] = None,
) -> Input:
    """
    Load input data using the specified loader.

    You can import the `load` function directly from `nextmv`:

    ```python
    from nextmv import load
    ```

    This is a convenience function for loading an `Input` object. By default,
    it uses the `LocalInputLoader` to load data from local sources.

    The input data can be in various formats and can be loaded from different
    sources depending on the loader:

    - `InputFormat.JSON`: the data is a `dict[str, Any]`
    - `InputFormat.TEXT`: the data is a `str`
    - `InputFormat.CSV`: the data is a `list[dict[str, Any]]`
    - `InputFormat.CSV_ARCHIVE`: the data is a `dict[str, list[dict[str, Any]]]`
        Each key is the name of the CSV file, minus the `.csv` extension.
    - `InputFormat.MULTI_FILE`: the data is a `dict[str, Any]`
        where each key is the file name (with extension) and the value is the
        data read from the file. This is used for loading multiple files in a
        single input, where each file can be of different types (JSON, CSV,
        Excel, etc.). The data is loaded as a dict of items, where each item
        corresponds to a file and its content.

    When specifying `input_format` as `InputFormat.MULTI_FILE`, the
    `data_files` argument must be provided. This argument is a list of
    `DataFile` instances, each representing a file to be read. Each `DataFile`
    instance should have a `name` (the file name with extension) and a `loader`
    function that reads the data from the file. The `loader` function should
    accept the file path as its first argument and return the data read from
    the file. The `loader` can also accept additional positional and keyword
    arguments, which can be provided through the `loader_args` and
    `loader_kwargs` attributes of the `DataFile` instance.

    There are convenience functions that can be used to create `DataFile`
    classes, such as:

    - `json_data_file`: Creates a `DataFile` that reads JSON data.
    - `csv_data_file`: Creates a `DataFile` that reads CSV data.
    - `text_data_file`: Creates a `DataFile` that reads utf-8 encoded text
      data.

    When workiing with data in other formats, such as Excel files, you are
    encouraged to create your own `DataFile` objects with your own
    implementation of the `loader` function. This allows you to read data
    from files in a way that suits your needs, while still adhering to the
    `DataFile` interface.

    Parameters
    ----------
    input_format : InputFormat, optional
        Format of the input data. Default is `InputFormat.JSON`.
    options : Options, optional
        Options for loading the input data.
    path : str, optional
        Path to the input data. For file-based loaders:
        - If provided, reads from the specified file or directory
        - If None, typically reads from stdin (for JSON, TEXT, CSV)
          or uses a default directory (for CSV_ARCHIVE)
    csv_configurations : dict[str, Any], optional
        Configurations for loading CSV files. Custom kwargs for
        Python's `csv.DictReader`.
    loader : InputLoader, optional
        The loader to use for loading the input data.
        Default is an instance of `LocalInputLoader`.
    data_files : list[DataFile], optional
        List of `DataFile` instances to read from. This is used when the
        `input_format` is set to `InputFormat.MULTI_FILE`. Each `DataFile`
        instance should have a `name` (the file name with extension) and a
        `loader` function that reads the data from the file. The `loader`
        function should accept the file path as its first argument and return
        the data read from the file. The `loader` can also accept additional
        positional and keyword arguments, which can be provided through the
        `loader_args` and `loader_kwargs` attributes of the `DataFile`
        instance.

        There are convenience functions that can be used to create `DataFile`
        classes, such as `json_data_file`, `csv_data_file`, and
        `text_data_file`. When working with data in other formats, such as
        Excel files, you are encouraged to create your own `DataFile` objects
        with your own implementation of the `loader` function. This allows you
        to read data from files in a way that suits your needs, while still
        adhering to the `DataFile` interface.

    Returns
    -------
    Input
        The loaded input data in an Input object.

    Raises
    ------
    ValueError
        If the path is invalid or data format is incorrect.

    Examples
    --------
    >>> from nextmv.input import load, InputFormat
    >>> # Load JSON from stdin
    >>> input_obj = load(input_format=InputFormat.JSON)
    >>> # Load CSV from a file
    >>> input_obj = load(input_format=InputFormat.CSV, path="data.csv")
    >>> # Load CSV archive from a directory
    >>> input_obj = load(input_format=InputFormat.CSV_ARCHIVE, path="input_dir")
    """

    return loader.load(input_format, options, path, csv_configurations, data_files)

load_local ¶

load_local(
    input_format: Optional[InputFormat] = JSON,
    options: Optional[Options] = None,
    path: Optional[str] = None,
    csv_configurations: Optional[dict[str, Any]] = None,
) -> Input

Warning

load_local is deprecated, use load instead.

Load input data from local sources.

This is a convenience function for instantiating a LocalInputLoader and calling its load method.

PARAMETER	DESCRIPTION
`input_format` ¶	Format of the input data. Default is `InputFormat.JSON`. TYPE: `InputFormat` DEFAULT: `JSON`
`options` ¶	Options for loading the input data. TYPE: `Options` DEFAULT: `None`
`path` ¶	Path to the input data. TYPE: `str` DEFAULT: `None`
`csv_configurations` ¶	Configurations for loading CSV files. Custom kwargs for Python's `csv.DictReader`. TYPE: `dict[str, Any]` DEFAULT: `None`

RETURNS	DESCRIPTION
`Input`	The loaded input data in an Input object.

RAISES	DESCRIPTION
`ValueError`	If the path is invalid or data format is incorrect.

text_data_file ¶

text_data_file(
    name: str, input_data_key: Optional[str] = None
) -> DataFile

This is a convenience function to create a DataFile that reads utf-8 encoded text data.

You can import the text_data_file function directly from nextmv:

from nextmv import text_data_file

You must provide the extension as part of the name parameter.

PARAMETER DESCRIPTION

`name` ¶

Name of the data file. The file extension must be provided in the name.

TYPE: str

`input_data_key` ¶

A custom key to represent the data from this file.

When using InputFormat.MULTI_FILE as the input_format of the Input, the data from the file is loaded to the .data parameter of the Input. In that case, the type of .data is dict[str, Any], where each key represents the file name (with extension) and the value is the data that is actually loaded from the file using the loader function. You can set a custom key to represent your file by using this attribute.

TYPE: str DEFAULT: None

RETURNS	DESCRIPTION
`DataFile`	A `DataFile` instance that reads text data from a file with the given name.

Examples:

>>> from nextmv import text_data_file
>>> data_file = text_data_file("my_data")
>>> data = data_file.read()
>>> print(data)
This is some text data.

Source code in nextmv/nextmv/input.py

def text_data_file(name: str, input_data_key: Optional[str] = None) -> DataFile:
    """
    This is a convenience function to create a `DataFile` that reads utf-8
    encoded text data.

    You can import the `text_data_file` function directly from `nextmv`:

    ```python
    from nextmv import text_data_file
    ```

    You must provide the extension as part of the `name` parameter.

    Parameters
    ----------
    name : str
        Name of the data file. The file extension must be provided in the name.
    input_data_key : str, optional
        A custom key to represent the data from this file.

        When using `InputFormat.MULTI_FILE` as the `input_format` of the `Input`,
        the data from the file is loaded to the `.data` parameter of the `Input`.
        In that case, the type of `.data` is `dict[str, Any]`, where each key
        represents the file name (with extension) and the value is the data that is
        actually loaded from the file using the `loader` function. You can set a
        custom key to represent your file by using this attribute.

    Returns
    -------
    DataFile
        A `DataFile` instance that reads text data from a file with the given
        name.

    Examples
    --------
    >>> from nextmv import text_data_file
    >>> data_file = text_data_file("my_data")
    >>> data = data_file.read()
    >>> print(data)
    This is some text data.
    """

    def loader(file_path: str) -> str:
        with open(file_path, encoding="utf-8") as f:
            return f.read().rstrip("\n")

    return DataFile(
        name=name,
        loader=loader,
        input_data_key=input_data_key,
    )

Input Module¶

input ¶

DataFile dataclass ¶

name ¶

reader ¶

input_data_key class-attribute instance-attribute ¶

loader instance-attribute ¶

loader_args class-attribute instance-attribute ¶

loader_kwargs class-attribute instance-attribute ¶

name instance-attribute ¶

Input dataclass ¶

data ¶

dict ¶

input_format ¶

options ¶

data instance-attribute ¶

input_format class-attribute instance-attribute ¶

options class-attribute instance-attribute ¶

to_dict ¶

InputFormat ¶

CSV class-attribute instance-attribute ¶

CSV_ARCHIVE class-attribute instance-attribute ¶

JSON class-attribute instance-attribute ¶

MULTI_FILE class-attribute instance-attribute ¶

TEXT class-attribute instance-attribute ¶

InputLoader ¶

load ¶

input_format ¶

options ¶

*args ¶

**kwargs ¶

LocalInputLoader ¶

FILE_READERS class-attribute instance-attribute ¶

STDIN_READERS class-attribute instance-attribute ¶

load ¶

input_format ¶

options ¶

path ¶

csv_configurations ¶

data_files ¶

csv_data_file ¶

name ¶

csv_configurations ¶

input_data_key ¶

json_data_file ¶

name ¶

json_configurations ¶

input_data_key ¶

load ¶

input_format ¶

options ¶

path ¶

csv_configurations ¶

loader ¶

data_files ¶

load_local ¶

input_format ¶

options ¶

path ¶

csv_configurations ¶

text_data_file ¶

name ¶

input_data_key ¶

DataFile `dataclass` ¶

`name` ¶

`reader` ¶

input_data_key `class-attribute` `instance-attribute` ¶

loader `instance-attribute` ¶

loader_args `class-attribute` `instance-attribute` ¶

loader_kwargs `class-attribute` `instance-attribute` ¶

name `instance-attribute` ¶

Input `dataclass` ¶

`data` ¶

`dict` ¶

`input_format` ¶

`options` ¶

data `instance-attribute` ¶

input_format `class-attribute` `instance-attribute` ¶

options `class-attribute` `instance-attribute` ¶

CSV `class-attribute` `instance-attribute` ¶

CSV_ARCHIVE `class-attribute` `instance-attribute` ¶

JSON `class-attribute` `instance-attribute` ¶

MULTI_FILE `class-attribute` `instance-attribute` ¶

TEXT `class-attribute` `instance-attribute` ¶

`input_format` ¶

`options` ¶

`*args` ¶

`**kwargs` ¶

FILE_READERS `class-attribute` `instance-attribute` ¶

STDIN_READERS `class-attribute` `instance-attribute` ¶

`input_format` ¶

`options` ¶

`path` ¶

`csv_configurations` ¶

`data_files` ¶

`name` ¶

`csv_configurations` ¶

`input_data_key` ¶

`name` ¶

`json_configurations` ¶

`input_data_key` ¶

`input_format` ¶

`options` ¶

`path` ¶

`csv_configurations` ¶

`loader` ¶

`data_files` ¶

`input_format` ¶

`options` ¶

`path` ¶

`csv_configurations` ¶

`name` ¶

`input_data_key` ¶