Skip to content

Input Module

This section documents the input components of the Nextmv Python SDK.

input

Module for handling input sources and data.

This module provides classes and functions for loading and handling input data in various formats for decision problems. It supports JSON, plain text, CSV, and CSV archive formats and can load data from standard input or files.

CLASS DESCRIPTION
InputFormat

Enum defining supported input data formats (JSON, TEXT, CSV, CSV_ARCHIVE).

Input

Container for input data with format specification and options.

InputLoader

Base class for loading inputs from various sources.

LocalInputLoader

Class for loading inputs from local files or stdin.

FUNCTION DESCRIPTION
load

Load input data using a specified loader.

DataFile dataclass

DataFile(
    name: str,
    loader: Callable[[str], Any],
    loader_kwargs: Optional[dict[str, Any]] = None,
    loader_args: Optional[list[Any]] = None,
    input_data_key: Optional[str] = None,
)

Represents data to be read from a file.

You can import the DataFile class directly from nextmv:

from nextmv import DataFile

This class is used to define data that will be read from a file in the filesystem. It includes the name of the file, and the reader function that will handle the loading, and deserialization of the data from the file. This DataFile class is typically used in the Input, when the Input.input_format is set to InputFormat.MULTI_FILE. Given that it is difficul to handle every edge case of how data is deserialized, and read from a file, this class exists so that the user can implement the reader callable of their choice and provide it with any reader_args and reader_kwargs they might need.

PARAMETER DESCRIPTION

name

Name of the data (input) file. The file extension should be included in the name.

TYPE: str

reader

Callable that reads the data from the file. This should be a function implemented by the user. There are convenience functions that you can use as a reader as well. The reader must receive, at the very minimum, the following arguments:

  • file_path: a str argument which is the location where this data will be read from. This includes the dir and name of the file. As such, the name parameter of this class is going to be passed to the reader function, joined with the directory where the file will be read from.

The reader can also receive additional arguments, and keyword arguments. The reader_args and reader_kwargs parameters of this class can be used to provide those additional arguments.

The reader function should return the data that will be used in the model.

TYPE: Callable[[str], Any]

input_data_key class-attribute instance-attribute

input_data_key: Optional[str] = None

Use this parameter to set a custom key to represent your file.

When using InputFormat.MULTI_FILE as the input_format of the Input, the data from the file is loaded to the .data parameter of the Input. In that case, the type of .data is dict[str, Any], where each key represents the file name (with extension) and the value is the data that is actually loaded from the file using the loader function. You can set a custom key to represent your file by using this attribute.

loader instance-attribute

loader: Callable[[str], Any]

Callable that reads (loads) the data from the file. This should be a function implemented by the user. There are convenience functions that you can use as a loader as well. The loader must receive, at the very minimum, the following arguments:

  • file_path: a str argument which is the location where this data will be read from. This includes the dir and name of the file. As such, the name parameter of this class is going to be passed to the loader function, joined with the directory where the file will be read from.

The loader can also receive additional arguments, and keyword arguments. The loader_args and loader_kwargs parameters of this class can be used to provide those additional arguments.

The loader function should return the data that will be used in the model.

loader_args class-attribute instance-attribute

loader_args: Optional[list[Any]] = None

Optional positional arguments to pass to the loader function. This can be used to customize the behavior of the loader.

loader_kwargs class-attribute instance-attribute

loader_kwargs: Optional[dict[str, Any]] = None

Optional keyword arguments to pass to the loader function. This can be used to customize the behavior of the loader.

name instance-attribute

name: str

Name of the data (input) file. The file extension should be included in the name.

Input dataclass

Input(
    data: Union[
        Union[dict[str, Any], Any],
        str,
        list[dict[str, Any]],
        dict[str, list[dict[str, Any]]],
        dict[str, Any],
    ],
    input_format: Optional[InputFormat] = JSON,
    options: Optional[Options] = None,
)

Input for a decision problem.

You can import the Input class directly from nextmv:

from nextmv import Input

The data's type must match the input_format:

  • InputFormat.JSON: the data is Union[dict[str, Any], Any]. This just means that the data must be JSON-deserializable, which includes dicts and lists.
  • InputFormat.TEXT: the data is str, and it must be utf-8 encoded.
  • InputFormat.CSV: the data is list[dict[str, Any]], where each dict represents a row in the CSV.
  • InputFormat.CSV_ARCHIVE: the data is dict[str, list[dict[str, Any]]], where each key is the name of a CSV file and the value is a list of dicts representing the rows in that CSV file.
  • InputFormat.MULTI_FILE: the data is dict[str, Any], where for each item, the key is the file name (with the extension) and the actual data from the file is the value. When working with multi-file, data is loaded from one or more files in a specific directory. Given that each file can be of different types (JSON, CSV, Excel, etc...), the data captured from each might vary. To reflect this, the data is loaded as a dict of items. You can have a custom key for the data, that is not the file name, if you use the input_data_key parameter of the DataFile class.
PARAMETER DESCRIPTION

data

TYPE: Union[Union[dict[str, Any], Any], str, list[dict[str, Any]],

dict

The actual data.

input_format

Format of the input data. Default is InputFormat.JSON.

TYPE: InputFormat DEFAULT: JSON

options

Options that the input was created with.

TYPE: Options DEFAULT: None

RAISES DESCRIPTION
ValueError

If the data type doesn't match the expected type for the given format.

ValueError

If the input_format is not one of the supported formats.

data instance-attribute

data: Union[
    Union[dict[str, Any], Any],
    str,
    list[dict[str, Any]],
    dict[str, list[dict[str, Any]]],
    dict[str, Any],
]

The actual data.

The data can be of various types, depending on the input format:

  • For JSON: Union[dict[str, Any], Any]
  • For TEXT: str
  • For CSV: list[dict[str, Any]]
  • For CSV_ARCHIVE: dict[str, list[dict[str, Any]]]
  • For MULTI_FILE: dict[str, Any]

input_format class-attribute instance-attribute

input_format: Optional[InputFormat] = JSON

Format of the input data.

Default is InputFormat.JSON.

options class-attribute instance-attribute

options: Optional[Options] = None

Options that the Input was created with.

A copy of the options is made during initialization, ensuring the original options remain unchanged even if modified later.

to_dict

to_dict() -> dict[str, Any]

Convert the input to a dictionary.

This method serializes the Input object to a dictionary format that can be easily converted to JSON or other serialization formats. When the input_type is set to InputFormat.MULTI_FILE, it will not include the data field, as it is uncertain how data is deserialized from the file.

RETURNS DESCRIPTION
dict[str, Any]

A dictionary containing the input data, format, and options.

The structure is:

{
    "data": <the input data>,
    "input_format": <the input format as a string>,
    "options": <the options as a dictionary or None>
}

Examples:

>>> from nextmv.input import Input, InputFormat
>>> input_obj = Input(data={"key": "value"}, input_format=InputFormat.JSON)
>>> input_dict = input_obj.to_dict()
>>> print(input_dict)
{'data': {'key': 'value'}, 'input_format': 'json', 'options': None}
Source code in nextmv/nextmv/input.py
def to_dict(self) -> dict[str, Any]:
    """
    Convert the input to a dictionary.

    This method serializes the Input object to a dictionary format that can
    be easily converted to JSON or other serialization formats. When the
    `input_type` is set to `InputFormat.MULTI_FILE`, it will not include
    the `data` field, as it is uncertain how data is deserialized from the file.

    Returns
    -------
    dict[str, Any]
        A dictionary containing the input data, format, and options.

        The structure is:
        ```python
        {
            "data": <the input data>,
            "input_format": <the input format as a string>,
            "options": <the options as a dictionary or None>
        }
        ```

    Examples
    --------
    >>> from nextmv.input import Input, InputFormat
    >>> input_obj = Input(data={"key": "value"}, input_format=InputFormat.JSON)
    >>> input_dict = input_obj.to_dict()
    >>> print(input_dict)
    {'data': {'key': 'value'}, 'input_format': 'json', 'options': None}
    """

    input_dict = {
        "input_format": self.input_format.value,
        "options": self.options.to_dict() if self.options is not None else None,
    }

    if self.input_format == InputFormat.MULTI_FILE:
        return input_dict

    input_dict["data"] = self.data

    return input_dict

InputFormat

Bases: str, Enum

Format of an Input.

You can import the InputFormat class directly from nextmv:

from nextmv import InputFormat

This enum specifies the supported formats for input data.

ATTRIBUTE DESCRIPTION
JSON

JSON format, utf-8 encoded.

TYPE: str

TEXT

Text format, utf-8 encoded.

TYPE: str

CSV

CSV format, utf-8 encoded.

TYPE: str

CSV_ARCHIVE

CSV archive format: multiple CSV files.

TYPE: str

MULTI_FILE

Multi-file format, used for loading multiple files in a single input.

TYPE: str

CSV class-attribute instance-attribute

CSV = 'csv'

CSV format, utf-8 encoded.

CSV_ARCHIVE class-attribute instance-attribute

CSV_ARCHIVE = 'csv-archive'

CSV archive format: multiple CSV files.

JSON class-attribute instance-attribute

JSON = 'json'

JSON format, utf-8 encoded.

MULTI_FILE class-attribute instance-attribute

MULTI_FILE = 'multi-file'

Multi-file format, used for loading multiple files in a single input.

TEXT class-attribute instance-attribute

TEXT = 'text'

Text format, utf-8 encoded.

InputLoader

Base class for loading inputs.

You can import the InputLoader class directly from nextmv:

from nextmv import InputLoader

This abstract class defines the interface for input loaders. Subclasses must implement the load method to provide concrete input loading functionality.

load

load(
    input_format: InputFormat = JSON,
    options: Optional[Options] = None,
    *args,
    **kwargs,
) -> Input

Read the input data. This method should be implemented by subclasses.

PARAMETER DESCRIPTION
input_format

Format of the input data. Default is InputFormat.JSON.

TYPE: InputFormat DEFAULT: JSON

options

Options for loading the input data.

TYPE: Options DEFAULT: None

*args

Additional positional arguments.

DEFAULT: ()

**kwargs

Additional keyword arguments.

DEFAULT: {}

RETURNS DESCRIPTION
Input

The input data.

RAISES DESCRIPTION
NotImplementedError

If the method is not implemented.

Source code in nextmv/nextmv/input.py
def load(
    self,
    input_format: InputFormat = InputFormat.JSON,
    options: Optional[Options] = None,
    *args,
    **kwargs,
) -> Input:
    """
    Read the input data. This method should be implemented by
    subclasses.

    Parameters
    ----------
    input_format : InputFormat, optional
        Format of the input data. Default is `InputFormat.JSON`.
    options : Options, optional
        Options for loading the input data.
    *args
        Additional positional arguments.
    **kwargs
        Additional keyword arguments.

    Returns
    -------
    Input
        The input data.

    Raises
    ------
    NotImplementedError
        If the method is not implemented.
    """

    raise NotImplementedError

LocalInputLoader

Bases: InputLoader

Class for loading local inputs.

You can import the LocalInputLoader class directly from nextmv:

from nextmv import LocalInputLoader

This class can load input data from the local filesystem, by using stdin, a file, or a directory, where applicable. It supports various input formats like JSON, TEXT, CSV, and CSV archive.

Call the load method to read the input data.

Examples:

>>> from nextmv.input import LocalInputLoader, InputFormat
>>> loader = LocalInputLoader()
>>> # Load JSON from stdin or file
>>> input_obj = loader.load(input_format=InputFormat.JSON, path="data.json")
>>> # Load CSV from a file
>>> input_obj = loader.load(input_format=InputFormat.CSV, path="data.csv")

FILE_READERS class-attribute instance-attribute

FILE_READERS = {
    JSON: _read_json,
    TEXT: _read_text,
    CSV: _read_csv,
}

Dictionary of functions to read from files.

Each key is an InputFormat, and each value is a function that reads from a file in that format.

STDIN_READERS class-attribute instance-attribute

STDIN_READERS = {
    JSON: lambda _: load(stdin),
    TEXT: lambda _: rstrip("\n"),
    CSV: lambda csv_configurations: list(
        DictReader(stdin, **csv_configurations)
    ),
}

Dictionary of functions to read from standard input.

Each key is an InputFormat, and each value is a function that reads from standard input in that format.

load

load(
    input_format: Optional[InputFormat] = JSON,
    options: Optional[Options] = None,
    path: Optional[str] = None,
    csv_configurations: Optional[dict[str, Any]] = None,
    data_files: Optional[list[DataFile]] = None,
) -> Input

Load the input data. The input data can be in various formats. For InputFormat.JSON, InputFormat.TEXT, and InputFormat.CSV, the data can be streamed from stdin or read from a file. When the path argument is provided (and valid), the input data is read from the file specified by path, otherwise, it is streamed from stdin. For InputFormat.CSV_ARCHIVE, the input data is read from the directory specified by path. If the path is not provided, the default location input is used. The directory should contain one or more files, where each file in the directory is a CSV file.

The Input that is returned contains the data attribute. This data can be of different types, depending on the provided input_format:

  • InputFormat.JSON: the data is a dict[str, Any].
  • InputFormat.TEXT: the data is a str.
  • InputFormat.CSV: the data is a list[dict[str, Any]].
  • InputFormat.CSV_ARCHIVE: the data is a dict[str, list[dict[str, Any]]]. Each key is the name of the CSV file, minus the .csv extension.
  • InputFormat.MULTI_FILE: the data is a dict[str, Any], where each key is the file name (with extension) and the value is the data read from the file. The data can be of any type, depending on the file type and the reader function provided in the DataFile instances.
PARAMETER DESCRIPTION
input_format

Format of the input data. Default is InputFormat.JSON.

TYPE: InputFormat DEFAULT: JSON

options

Options for loading the input data.

TYPE: Options DEFAULT: None

path

Path to the input data.

TYPE: str DEFAULT: None

csv_configurations

Configurations for loading CSV files. The default DictReader is used when loading a CSV file, so you have the option to pass in a dictionary with custom kwargs for the DictReader.

TYPE: dict[str, Any] DEFAULT: None

data_files

List of DataFile instances to read from. This is used when the input_format is set to InputFormat.MULTI_FILE. Each DataFile instance should have a name (the file name with extension) and a loader function that reads the data from the file. The loader function should accept the file path as its first argument and return the data read from the file. The loader can also accept additional positional and keyword arguments, which can be provided through the loader_args and loader_kwargs attributes of the DataFile instance.

TYPE: list[DataFile] DEFAULT: None

RETURNS DESCRIPTION
Input

The input data.

RAISES DESCRIPTION
ValueError

If the path is not a directory when working with CSV_ARCHIVE.

Source code in nextmv/nextmv/input.py
def load(
    self,
    input_format: Optional[InputFormat] = InputFormat.JSON,
    options: Optional[Options] = None,
    path: Optional[str] = None,
    csv_configurations: Optional[dict[str, Any]] = None,
    data_files: Optional[list[DataFile]] = None,
) -> Input:
    """
    Load the input data. The input data can be in various formats. For
    `InputFormat.JSON`, `InputFormat.TEXT`, and `InputFormat.CSV`, the data
    can be streamed from stdin or read from a file. When the `path`
    argument is provided (and valid), the input data is read from the file
    specified by `path`, otherwise, it is streamed from stdin. For
    `InputFormat.CSV_ARCHIVE`, the input data is read from the directory
    specified by `path`. If the `path` is not provided, the default
    location `input` is used. The directory should contain one or more
    files, where each file in the directory is a CSV file.

    The `Input` that is returned contains the `data` attribute. This data
    can be of different types, depending on the provided `input_format`:

    - `InputFormat.JSON`: the data is a `dict[str, Any]`.
    - `InputFormat.TEXT`: the data is a `str`.
    - `InputFormat.CSV`: the data is a `list[dict[str, Any]]`.
    - `InputFormat.CSV_ARCHIVE`: the data is a `dict[str, list[dict[str, Any]]]`.
      Each key is the name of the CSV file, minus the `.csv` extension.
    - `InputFormat.MULTI_FILE`: the data is a `dict[str, Any]`, where each
      key is the file name (with extension) and the value is the data read
      from the file. The data can be of any type, depending on the file
      type and the reader function provided in the `DataFile` instances.

    Parameters
    ----------
    input_format : InputFormat, optional
        Format of the input data. Default is `InputFormat.JSON`.
    options : Options, optional
        Options for loading the input data.
    path : str, optional
        Path to the input data.
    csv_configurations : dict[str, Any], optional
        Configurations for loading CSV files. The default `DictReader` is
        used when loading a CSV file, so you have the option to pass in a
        dictionary with custom kwargs for the `DictReader`.
    data_files : list[DataFile], optional
        List of `DataFile` instances to read from. This is used when the
        `input_format` is set to `InputFormat.MULTI_FILE`. Each `DataFile`
        instance should have a `name` (the file name with extension) and a
        `loader` function that reads the data from the file. The `loader`
        function should accept the file path as its first argument and return
        the data read from the file. The `loader` can also accept additional
        positional and keyword arguments, which can be provided through the
        `loader_args` and `loader_kwargs` attributes of the `DataFile`
        instance.

    Returns
    -------
    Input
        The input data.

    Raises
    ------
    ValueError
        If the path is not a directory when working with CSV_ARCHIVE.
    """

    data: Any = None
    if csv_configurations is None:
        csv_configurations = {}

    if input_format in [InputFormat.JSON, InputFormat.TEXT, InputFormat.CSV]:
        data = self._load_utf8_encoded(path=path, input_format=input_format, csv_configurations=csv_configurations)
    elif input_format == InputFormat.CSV_ARCHIVE:
        data = self._load_archive(path=path, csv_configurations=csv_configurations)
    elif input_format == InputFormat.MULTI_FILE:
        if data_files is None:
            raise ValueError("data_files must be provided when input_format is InputFormat.MULTI_FILE")

        if not isinstance(data_files, list):
            raise ValueError("data_files must be a list of DataFile instances")

        data = self._load_multi_file(data_files=data_files, path=path)

    return Input(data=data, input_format=input_format, options=options)

csv_data_file

csv_data_file(
    name: str,
    csv_configurations: Optional[dict[str, Any]] = None,
    input_data_key: Optional[str] = None,
) -> DataFile

This is a convenience function to create a DataFile that reads CSV data.

You can import the csv_data_file function directly from nextmv:

from nextmv import csv_data_file
PARAMETER DESCRIPTION

name

Name of the data file. You don't need to include the .csv extension.

TYPE: str

csv_configurations

CSV-specific configurations for reading the data.

TYPE: dict[str, Any] DEFAULT: None

input_data_key

A custom key to represent the data from this file.

When using InputFormat.MULTI_FILE as the input_format of the Input, the data from the file is loaded to the .data parameter of the Input. In that case, the type of .data is dict[str, Any], where each key represents the file name (with extension) and the value is the data that is actually loaded from the file using the loader function. You can set a custom key to represent your file by using this attribute.

TYPE: str DEFAULT: None

RETURNS DESCRIPTION
DataFile

A DataFile instance that reads CSV data from a file with the given name.

Examples:

>>> from nextmv import csv_data_file
>>> data_file = csv_data_file("my_data")
>>> data = data_file.read()
>>> print(data)
[
    {"column1": "value1", "column2": "value2"},
    {"column1": "value3", "column2": "value4"}
]
Source code in nextmv/nextmv/input.py
def csv_data_file(
    name: str,
    csv_configurations: Optional[dict[str, Any]] = None,
    input_data_key: Optional[str] = None,
) -> DataFile:
    """
    This is a convenience function to create a `DataFile` that reads CSV data.

    You can import the `csv_data_file` function directly from `nextmv`:

    ```python
    from nextmv import csv_data_file
    ```

    Parameters
    ----------
    name : str
        Name of the data file. You don't need to include the `.csv` extension.
    csv_configurations : dict[str, Any], optional
        CSV-specific configurations for reading the data.
    input_data_key : str, optional
        A custom key to represent the data from this file.

        When using `InputFormat.MULTI_FILE` as the `input_format` of the `Input`,
        the data from the file is loaded to the `.data` parameter of the `Input`.
        In that case, the type of `.data` is `dict[str, Any]`, where each key
        represents the file name (with extension) and the value is the data that is
        actually loaded from the file using the `loader` function. You can set a
        custom key to represent your file by using this attribute.

    Returns
    -------
    DataFile
        A `DataFile` instance that reads CSV data from a file with the given
        name.

    Examples
    --------
    >>> from nextmv import csv_data_file
    >>> data_file = csv_data_file("my_data")
    >>> data = data_file.read()
    >>> print(data)
    [
        {"column1": "value1", "column2": "value2"},
        {"column1": "value3", "column2": "value4"}
    ]
    """

    if not name.endswith(".csv"):
        name += ".csv"

    csv_configurations = csv_configurations or {}

    def loader(file_path: str) -> list[dict[str, Any]]:
        with open(file_path, encoding="utf-8") as f:
            return list(csv.DictReader(f, **csv_configurations))

    return DataFile(
        name=name,
        loader=loader,
        input_data_key=input_data_key,
    )

json_data_file

json_data_file(
    name: str,
    json_configurations: Optional[dict[str, Any]] = None,
    input_data_key: Optional[str] = None,
) -> DataFile

This is a convenience function to create a DataFile that reads JSON data.

You can import the json_data_file function directly from nextmv:

from nextmv import json_data_file
PARAMETER DESCRIPTION

name

Name of the data file. You don't need to include the .json extension.

TYPE: str

json_configurations

JSON-specific configurations for reading the data.

TYPE: dict[str, Any] DEFAULT: None

input_data_key

A custom key to represent the data from this file.

When using InputFormat.MULTI_FILE as the input_format of the Input, the data from the file is loaded to the .data parameter of the Input. In that case, the type of .data is dict[str, Any], where each key represents the file name (with extension) and the value is the data that is actually loaded from the file using the loader function. You can set a custom key to represent your file by using this attribute.

TYPE: str DEFAULT: None

RETURNS DESCRIPTION
DataFile

A DataFile instance that reads JSON data from a file with the given name.

Examples:

>>> from nextmv import json_data_file
>>> data_file = json_data_file("my_data")
>>> data = data_file.read()
>>> print(data)
{
    "key": "value",
    "another_key": [1, 2, 3]
}
Source code in nextmv/nextmv/input.py
def json_data_file(
    name: str,
    json_configurations: Optional[dict[str, Any]] = None,
    input_data_key: Optional[str] = None,
) -> DataFile:
    """
    This is a convenience function to create a `DataFile` that reads JSON data.

    You can import the `json_data_file` function directly from `nextmv`:

    ```python
    from nextmv import json_data_file
    ```

    Parameters
    ----------
    name : str
        Name of the data file. You don't need to include the `.json` extension.
    json_configurations : dict[str, Any], optional
        JSON-specific configurations for reading the data.
    input_data_key : str, optional
        A custom key to represent the data from this file.

        When using `InputFormat.MULTI_FILE` as the `input_format` of the `Input`,
        the data from the file is loaded to the `.data` parameter of the `Input`.
        In that case, the type of `.data` is `dict[str, Any]`, where each key
        represents the file name (with extension) and the value is the data that is
        actually loaded from the file using the `loader` function. You can set a
        custom key to represent your file by using this attribute.

    Returns
    -------
    DataFile
        A `DataFile` instance that reads JSON data from a file with the given
        name.

    Examples
    --------
    >>> from nextmv import json_data_file
    >>> data_file = json_data_file("my_data")
    >>> data = data_file.read()
    >>> print(data)
    {
        "key": "value",
        "another_key": [1, 2, 3]
    }
    """

    if not name.endswith(".json"):
        name += ".json"

    json_configurations = json_configurations or {}

    def loader(file_path: str) -> Union[dict[str, Any], Any]:
        with open(file_path, encoding="utf-8") as f:
            return json.load(f, **json_configurations)

    return DataFile(
        name=name,
        loader=loader,
        input_data_key=input_data_key,
    )

load

load(
    input_format: Optional[InputFormat] = JSON,
    options: Optional[Options] = None,
    path: Optional[str] = None,
    csv_configurations: Optional[dict[str, Any]] = None,
    loader: Optional[InputLoader] = _LOCAL_INPUT_LOADER,
    data_files: Optional[list[DataFile]] = None,
) -> Input

Load input data using the specified loader.

You can import the load function directly from nextmv:

from nextmv import load

This is a convenience function for loading an Input object. By default, it uses the LocalInputLoader to load data from local sources.

The input data can be in various formats and can be loaded from different sources depending on the loader:

  • InputFormat.JSON: the data is a dict[str, Any]
  • InputFormat.TEXT: the data is a str
  • InputFormat.CSV: the data is a list[dict[str, Any]]
  • InputFormat.CSV_ARCHIVE: the data is a dict[str, list[dict[str, Any]]] Each key is the name of the CSV file, minus the .csv extension.
  • InputFormat.MULTI_FILE: the data is a dict[str, Any] where each key is the file name (with extension) and the value is the data read from the file. This is used for loading multiple files in a single input, where each file can be of different types (JSON, CSV, Excel, etc.). The data is loaded as a dict of items, where each item corresponds to a file and its content.

When specifying input_format as InputFormat.MULTI_FILE, the data_files argument must be provided. This argument is a list of DataFile instances, each representing a file to be read. Each DataFile instance should have a name (the file name with extension) and a loader function that reads the data from the file. The loader function should accept the file path as its first argument and return the data read from the file. The loader can also accept additional positional and keyword arguments, which can be provided through the loader_args and loader_kwargs attributes of the DataFile instance.

There are convenience functions that can be used to create DataFile classes, such as:

  • json_data_file: Creates a DataFile that reads JSON data.
  • csv_data_file: Creates a DataFile that reads CSV data.
  • text_data_file: Creates a DataFile that reads utf-8 encoded text data.

When workiing with data in other formats, such as Excel files, you are encouraged to create your own DataFile objects with your own implementation of the loader function. This allows you to read data from files in a way that suits your needs, while still adhering to the DataFile interface.

PARAMETER DESCRIPTION

input_format

Format of the input data. Default is InputFormat.JSON.

TYPE: InputFormat DEFAULT: JSON

options

Options for loading the input data.

TYPE: Options DEFAULT: None

path

Path to the input data. For file-based loaders: - If provided, reads from the specified file or directory - If None, typically reads from stdin (for JSON, TEXT, CSV) or uses a default directory (for CSV_ARCHIVE)

TYPE: str DEFAULT: None

csv_configurations

Configurations for loading CSV files. Custom kwargs for Python's csv.DictReader.

TYPE: dict[str, Any] DEFAULT: None

loader

The loader to use for loading the input data. Default is an instance of LocalInputLoader.

TYPE: InputLoader DEFAULT: _LOCAL_INPUT_LOADER

data_files

List of DataFile instances to read from. This is used when the input_format is set to InputFormat.MULTI_FILE. Each DataFile instance should have a name (the file name with extension) and a loader function that reads the data from the file. The loader function should accept the file path as its first argument and return the data read from the file. The loader can also accept additional positional and keyword arguments, which can be provided through the loader_args and loader_kwargs attributes of the DataFile instance.

There are convenience functions that can be used to create DataFile classes, such as json_data_file, csv_data_file, and text_data_file. When working with data in other formats, such as Excel files, you are encouraged to create your own DataFile objects with your own implementation of the loader function. This allows you to read data from files in a way that suits your needs, while still adhering to the DataFile interface.

TYPE: list[DataFile] DEFAULT: None

RETURNS DESCRIPTION
Input

The loaded input data in an Input object.

RAISES DESCRIPTION
ValueError

If the path is invalid or data format is incorrect.

Examples:

>>> from nextmv.input import load, InputFormat
>>> # Load JSON from stdin
>>> input_obj = load(input_format=InputFormat.JSON)
>>> # Load CSV from a file
>>> input_obj = load(input_format=InputFormat.CSV, path="data.csv")
>>> # Load CSV archive from a directory
>>> input_obj = load(input_format=InputFormat.CSV_ARCHIVE, path="input_dir")
Source code in nextmv/nextmv/input.py
def load(
    input_format: Optional[InputFormat] = InputFormat.JSON,
    options: Optional[Options] = None,
    path: Optional[str] = None,
    csv_configurations: Optional[dict[str, Any]] = None,
    loader: Optional[InputLoader] = _LOCAL_INPUT_LOADER,
    data_files: Optional[list[DataFile]] = None,
) -> Input:
    """
    Load input data using the specified loader.

    You can import the `load` function directly from `nextmv`:

    ```python
    from nextmv import load
    ```

    This is a convenience function for loading an `Input` object. By default,
    it uses the `LocalInputLoader` to load data from local sources.

    The input data can be in various formats and can be loaded from different
    sources depending on the loader:

    - `InputFormat.JSON`: the data is a `dict[str, Any]`
    - `InputFormat.TEXT`: the data is a `str`
    - `InputFormat.CSV`: the data is a `list[dict[str, Any]]`
    - `InputFormat.CSV_ARCHIVE`: the data is a `dict[str, list[dict[str, Any]]]`
        Each key is the name of the CSV file, minus the `.csv` extension.
    - `InputFormat.MULTI_FILE`: the data is a `dict[str, Any]`
        where each key is the file name (with extension) and the value is the
        data read from the file. This is used for loading multiple files in a
        single input, where each file can be of different types (JSON, CSV,
        Excel, etc.). The data is loaded as a dict of items, where each item
        corresponds to a file and its content.

    When specifying `input_format` as `InputFormat.MULTI_FILE`, the
    `data_files` argument must be provided. This argument is a list of
    `DataFile` instances, each representing a file to be read. Each `DataFile`
    instance should have a `name` (the file name with extension) and a `loader`
    function that reads the data from the file. The `loader` function should
    accept the file path as its first argument and return the data read from
    the file. The `loader` can also accept additional positional and keyword
    arguments, which can be provided through the `loader_args` and
    `loader_kwargs` attributes of the `DataFile` instance.

    There are convenience functions that can be used to create `DataFile`
    classes, such as:

    - `json_data_file`: Creates a `DataFile` that reads JSON data.
    - `csv_data_file`: Creates a `DataFile` that reads CSV data.
    - `text_data_file`: Creates a `DataFile` that reads utf-8 encoded text
      data.

    When workiing with data in other formats, such as Excel files, you are
    encouraged to create your own `DataFile` objects with your own
    implementation of the `loader` function. This allows you to read data
    from files in a way that suits your needs, while still adhering to the
    `DataFile` interface.

    Parameters
    ----------
    input_format : InputFormat, optional
        Format of the input data. Default is `InputFormat.JSON`.
    options : Options, optional
        Options for loading the input data.
    path : str, optional
        Path to the input data. For file-based loaders:
        - If provided, reads from the specified file or directory
        - If None, typically reads from stdin (for JSON, TEXT, CSV)
          or uses a default directory (for CSV_ARCHIVE)
    csv_configurations : dict[str, Any], optional
        Configurations for loading CSV files. Custom kwargs for
        Python's `csv.DictReader`.
    loader : InputLoader, optional
        The loader to use for loading the input data.
        Default is an instance of `LocalInputLoader`.
    data_files : list[DataFile], optional
        List of `DataFile` instances to read from. This is used when the
        `input_format` is set to `InputFormat.MULTI_FILE`. Each `DataFile`
        instance should have a `name` (the file name with extension) and a
        `loader` function that reads the data from the file. The `loader`
        function should accept the file path as its first argument and return
        the data read from the file. The `loader` can also accept additional
        positional and keyword arguments, which can be provided through the
        `loader_args` and `loader_kwargs` attributes of the `DataFile`
        instance.

        There are convenience functions that can be used to create `DataFile`
        classes, such as `json_data_file`, `csv_data_file`, and
        `text_data_file`. When working with data in other formats, such as
        Excel files, you are encouraged to create your own `DataFile` objects
        with your own implementation of the `loader` function. This allows you
        to read data from files in a way that suits your needs, while still
        adhering to the `DataFile` interface.

    Returns
    -------
    Input
        The loaded input data in an Input object.

    Raises
    ------
    ValueError
        If the path is invalid or data format is incorrect.

    Examples
    --------
    >>> from nextmv.input import load, InputFormat
    >>> # Load JSON from stdin
    >>> input_obj = load(input_format=InputFormat.JSON)
    >>> # Load CSV from a file
    >>> input_obj = load(input_format=InputFormat.CSV, path="data.csv")
    >>> # Load CSV archive from a directory
    >>> input_obj = load(input_format=InputFormat.CSV_ARCHIVE, path="input_dir")
    """

    return loader.load(input_format, options, path, csv_configurations, data_files)

load_local

load_local(
    input_format: Optional[InputFormat] = JSON,
    options: Optional[Options] = None,
    path: Optional[str] = None,
    csv_configurations: Optional[dict[str, Any]] = None,
) -> Input

Warning

load_local is deprecated, use load instead.

Load input data from local sources.

This is a convenience function for instantiating a LocalInputLoader and calling its load method.

PARAMETER DESCRIPTION

input_format

Format of the input data. Default is InputFormat.JSON.

TYPE: InputFormat DEFAULT: JSON

options

Options for loading the input data.

TYPE: Options DEFAULT: None

path

Path to the input data.

TYPE: str DEFAULT: None

csv_configurations

Configurations for loading CSV files. Custom kwargs for Python's csv.DictReader.

TYPE: dict[str, Any] DEFAULT: None

RETURNS DESCRIPTION
Input

The loaded input data in an Input object.

RAISES DESCRIPTION
ValueError

If the path is invalid or data format is incorrect.

See Also

load : The recommended function to use instead.

Source code in nextmv/nextmv/input.py
def load_local(
    input_format: Optional[InputFormat] = InputFormat.JSON,
    options: Optional[Options] = None,
    path: Optional[str] = None,
    csv_configurations: Optional[dict[str, Any]] = None,
) -> Input:
    """
    !!! warning
        `load_local` is deprecated, use `load` instead.

    Load input data from local sources.

    This is a convenience function for instantiating a `LocalInputLoader`
    and calling its `load` method.

    Parameters
    ----------
    input_format : InputFormat, optional
        Format of the input data. Default is `InputFormat.JSON`.
    options : Options, optional
        Options for loading the input data.
    path : str, optional
        Path to the input data.
    csv_configurations : dict[str, Any], optional
        Configurations for loading CSV files. Custom kwargs for
        Python's `csv.DictReader`.

    Returns
    -------
    Input
        The loaded input data in an Input object.

    Raises
    ------
    ValueError
        If the path is invalid or data format is incorrect.

    See Also
    --------
    load : The recommended function to use instead.
    """

    deprecated(
        name="load_local",
        reason="`load_local` is deprecated, use `load` instead.",
    )

    loader = LocalInputLoader()
    return loader.load(input_format, options, path, csv_configurations)

text_data_file

text_data_file(
    name: str, input_data_key: Optional[str] = None
) -> DataFile

This is a convenience function to create a DataFile that reads utf-8 encoded text data.

You can import the text_data_file function directly from nextmv:

from nextmv import text_data_file

You must provide the extension as part of the name parameter.

PARAMETER DESCRIPTION

name

Name of the data file. The file extension must be provided in the name.

TYPE: str

input_data_key

A custom key to represent the data from this file.

When using InputFormat.MULTI_FILE as the input_format of the Input, the data from the file is loaded to the .data parameter of the Input. In that case, the type of .data is dict[str, Any], where each key represents the file name (with extension) and the value is the data that is actually loaded from the file using the loader function. You can set a custom key to represent your file by using this attribute.

TYPE: str DEFAULT: None

RETURNS DESCRIPTION
DataFile

A DataFile instance that reads text data from a file with the given name.

Examples:

>>> from nextmv import text_data_file
>>> data_file = text_data_file("my_data")
>>> data = data_file.read()
>>> print(data)
This is some text data.
Source code in nextmv/nextmv/input.py
def text_data_file(name: str, input_data_key: Optional[str] = None) -> DataFile:
    """
    This is a convenience function to create a `DataFile` that reads utf-8
    encoded text data.

    You can import the `text_data_file` function directly from `nextmv`:

    ```python
    from nextmv import text_data_file
    ```

    You must provide the extension as part of the `name` parameter.

    Parameters
    ----------
    name : str
        Name of the data file. The file extension must be provided in the name.
    input_data_key : str, optional
        A custom key to represent the data from this file.

        When using `InputFormat.MULTI_FILE` as the `input_format` of the `Input`,
        the data from the file is loaded to the `.data` parameter of the `Input`.
        In that case, the type of `.data` is `dict[str, Any]`, where each key
        represents the file name (with extension) and the value is the data that is
        actually loaded from the file using the `loader` function. You can set a
        custom key to represent your file by using this attribute.

    Returns
    -------
    DataFile
        A `DataFile` instance that reads text data from a file with the given
        name.

    Examples
    --------
    >>> from nextmv import text_data_file
    >>> data_file = text_data_file("my_data")
    >>> data = data_file.read()
    >>> print(data)
    This is some text data.
    """

    def loader(file_path: str) -> str:
        with open(file_path, encoding="utf-8") as f:
            return f.read().rstrip("\n")

    return DataFile(
        name=name,
        loader=loader,
        input_data_key=input_data_key,
    )