Retrievers registry

Register and instantiate ChromaDB-backed retriever configurations.

This module defines the registry of supported embedding models used to build ChromaDB retrievers. Each registry entry stores the information required to construct a retriever, including the model identifier, embedding function class, collection name, and optional rate limit metadata.

The module also provides a factory function for loading a configured ChromaDBRetriever instance from a registered model name.

`ChromaRetrieverModelSpec` `dataclass`

Store the configuration required for a ChromaDB-backed retriever.

Instances of this dataclass describe a retriever model specification, including the embedding model identifier, the embedding function class used to encode queries, the target ChromaDB collection name, and optional rate limit metadata.

Source code in aatm\registries\retrievers.py

@dataclass(slots=True, frozen=True)
class ChromaRetrieverModelSpec:
    """Store the configuration required for a ChromaDB-backed retriever.

    Instances of this dataclass describe a retriever model specification,
    including the embedding model identifier, the embedding function class used
    to encode queries, the target ChromaDB collection name, and optional rate
    limit metadata.
    """

    name: str
    """Unique registry name used to identify the retriever model."""

    model_id: str
    """Identifier of the embedding model used by the retriever."""

    embedding_function_cls: type[chromadb.EmbeddingFunction]
    """Embedding function class used to encode queries for retrieval."""

    collection_name: str = "expressions"
    """Name of the ChromaDB collection queried by the retriever."""

    rate_limit: int | None = None
    """Optional maximum throughput in items per minute for embedding generation."""

    @property
    def chromadb_path(self) -> str:
        """Return the filesystem path for this retriever's ChromaDB database.

        The path is derived from the retriever name and points to the local
        directory where the persistent ChromaDB collection is stored.

        Returns:
            The string path to the retriever's ChromaDB persistence directory.
        """
        return str(Path(".aatm/chroma_vector_dbs") / self.name)

    @property
    def output_path(self) -> str:
        """Return the default output directory associated with this retriever.

        This path can be used by downstream workflows to store outputs related
        to the retriever configuration.

        Returns:
            The string path to the default output directory for this retriever.
        """
        return str(Path("output") / self.name)

    def to_dict(self) -> dict[str, Any]:
        """Convert the retriever specification to a dictionary.

        This helper provides a dictionary representation of the retriever
        configuration for compatibility with code paths that expect
        mapping-based access instead of attribute-based access.

        Returns:
            A dictionary containing the retriever model identifier, embedding
                function class, collection name, ChromaDB path, output path, and
                rate limit.
        """
        return {
            "model_id": self.model_id,
            "embedding_function": self.embedding_function_cls,
            "collection_name": self.collection_name,
            "chromadb_path": self.chromadb_path,
            "output_path": self.output_path,
            "rate_limit": self.rate_limit,
        }

`name` `instance-attribute`

Unique registry name used to identify the retriever model.

`model_id` `instance-attribute`

Identifier of the embedding model used by the retriever.

`embedding_function_cls` `instance-attribute`

Embedding function class used to encode queries for retrieval.

`collection_name = 'expressions'` `class-attribute` `instance-attribute`

Name of the ChromaDB collection queried by the retriever.

`rate_limit = None` `class-attribute` `instance-attribute`

Optional maximum throughput in items per minute for embedding generation.

`chromadb_path` `property`

Return the filesystem path for this retriever's ChromaDB database.

The path is derived from the retriever name and points to the local directory where the persistent ChromaDB collection is stored.

Returns:

Type	Description
`str`	The string path to the retriever's ChromaDB persistence directory.

`output_path` `property`

Return the default output directory associated with this retriever.

This path can be used by downstream workflows to store outputs related to the retriever configuration.

Returns:

Type	Description
`str`	The string path to the default output directory for this retriever.

`to_dict()`

Convert the retriever specification to a dictionary.

This helper provides a dictionary representation of the retriever configuration for compatibility with code paths that expect mapping-based access instead of attribute-based access.

Returns:

Type	Description
`dict[str, Any]`	A dictionary containing the retriever model identifier, embedding function class, collection name, ChromaDB path, output path, and rate limit.

Source code in aatm\registries\retrievers.py

def to_dict(self) -> dict[str, Any]:
    """Convert the retriever specification to a dictionary.

    This helper provides a dictionary representation of the retriever
    configuration for compatibility with code paths that expect
    mapping-based access instead of attribute-based access.

    Returns:
        A dictionary containing the retriever model identifier, embedding
            function class, collection name, ChromaDB path, output path, and
            rate limit.
    """
    return {
        "model_id": self.model_id,
        "embedding_function": self.embedding_function_cls,
        "collection_name": self.collection_name,
        "chromadb_path": self.chromadb_path,
        "output_path": self.output_path,
        "rate_limit": self.rate_limit,
    }

`load_retriever(model_name)`

Instantiate a registered ChromaDB retriever by model name.

This function looks up a retriever specification in the model registry, creates a persistent ChromaDB client for the configured storage path, instantiates the corresponding embedding function, and returns a configured ChromaDBRetriever.

Parameters:

Name	Type	Description	Default
`model_name`	`str`	Registry name of the retriever model to instantiate.	required

Returns:

Type	Description
`ChromaDBRetriever`	A configured `ChromaDBRetriever` instance for the requested model.

Raises:

Type	Description
`ValueError`	If the provided model name is not present in the retriever registry.

Source code in aatm\registries\retrievers.py

def load_retriever(model_name: str) -> "ChromaDBRetriever":
    """Instantiate a registered ChromaDB retriever by model name.

    This function looks up a retriever specification in the model registry,
    creates a persistent ChromaDB client for the configured storage path,
    instantiates the corresponding embedding function, and returns a configured
    ``ChromaDBRetriever``.

    Args:
        model_name: Registry name of the retriever model to instantiate.

    Returns:
        A configured ``ChromaDBRetriever`` instance for the requested model.

    Raises:
        ValueError: If the provided model name is not present in the retriever
            registry.
    """
    try:
        spec = CHROMADB_RETRIEVER_MODEL_REGISTRY[model_name]
    except KeyError as e:
        available = ", ".join(sorted(CHROMADB_RETRIEVER_MODEL_REGISTRY))
        raise ValueError(
            f"Unknown retriever model '{model_name}'. Available models: {available}"
        ) from e

    client = chromadb.PersistentClient(spec.chromadb_path)

    retriever = ChromaDBRetriever(
        client=client,
        collection_name=spec.collection_name,
        embedding_function=spec.embedding_function_cls(model=spec.model_id),
    )
    return retriever

Retrievers registry

ChromaRetrieverModelSpec dataclass

name instance-attribute

model_id instance-attribute

embedding_function_cls instance-attribute

collection_name = 'expressions' class-attribute instance-attribute

rate_limit = None class-attribute instance-attribute

chromadb_path property

output_path property

to_dict()

load_retriever(model_name)

`ChromaRetrieverModelSpec` `dataclass`

`name` `instance-attribute`

`model_id` `instance-attribute`

`embedding_function_cls` `instance-attribute`

`collection_name = 'expressions'` `class-attribute` `instance-attribute`

`rate_limit = None` `class-attribute` `instance-attribute`

`chromadb_path` `property`

`output_path` `property`

`to_dict()`

`load_retriever(model_name)`