Retrievers
Define retriever abstractions and ChromaDB-based retrieval implementations.
This module provides the base interface for retrieval components used in the
pipeline, along with a concrete retriever backed by ChromaDB. Retrievers are
responsible for accepting one or more queries, fetching candidate expressions,
and returning them in a standardized RetrieverResults structure.
The module is designed to support pipeline-style composition, allowing retrievers to be chained with other components such as rerankers and selectors.
BaseRetriever
Bases: PipelineBaseClass, ABC
Define the abstract interface for retrieval pipeline components.
This base class establishes the contract for retrievers that accept one or
more queries and return structured retrieval results. It also provides a
flexible __call__() implementation that normalizes several supported
query input types before delegating to retrieve().
Source code in aatm\retrievers.py
retrieve(queries)
abstractmethod
Retrieve candidate results for one or more query strings.
Subclasses must implement this method to perform the actual retrieval
operation and return results in a RetrieverResults object.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
queries
|
List[str]
|
A list of query strings to retrieve candidates for. |
required |
Returns:
| Type | Description |
|---|---|
RetrieverResults
|
A |
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
If the subclass does not override this method. |
Source code in aatm\retrievers.py
__call__(queries, *args, **kwargs)
Normalize query inputs and perform retrieval.
This method allows retriever instances to be called directly with a
single string, a single Translation object, a list of strings, or a
list of Translation objects. Supported inputs are converted into a
list of query strings before being passed to retrieve().
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
queries
|
str | Translation | List[str] | List[Translation]
|
Query input to retrieve against. Supported values are a
single string, a single |
required |
Returns:
| Type | Description |
|---|---|
RetrieverResults
|
A |
Raises:
| Type | Description |
|---|---|
AssertionError
|
If the input is not one of the supported query formats or does not contain valid string queries. |
Source code in aatm\retrievers.py
ChromaDBRetriever
Bases: BaseRetriever
Retrieve candidate expressions from a ChromaDB collection.
This retriever uses a ChromaDB client and collection to perform vector-based
retrieval over stored expressions. Retrieved metadata and distances are
converted into RetrievedExpressionMetadata objects and returned in a
standardized RetrieverResults container.
The retriever supports default filtering through a where clause and a
configurable default number of results per query.
Source code in aatm\retrievers.py
95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 | |
__init__(client, collection_name, embedding_function, top_k=10, where=None, *args, **kwargs)
Initialize the ChromaDB retriever.
This constructor stores the ChromaDB client configuration, creates or retrieves the target collection, and sets default retrieval parameters such as the number of results to return and any metadata filter to apply.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
client
|
ClientAPI
|
ChromaDB client used to access the vector database. |
required |
collection_name
|
str
|
Name of the ChromaDB collection to query. |
required |
embedding_function
|
EmbeddingFunction
|
Embedding function used by the collection for query encoding. |
required |
top_k
|
int
|
Default number of candidate results to return per query. |
10
|
where
|
dict[str, Any] | None
|
Optional metadata filter applied to all queries unless overridden at retrieval time. |
None
|
*args
|
Any
|
Additional positional arguments reserved for compatibility. |
()
|
**kwargs
|
Any
|
Additional keyword arguments reserved for compatibility. |
{}
|
Returns:
| Type | Description |
|---|---|
Any
|
None. |
Source code in aatm\retrievers.py
retrieve(queries, where=None, top_k=None, *args, **kwargs)
Retrieve nearest candidates for the given queries from ChromaDB.
This method submits the provided query strings to the configured ChromaDB
collection, optionally overriding the default metadata filter and number
of returned results. It then converts the raw ChromaDB response into
RetrievedExpressionMetadata objects grouped by query and wraps them
in a RetrieverResults instance.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
queries
|
List[str]
|
List of query strings to search for in the collection. |
required |
where
|
dict[str, Any] | None
|
Optional metadata filter to apply to this retrieval call. If not provided, the retriever's default filter is used. |
None
|
top_k
|
int | None
|
Optional number of results to return per query. If not
provided, the retriever's default |
None
|
Returns:
| Type | Description |
|---|---|
RetrieverResults
|
A |
Raises:
| Type | Description |
|---|---|
Exception
|
Propagates errors raised by the underlying ChromaDB client or collection query operation. |