Embedding functions
Embedding function implementations and model enums for vectorization.
This module defines embedding function adapters compatible with
ChromaDB's EmbeddingFunction interface. It provides implementations
for multiple embedding backends, including Google embeddings, Qwen3
embedding models served through SentenceTransformers, Gemma embedding
models served through SentenceTransformers, and OpenAI embedding
models.
The module also defines enum classes that centralize the supported model identifiers for each provider-specific embedding family. These enums are used to map user-facing or internal model names to the actual model identifiers required by the corresponding client libraries.
Environment variables are loaded at import time using dotenv so that
provider credentials such as API keys can be resolved before embedding
clients are instantiated.
Classes:
| Name | Description |
|---|---|
GoogleEmbeddingFunction |
ChromaDB embedding function implementation backed by the Google GenAI embeddings API. |
Qwen3EmbeddingModels |
Enumeration of supported Qwen3 embedding model identifiers. |
Qwen3EmbeddingFunction |
ChromaDB embedding function implementation backed by a Qwen3 embedding model loaded with SentenceTransformers. |
GemmaEmbeddingModels |
Enumeration of supported Gemma embedding model identifiers. |
GemmaEmbeddingFunction |
ChromaDB embedding function implementation backed by a Gemma embedding model loaded with SentenceTransformers. |
OpenAIEmbeddingModels |
Enumeration of supported OpenAI embedding model identifiers. |
OpenAIEmbeddingFunction |
ChromaDB embedding function implementation backed by the OpenAI embeddings API. |
Notes
Qwen3 and Gemma embedding models are loaded onto CUDA when a GPU is available and otherwise fall back to CPU. SentenceTransformers-based models are initialized with left-padding tokenizer behavior in this module.
GoogleEmbeddingFunction
Bases: EmbeddingFunction
Embed documents using the Google GenAI embeddings API.
This embedding function adapts Google's embedding endpoint to the
ChromaDB EmbeddingFunction interface. It creates a Google GenAI
client using the GOOGLE_API_KEY environment variable and uses the
configured model to embed incoming documents.
Attributes:
| Name | Type | Description |
|---|---|---|
client |
Google GenAI client used to request embeddings. |
|
model |
Name of the embedding model to use with the Google API. |
Source code in aatm\embedding_functions.py
__init__(model, *args, **kwargs)
Initialize the Google embedding function.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
str
|
Name of the Google embedding model to use. |
required |
*args
|
Any
|
Additional positional arguments accepted for interface compatibility. |
()
|
**kwargs
|
Any
|
Additional keyword arguments accepted for interface compatibility. |
{}
|
Returns:
| Type | Description |
|---|---|
None
|
None |
Source code in aatm\embedding_functions.py
__call__(input)
Generate embeddings for the provided documents.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input
|
Documents
|
Documents to be embedded. |
required |
Returns:
| Type | Description |
|---|---|
Embeddings
|
Embeddings returned by the Google embedding API, converted to the |
Embeddings
|
format expected by ChromaDB. |
Source code in aatm\embedding_functions.py
Qwen3EmbeddingModels
Bases: Enum
Enumerate the supported Qwen3 embedding models.
This enum defines the available Qwen3 embedding model identifiers
that can be used by Qwen3EmbeddingFunction.
Source code in aatm\embedding_functions.py
Qwen3EmbeddingFunction
Bases: EmbeddingFunction
Embed documents using a Qwen3 SentenceTransformer model.
This embedding function adapts Qwen3 embedding models loaded through
SentenceTransformers to the ChromaDB EmbeddingFunction interface.
The selected model is loaded onto CUDA when available and otherwise
onto CPU.
Attributes:
| Name | Type | Description |
|---|---|---|
model_id |
Fully qualified identifier of the selected Qwen3 embedding model. |
|
model |
SentenceTransformer instance used to encode documents. |
Source code in aatm\embedding_functions.py
__init__(model, *args, **kwargs)
Initialize the Qwen3 embedding function.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
str
|
Enum key corresponding to a supported Qwen3 embedding model. |
required |
*args
|
Any
|
Additional positional arguments accepted for interface compatibility. |
()
|
**kwargs
|
Any
|
Additional keyword arguments accepted for interface compatibility. |
{}
|
Returns:
| Type | Description |
|---|---|
None
|
None |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
Source code in aatm\embedding_functions.py
__call__(input)
Generate embeddings for the provided documents.
This method encodes the input documents using the SentenceTransformer
model with the "query" prompt configuration.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input
|
Documents
|
Documents to be embedded. |
required |
Returns:
| Type | Description |
|---|---|
Embeddings
|
Embeddings for the provided documents as a list of vectors. |
Source code in aatm\embedding_functions.py
GemmaEmbeddingModels
Bases: Enum
Enumerate the supported Gemma embedding models.
This enum defines the available Gemma embedding model identifiers
that can be used by GemmaEmbeddingFunction.
Source code in aatm\embedding_functions.py
GemmaEmbeddingFunction
Bases: EmbeddingFunction
Embed documents using a Gemma SentenceTransformer model.
This embedding function adapts Gemma embedding models loaded through
SentenceTransformers to the ChromaDB EmbeddingFunction interface.
The selected model is loaded onto CUDA when available and otherwise
onto CPU.
Attributes:
| Name | Type | Description |
|---|---|---|
model_id |
Fully qualified identifier of the selected Gemma embedding model. |
|
model |
SentenceTransformer instance used to encode documents. |
Source code in aatm\embedding_functions.py
__init__(model, *args, **kwargs)
Initialize the Gemma embedding function.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
str
|
Enum key corresponding to a supported Gemma embedding model. |
required |
*args
|
Any
|
Additional positional arguments accepted for interface compatibility. |
()
|
**kwargs
|
Any
|
Additional keyword arguments accepted for interface compatibility. |
{}
|
Returns:
| Type | Description |
|---|---|
None
|
None |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
Source code in aatm\embedding_functions.py
__call__(input)
Generate embeddings for the provided documents.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input
|
Documents
|
Documents to be embedded. |
required |
Returns:
| Type | Description |
|---|---|
Embeddings
|
Embeddings for the provided documents as a list of vectors. |
Source code in aatm\embedding_functions.py
OpenAIEmbeddingModels
Bases: Enum
Enumerate the supported OpenAI embedding models.
This enum defines the available OpenAI embedding model identifiers
that can be used by OpenAIEmbeddingFunction.
Source code in aatm\embedding_functions.py
OpenAIEmbeddingFunction
Bases: EmbeddingFunction
Embed documents using the OpenAI embeddings API.
This embedding function adapts OpenAI's embeddings endpoint to the
ChromaDB EmbeddingFunction interface. It resolves the configured
model identifier from OpenAIEmbeddingModels and uses an OpenAI
client to generate embeddings for input documents.
Attributes:
| Name | Type | Description |
|---|---|---|
model_id |
Identifier of the selected OpenAI embedding model. |
|
client |
OpenAI client used to request embeddings. |
Source code in aatm\embedding_functions.py
__init__(model, *args, **kwargs)
Initialize the OpenAI embedding function.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
str
|
Enum key corresponding to a supported OpenAI embedding model. |
required |
*args
|
Any
|
Additional positional arguments accepted for interface compatibility. |
()
|
**kwargs
|
Any
|
Additional keyword arguments accepted for interface compatibility. |
{}
|
Returns:
| Type | Description |
|---|---|
None
|
None |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
Source code in aatm\embedding_functions.py
__call__(input)
Generate embeddings for the provided documents.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input
|
Documents
|
Documents to be embedded. |
required |
Returns:
| Type | Description |
|---|---|
Embeddings
|
Embeddings returned by the OpenAI embeddings API in the format |
Embeddings
|
expected by ChromaDB. |