pe.embedding package
- class pe.embedding.CLIP(res=None, device='cuda', batch_size=2000)[source]
Bases:
EmbeddingCompute the CLIP embedding of images.
- __init__(res=None, device='cuda', batch_size=2000)[source]
Constructor.
- Parameters:
res (int, optional) – The resolution of the images. The images will be resized to (res, res) before computing the embedding. If None, the images will not be resized. Defaults to None
device (str, optional) – The device to use for computing the embedding, defaults to “cuda”
batch_size (int, optional) – The batch size to use for computing the embedding, defaults to 2000
- compute_embedding(data)[source]
Compute the CLIP embedding of images.
- Parameters:
data (
pe.data.Data) – The data object containing the images- Returns:
The data object with the computed embedding
- Return type:
- class pe.embedding.Embedding[source]
Bases:
ABCThe abstract class that computes the embedding of samples.
- property column_name
The column name to be used in the data frame.
- abstract compute_embedding(data)[source]
Compute the embedding of samples.
- Parameters:
data (
pe.data.Data) – The data to compute the embedding
- class pe.embedding.FLDInception(res=None)[source]
Bases:
EmbeddingCompute the Inception embedding of images using FLD library.
- __init__(res=None)[source]
Constructor.
- Parameters:
res (int, optional) – The resolution of the images. The images will be resized to (res, res) before computing the embedding. If None, the images will not be resized. Defaults to None
- compute_embedding(data)[source]
Compute the Inception embedding of images.
- Parameters:
data (
pe.data.Data) – The data object containing the images- Returns:
The data object with the computed embedding
- Return type:
- class pe.embedding.Inception(res, device='cuda', batch_size=2000)[source]
Bases:
EmbeddingCompute the Inception embedding of images.
- __init__(res, device='cuda', batch_size=2000)[source]
Constructor.
- Parameters:
res (int) – The resolution of the images. The images will be resized to (res, res) before computing the embedding
device (str, optional) – The device to use for computing the embedding, defaults to “cuda”
batch_size (int, optional) – The batch size to use for computing the embedding, defaults to 2000
- compute_embedding(data)[source]
Compute the Inception embedding of images.
- Parameters:
data (
pe.data.Data) – The data object containing the images- Returns:
The data object with the computed embedding
- Return type:
- class pe.embedding.RawPixel[source]
Bases:
EmbeddingUse the raw pixels of images as the embedding.
- compute_embedding(data)[source]
Extract the raw pixels of images.
- Parameters:
data (
pe.data.Data) – The data object containing the images- Returns:
The data object with the computed embedding
- Return type:
- class pe.embedding.SentenceTransformer(model, batch_size=2000)[source]
Bases:
EmbeddingCompute the Sentence Transformers embedding of text.
- __init__(model, batch_size=2000)[source]
Constructor.
- Parameters:
model (str) – The Sentence Transformers model to use
batch_size (int, optional) – The batch size to use for computing the embedding, defaults to 2000
- property column_name
The column name to be used in the data frame.
- compute_embedding(data)[source]
Compute the Sentence Transformers embedding of text.
- Parameters:
data (
pe.data.Data) – The data object containing the text- Returns:
The data object with the computed embedding
- Return type:
- class pe.embedding.TabularEmbedding(info, cat_weight=0.3333333333333333, num_weight=1)[source]
Bases:
EmbeddingCompute the tabular embedding.
- __init__(info, cat_weight=0.3333333333333333, num_weight=1)[source]
Constructor.
- Parameters:
info (dict) – The information (categories and numerical bounds) of the private data
cat_weight (float, optional) – The weight for the categorical columns, defaults to 1/3
num_weight (float, optional) – The weight for the numerical columns, defaults to 1
- compute_embedding(data)[source]
Compute the tabular embedding. (the embedding is computed using the features only, not the labels) Vectorization per column is implemented to improve the performance.
- Parameters:
data (
pe.data.Data) – The data object containing the tabular data- Returns:
The data object with the computed embedding
- Return type: