pe.api.image.nearest_image_api module

class pe.api.image.nearest_image_api.NearestImage(data, embedding, nearest_neighbor_mode, variation_degrees, nearest_neighbor_backend='auto')[source]

Bases: API

The API that generates synthetic images by randomly drawing an image from the given dataset as the RANDOM_API and finding the nearest images in the given dataset as the VARIATION_API.

__init__(data, embedding, nearest_neighbor_mode, variation_degrees, nearest_neighbor_backend='auto')[source]

Constructor.

Parameters:
  • data (pe.data.Data) – The data object that contains the images

  • embedding (pe.embedding.Embedding) – The embedding object that computes the embeddings of the images

  • nearest_neighbor_mode (str) – The distance metric to use for finding the nearest neighbors. It should be one of the following: “l2” (l2 distance), “cos_sim” (cosine similarity), “ip” (inner product). Not all backends support all modes

  • variation_degrees (int or list[int]) – The variation degrees utilized at each PE iteration. If a single value is provided, the same variation degree will be used for all iterations. The value means the number of nearest neighbors to consider for the VARIAITON_API

  • nearest_neighbor_backend (str, optional) – The backend to use for finding the nearest neighbors. It should be one of the following: “faiss” (FAISS), “sklearn” (scikit-learn), “auto” (using FAISS if available, otherwise scikit-learn). Defaults to “auto”. FAISS supports GPU and is much faster when the number of samples is large. It requires the installation of faiss-gpu or faiss-cpu package. See https://faiss.ai/

Raises:

ValueError – If the nearest_neighbor_backend is unknown

_build_nearest_neighbor_graph()[source]

Finding the nearest neighbor for each sample in the given dataset.

random_api(label_info, num_samples)[source]

Generating random synthetic data by randomly drawing images from the given dataset.

Parameters:
  • label_info (omegaconf.dictconfig.DictConfig) – The info of the label, not utilized in this API

  • num_samples (int) – The number of random samples to generate

Returns:

The data object of the generated synthetic data

Return type:

pe.data.Data

variation_api(syn_data)[source]

Generating variations of the synthetic data by finding the nearest images in the given dataset.

Parameters:

syn_data (pe.data.Data) – The data object of the synthetic data

Returns:

The data object of the variation of the input synthetic data

Return type:

pe.data.Data