Pgvector

Pgvector is an open-source extension to PostgreSQL® designed to add support for storing, indexing vectors and searching vector similarity.

Vectors are really useful in multiple fields, among them semantic search, similarity search, machine learning, multimedia data handling and Natural Language Processing (NLP).

The extension supports the following features:

  • Storing vectors of multiple sizes
  • Indexing vectors for efficient comparaison using multiple algorithms
  • Searching exact and approximate nearest neighbor
  • Single-precision, half-precision, binary, and sparse vectors
  • L2 distance, inner product, cosine distance, L1 distance, Hamming distance, and Jaccard distance

More info on the official documentation

Usage Example of Pgvector: Storing Vector Embeddings

More and more company around generative Artificial Intelligence are proposing vector embeddings models to compute vector embeddings for any type of content.

These vectors allow the comparaison of arbitrary contents based on vector arithmetics. Thus, a common usage of Pgvector is to add a vector column in your data table, and fill it with generated embeddings.

-- The size of the vector should be modified according to the model you are using
ALTER TABLE items ADD COLUMN embedding vector(3);

Then you can easily make a request in order to find rows which are similar to a given input. You’ll need to generate the embedding of the input and compare it with your data. In the following example, get the 5 “closest” rows:

-- vector_data has to be set, either by the embedding of another row, or by an
-- externally computed embedding
SELECT * FROM items ORDER BY embedding <-> '[vector_data]' LIMIT 5;

Example: have a look at OpenAI Embeddings Documentation

Enabling Pgvector

To enable Pgvector:

  1. Provision a new PostgreSQL® database
  2. Enable the vector extension

Suggest edits

Pgvector

©2024 Scalingo