Pgvector
Pgvector is an open-source extension to PostgreSQL® designed to add support for storing, indexing vectors and searching vector similarity.
Vectors are really useful in multiple fields, among them semantic search, similarity search, machine learning, multimedia data handling and Natural Language Processing (NLP).
The extension supports the following features:
- Storing vectors of multiple sizes
- Indexing vectors for efficient comparaison using multiple algorithms
- Searching exact and approximate nearest neighbor
- Single-precision, half-precision, binary, and sparse vectors
- L2 distance, inner product, cosine distance, L1 distance, Hamming distance, and Jaccard distance
More info on the official documentation
Usage Example of Pgvector: Storing Vector Embeddings
More and more company around generative Artificial Intelligence are proposing vector embeddings models to compute vector embeddings for any type of content.
These vectors allow the comparaison of arbitrary contents based on vector
arithmetics. Thus, a common usage of Pgvector is to add a vector
column in
your data table, and fill it with generated embeddings.
-- The size of the vector should be modified according to the model you are using
ALTER TABLE items ADD COLUMN embedding vector(3);
Then you can easily make a request in order to find rows which are similar to a given input. You’ll need to generate the embedding of the input and compare it with your data. In the following example, get the 5 “closest” rows:
-- vector_data has to be set, either by the embedding of another row, or by an
-- externally computed embedding
SELECT * FROM items ORDER BY embedding <-> '[vector_data]' LIMIT 5;
Example: have a look at OpenAI Embeddings Documentation
Enabling Pgvector
To enable Pgvector: