Setting up logical replication

Logical replication is a method of replicating data objects and their changes, based upon their replication identity (usually a primary key). It is often used in migration, Change Data Capture(CDC), and fine-grained database integration and access control.

In this article, we will introduce how to set logical replication for pgvecto.rs index. Even if the source index instance goes down, you can still query the target database using the index.

Deploying PostgreSQL Clusters

In this tutorial, we will use docker compose to deploy two PostgreSQL clusters.

shell

$ echo 'version: "3.7"
services:
  source_db:
    image: tensorchord/pgvecto-rs:pg15-v0.1.13 
    ports:
      - "5432:5432"
    environment:
      POSTGRES_PASSWORD: password
    command: >
      -c wal_level=logical
      -c shared_preload_libraries=vectors.so
    networks:
      localnet:

  target_db:
    image: tensorchord/pgvecto-rs:pg15-v0.1.13 
    ports:
      - "5433:5432"
    environment:
      POSTGRES_PASSWORD: password
    command: >
      -c wal_level=logical
      -c shared_preload_libraries=vectors.so
    networks:
      localnet:

networks:
  localnet:
' > docker-compose.yml

$ docker compose up -d

Create extension vectors in source database and target database.

shell

# create extension vectors
$ psql -h <source_db_ip> -U postgres -p 5432 
DROP EXTENSION IF EXISTS "vectors";
CREATE EXTENSION "vectors";

$ psql -h <target_db_ip> -U postgres -p 5432
DROP EXTENSION IF EXISTS "vectors";
CREATE EXTENSION "vectors";

Set Logical Replication

Now, we can set logical replication between source database and target database.

Prepare Data

We need to create a table named test with a column named embedding of type vector(10) in source database and target database. Then we create an index on the embedding column of the test table in source database and target database. Finally, we insert data into the source database.

Create test table

sql

DROP TABLE IF EXISTS test;
CREATE TABLE test (id integer PRIMARY KEY, embedding vector(10) NOT NULL);

Create index

We create an index on the embedding column of the test table in source database and target database. The index type is flat, it is a brute force algorithm. We choose vector_l2_ops squared Euclidean distance to measure the distance between vectors. Another index type and distance function can be found in here.

sql

CREATE INDEX ON test USING vectors (embedding vector_l2_ops) WITH (options = "[indexing.flat]");

Insert data

We only need to insert data into the source database. The data will be synchronized to the target database.

sql

INSERT INTO test
SELECT i, ARRAY[random(), random(),random(),random(),random(),random(),random(),random(),random()]::real[] 
FROM generate_series(1, 100) i;

In source database, now you can search for the nearest neighbor of a vector in the embedding column of the test table.

sql

postgres=# SELECT id FROM test ORDER BY embedding <-> '[0.40671515, 0.24202824, 0.37059402, 0.50316447, 0.10779921, 0.80774295, 0.8879849, 0.31292745, 0.05584943, 0.8738258]' LIMIT 10;
 id 
----
 82
 69
 80
 22
 31
 20
 85
 97
 72
(10 rows)

If you query the target database, you will get the noting:

sql

SELECT id FROM test ORDER BY embedding <-> '[0.40671515, 0.24202824, 0.37059402, 0.50316447, 0.10779921, 0.80774295, 0.8879849, 0.31292745, 0.05584943, 0.8738258]' LIMIT 10;
 id 
----
(0 rows)

Config Logical Replication

To simplify the process of setting up logical replication between two PostgreSQL databases, we will use pg-easy-replicate.

Config Check

shell

# get the network name
$ docker network ls | grep  pg
8631dd77129b   pg_regress_localnet   bridge    local

# get source and target container ip
$ docker inspect <container_id> | grep IP

$ docker run -e SOURCE_DB_URL="postgres://postgres:password@<source_db_ip>:5432/postgres"    -e TARGET_DB_URL="postgres://postgres:password@<target_db_ip>:5432/postgres"   -it --rm --network=<network> shayonj/pg_easy_replicate:latest   pg_easy_replicate config_check
/usr/bin/pg_dump
✅ Config is looking good.

Bootstrap

Every sync will need to be bootstrapped before you can set up the sync between two databases. Bootstrap creates a new super user to perform the orchestration required during the rest of the process. It also creates some internal metadata tables for record keeping.

shell

$ docker run -e SOURCE_DB_URL="postgres://postgres:password@<source_db_ip>:5432/postgres"    -e TARGET_DB_URL="postgres://postgres:password@<target_db_ip>:5432/postgres"   -it --rm --network=pg_regress_localnet shayonj/pg_easy_replicate:latest pg_easy_replicate bootstrap --group-name database-cluster-1 --copy-schema

Start Sync

Once the bootstrap is complete, you can start the sync. Starting the sync sets up the publication, subscription and performs other minor housekeeping things.

shell

$ docker run -e SOURCE_DB_URL="postgres://postgres:password@192.168.64.2:5432/postgres"    -e TARGET_DB_URL="postgres://postgres:password@192.168.64.3:5432/postgres"   -it --rm --network=pg_regress_localnet shayonj/pg_easy_replicate:latest pg_easy_replicate start_sync --group-name database-cluster-1

Test Logical Replication

Query In Target Database

Now we can query the target database to get the nearest neighbor of a vector in the embedding column of the test table. The result is the same as the source database.

sql

postgres=# SELECT id FROM test ORDER BY embedding <-> '[0.40671515, 0.24202824, 0.37059402, 0.50316447, 0.10779921, 0.80774295, 0.8879849, 0.31292745, 0.05584943, 0.8738258]' LIMIT 10;
 id 
----
 82
 69
 80
 22
 31
 20
 85
 97
 72
(10 rows)

Update Index

If you insert data into the source database, the data will be synchronized to the target database. Insert data into the source database:

sql

INSERT INTO test
SELECT i, ARRAY[random(), random(),random(),random(),random(),random(),random(),random(),random()]::real[] 
FROM generate_series(101, 200) i;

Query in the source database:

sql

postgres=# SELECT id FROM test ORDER BY embedding <-> '[0.40671515, 0.24202824, 0.37059402, 0.50316447, 0.10779921, 0.80774295, 0.8879849, 0.31292745, 0.05584943, 0.8738258]' LIMIT 10;
id  
-----
  82
 130
 145
  69
  80
 200
  22
  31
 182
 (10 rows)

Query in the target database, it will return the same result:

sql

postgres=# SELECT id FROM test ORDER BY embedding <-> '[0.40671515, 0.24202824, 0.37059402, 0.50316447, 0.10779921, 0.80774295, 0.8879849, 0.31292745, 0.05584943, 0.8738258]' LIMIT 10;
id  
-----
  82
 130
 145
  69
  80
 200
  22
  31
 182
 (10 rows)

Setting up logical replication ​

Deploying PostgreSQL Clusters ​

Set Logical Replication ​

Prepare Data ​

Create test table ​

Create index ​

Insert data ​

Config Logical Replication ​

Config Check ​

Bootstrap ​

Start Sync ​

Test Logical Replication ​

Query In Target Database ​

Update Index ​

Setting up logical replication

Deploying PostgreSQL Clusters

Set Logical Replication

Prepare Data

Create test table

Create index

Insert data

Config Logical Replication

Config Check

Bootstrap

Start Sync

Test Logical Replication

Query In Target Database

Update Index