Stop Building 'Data Sync' Services (The Engineering Case for Postgres Foreign Data Wrappers)

Most of us have spent a frantic weekend debugging a data sync service that stopped working because someone changed a column name in a source database three levels upstream. We build these complex, brittle ETL pipelines just to get a "read-only" view of data that already exists somewhere else, and honestly, it’s a massive waste of engineering cycles.

If you find yourself writing yet another Node.js worker or Python script whose only job is to query Database A and INSERT ... ON CONFLICT into Database B, stop. You're building a "glue code" monument to technical debt. Postgres already has a built-in feature for this called Foreign Data Wrappers (FDW), and it’s time we started using it for more than just migration scripts.

The "Sync Service" Tax

When we build custom synchronization services, we aren't just writing code. We're signing up for a lifetime of maintenance:
- State management: Did the last sync finish? Did it fail halfway?
- Schema drift: The source DB changed a type, and now your sync worker is throwing 500s.
- Latency: Your data is always "eventually" consistent, which is usually a polite way of saying "out of date."

FDW changes the model from pushing data (syncing) to pulling data on demand. It lets your Postgres instance treat an external table—whether it's in another Postgres DB, a CSV file, or even a Redis instance—as if it were a local table.

Let’s Connect Two Databases (The "No-Code" Sync)

Imagine you have a Users database and a separate Billing database. You need to join them to see which premium users are active. Instead of syncing the subscriptions table every five minutes, let's just mount it.

First, enable the extension:

CREATE EXTENSION IF NOT EXISTS postgres_fdw;

Now, we define the "Foreign Server." This is basically telling Postgres where the other database lives:

CREATE SERVER billing_db_server
FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (host 'billing-db.internal.example.com', port '5432', dbname 'billing_prod');

Next, we need to map our local user to a user on the remote database. Pro-tip: Use a read-only user on the remote side to keep things safe.

CREATE USER MAPPING FOR current_user
SERVER billing_db_server
OPTIONS (user 'readonly_sync_user', password 'super-secret-password');

Finally, we import the schema. You can import everything, but I prefer to be surgical and only grab the tables I actually need:

IMPORT FOREIGN SCHEMA public
LIMIT TO (subscriptions, invoices)
FROM SERVER billing_db_server
INTO public;

Boom. You now have a table called subscriptions in your local database. You can run SELECT * FROM subscriptions and Postgres will reach out to the billing DB in real-time. No cron jobs, no Kafka, no tears.

Can we talk about Joins?

The real magic of FDW isn't just fetching data; it's the ability to perform cross-database joins as if the data were local.

SELECT 
    u.email, 
    s.plan_name, 
    s.status 
FROM users u
JOIN subscriptions s ON u.external_id = s.user_id
WHERE s.status = 'active';

When you run this, Postgres is smart enough to handle "Predicate Pushdown." This means it doesn't download the entire subscriptions table. It sends the WHERE s.status = 'active' filter to the remote server, gets back only the relevant rows, and then performs the join locally.

The "Gotchas" (Because nothing is free)

I’m not suggesting you should connect every database in your VPC into a giant, tangled web. FDW has limits, and ignoring them is how you end up with 30-second query times.

1. Network Latency is Real: If your FDW server is in us-east-1 and your source is in eu-west-1, your joins will be slow. Keep your federated databases in the same region (and ideally the same VPC).
2. Connection Pooling: Every FDW query opens a connection to the remote server. If you have 100 concurrent users hitting an FDW-backed table, you’re hitting the remote DB with 100 connections. Use a connection pooler like PgBouncer on the remote side.
3. The "N+1" Risk: Postgres is smart, but it's not omniscient. Complex queries with multiple FDW joins can sometimes result in suboptimal execution plans. Always EXPLAIN ANALYZE your FDW queries to see what’s actually being sent over the wire.

Beyond Postgres-to-Postgres

One of the coolest (and weirdest) things about FDW is that it’s an open interface. There are wrappers for almost everything.

Need to query a legacy MySQL database? mysql_fdw exists. Want to query a CSV file sitting on your server as a table?

CREATE EXTENSION file_wrapper;

CREATE SERVER csv_server FOREIGN DATA WRAPPER file_wrapper;

CREATE FOREIGN TABLE external_logs (
    log_level text,
    message text,
    created_at timestamp
) SERVER csv_server
OPTIONS ( filename '/var/log/app/events.csv', format 'csv' );

You can even use things like Multicorn to write your own foreign data wrappers in Python if you need to fetch data from a JSON API.

Stop Building Glue

We spend so much time building systems to move data around. We build "Synchronizers," "Aggregators," and "Data Lake Ingestors." Sometimes, you really do need a full ETL pipeline or a data warehouse like Snowflake.

But a lot of the time? You just need to see the data.

Before you spin up a new microservice or a series of Lambda functions to copy rows from point A to point B, try mounting it as a foreign table. It’s faster to build, easier to maintain, and uses the database for what it was meant for: managing data access.

Don't be a plumber if you don't have to be. Let Postgres handle the pipes.