Aurora Postgresql Table Recovery

Runbook: Aurora PostgreSQL Table-Level Data Recovery #

Purpose #

This runbook describes the standard procedure for recovering specific tables or rows from an Amazon Aurora PostgreSQL database when a full cluster restore is required (Aurora does not support table-level restores).

Use this runbook when:

  • Data was accidentally deleted or corrupted
  • You need to recover a subset of rows from a previous point in time
  • Logical replication / PITR is not available for the specific data

Scope & Assumptions #

  • Database engine: Aurora PostgreSQL

  • Recovery method: Restore cluster → copy data → reinsert into prod

  • Access to:

    • AWS CLI with MFA
    • Production AWS account
    • psql access to prod and restored clusters
  • You know:

    • Affected tables
    • Filtering conditions (e.g. workspace_id, date range)

Roles & Responsibilities #

Role Responsibility
Incident Owner Executes runbook, validates data
Reviewer Confirms correctness before commit
Scribe (optional) Records timestamps, commands

Step 1: Authenticate to AWS #

1.1 Obtain MFA session credentials #

aws sts get-session-token \
  --serial-number arn:aws:iam::<ACCOUNT_ID>:mfa/<USERNAME> \
  --token-code <MFA_CODE>

Export returned credentials.


1.2 Assume Admin role in target account #

aws sts assume-role \
  --role-arn arn:aws:iam::<TARGET_ACCOUNT_ID>:role/Administrator \
  --role-session-name data-recovery \
  --duration-seconds 3600

Export the assumed-role credentials.


Step 2: Restore Aurora Cluster from Backup #

⚠️ Aurora restores are cluster-level only

2.1 Identify snapshot or PITR time #

  • Snapshot name: ____________________
  • Restore timestamp (if PITR): ____________________

2.2 Restore cluster (if not already restored) #

(Performed via AWS Console or CLI depending on situation.)


2.3 Create a temporary DB instance in restored cluster #

aws rds create-db-instance \
  --db-instance-identifier restore-<date> \
  --db-cluster-identifier <restored-cluster-id> \
  --engine aurora-postgresql \
  --db-instance-class db.t4g.large

Wait until instance status = available.


Step 3: Connect to Restored Database #

psql -h <restore-endpoint> -U <user> -d <database>

Confirm expected data exists.


Step 4: Identify Rows to Recover #

Document exact recovery criteria:

  • Tables:

    • ____________________
    • ____________________
  • Filters:

    -- Example
    workspace_id = '<UUID>'
    AND assigned_to IS NOT NULL
    

Sanity check #

SELECT count(*) FROM <table> WHERE <conditions>;

Step 5: Stage Data in Temporary Tables #

Purpose: isolate and safely export only required rows

5.1 Create staging table #

SET ROLE dev;

CREATE TABLE back AS
SELECT *
FROM <table>
WHERE <conditions>;

5.2 Grant access (if needed) #

GRANT SELECT ON public.back TO dev;
ALTER TABLE public.back OWNER TO dev;

5.3 Repeat for dependent tables #

Example with foreign keys:

CREATE TABLE back AS
SELECT *
FROM <child_table>
WHERE parent_id IN (
  SELECT id FROM <parent_table> WHERE <conditions>
);

Step 6: Export or Prepare Restore Script #

Choose one approach:

Option A: SQL script (INSERT / UPSERT) #

  • Preferred for small–medium datasets
  • Checked into incident artifacts

Option B: COPY to CSV #

  • Use for large datasets
  • Requires careful ordering

Document chosen method:

Method used: SQL script / COPY
Artifact location: ____________________

Step 7: Restore Data into Production #

7.1 Connect to production DB #

psql -h <prod-endpoint> -U <user> -d <database>

7.2 Dry-run transaction #

SET ROLE dev;

BEGIN;
\i /path/to/restore.sql
ROLLBACK;

Confirm no errors.


7.3 Final commit #

BEGIN;
\i /path/to/restore.sql
COMMIT;

Step 8: Validation #

8.1 Row counts #

SELECT count(*) FROM <table> WHERE <conditions>;

8.2 Spot checks #

SELECT * FROM <table> WHERE <conditions> LIMIT 5;

8.3 Write validation #

UPDATE <table>
SET updated_at = now()
WHERE <conditions>;
ROLLBACK;

Step 9: Cleanup #

9.1 Drop staging tables #

DROP TABLE IF EXISTS back;

9.2 Delete restored DB instance #

aws rds delete-db-instance \
  --db-instance-identifier restore-<date> \
  --skip-final-snapshot

(Optional: delete restored cluster if no longer needed.)


Safety Checklist (Must Complete) #

  • Scoped queries reviewed
  • Dry-run rollback succeeded
  • Production commit reviewed
  • Validation queries passed
  • Temporary resources removed

Incident Record #

Field Value
Incident Date
Root Cause
Affected Tables
Rows Restored
Executed By
Reviewed By

Notes & Gotchas #

  • Aurora does not support table-level restore
  • Always restore parent tables before child tables
  • Avoid running recovery as rds_superuser
  • Use explicit BEGIN / COMMIT blocks only