Runbook: Aurora PostgreSQL Table-Level Data Recovery #

Purpose #

This runbook describes the standard procedure for recovering specific tables or rows from an Amazon Aurora PostgreSQL database when a full cluster restore is required (Aurora does not support table-level restores).

Use this runbook when:

Data was accidentally deleted or corrupted
You need to recover a subset of rows from a previous point in time
Logical replication / PITR is not available for the specific data

Scope & Assumptions #

Database engine: Aurora PostgreSQL
Recovery method: Restore cluster → copy data → reinsert into prod
Access to:
- AWS CLI with MFA
- Production AWS account
- psql access to prod and restored clusters
You know:
- Affected tables
- Filtering conditions (e.g. workspace_id, date range)

Roles & Responsibilities #

Role	Responsibility
Incident Owner	Executes runbook, validates data
Reviewer	Confirms correctness before commit
Scribe (optional)	Records timestamps, commands

Step 1: Authenticate to AWS #

1.1 Obtain MFA session credentials #

aws sts get-session-token \
  --serial-number arn:aws:iam::<ACCOUNT_ID>:mfa/<USERNAME> \
  --token-code <MFA_CODE>

Export returned credentials.

1.2 Assume Admin role in target account #

aws sts assume-role \
  --role-arn arn:aws:iam::<TARGET_ACCOUNT_ID>:role/Administrator \
  --role-session-name data-recovery \
  --duration-seconds 3600

Export the assumed-role credentials.

Step 2: Restore Aurora Cluster from Backup #

⚠️ Aurora restores are cluster-level only

2.1 Identify snapshot or PITR time #

Snapshot name: ____________________
Restore timestamp (if PITR): ____________________

2.2 Restore cluster (if not already restored) #

(Performed via AWS Console or CLI depending on situation.)

2.3 Create a temporary DB instance in restored cluster #

aws rds create-db-instance \
  --db-instance-identifier restore-<date> \
  --db-cluster-identifier <restored-cluster-id> \
  --engine aurora-postgresql \
  --db-instance-class db.t4g.large

Wait until instance status = available.

Step 3: Connect to Restored Database #

psql -h <restore-endpoint> -U <user> -d <database>

Confirm expected data exists.

Step 4: Identify Rows to Recover #

Document exact recovery criteria:

Tables:
- ____________________
- ____________________

Filters:

-- Example
workspace_id = '<UUID>'
AND assigned_to IS NOT NULL

Sanity check #

SELECT count(*) FROM <table> WHERE <conditions>;

Step 5: Stage Data in Temporary Tables #

Purpose: isolate and safely export only required rows

5.1 Create staging table #

SET ROLE dev;

CREATE TABLE back AS
SELECT *
FROM <table>
WHERE <conditions>;

5.2 Grant access (if needed) #

GRANT SELECT ON public.back TO dev;
ALTER TABLE public.back OWNER TO dev;

5.3 Repeat for dependent tables #

Example with foreign keys:

CREATE TABLE back AS
SELECT *
FROM <child_table>
WHERE parent_id IN (
  SELECT id FROM <parent_table> WHERE <conditions>
);

Step 6: Export or Prepare Restore Script #

Choose one approach:

Option A: SQL script (`INSERT / UPSERT`) #

Preferred for small–medium datasets
Checked into incident artifacts

Option B: `COPY` to CSV #

Use for large datasets
Requires careful ordering

Document chosen method:

Method used: SQL script / COPY
Artifact location: ____________________

Step 7: Restore Data into Production #

7.1 Connect to production DB #

psql -h <prod-endpoint> -U <user> -d <database>

7.2 Dry-run transaction #

SET ROLE dev;

BEGIN;
\i /path/to/restore.sql
ROLLBACK;

Confirm no errors.

7.3 Final commit #

BEGIN;
\i /path/to/restore.sql
COMMIT;

Step 8: Validation #

8.1 Row counts #

SELECT count(*) FROM <table> WHERE <conditions>;

8.2 Spot checks #

SELECT * FROM <table> WHERE <conditions> LIMIT 5;

8.3 Write validation #

UPDATE <table>
SET updated_at = now()
WHERE <conditions>;
ROLLBACK;

Step 9: Cleanup #

9.1 Drop staging tables #

DROP TABLE IF EXISTS back;

9.2 Delete restored DB instance #

aws rds delete-db-instance \
  --db-instance-identifier restore-<date> \
  --skip-final-snapshot

(Optional: delete restored cluster if no longer needed.)

Safety Checklist (Must Complete) #

Scoped queries reviewed
Dry-run rollback succeeded
Production commit reviewed
Validation queries passed
Temporary resources removed

Incident Record #

Field	Value
Incident Date
Root Cause
Affected Tables
Rows Restored
Executed By
Reviewed By

Notes & Gotchas #

Aurora does not support table-level restore
Always restore parent tables before child tables
Avoid running recovery as rds_superuser
Use explicit BEGIN / COMMIT blocks only