Runbook: Aurora PostgreSQL Table-Level Data Recovery #
Purpose #
This runbook describes the standard procedure for recovering specific tables or rows from an Amazon Aurora PostgreSQL database when a full cluster restore is required (Aurora does not support table-level restores).
Use this runbook when:
- Data was accidentally deleted or corrupted
- You need to recover a subset of rows from a previous point in time
- Logical replication / PITR is not available for the specific data
Scope & Assumptions #
-
Database engine: Aurora PostgreSQL
-
Recovery method: Restore cluster → copy data → reinsert into prod
-
Access to:
- AWS CLI with MFA
- Production AWS account
psqlaccess to prod and restored clusters
-
You know:
- Affected tables
- Filtering conditions (e.g.
workspace_id, date range)
Roles & Responsibilities #
| Role | Responsibility |
|---|---|
| Incident Owner | Executes runbook, validates data |
| Reviewer | Confirms correctness before commit |
| Scribe (optional) | Records timestamps, commands |
Step 1: Authenticate to AWS #
1.1 Obtain MFA session credentials #
aws sts get-session-token \
--serial-number arn:aws:iam::<ACCOUNT_ID>:mfa/<USERNAME> \
--token-code <MFA_CODE>
Export returned credentials.
1.2 Assume Admin role in target account #
aws sts assume-role \
--role-arn arn:aws:iam::<TARGET_ACCOUNT_ID>:role/Administrator \
--role-session-name data-recovery \
--duration-seconds 3600
Export the assumed-role credentials.
Step 2: Restore Aurora Cluster from Backup #
⚠️ Aurora restores are cluster-level only
2.1 Identify snapshot or PITR time #
- Snapshot name:
____________________ - Restore timestamp (if PITR):
____________________
2.2 Restore cluster (if not already restored) #
(Performed via AWS Console or CLI depending on situation.)
2.3 Create a temporary DB instance in restored cluster #
aws rds create-db-instance \
--db-instance-identifier restore-<date> \
--db-cluster-identifier <restored-cluster-id> \
--engine aurora-postgresql \
--db-instance-class db.t4g.large
Wait until instance status = available.
Step 3: Connect to Restored Database #
psql -h <restore-endpoint> -U <user> -d <database>
Confirm expected data exists.
Step 4: Identify Rows to Recover #
Document exact recovery criteria:
-
Tables:
________________________________________
-
Filters:
-- Example workspace_id = '<UUID>' AND assigned_to IS NOT NULL
Sanity check #
SELECT count(*) FROM <table> WHERE <conditions>;
Step 5: Stage Data in Temporary Tables #
Purpose: isolate and safely export only required rows
5.1 Create staging table #
SET ROLE dev;
CREATE TABLE back AS
SELECT *
FROM <table>
WHERE <conditions>;
5.2 Grant access (if needed) #
GRANT SELECT ON public.back TO dev;
ALTER TABLE public.back OWNER TO dev;
5.3 Repeat for dependent tables #
Example with foreign keys:
CREATE TABLE back AS
SELECT *
FROM <child_table>
WHERE parent_id IN (
SELECT id FROM <parent_table> WHERE <conditions>
);
Step 6: Export or Prepare Restore Script #
Choose one approach:
Option A: SQL script (INSERT / UPSERT)
#
- Preferred for small–medium datasets
- Checked into incident artifacts
Option B: COPY to CSV
#
- Use for large datasets
- Requires careful ordering
Document chosen method:
Method used: SQL script / COPY
Artifact location: ____________________
Step 7: Restore Data into Production #
7.1 Connect to production DB #
psql -h <prod-endpoint> -U <user> -d <database>
7.2 Dry-run transaction #
SET ROLE dev;
BEGIN;
\i /path/to/restore.sql
ROLLBACK;
Confirm no errors.
7.3 Final commit #
BEGIN;
\i /path/to/restore.sql
COMMIT;
Step 8: Validation #
8.1 Row counts #
SELECT count(*) FROM <table> WHERE <conditions>;
8.2 Spot checks #
SELECT * FROM <table> WHERE <conditions> LIMIT 5;
8.3 Write validation #
UPDATE <table>
SET updated_at = now()
WHERE <conditions>;
ROLLBACK;
Step 9: Cleanup #
9.1 Drop staging tables #
DROP TABLE IF EXISTS back;
9.2 Delete restored DB instance #
aws rds delete-db-instance \
--db-instance-identifier restore-<date> \
--skip-final-snapshot
(Optional: delete restored cluster if no longer needed.)
Safety Checklist (Must Complete) #
- Scoped queries reviewed
- Dry-run rollback succeeded
- Production commit reviewed
- Validation queries passed
- Temporary resources removed
Incident Record #
| Field | Value |
|---|---|
| Incident Date | |
| Root Cause | |
| Affected Tables | |
| Rows Restored | |
| Executed By | |
| Reviewed By |
Notes & Gotchas #
- Aurora does not support table-level restore
- Always restore parent tables before child tables
- Avoid running recovery as
rds_superuser - Use explicit
BEGIN / COMMITblocks only