Skip to content

hive-engineering/hive-optimize-data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Redshift to Postgres Sync Lambda

Syncs checkout optimization tables from Redshift to Postgres staging database.

Tables Synced

  • events.aggregated_performance_metrics_dailycheckout_optimization_raw.aggregated_performance_metrics_daily
  • events.lever_traffic_dailycheckout_optimization_raw.lever_traffic_daily

Schedule

Daily at 07:00 UTC (after dbt models refresh)

Deployment

1. Build Lambda Package

cd src/
mkdir -p package
pip install psycopg2-binary redshift-connector boto3 -t package/
cp redshift_to_postgres_sync.py package/
cd package && zip -r ../lambda_package.zip . && cd ..
mv lambda_package.zip ../

2. Deploy with Terraform

terraform init
terraform plan
terraform apply

3. Manual Invocation (Testing)

aws lambda invoke \
  --function-name redshift-to-postgres-sync \
  --profile production \
  response.json

cat response.json

Configuration

Environment variables (set in Terraform):

  • REDSHIFT_HOST - Redshift cluster endpoint
  • REDSHIFT_PORT - Redshift port (5439)
  • REDSHIFT_DATABASE - Database name (main)
  • REDSHIFT_USER - IAM user (dbt)
  • REDSHIFT_CLUSTER_ID - Cluster identifier (hive)
  • POSTGRES_HOST - Postgres RDS endpoint
  • POSTGRES_PORT - Postgres port (5432)
  • POSTGRES_DATABASE - Database name
  • POSTGRES_USER - Database user
  • TARGET_SCHEMA - Target schema in Postgres

IAM Permissions Required

  • CloudWatch Logs (create/write)
  • EC2 ENI management (for VPC)
  • Redshift GetClusterCredentials
  • Secrets Manager (for Postgres password)

About

Redshift to Postgres Sync Lambda

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors