Skip to content

Dev#7

Open
Marvelgenius wants to merge 13 commits intoomkarcloud:masterfrom
Marvelgenius:dev
Open

Dev#7
Marvelgenius wants to merge 13 commits intoomkarcloud:masterfrom
Marvelgenius:dev

Conversation

@Marvelgenius
Copy link

No description provided.

GurryShark added 13 commits March 20, 2026 18:31
…RDS support

- Replace old amazon-product-scraper8 with real-time-amazon-data RapidAPI client
- Rewrite database layer from PostgreSQL (psycopg2) to MySQL (pymysql)
- Add SSH tunnel support for VPC-internal RDS access
- Add config module with etlprocessor-compatible env var naming
- Add Airflow-callable task entry points (tasks.py)
- Add RDS SSH tunnel helper scripts
- Store raw API data in gurysk_src.amazon_product_raw table

Made-with: Cursor
- AMAZON_QUERIES: comma-separated search keywords
- AMAZON_ASINS: comma-separated product ASINs for detail fetching
- AMAZON_COUNTRIES: comma-separated country codes or "ALL" for all 24 markets
- AMAZON_PAGES: number of pages to fetch per query per country
- Graceful error handling per request with continue-on-failure

Made-with: Cursor
Extract 'data' field from API response before appending to results,
so _extract_asin can find the asin at the correct level.

Made-with: Cursor
- Add src/etl.py: DDL definitions, JSON extraction, incremental SRC->DWH ETL,
  monthly aggregation for sales/reviews/market-share (DWH->DMT)
- Add src/queries.py: dashboard query functions reading from APP layer views
- Add dashboard/: Streamlit app with Plotly charts for sales trend, review
  changes, and coffee machine market share
- Update src/tasks.py: add Airflow ETL tasks (extract_to_dwh, build_outin_sales,
  build_review_trend, build_market_share)
- Add Dockerfile.dashboard for ECS Fargate deployment
- Add pandas, streamlit, plotly to requirements.txt

Made-with: Cursor
The python:3.11-slim image doesn't include curl, causing healthcheck failures.

Made-with: Cursor
When only one month of data exists, select_slider throws RangeError
because min equals max. Show an info message instead.

Made-with: Cursor
…tion

- New DMT table dmt_outin_daily_sales with day-over-day sales changes
- Daily/Monthly view toggle on the dashboard
- Cascading marketplace -> ASIN filter (one ASIN belongs to one market)
- Filter out zero-sales products from charts to reduce legend clutter
- Fix null handling in ETL aggregations and KPI rendering
- New Airflow task for daily sales ETL

Made-with: Cursor
- Add clean_text() and clean_brand() functions to decode HTML entities
  (e.g. De'Longhi -> De'Longhi) and strip marketplace prefixes
  (e.g. 【Amazon.co.jp限定】)
- Apply cleaning in SRC->DWH extraction and all DMT build steps
- Add clean_dwh_data() for one-time fix of existing dirty DWH rows
- Add brand alias mapping for known variants (De'Longhi, NESCAFÉ, etc.)

Made-with: Cursor
…atching

- Normalize curly quotes (U+2018/2019) to straight quotes in text/brand
- Add regex prefix matching for De'Longhi variants (DeLonghi, DELONGHI, etc.)
- Expand clean_dwh_data query to catch curly-quoted and DeLonghi variants

Made-with: Cursor
Dynamically calculate bottom margin based on number of ASIN traces
so legends never cover the chart content.

Made-with: Cursor
Use ping(reconnect=True) to detect stale connections and clear the
st.cache_resource to establish a fresh connection, preventing the
"unable to rollback" error after DB connections are externally killed.

Made-with: Cursor
@Marvelgenius
Copy link
Author

adjust to rapid api and created one dashboard

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant