
Humble Scrape is a Python FastAPI + Vue pipeline that scrapes Humble Bundle Books, normalizes bundles and tiers, stores them in SQLite, and ships a responsive SPA with JWT-protected ETL triggers and featured bundle insights.
Humble Scrape is a small but complete data pipeline that keeps tabs on Humble Bundle’s books section. It scrapes the live landing page, normalizes every bundle, stores it in SQLite, exposes the data through a FastAPI service, and ships a Vue 3 interface that mirrors the original site’s look and feel.
Humble Bundle rotates limited-time book bundles, but there’s no simple way to track historical offers, surface featured bundles, or query details like tiers and MSRP totals. Humble Scrape automates that: it fetches the catalog, enriches each bundle with tier pricing and individual book info, and keeps an auditable record you can query or display.
https://www.humblebundle.com/books (HTML + embedded JSON).HumbleSpider (Requests + BeautifulSoup) extracts the page payload, including webpack-bundle-page-data.python -m spider.cli.run_spider (or make etl) triggers the end-to-end fetch/normalize/store flow and stamps a verification_date for auditing.GET /bundles — all bundles ordered by closing date.GET /bundles/{bundle_id} — detail view with tiers, books, and MSRP totals.GET /bundles/featured — the “most valuable” bundle by MSRP and sales.POST /etl/run — re-scrape on demand (JWT-protected).POST /auth/login + GET /auth/me — lightweight auth; optional auto-seeded admin user via .env (DB_ADMIN_*).A single-page app in frontend/ consumes the API and recreates the Humble Bundle-style cards with light/dark theming and the Monofur typography. Responsive hooks (useResponsiveQueryEvent) switch layouts for
mobile/desktop, and featured content surfaces the top bundle instantly.
Backend
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
make db-init
uvicorn api.main:app --reload --host 0.0.0.0 --port 5002
ETL on demand
make etl
Frontend
cd frontend npm install VITE_API_BASE_URL=http://127.0.0.1:5002 npm run dev # http://localhost:3002
Data profiling shows 32 normalized columns with key fields always present (machine_name, tile_name, dates, sales). Optional fields like marketing_blurb or tile_logo are handled defensively, and lists (highlights, tiers) are serialized consistently to keep the API predictable. Each run timestamps its scrape so you can track changes over time.
Humble Scrape turns a fleeting storefront into a queryable dataset and a polished UI. Clone it, run the ETL, and you’ll have a live mirror of Humble’s book bundles—ready for exploration, alerting, or your own curated recommendations.
Hola
Hola :)
Your comment will be reviewed before being published.