Humble Spider

Humble Spider

Humble Scrape is a Python FastAPI + Vue pipeline that scrapes Humble Bundle Books, normalizes bundles and tiers, stores them in SQLite, and ships a responsive SPA with JWT-protected ETL triggers and featured bundle insights.

Documentation

Humble Scrape: An ETL, API, and Vue Frontend for Humble Bundle Books

Humble Scrape is a small but complete data pipeline that keeps tabs on Humble Bundle’s books section. It scrapes the live landing page, normalizes every bundle, stores it in SQLite, exposes the data through a FastAPI service, and ships a Vue 3 interface that mirrors the original site’s look and feel.

What Problem It Solves

Humble Bundle rotates limited-time book bundles, but there’s no simple way to track historical offers, surface featured bundles, or query details like tiers and MSRP totals. Humble Scrape automates that: it fetches the catalog, enriches each bundle with tier pricing and individual book info, and keeps an auditable record you can query or display.

How the ETL Works

  • Source: https://www.humblebundle.com/books (HTML + embedded JSON).
  • Scraper: HumbleSpider (Requests + BeautifulSoup) extracts the page payload, including webpack-bundle-page-data.
  • Normalization: Pandas flattens and cleans the data; utilities build absolute URLs, compute metrics, and serialize list fields so Pydantic stays happy.
  • Validation & Persistence: Pydantic models ensure schema integrity; SQLAlchemy upserts into SQLite, pruning expired bundles on each run.
  • CLI: python -m spider.cli.run_spider (or make etl) triggers the end-to-end fetch/normalize/store flow and stamps a verification_date for auditing.

API Highlights (FastAPI v1.0)

  • GET /bundles — all bundles ordered by closing date.
  • GET /bundles/{bundle_id} — detail view with tiers, books, and MSRP totals.
  • GET /bundles/featured — the “most valuable” bundle by MSRP and sales.
  • POST /etl/run — re-scrape on demand (JWT-protected).
  • POST /auth/login + GET /auth/me — lightweight auth; optional auto-seeded admin user via .env (DB_ADMIN_*).

Frontend (Vue + Vite)

A single-page app in frontend/ consumes the API and recreates the Humble Bundle-style cards with light/dark theming and the Monofur typography. Responsive hooks (useResponsiveQueryEvent) switch layouts for mobile/desktop, and featured content surfaces the top bundle instantly.

Running Locally

  1. Backend

    python3 -m venv .venv && source .venv/bin/activate
    pip install -r requirements.txt
    make db-init
    uvicorn api.main:app --reload --host 0.0.0.0 --port 5002
    
    
  2. ETL on demand

    make etl

  3. Frontend

    cd frontend npm install VITE_API_BASE_URL=http://127.0.0.1:5002 npm run dev # http://localhost:3002

What’s Inside

  • spider/: scraping core, enrichment helpers, Pydantic schemas, SQLAlchemy persistence, CLI entrypoints.
  • api/: FastAPI app with sync/async deps, JWT auth, and bundle-focused endpoints.
  • frontend/: Vue 3 + Vite SPA with themed components, responsive header, featured bundle section, and bundles list.
  • docs/: data profiling notes and style stack references.
  • Makefile: shortcuts for ETL, API, DB init/reset, and frontend dev/build.

Notes on Data Quality

Data profiling shows 32 normalized columns with key fields always present (machine_name, tile_name, dates, sales). Optional fields like marketing_blurb or tile_logo are handled defensively, and lists (highlights, tiers) are serialized consistently to keep the API predictable. Each run timestamps its scrape so you can track changes over time.

Roadmap Ideas

  • Add unit/integration tests for the spider, persistence layer, and API.
  • Schedule periodic ETL runs (cron, Celery, or CI) to maintain a time series of bundles.
  • Expand frontend coverage with component tests and better featured-bundle heuristics.
  • Export historical snapshots for analysis or dashboards.

Humble Scrape turns a fleeting storefront into a queryable dataset and a polished UI. Clone it, run the ETL, and you’ll have a live mirror of Humble’s book bundles—ready for exploration, alerting, or your own curated recommendations.

Comments (2)

test11/24/2025

Hola

Miya 11/24/2025

Hola :)

Your comment will be reviewed before being published.