Case Study · Backend Engineering
An end-to-end backend data pipeline that scrapes structured data from a public website, stores it in a relational database, and serves it through a clean REST API — demonstrating real-world backend engineering fundamentals.
// SYSTEM ARCHITECTURE — DATA FLOW
01 — Overview
This project implements a simple but complete end-to-end backend data pipeline using Python. The system collects structured data from a public website, stores it in a relational database, and exposes it through a REST API with pagination and search filtering.
The goal was to build a clean, modular backend workflow demonstrating key backend engineering concepts: web scraping, data persistence, API development, pagination, and query filtering — all in a single cohesive project.
02 — Architecture
The scraper sends an HTTP GET request to books.toscrape.com
using Requests. The HTML response is parsed with BeautifulSoup,
which traverses the DOM to extract each book's title and price.
Extracted records are then passed to the database layer for storage.
A lightweight SQLite database stores the scraped records.
SQLite was chosen for its zero-configuration setup, making it ideal for a self-contained backend prototype.
The db.py module handles connections, table creation, and insert operations.
// DATABASE SCHEMA
A FastAPI application exposes two endpoints: a paginated books list and a title-search endpoint.
FastAPI auto-generates interactive docs at /docs
and uses Python's type hints for built-in validation.
Returns paginated list of all books.
?page=1&limit=10
SQL LIKE pattern match on title.
?title=travel
03 — Structure
The project is split into three core modules — each with a single clear responsibility — plus the database file and config.
04 — Stack
Data Collection
Data Storage
API Server
05 — Features
01
Sends HTTP GET requests to books.toscrape.com and uses BeautifulSoup to parse the HTML DOM. Extracts book title and price from every article element on the page.
02
A dedicated db.py module handles all database concerns: opening connections, creating the books table, and inserting records. Clean separation from the API and scraper logic.
03
The /books endpoint accepts page and limit query params. SQL OFFSET and LIMIT are used to return the correct page slice — improving performance and scalability.
04
The /books/search?title= endpoint uses SQL LIKE pattern matching to filter books by title keyword. Returns metadata: the search term, count, and matched records.
05
FastAPI automatically generates interactive Swagger UI at /docs — making every endpoint explorable in the browser without any extra tooling.
06
Database exceptions are caught and returned as structured HTTP error responses — ensuring the API never crashes with an unhandled exception and always returns a meaningful response to the client.
06 — Results
End-to-end backend ownership — from raw HTML to structured JSON API response, every layer built and connected independently.
Clean modular architecture — scraper, database, and API are fully decoupled; each file has a single responsibility.
Practical API design — pagination, filtering, structured error responses, and auto-generated Swagger documentation.
Real data pipeline experience — ingestion → parsing → storage → delivery, the same pattern used in production data systems at scale.
Zero-dependency frontend — the API is fully client-agnostic, consumable by any frontend framework or HTTP client.
07 — Roadmap