bookimport.md 2.9 KB

BookImport Module Overview

BookImport contains the format-specific logic for extracting metadata and covers from incoming files. It supports EPUB and PDF sources out of the box and returns a lightweight ImportResult that the rest of the pipeline uses to build Book instances.

Responsibilities

  • Ensure the covers cache directory exists (~/.local/share/bibliotheca/covers).
  • Extract title/author metadata from EPUB and PDF files.
  • Render or extract cover art, saving a PNG (or original asset) into the covers directory.
  • Expose a single import_book_assets(path, bookIdHex) function that dispatches to the correct handler based on file extension.

Structure

BookImport.cpp
 ├── ensure_covers_dir()          // filesystem helper
 ├── import_epub(...)             // libzip + tinyxml2
 ├── import_pdf(...)              // poppler-glib + cairo
 └── import_book_assets(...)      // public dispatcher

ImportResult

struct ImportResult {
  std::string title;
  std::string author;
  std::string coverPngPath; // empty if no cover extracted
};

If a format fails to provide metadata or a cover, the corresponding fields are left empty; the caller (usually BibliothecaWindow) merges these with defaults (e.g., falls back to the filename for the title).

EPUB pipeline

  1. Open the .epub as a ZIP archive via libzip.
  2. Parse META-INF/container.xml to locate the OPF package.
  3. Read the OPF document with TinyXML2, extracting <dc:title>, <dc:creator>, and the cover manifest entry (cover-image property or meta name="cover").
  4. If a cover asset exists, copy it into the covers directory (preserving the original extension); otherwise leave coverPngPath empty.

PDF pipeline

  1. Open the document with Poppler (poppler_document_new_from_file).
  2. Pull title/author metadata using Poppler's getters.
  3. Render the first page to an ARGB32 Cairo surface scaled to ~1000px tall.
  4. Write the rendered surface to ${coversDir}/${bookId}.png.

Errors (e.g., corrupt files) throw std::runtime_error. Callers typically catch these during batch imports, log the failure, and skip the problematic file.

Extension points

  • Additional formats: add new is_xyz() helpers and import_xyz() methods, then extend import_book_assets() to dispatch accordingly.
  • Metadata enrichment: augment the result with series information, tags, or the table of contents if formats expose them.
  • Cover sizing: adjust the Cairo scale if you want smaller thumbnails.

Expected usage

BibliothecaWindow computes a SHA-256 id for each selected file, calls import_book_assets() in a worker thread, and combines the ImportResult with fallback metadata before enqueuing BookList::upsertMany().

Because cover files live in a shared directory addressed by book id, repeated imports overwrite previous covers, ensuring consistency across sessions.