coryd.dev-eleventy/src/posts/2023/onward-to-the-storygraph.md at 6256527f6c755930c9d44849747180a85e429356

Archived

This repository has been archived on 2025-03-28. You can view files and clone it, but cannot push or open issues or pull requests.

Cory Dransfeldt 6256527f6c

chore: page descriptions

2023-12-11 15:12:36 -08:00

4.1 KiB

Raw Blame History

date

title

description

draft

tags

2023-10-23

Onward, to The Storygraph

false

Eleventy

development

Recently, I've been using Goodreads, begrudgingly, to track my reading activity. I had been using Oku but wanted to hedge against the lack of updates since 2022 or so. Looking around for an alternative, I found and read many good things about The Storygraph. It fits my needs, but doesn't (yet) have an API or RSS/Atom feeds exposed for your reading activity. With this in mind, I went ahead and imported my Goodreads activity and set about thinking of a way to preserve the reading activity I expose as an RSS feed and on the now of my site.

The solution I've arrived at is, well, web-scraping. it looks like this:

const jsdom = require('jsdom')
const { AssetCache } = require('@11ty/eleventy-fetch')
const { JSDOM } = jsdom

module.exports = async function () {
  const url = 'https://app.thestorygraph.com/currently-reading/coryd'
  const asset = new AssetCache('books_data')
  if (asset.isCacheValid('1h')) return await asset.getCachedValue()
  const data = []
  await fetch(url)
    .then((res) => res.text())
    .then((html) => {
      const DOM = new JSDOM(html)
      const doc = DOM.window.document
      doc
        .querySelectorAll('.md\\:block .book-title-author-and-series h3 > a')
        .forEach((title, index) => {
          if (!data[index]) data.push({ title: title.textContent })
          if (data[index]) data[index]['title'] = title.textContent
        })
      doc
        .querySelectorAll('.md\\:block .book-title-author-and-series h3 p:last-of-type > a')
        .forEach((author, index) => {
          if (!data[index]) data.push({ author: author.textContent })
          if (data[index]) data[index]['author'] = author.textContent
        })
      doc.querySelectorAll('.md\\:block .book-cover img').forEach((image, index) => {
        const img = image.src.replace('https://cdn.thestorygraph.com', 'https://cd-books.b-cdn.net')
        if (!data[index]) data.push({ image: img })
        if (data[index]) data[index]['image'] = img
      })
      doc.querySelectorAll('.md\\:block .book-cover a').forEach((url, index) => {
        if (!data[index]) data.push({ url: `https://app.thestorygraph.com${url.href}` })
        if (data[index]) data[index]['url'] = `https://app.thestorygraph.com${url.href}`
      })
    })
  const books = data
    .filter((book) => book.title)
    .map((book) => {
      book.type = 'book'
      book.dateAdded = new Date()
      return book
    })
  await asset.save(books, 'json')
  return books
}

First, we fetch 'https://app.thestorygraph.com/currently-reading/coryd', which is the view of books I'm actively reading and parse the response to text. Once we have the page text, we use jsdom¹ to query for the selectors enclosing the information we need.

The Storygraph DOM includes two different layouts for the books you're reading: one shown at .md:block and a mobile-friendly version at smaller viewports. To avoid collecting duplicate data from the DOM, we can scope our selectors using .md\\:block². We the iterate through the NodeList returned by querySelectorAll, adding or updating objects in a data array as needed. The final data object exposed to our templates looks like this:

{
  author: string,
  image: string,
  url: string,
  type: string,
}[]

With that in place, I have the same data displayed and syndicated but without the stopgap dependence on a platform owned by Amazon.

We're not fetching this from the browser, so we can't leverage native APIs to deal with the HTML. ↩︎
The \\ is necessary to escape the :block pseudo-selector which the querySelectorAll otherwise treats as invalid. ↩︎

4.1 KiB Raw Blame History

4.1 KiB

Raw Blame History