chore: charts post

This commit is contained in:
Cory Dransfeldt 2023-07-21 18:43:36 -07:00
parent dd7430a46d
commit 07d1d31367
No known key found for this signature in database

View file

@ -0,0 +1,142 @@
---
date: '2023-07-21'
title: 'Road to madness: charting Apple Music data listening data'
draft: false
tags: ['development', 'music', 'Eleventy', 'Apple', 'JavaScript', 'API']
image: https://cdn.coryd.dev/blog/albums-artists.jpg
---
I've written before about [displaying my listening data from Apple Music](/posts/2023/displaying-listening-data-from-apple-music-using-musickit/) but, recently, I've attempted to take things a bit further.<!-- excerpt -->
The Apple Music is API is cool because it gives you data about your music, its not cool because well, its missing some things. It sends back a whole host of handy-dandy track metadata that youd expect from a music service and thats great. But it doesnt provide data youd normally expect like, well, a time stamp of when the recently played track was recently played.
I want an API that can act as a state of truth — what Ive got is an API that returns tracks in the play order, but with no concrete representation of when they were actually played.
Where does that leave us? Well, if were smart, that solution might look like what I ran with during my first go around. I call Apples API and iteratively page through it to aggregate a 200 track sample. Thats about 6-7 calls and a moving window.
What we can achieve though, dear listener, through some inferences and external storage is a cache and — wait for it — with a more slowly moving, less capricious window.
What weve got:
- The current time
- A duration for each track
What we can do:
- Calculate how many tracks from Apples response approximate an hour of listening
- Infer time stamps by moving backwards iteratively through an hour of listening
This isnt canonical, its not definitive, but its what weve got.
So, were dealing with JSON and a static site generator. We want to persist our data as a cache, read it in and write out an update. For this Ive elected to use Wasabi, who offer a 1:1 compatible S3 API. The data structure we want to store for each track looks like this[^1]:
```json
{
"i.rXXXdmUa6Nme-1689970612847": { // that's an id + a timestamp, not a leaked key
"name": "Sacrificial Blood Oath In The Temple Of K'zadu",
"artist": "Gateway",
"album": "Galgendood",
"art": "https://store-033.blobstore.apple.com/sq-mq-us-033-000002/18/f1/a3/18f1a37a-8c9a-169a-5458-464aea20ce05/image?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20230721T202228Z&X-Amz-SignedHeaders=host&X-Amz-Expires=86400&X-Amz-Credential=MKIAU0HKO2RBEAT0UMZS%2F20230721%2Fstore-033%2Fs3%2Faws4_request&X-Amz-Signature=85790600221880597074559ed3674564f17ca3df6634d6fa15496baf7aca5d56",
"url": "https://rateyourmusic.com/search?searchtype=l&searchterm=Galgendood%20Gateway",
"id": "i.rXXXdmUa6Nme",
"playTime": 1689970612847,
"duration": 338808
}
}
```
When I deploy a production build of my site[^2] well read in our cache from Wasabi, call Apples flawed[^3] but persistent API, align the two and suss out the difference:
```javascript
const _ = require('lodash')
const getTracksOneHour = (tracks) => {
const TIMER_CEILING = 3600000 // 1 hour
const tracksOneHour = []
let trackIndex = 0
let trackTimer = 0
while (trackTimer < TIMER_CEILING) {
if (!tracks[trackIndex]) return tracksOneHour
trackTimer = trackTimer + parseInt(tracks[trackIndex].duration)
tracksOneHour.push(tracks[trackIndex])
trackIndex++
}
return tracksOneHour
}
const diffTracks = (cache, tracks) => {
const trackCompareSet = Object.values(tracks)
const cacheCompareSet = _.orderBy(Object.values(cache), ['time'], ['desc'])
const diffedTracks = {}
const cacheCompareOneHour = getTracksOneHour(cacheCompareSet)
const comparedTracks = _.differenceWith(trackCompareSet, cacheCompareOneHour, (a, b) =>
_.isEqual(a.id, b.id)
)
for (let i = 0; i < comparedTracks.length; i++)
diffedTracks[`${comparedTracks[i]?.id}-${comparedTracks[i].playTime}`] = comparedTracks[i]
return diffedTracks
}
```
Still with me? Next — were going to derive some chart data, excluding anything not within a week prior to build time (this is where that slower moving window comes in).
```javascript
const deriveCharts = (tracks) => {
const charts = {
artists: {},
albums: {},
}
const tracksForLastWeek = Object.values(tracks).filter((track) => {
const currentDate = new Date()
const currentDateTime = new Date().getTime()
const lastWeek = new Date(currentDate.setDate(currentDate.getDate() - 7))
const lastWeekDateTime = lastWeek.getTime()
const trackDateTime = new Date(track.playTime).getTime()
return trackDateTime <= currentDateTime && trackDateTime > lastWeekDateTime
})
tracksForLastWeek.forEach((track) => {
if (!charts.artists[track.artist]) {
charts.artists[track.artist] = {
artist: track.artist,
genre: getKeyByValue(artistGenres, track.artist.replace(/\s+/g, '-').toLowerCase()),
url: `https://rateyourmusic.com/search?searchterm=${encodeURI(track.artist)}`,
plays: 1,
}
} else {
charts.artists[track.artist].plays++
}
if (!charts.albums[track.album]) {
charts.albums[track.album] = {
name: track.album,
artist: track.artist,
art: track.art,
url: track.url,
plays: 1,
}
} else {
charts.albums[track.album].plays++
}
})
return charts
}
```
_Cool_[^4]. GitHub triggers a rebuild of the site every hour, Netlify builds it, Eleventy optimizes images that are stored at bunny.net, Apple provides the listening data, Wasabi provides persistence.
There are some significant issues with this approach: it doesnt capture listens to an album in a loop (like me playing the new Outer Heaven record today — hails 🤘). It can get wonky when my diff function hits a track order that elicits a false positive return value.
{% image 'https://cdn.coryd.dev/blog/charlie.jpg', 'Charlie Day standing in front of "charts"', 'w-full', '600px' %}
"But Cory theres last.fm." I hear this, I love last.fm, but Ive got concerns about its age, ownership and maintenance. I dont want to be on the wrong end of a scream test when the wrong (right?) server rack gets decommissioned.
So, would I recommend pursuing this? Probably not, pretty definitely, probably not. It's, I think, as close as it can be to being an accurate but imperfect representation of what I listen to regularly. With that imperfect accuracy in mind I've replaced play counts on [my now page](https://coryd.dev/now) where this is all displayed with the genres I've associated with each artist[^5]. I _like_ where this is at. I'd **love** it if Apple would take away my crazy wall and give me a timestamp though.
[^1]: Yes this is a real song — see [Death Metal English (2013)](https://www.invisibleoranges.com/death-metal-english/)
[^2]: A technical term — by no means a measure of importance over here.
[^3]: A statement of fact, not a pejorative descriptor.
[^4]: Said as witheringly as John Oliver can muster.
[^5]: As exported from Music.app and programmatically transformed into JSON, naturally — feel free to email me and argue my choices. Are Runemagick slow enough to warrant being tagged as death doom metal rather than death metal? Is the granularity more valuable than broad, bucketed categories? Is the vocalist's delivery more black than death metal? The world may never know.