Random Musings on Technology

Online Music Services should Provide Play/Like history in a Standard format.

 • 

An old man's rant

Back in the day (20 years ago?) when online music services started (after the RIAA lost against MP3 piracy), one of the main features of online music players that you could not play something specific, but could get a mix based on artists that you liked. One of the services that offered this was LastFM (way back in 2002). Additionally you had the ability to "like" the music that you have liked, and "block" any song you didn't; logging every song you played and every other song you skipped.

The problem is... that since that time, there have been a plethora of new services (Rhapsody, Rrdio, Deezer, Pandora, Tidal, Spotify, Google PlayYoutube Music, among several more obscure others), and each of them saves in one way or another your list of played, skipped, loved and hated tracks, over and over. Every one of them think that the way they use your play history to suggest you new songs is the best one; and most likely all of it sucks.

The way I see it, that data should be available for download in an open and standard format. So that you could a) Move it between music services if you like and b) Use any "neutral" third party application (maybe open source?) to analyze your play history and give you insights.

Coming back to LastFM, they tried to do something like that: You can connect a LastFM plugin in some of the services that will send your listened songs to LastFM. The problem is that a) It is not open (you cannot download your history form LastFM in a suitable format) and b) Not all services support it, and c) Once you generate your data in LastFM, you can only listen it there.

Proposal for an Free, Open Music Streaming History Data format

What I am proposing is simple. Come up with a simple JSON (or YAML if we feel brave) standard that allows one to store the behavioural data as you play your music in any service, including things like: "loved this song", "skipped this song after N seconds", "hated this song", etc.

This will give everyone the ability to really own their data, and will open the opportunity to improve recommendation algorithms by third parties (open source or commercial).

The format could be something as simple as this:

listening_history.json 
-------
{
   exportDate: "2022-01-01T20:35Z",
   listenedTracks: [
    {
       artistName: "Metallica",
       albumName:  "Ride the Lightning", 
       trackName:  "For Whom the Bell Tolls",
       durationSeconds: 309,
       playData: {
          playTimeSeconds: 150,
          playScore: 1
       }
    },
    {
       artistName: "Megadeth",
       albumName:  "Youthanasia", 
       trackName:  "Victory",
       durationSeconds: 327,
       playData: {
          playTimeSeconds: 327,
          playScore: -1
       }
    }
 ...
]

All fields are straightforward except maybe playScore, which would be used to capture the "like" and "dislike" buttons across different services (i.e. when someone clicks on a like button, the playScore would be +1; clicking on a "skip/block" button would set the playScore to -1; finally 0 would be a default value for when there's not reaction to the song).

Checking on my cat whereabouts ... the geek way

 • 

Recently we installed a cat door at home to allow our cat to go in and out as she pleases. After installing it, my wife innocently asked "wouldn't it be cool if we could know when the cat goes out and in of the door?". Of course I took that as a dare, and a weekend project was born.

For this project, I setup a system that senses whenever the cat moves through the door (using an arduino-like board and a move sensor), and sends a ping to an AWS lambda function. The Lambda function registers the date and time of the ping and writes the timestamp in a Goole Spreadsheet into a new row.

The Cat (introducing Garrita)

This is Garrita, the one that started everything. She loves going in and out of the house at her own pleasure. For those of you who don't speak Spanish, "Garrita" roughly translates to "little claw".
garrita

And this is the door we want to spy:
the cat door

The hardware

For this project I used the following hardware:

  • An ESP32 board which was around $200 MXN ($10 USD), ESP32 board
  • HC SR501 PIR movement sensor that goes for around $161 MXN ($7 USD) for 5 pieces sensor
  • a bunch of jumper cables to connect the components, costing around $90 MXN (~$4 USD) for 120 cables. Jumper cables

Connecting the hardware was pretty straightforward. There are plenty of web sites that explain how to connect the PIR sensor into the ESP32 board. This one shows how to, as part of a nice project. Basically the PIR sensor has a positive (+), negative (-) and a data connector. The negative connects into the ESP32 ground, the data connects into any of the ESP32 data pins, and the positive connects into the VIN pin (this is instead of the standard 3 Volt pin, because the PIR connector requires 5 volts, which are available through the VIN PIN).

After connecting everything this is how the hardware looks like. Pretty simple:
all connected

The Software

ESP32 Board Software

For the software, I used the Arduino IDE. And connected the ESP32 to my MacBook for flashing. Note that I had to install UART serial port drivers for the ESP32 to be seen by OSX. The drivers can be downloaded from this page.

The software for the ESP32 is pretty simple, we setup an interrupt to listen to events when the PIR sensor detects movement, then we send a ping in the form of an HTTP GET request into the AWS API gateway lambda function that we will be using. Of course we must do all the Wifi initialization and other setup processes.

#include "WiFi.h"
#include <stdio.h>
#include <HTTPClient.h>
#include <TimeLib.h>

#define timeSeconds 10
#define MY_SSID "yourWiFiSSID"
#define PASSWORD "YourWifiPassword"
#include "soc/rtc_wdt.h"


#define MY_URL String("https://your-AWS-api-gateway-endpoint.aws.amazon.com/dev/event")
#define EVENT_DELAY_TIME 60
const int motionSensor = 27;

/* We use a flag to indicate when the sensor movement was detected because 
 * we have to send the HTTP request outside of the interrupt handler
*/
int sensor = 0;  
// Setu
time_t lastEventTime = now() - EVENT_DELAY_TIME ;

// Performs a GET request to the endpoint appending the specified sensor
void do_request(String sensor) {  
  String request_string =  MY_URL + String("?") +  sensor;
  HTTPClient http;
  http.begin(request_string);
  Serial.println(request_string);
  int httpCode = http.GET();  
    if (httpCode > 0) { //Check for the returning code
        String payload = http.getString();
      }
    Serial.println(httpCode);
    http.end(); //Free the resources
}

// Checks if motion was detected, sets LED HIGH and starts a timer
void IRAM_ATTR detectsMovement() {  
  Serial.println("Motiong detected");  
  // Interupt handler only raises the flag to send the sensor event
  time_t timeNow = now();

  // Wait at least 60 seconds between sending events
  Serial.println(timeNow - lastEventTime);    
  if (timeNow - lastEventTime > EVENT_DELAY_TIME) {

    Serial.println("Sending event");  
    sensor = 1;
    lastEventTime = timeNow;
  } else {
      Serial.println("Delayig event dispatch");  
  }
}

void setup()  
{  
    Serial.begin(115200);
    // Set WiFi to station mode and disconnect from an AP if it was previously connected
    WiFi.mode(WIFI_STA);
    WiFi.begin(MY_SSID, PASSWORD);
    if(WiFi.waitForConnectResult() != WL_CONNECTED) {
         Serial.println("Wifi Connection failed"); 
    }
    WiFi.printDiag(Serial);

    Serial.println("Setup done");
    pinMode (LED_BUILTIN, OUTPUT);
    // Set LED to LOW
    digitalWrite(LED_BUILTIN, LOW);

    // PIR Motion Sensor mode INPUT_PULLUP
    pinMode(motionSensor, INPUT_PULLUP);
    // Set motionSensor pin as interrupt, assign interrupt function and set RISING mode
    attachInterrupt(digitalPinToInterrupt(motionSensor), detectsMovement, RISING);


}

void loop()  
{
    // If the sensor flag is up then we send the request and reset the flag
    if (sensor > 0) {
      do_request( sensor == 1 ? "sensor_1" : "sensor_2");    
      sensor = 0;
    }
    // Wait a bit before looping
    delay(3000);

}

This code is also available on github.

We do some simple logic to prevent sending multiple pings for the same movement detection event. This is something that I have to improve in the future, doing a better noise reduction algorithm.

Serverless AWS/Lambda/HTTP-gateway

To capture the pings from the ESP32 board, I setup a Serverless framework project using AWS Lambda and HTTP gateway technologies. The full code is also available on github. But basically it mainly consists of a serverless.yml file defining the HTTP gateway, the lambda function, and the definition of some ENV variables to authenticate with Google Spreadshseets:

service: event-service

frameworkVersion: ">=1.1.0 <=2.65.0"

provider:  
  name: aws
  runtime: nodejs14.x
  environment: ${file(.env.yml):}
  stage: dev
  region: us-east-1
  lambdaHashingVersion:  "20201221"
functions:  
  eventSubmission:
    handler: api/event.submit
    memorySize: 128
    description: Submit Event for saving
    events:
      - http: 
          path: event
          method: get

The other main file is the event.js file, which receives the HTTP gateway GET request and appends a new row in a Google Spreadsheet with
the current date and time (in my timezone):

'use strict';  
const { google } = require('googleapis');


let sheets;

module.exports.submit = async (event, context, callback) => {

 const auth = authorize();
 sheets = google.sheets({ version: 'v4', auth });
 writeToSheet();
   const response = {
    statusCode: 200,
    body: JSON.stringify({
      message: 'Event registered successfully',
      input: event,
    }),
  };
  callback(null, response);
};

function authorize() {  
  const oAuth2Client = new google.auth.GoogleAuth({
    credentials: {
      client_email: process.env.GOOGLE_SERVICE_ACCOUNT_EMAIL,
      private_key: process.env.GOOGLE_PRIVATE_KEY.replace(/\\n/gm, '\n')
    },
    scopes: ['https://www.googleapis.com/auth/spreadsheets'],
  });
  return oAuth2Client;
} 


async function writeToSheet() {

  let dateTime = new Date().toLocaleString("es-MX", { timeZone: 'America/Mexico_City'});

  return sheets.spreadsheets.values.append({
    spreadsheetId: process.env.GOOGLE_SPREADSHEET_ID,
    range: 'data',
    resource: {
      values: [dateTime.split(' ')]
    },
    insertDataOption: 'INSERT_ROWS',
    valueInputOption: 'RAW'
  });
}

The Result

Once all of that is setup, we get a timestamp added into the spreadsheet every time Garrita passes the door.
spreadsheet.

This is the ESP32 board with the connected sensor:

Future work

There's a bunch of things that I have to do to improve this, among them:

  • Implement a better noise reduction algorithm in the board so that I get less false positive events triggered. Right now, some shadow movements trigger a ping.
  • Remove the Fresnel lens that sits on top of the PIR sensor, to focus its field of view only to the point where the cat moves through the door.
  • Place the whole hardware into a plastic enclosure for better presentation.

Using Spare Disk Space as One Large Disk

 • 

This is a neat trick in Linux for when you have a bit (some GB) in different disks and want to use it to save large files. I use this for mining XCH (Chia Cryptocoin).

  1. First check the available space on each disk using df -h:
Filesystem      Size  Used Avail Use% Mounted on  
/dev/sdg        4.6T  4.5T   84G  99% /media/omar/SEA_5T_B
/dev/sdp2        30T   30T   68G 100% /media/omar/seagate1

Here we have 2 disks, one with 84G and another with 68G. The size of the file we want to store is more than 100GB. To do that, we will first create "image" files using the remaining space in each disk:

truncate -s67G /media/omar/seagate1/67G.img  
truncate -s83G /media/omar/SEA_5T_B/83G.img  

Then, we create a loopback device with losetup for each disk:

losetup --show --find /media/omar/seagate1/67G.img  
losetup --show --find /media/omar/SEA_5T_B/83G.img  

We must pay attention to the data returned by those two previous commands, as they will tell us the name of the loopback device (e.g. /dev/loop1).

As a next step, we are going to use btrfs to create a single filesystem from those 2 devices:

sudo mkfs.btrfs -d single /dev/loop1 /dev/loop2 (Replace /dev/loop1 and /dev/loop2 with whatever you got from the losetup step)

Finally, we can mount our btrfs filesystem in some directory like:

sudo mount -t btrfs -o ssd,nodatacow,nodatasum,discard,nodiratime,discard=async,noatime /dev/loop1 loops/

In my case I added all those options to make access faster. With btrfs you can mount a several-disk filesystem by referring to one of the devices (like /dev/loop1 in this case)

After mounting this device, we do a df -h again and should see our new disk mounted in the specified directory with the summed amount of space:

Filesystem      Size  Used Avail Use% Mounted on  
/dev/loop1      150G  3.8M  148G   1% /media/omar/loops

And that's it! we've got 150G to use for large files!

Some references:

The Path to Kueski 2.0

 • 

Back in the day (5 years ago) I wrote a good blog post related to a migration the Engineering team did from a monolithic infrastructure into a "microservice based" architecture.

The original URL is this, but I've seen it has gone down a couple of times in the last days so I saved a copy of it in archive.is:

The Path to Kueski 2.0 (Archive).

Another good blog post that is worth saving, is the Better Logging (Archive) post by Jorge del Rio which has good ideas about logging in production environments.