obsolete.computer/geekery/

Nightly YouTube & Odysee Download Script

Since my internet provider is so terrible, I struggle to watch videos directly on the web. Occasionally it works, but my patience grows thin every time I try. So, of course I wrote a script that downloads subscriptions from my favorite channels in the middle of the night from both YouTube and Odysee. Hopefully someone out there will find it useful. It uses youtube-dl so naturally that'll have to be installed on your system for it to work.

It took me a while to get it just right and work around youtube-dl's quirks. It's tweaked to fit my particular needs: I wanted it to check each channel for a new video, and download only the latest video unless it's already been downloaded. I wanted to be able to set a max number of downloads per night, because my ISP places a separate cap on the data used at night time. I also wanted the ability to download certain channels from beginnning to end, something I called archive mode. Lastly, there are times that I wanted to grab only videos containing a certain phrase in the title, such as "Full Podcast" or the like. So I added the ability to use a different title filter for each channel, specified in the urls.txt config file. And of course I didn't want it eating up all my hard drive space, so I added a cleanup routine that gets rid of anything older than two weeks. Depending on your usage, you may want to tweak this value in the settings (top of the script).

Lastly, I wanted a .m3u playlist to be generated every day, just for the videos that were downloaded that day. And because I share the Videos folder over the network using minidlna, allowing me to watch using the Roku media player app, I also had to make it symlink each video file into the folders where the playlists are stored. (You'll see what I mean if you try the script.)

Honestly, I've found that 'browsing' internet videos this way is very satisfying. No ads, no recommended videos or other distractions, just the content I want to see, all immediately accessible -- and something new every day. Even if we ever get Starlink in our area, I am pretty sure I'll keep using this script, especially since I don't have a YouTube account.

Let's do it

To use the script, first make a folder at $HOME/.config/download-video-subs or else tweak the $CONFIGDIR variable to point wherever you want your config to be stored.

Next, create a file in the $CONFIGDIR called urls.txt. The syntax should be something like this:

urls.txt

#URL[;Title Filter][;Number of playlist items to check][;Format Override]

#8-bit Guy - regular "just fetch the latest episode, and delete when the video is older than the date specified by $FIRSTDATE"
https://www.youtube.com/channel/UC8uT9cgJorJPWu7ITLGo9Ww/videos

#Audiotree - title must contain "Full Sessions", check back 10 playlist items each script run rather than the default 5
https://www.youtube.com/channel/UCWjmAUHmajb1-eo5WKk_22A;Full Session;10

#Dave Smith - Part of the Problem
https://www.youtube.com/channel/UCEfe80CP2cs1eLRNQazffZw/videos

###Epic Family Road Trip - with format override (need better video quality for this channel)
https://www.youtube.com/channel/UC1Az_80tfW-1uEQBlUGXnww/videos;;;[height>=720]/best

#3Blue1Brown - Odysee channel example
https://odysee.com/@3Blue1Brown:b

#HexDSL - Bitchute playlists work too
https://www.bitchute.com/channel/hexdsl/

Then, download the script below into ~/bin or wherever you keep your scripts, and of course chmod +x it. After doing some test runs, you can add it to your personal crontab file.

As always, please examine the script to understand what it does before you run it. Enjoy!

download-video-subs

#!/bin/bash

#This script will download one video for each channel in a list. See below for settings.
#After downloading $MAXDOWNLOADS videos, m3u8 playlists as well as folders full of symlinks are generated
#for all of the videos downloaded on that day. This script is meant to be run once per day
#(i.e. in the middle of the night). Playlists older than $CLEANUPDATE are cleaned up.
#You can pass "--skip-downloads" as $1 if you just want to regenerate the playslists without downloading anything.

# #Your urls.txt file should take the following format... one URL for each line (begin comment lines with a #):
#
# #URL[;Title Filter][;Number of playlist items to check][;Format Override]

YTDL=/usr/local/bin/youtube-dl
[[ -x "$YTDL" ]] || exit 1

SCRIPTDIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"

#File with URLS to check
URLSFILE="$SCRIPTDIR/urls-sample.txt"
#Archive file which keeps track of already downloaded videos
ARCHIVEFILE="$SCRIPTDIR/archive-$HOSTNAME.txt"
#Temp file, used to keep track of total downloads
TEMPFILE="/tmp/`basename "$0"`.$$"
#Where to store regular downloads
SHOWSFOLDER="$HOME/Videos/Internet-Shows"
#where to put generated playlists and symlinks
PLAYLISTFOLDER="$HOME/Videos/Daily-Playlists"
#Don't download videos older than this
FIRSTDATE="`date --date='-6 months' +%Y%m%d`"
#Don't download anything published after this date
LASTDATE="`date +%Y%m%d`"
#Don't keep videos longer than this
CLEANUPDATE="`date --date='-2 weeks' +%Y%m%d`"
#Filename template (see youtube-dl docs)
FILETEMPLATE='%(playlist)s/%(upload_date)s-%(title)s.%(ext)s'
#Downloads per channel per script execution.
URLDOWNLOADS=1
#Total downloads per script execution.
MAXDOWNLOADS=20
#Check back this many videos in the playlist (can be overridden in the urls.txt file)
PLAYLISTEND=5
#see youtube-dl docs for valid format strings
FORMAT="[height<=480]/worst"
#don't download currently live videos
FILTER="!is_live"
#850M = roughtly four hours
MAXFILESIZE=1024M
#Root URL For serving via HTTP
SHOWSROOTURL="http://$(hostname)/internet-shows"


#Start all the downloadin'
mkdir -p "$SHOWSFOLDER" || exit 1
mkdir -p "$PLAYLISTFOLDER" || exit 1
echo "" > "$TEMPFILE" || exit 1
if [ "$1" != "--skip-downloads" ]; then
grep -vE '^(\s*$|#)' "$URLSFILE" | while IFS=';' read -ra LINE; do
    URL=${LINE[0]}
    TITLEFILTER=${LINE[1]}
    if [ "$TITLEFILTER" = "" ]; then
        TITLEFILTER=".*"
    fi
    URLPLAYLISTEND=${LINE[2]}
    if [ "$URLPLAYLISTEND" = "" ]; then
        URLPLAYLISTEND=$PLAYLISTEND
    fi
    URLFORMAT=${LINE[3]}
    if [ "$URLFORMAT" = "" ]; then
        URLFORMAT="$FORMAT"
    fi

    $YTDL \
        --socket-timeout 30 \
        --download-archive "$ARCHIVEFILE" \
        --dateafter "$FIRSTDATE" \
        --max-downloads $URLDOWNLOADS \
        --max-filesize $MAXFILESIZE \
        --playlist-end $URLPLAYLISTEND \
        --match-filter "$FILTER" \
        --match-title "$TITLEFILTER" \
        --output "$SHOWSFOLDER/$FILETEMPLATE" \
        --restrict-filenames \
        --format "$URLFORMAT" \
        --no-progress \
        --no-mtime \
        --ignore-errors \
        --no-overwrites \
        --continue \
        --force-ipv4 \
        "`echo $URL | tr -d ' '`" | tee -a "$TEMPFILE" \
#       --simulate --verbose \

    TOTALDOWNLOADS=`grep -o 'Download completed' $TEMPFILE | wc -l`
    echo "Total downloads so far: $TOTALDOWNLOADS"
    if [ $TOTALDOWNLOADS -ge $MAXDOWNLOADS ]; then
        echo "Max downloads ($MAXDOWNLOADS) reached."
        break
    fi
done
fi

#Clean up playlists - by filename
find "$PLAYLISTFOLDER" \
    -type f \
    -regextype posix-egrep -regex ".*\/[0-9]{8}[^/]*(\.m3u|\.m3u8)" \
    -exec bash -c 'fn=${0##*/}; d=${fn:0:8}; [[ $d -lt $1 ]] && echo Removing "$0" && rm "$0"' {} $CLEANUPDATE \;

#Clean up - by modified date (if no date in filename)
find "$SHOWSFOLDER" \
    -type f \
    -regextype posix-egrep -regex ".*(\.mp4|\.m4v|\.webm|\.part|\.md|\.jpeg|\.jpg|\.ytdl)" \
    ! -newermt "$CLEANUPDATE" \
    -exec bash -c 'echo Removing "$0" && rm "$0"' {} \;

#Remove Empty Directories and broken Symlinks
find "$SHOWSFOLDER" -empty -type d -delete
find -L "$PLAYLISTFOLDER" -type l -delete
find "$PLAYLISTFOLDER" -empty -type d -delete


#Build playlists
echo "Building Playlists..."
d=$CLEANUPDATE
while [ $d -le $LASTDATE ]; do
    mkdir -p "$PLAYLISTFOLDER/$d" || exit 1

    echo '#EXTM3U' > "$PLAYLISTFOLDER/$d/$d.m3u8"

    #Build .m3u8 playlist from download date (file modified time)
    find "$SHOWSFOLDER" \
        -type f \
        -iname "*.mp4" \
        -newermt "$d" ! -newermt "`date --date="$d +1 day" +%Y%m%d`" >> "$PLAYLISTFOLDER/$d/$d.m3u8"

    #Make a symlink for each playlist entry
    #(useful if you share this folder via DLNA or SMB to your Roku)
    cat "$PLAYLISTFOLDER/$d/$d.m3u8" | while read LINE; do
        [[ "$LINE" != "#EXTM3U" ]] && ln -s -f "$LINE" "$PLAYLISTFOLDER/$d/$(basename $(dirname "$LINE"))-$(basename "$LINE")"
    done

    #Convert to HTTP-served version of the playlist. Comment out if not accessing via a webserver
    sed -i "s#${SHOWSFOLDER}#${SHOWSROOTURL}#g" "$PLAYLISTFOLDER/$d/$d.m3u8"

    d=$(date --date="$d +1 day" +%Y%m%d)
done

[[ -f "$PLAYLISTFOLDER/$LASTDATE/$LASTDATE.m3u8" ]] && cp "$PLAYLISTFOLDER/$LASTDATE/$LASTDATE.m3u8" "$PLAYLISTFOLDER/today.m3u8"

echo "Done."

[[ -f "$TEMPFILE" ]] && rm -f "$TEMPFILE"

Modified Friday, September 30, 2022