obsolete.computer/geekery/

Nightly YouTube & Odysee Download Script

Since my internet provider is so terrible, I struggle to watch videos directly on the web. Occasionally it works, but my patience grows thin every time I try. So, of course I wrote a script that downloads subscriptions from my favorite channels in the middle of the night from both YouTube and Odysee. Hopefully someone out there will find it useful. It uses youtube-dl so naturally that'll have to be installed on your system for it to work.

It took me a while to get it just right and work around youtube-dl's quirks. It's tweaked to fit my particular needs: I wanted it to check each channel for a new video, and download only the latest video unless it's already been downloaded. I wanted to be able to set a max number of downloads per night, because my ISP places a separate cap on the data used at night time. I also wanted the ability to download certain channels from beginnning to end, something I called archive mode. Lastly, there are times that I wanted to grab only videos containing a certain phrase in the title, such as "Full Podcast" or the like. So I added the ability to use a different title filter for each channel, specified in the urls.txt config file. And of course I didn't want it eating up all my hard drive space, so I added a cleanup routine that gets rid of anything older than two weeks. Depending on your usage, you may want to tweak this value in the settings (top of the script).

Lastly, I wanted a .m3u playlist to be generated every day, just for the videos that were downloaded that day. And because I share the Videos folder over the network using minidlna, allowing me to watch using the Roku media player app, I also had to make it symlink each video file into the folders where the playlists are stored. (You'll see what I mean if you try the script.)

Honestly, I've found that 'browsing' internet videos this way is very satisfying. No ads, no recommended videos or other distractions, just the content I want to see, all immediately accessible -- and something new every day. Even if we ever get Starlink in our area, I am pretty sure I'll keep using this script, especially since I don't have a YouTube account.

Let's do it

To use the script, first make a folder at $HOME/.config/download-video-subs or else tweak the $CONFIGDIR variable to point wherever you want your config to be stored.

Next, create a file in the $CONFIGDIR called urls.txt. The syntax should be something like this:

urls.txt

#URL[;Title Filter][;Number of playlist items to check][;Format Override]

#8-bit Guy - regular "just fetch the latest episode, and delete when the video is older than the date specified by $FIRSTDATE"
https://www.youtube.com/channel/UC8uT9cgJorJPWu7ITLGo9Ww/videos

#Audiotree - title must contain "Full Sessions", check back 10 playlist items each script run rather than the default 5
https://www.youtube.com/channel/UCWjmAUHmajb1-eo5WKk_22A;Full Session;10

#Dave Smith - Part of the Problem
https://www.youtube.com/channel/UCEfe80CP2cs1eLRNQazffZw/videos

###Epic Family Road Trip - with format override (need better video quality for this channel)
https://www.youtube.com/channel/UC1Az_80tfW-1uEQBlUGXnww/videos;;;[height>=720]/best

#3Blue1Brown - Odysee channel example
https://odysee.com/@3Blue1Brown:b

Then, download the script below into ~/bin or wherever you keep your scripts, and of course chmod +x it. After doing some test runs, you can add it to your personal crontab file.

As always, please examine the script to understand what it does before you run it. Enjoy!

download-video-subs

#!/bin/bash

#This script will download one video for each channel in a list. See below for settings.
#After downloading $MAXDOWNLOADS videos, m3u8 playlists as well as folders full of symlinks are generated
#for all of the videos downloaded on that day. This script is meant to be run once per day
#(i.e. in the middle of the night). Playlists older than $CLEANUPDATE are cleaned up.
#You can pass "--skip-downloads" if you just want to regenerate the playslists without downloading anything, 
#and/or "--skip-playlists" to skip building the per-day playlists.

# #Your urls.txt file should take the following format... one URL for each line (begin comment lines with a #):
#
# #URL[;Title Filter][;Number of playlist items to check][;Format Override]

SCRIPTDIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"

#YTDL=/usr/local/bin/yt-dlp
YTDL="$SCRIPTDIR/yt-dlp"
[[ -x "$YTDL" ]] || exit 1

SKIPDOWNLOADS=false
SKIPPLAYLISTS=false
DOWNLOADSBYDATE=false

while test $# -gt 0
do
    case "$1" in
        --skip-downloads) SKIPDOWNLOADS=true
            ;;
        --do-downloads) SKIPDOWNLOADS=false
            ;;
        --skip-playlists) SKIPPLAYLISTS=true
            ;;
        --do-playlists) SKIPPLAYLISTS=false
            ;;
        --downloads-by-date) DOWNLOADSBYDATE=true
            ;;
        --downloads-by-playlist) DOWNLOADSBYDATE=false
            ;;
        --*) echo "bad option $1"
            ;;
        *) echo "argument $1"
            ;;
    esac
    shift
done

#File with URLS to check
URLSFILE="$SCRIPTDIR/urls-sean.txt"
#Archive file which keeps track of already downloaded videos
ARCHIVEFILE="$SCRIPTDIR/archive-$HOSTNAME.txt"
#Temp file, used to keep track of total downloads
TEMPFILE="/tmp/`basename "$0"`.$$"
#Where to store regular downloads
SHOWSFOLDER="$HOME/Videos/Internet-Shows"
#where to put generated playlists and symlinks
PLAYLISTFOLDER="$SHOWSFOLDER"
#Don't download videos older than this
FIRSTDATE="`date --date='-1 month' +%Y%m%d`"
#Don't download anything published after this date
LASTDATE="`date +%Y%m%d`"
#Don't keep videos longer than this
CLEANUPDATE="`date --date='-2 weeks' +%Y%m%d`"
#Filename template (see youtube-dl docs)
if [[ $DOWNLOADSBYDATE = false ]]; then
    FILETEMPLATE="%(playlist)s/%(upload_date)s-%(title)s.%(ext)s"
else
    FILETEMPLATE="$LASTDATE/%(playlist)s-%(title)s.%(ext)s"
fi
#Downloads per channel per script execution.
URLDOWNLOADS=1
#Total downloads per script execution.
MAXDOWNLOADS=15
#Check back this many videos in the playlist (can be overridden in the urls.txt file)
PLAYLISTEND=5
#see youtube-dl docs for valid format strings
FORMAT="best[height<=720]/best[height<=1080]"
#don't download currently live videos
FILTER="!is_live"
#850M = roughtly four hours
MAXFILESIZE=1024M
#Root URL For serving via HTTP
SHOWSROOTURL="http://$(hostname)/internet-shows"
#File types to clean up
DOWNLOADCLEANUPFILETYPES="(\.mp4|\.m4v|\.webm|\.part|\.md|\.jpeg|\.jpg|\.ytdl|\.vtt)"
PLAYLISTCLEANUPFILETYPES="(\.m3u|\.m3u8)"


#Start all the downloadin'
if [[ $SKIPDOWNLOADS = false ]]; then
    echo "Starting Downloads..."
    echo "" > "$TEMPFILE" || exit 1
    mkdir -p "$SHOWSFOLDER" || exit 1   

    grep -vE '^(\s*$|#)' "$URLSFILE" | while IFS=';' read -ra LINE; do
        URL=${LINE[0]}
        TITLEFILTER=${LINE[1]}
        if [[ "$TITLEFILTER" = "" ]]; then
            TITLEFILTER=".*"
        fi
        URLPLAYLISTEND=${LINE[2]}
        if [[ "$URLPLAYLISTEND" = "" ]]; then
            URLPLAYLISTEND=$PLAYLISTEND
        fi
        URLFORMAT=${LINE[3]}
        if [[ "$URLFORMAT" = "" ]]; then
            URLFORMAT="$FORMAT"
        fi

        $YTDL \
            --socket-timeout 30 \
            --download-archive "$ARCHIVEFILE" \
            --dateafter "$FIRSTDATE" \
            --max-downloads $URLDOWNLOADS \
            --max-filesize $MAXFILESIZE \
            --playlist-end $URLPLAYLISTEND \
            --match-filter "$FILTER" \
            --match-title "$TITLEFILTER" \
            --output "$SHOWSFOLDER/$FILETEMPLATE" \
            --restrict-filenames \
            --format "$URLFORMAT" \
            --no-progress \
            --no-mtime \
            --ignore-errors \
            --no-overwrites \
            --continue \
            --force-ipv4 \
            "`echo $URL | tr -d ' '`" | tee -a "$TEMPFILE" \

    #       --write-sub --write-auto-sub --sub-lang "en.*" \
    #       --simulate --verbose \

        TOTALDOWNLOADS=`grep -o 'Download completed' $TEMPFILE | wc -l`
        echo "Total downloads so far: $TOTALDOWNLOADS"
        if [ $TOTALDOWNLOADS -ge $MAXDOWNLOADS ]; then
            echo "Max downloads ($MAXDOWNLOADS) reached."
            break
        fi
    done

    #Clean up downloads folder
    echo "Cleaning Up Old Downloads..."
    find "$SHOWSFOLDER" \
        -type f \
        -regextype posix-egrep -regex ".*$DOWNLOADCLEANUPFILETYPES" \
        ! -newermt "$CLEANUPDATE" \
        -exec bash -c 'echo Removing "$0" && rm "$0"' {} \;

    #Remove Empty Directories and broken Symlinks
    find "$SHOWSFOLDER" -empty -type d -delete
else
    echo "Skipped Downloads"
fi

#Build playlists
if [[ $SKIPPLAYLISTS = false ]]; then
    echo "Building Playlists..."
    mkdir -p "$PLAYLISTFOLDER" || exit 1
    if [[ $DOWNLOADSBYDATE = false ]]; then
        d=$CLEANUPDATE
    else
        d=$LASTDATE
    fi
    while [ $d -le $LASTDATE ]; do
        mkdir -p "$PLAYLISTFOLDER/$d" || exit 1

        echo '#EXTM3U' > "$PLAYLISTFOLDER/$d/$d.m3u8"

        #Build .m3u8 playlist from download date (file modified time)
        find "$SHOWSFOLDER" \
            -type f \
            -iname "*.mp4" \
            -newermt "$d" ! -newermt "`date --date="$d +1 day" +%Y%m%d`" >> "$PLAYLISTFOLDER/$d/$d.m3u8"

        #Make a symlink for each playlist entry
        #(useful if you share this folder via DLNA or SMB to your Roku,
        #and unnecessary if the downloads are grouped by date)
        if [[ $DOWNLOADSBYDATE = false ]]; then
            cat "$PLAYLISTFOLDER/$d/$d.m3u8" | while read LINE; do
                [[ "$LINE" != "#EXTM3U" ]] && ln -s -f "$LINE" "$PLAYLISTFOLDER/$d/$(basename $(dirname "$LINE"))-$(basename "$LINE")"
            done
        fi

        #Create an HTTP-served version of the playlist.
        sed "s#${SHOWSFOLDER}#${SHOWSROOTURL}#g" "$PLAYLISTFOLDER/$d/$d.m3u8" > "$PLAYLISTFOLDER/$d/$d-http.m3u8"

        #If the downloads and playlists are in the same folder, use relative paths."
        if [[ $SHOWSFOLDER = $PLAYLISTFOLDER ]]; then
            sed -i "s#${SHOWSFOLDER}#\.\.#g" "$PLAYLISTFOLDER/$d/$d.m3u8"
        fi

        d=$(date --date="$d +1 day" +%Y%m%d)
    done

    echo "Cleaning Up Old Playlists..."
    #Clean up playlists - by filename
    find "$PLAYLISTFOLDER" \
        -type f \
        -regextype posix-egrep -regex ".*\/[0-9]{8}[^/]*$PLAYLISTCLEANUPFILETYPES" \
        -exec bash -c 'fn=${0##*/}; d=${fn:0:8}; [[ $d -lt $1 ]] && echo Removing "$0" && rm "$0"' {} $CLEANUPDATE \;

    #Clean up - by modified date (if no date in filename)
    find "$PLAYLISTFOLDER" \
        -type f \
        -regextype posix-egrep -regex ".*$PLAYLISTCLEANUPFILETYPES" \
        ! -newermt "$CLEANUPDATE" \
        -exec bash -c 'echo Removing "$0" && rm "$0"' {} \;

    #Remove Empty Directories and broken Symlinks
    find -L "$PLAYLISTFOLDER" -type l -delete
    find "$PLAYLISTFOLDER" -empty -type d -delete
else
    echo "Skipped Building Playlists"
fi

echo "Done."

[[ -f "$TEMPFILE" ]] && rm -f "$TEMPFILE"

Modified Friday, September 30, 2022