Sometimes I interact with folks interested in digital projects that entail some form of video analysis. These noble hypothetical folk, whether they know it or not, join a quest to augment Digital Humanities discourse with a format that doesn’t get enough attention. Brave souls. Just for starters, video data sources can be tough to gain access to. For projects of this type Internet Archive (IA) is a shining star amidst a sea of negligible possibilities. Numerous (>1.9 million videos) and diverse content combined with some excellent tutorials help you bulk download IA data relatively easily. A boon for video research. Up until a couple days ago I would not have thought to consider Youtube as a source worth anything but headaches, or perhaps a laugh at a fainting goat, but a neat command line tool called Youtube-DL definitely changed that. For research purposes, if you wanted all video content produced by The White House, video content that matches the search query ‘Dragons’, a single video, or perhaps a custom playlist of videos and perhaps even associated text transcripts, then Youtube-DL is a game changer, knowledge of which may possibly make you start spinning in circles right at this very moment.
Enumerated game changing qualities:
- Low barrier use – no programming chops needed to get started
- Easy to scale – download one video, download all video from a playlist, download one/some/all video created by user (e.g. The White House), download video(s) that match a search query
- Manage data – impose file naming conventions on collected data derived from various components of the file (e.g. dateuploaded_user_title.mp4)
- Granular control – specify video format and quality, control dataset size (e.g. download up to 1 GB of data and stop)
- More than video – download one or all available text transcripts, extract audio from video
In what follows I’ll work through how to install Youtube-DL and implement some of the awesome discussed above.
What You Need
Brew – package management system, basically makes it easier for you to install software
FFmpeg – lets you manipulate multimedia content, basically the all the things of multimedia work
Youtube-dl – command line program for downloading Youtube content
Installing FFmpeg and Youtube-dl
– Open Terminal
– Enter the following commands
brew install ffmpeg
brew install youtube-dl
Use Case – Building a White House dataset
You want to capture video related to the Obama Presidency. Starting with the Inaugural Address is probably as good a place as any. Maybe you want to study characteristics of video composition (video data), perform some audio analysis (audio data), and maybe even consider a text analysis of the inaugural speech (text data). Eventually you might even decide you want video produced by The White House between a certain period of time. Perhaps you might also want to build a playlist related to White House coverage of Ferguson and download that – videos, video descriptions, audio, and subtitle text data. What follows should give you what you need to approach all of the above.
– Create a folder to contain files you capture
– Open Terminal
– In Terminal navigate to the folder you created, e.g. cd/Desktop/youtubedl/whitehouse
After making your way to the folder, you have a number of different ways to use Youtube-dl:
Single item, default to highest quality video
youtube-dl https://www.youtube.com/watch?v=3PuHGKnboNY
Single item, with file naming conventions imposed
youtube-dl --restrict-filenames -o "%(upload_date)s.%(uploader)s.%(title)s.%(ext)s" https://www.youtube.com/watch?v=3PuHGKnboNY
Single item, extract audio
youtube-dl --restrict-filenames --extract-audio --audio-format "mp3" -o "%(upload_date)s.%(uploader)s.%(playlist)s.%(title)s.%(ext)s" https://www.youtube.com/watch?v=3PuHGKnboNY
Single item, extract subtitles
youtube-dl --restrict-filenames --all-subs -o "%(upload_date)s.%(uploader)s.%(playlist)s.%(title)s.%(ext)s" https://www.youtube.com/watch?v=3PuHGKnboNY
Multiple items, download content from user between dates
youtube-dl --dateafter 20150101 --datebefore 20150107 --restrict-filenames -o "%(upload_date)s.%(uploader)s.%(playlist)s.%(title)s.%(ext)s" https://www.youtube.com/user/whitehouse
Multiple items, search query ‘obama and ferguson’ – five videos
youtube-dl -t ytsearch5:"obama and ferguson" --restrict-filenames
Multiple items, build a playlist – download video, video descriptions, audio, and subtitles
youtube-dl --restrict-filenames --write-description --extract-audio --audio-format "mp3" -k --all-subs -o "%(upload_date)s.%(uploader)s.%(playlist)s.%(title)s.%(ext)s" https://www.youtube.com/playlist?list=PLf7yYLO8w1_lSVBqeZmp17dy7kvql6qTP
And there you have it. Youtube data for research. After working through the above consider some of Youtube-dl’s more advanced features.
A great little tutorial Thomas! I haven’t played around with the date range feature. Very useful!