wohali is a user on octodon.social. You can follow them or interact with them if you have an account anywhere in the fediverse. If you don't, you can sign up here.
wohali @wohali

Remember the BBC Sound Archive from a while back?

here's how to download them all.

GETLIST=`curl -s bbcsfx.acropolis.org.uk/assets | awk -F '"' '{ print $2 }' |
awk -F '.' '{ print $1 }' | grep -v location`
for i in $GETLIST; do
wget bbcsfx.acropolis.org.uk/assets
done

· Web · 7 · 6

@wohali
do you know if there's any mapping between filename and description available?

@js0000 Yes, that initial .csv file has all the data. just run:

wget bbcsfx.acropolis.org.uk/assets

and open in your favourite text editor or spreadsheet program.

also the WAV files have metadata in them already, if your browser supports it

@wohali I think if you pass that as a list to wget, you benefit from keepalives and pipelining, which might be handy given that it's a *lot* of small files.

@aschmitz good point, care to suggest a modification? my brain hurts.

@wohali Eh, it's bad and untested but:

GETLIST=`curl -s bbcsfx.acropolis.org.uk/assets | awk -F '"' '{ print $2 }' |
awk -F '.' '{ print $1 }' | grep -v location`
rm uris.txt
for i in $GETLIST; do
echo bbcsfx.acropolis.org.uk/assets >> uris.txt
done
wget -i uris.txt

In theory you could run that file though parallel or something too, but I wouldn't want to hammer the server too much.

@aschmitz thanks! yeah, you wouldn't want to get auto-blocked, either... :)