Remember the BBC Sound Archive from a while back?
here's how to download them all.
GETLIST=`curl -s http://bbcsfx.acropolis.org.uk/assets/BBCSoundEffects.csv | awk -F '"' '{ print $2 }' |
awk -F '.' '{ print $1 }' | grep -v location`
for i in $GETLIST; do
wget http://bbcsfx.acropolis.org.uk/assets/$i.wav
done
@wohali
do you know if there's any mapping between filename and description available?
@js0000 Yes, that initial .csv file has all the data. just run:
wget http://bbcsfx.acropolis.org.uk/assets/BBCSoundEffects.csv
and open in your favourite text editor or spreadsheet program.
also the WAV files have metadata in them already, if your browser supports it
@wohali I think if you pass that as a list to wget, you benefit from keepalives and pipelining, which might be handy given that it's a *lot* of small files.
@aschmitz good point, care to suggest a modification? my brain hurts.
@wohali Eh, it's bad and untested but:
GETLIST=`curl -s http://bbcsfx.acropolis.org.uk/assets/BBCSoundEffects.csv | awk -F '"' '{ print $2 }' |
awk -F '.' '{ print $1 }' | grep -v location`
rm uris.txt
for i in $GETLIST; do
echo http://bbcsfx.acropolis.org.uk/assets/$i.wav >> uris.txt
done
wget -i uris.txt
In theory you could run that file though parallel or something too, but I wouldn't want to hammer the server too much.
@aschmitz thanks! yeah, you wouldn't want to get auto-blocked, either... :)
@wohali
running now ...
thank you!