Friday, August 29, 2008

The Music Project

I was talking to my friend Brian Goetz recently, and he reminded me of a blog entry he posted a while back. He's digitizing his entire music collection, and he's done all the research. This is appealing to me because I recently switched over to zero reliance on the shiny plastic disks once I've ripped them. All my music consumption is now electronic, mostly via iPod (the car was the last frontier, and now that's switched over as well). I have a large-ish CD collection (~1500 CDs), so translating them to electronic form is no small feat. And I only have to do it once for the rest of my life.

Based on my conversations with Brian, I realized the following:
  • Ripping and encoding are completely separate activities
  • The most important thing is to get a reliable, error corrected rip
  • You don't want to get stuck with a proprietary format (as much as I love Apple, I don't want my music collection tied up in one of their formats forever)
  • Lossless is the way to go, so that you don't loose any data in this process
  • Cross-encoding allows you to rip your music in a "core" format and selectively support proprietary formats and reduced file sizes (and lossiness)
His original entry is about his Windows and Linux based infrastructure. I replicated his same setup (with minor differences) on the Mac.

A Home

First, you need a home for all these files. I recently got a network attached storage device to hold our growing collection of digital photos. I'm my family's tech support, and if anything ever happens to those photos, I'm a dead man! After doing a lot of research on quality, Mac compatibility (my household is 100% Mac now), I ended up with the NETGEAR RND4250 ReadyNAS, which comes with 500 GB of storage (2 mirrored 500 GB drives), with a capacity of 2 TB. I ended up bumping it up to a TB when I realized how big the combination of music + pictures would be, and the Netgear handled it beautifully.

Ripping

I didn't realize this, but most rippers don't take advantage of the error correction bits on the CD, and they are wimps: they give up way too easily. The trick is to find a ripper that is relentless and tries its hardest to get all the bits off the CD. The ripper/encoder I ended up using is Max, which supports Leopard in it's latest (allegedly) unstable form, and previous Mac OS X versions in it's previous (stable) release. I caveat "unstable" because I used it a lot and it was rock solid for me on Leopard. One of the nice things about Max is its support for different rippers: fast & carefree, or paranoid. The latter is the one I want, based on the cdparanoia project. You can configure this ripper to never give up until it gets a clean read of the disk. Several of my CDs trundled for 6-10 hours before Max finally reported success, including a couple that I'd given up for dead.

Encoding

OK, so now I have a good rip, I need to encode it (this post makes it sound like separate steps, but Max handles both for you). As I stated earlier, I don't want to get trapped by a specific format. I ended up (like Brian) choosing FLAC. FLAC is a open standard encoding for music that offers lossless compression, which is what I wanted. The FLAC spec also allows for more aggressive compression without loss of data, depending on your patience. It's designed to take longer to encode but have no impact on playback time. I choose the most aggressive because I have time, hardware (a MacPro with 4 processors), and I want to conserve space if possible. But, iTunes (which is how I play and sync my music) doesn't support FLAC. Max to the rescue: it will let you do parallel encoding. I set Max up to encode the ripped music files to both lossless FLAC and lossless MP4 (Apple's format). The only downside is that it won't allow you to choose different directories for the encoding. The FLAC files I'm placing on a RAID mirrored network-attached storage drive (remember, I never want to do this again!). So, I ended up writing a little Rake file to handle automatically moving the files from one place to another. I rip them all to the RAID drive, then let the script move them (preserving directories) to the other. The script is here, if anyone wants it (no warranty expressed or implied -- you'll have to change all the directories, and if you use this to erase your hard drive I'll shed a tear for you, but might just laugh).
task :copy do
count = 0
skipped = 0
FileList["**/*.m4a"].each do |f|
artist, album = recording_info_based_on f
if File.exist? "#{DEST}/#{artist}/#{album}/#{File.basename(f)}"
puts "\tsomething is amiss; I'm skipping: #{f}"
skipped += 1
else
FileUtils.mkdir "#{DEST}/#{artist}" unless File.exist? "#{DEST}/#{artist}"
FileUtils.mkdir "#{DEST}/#{artist}/#{album}" unless File.exist? "#{DEST}/#{artist}/#{album}"
puts "#{artist} - #{album} - #{File.basename(f)}"
count += 1
FileUtils.cp f,"#{DEST}/#{artist}/#{album}"
end
end
puts "copied #{count} files\nskipped #{skipped} files"
end

def recording_info_based_on filename
File.expand_path(filename) =~ /.*\/(.*)\/(.*)\/.*/
return $1, $2
end

I also made a rake task to report any that ended up missing from the original FLAC directories to the AAC files (just in case something went amiss during a copy process, or I screwed up and deleted something by mistake). I want to make sure that the convenience Apple-format files match the canonical source (the FLAC) files. So, this is the "missing" rake task:

task :report_missings do
count = 0
FileList["**/*.flac"].each do |f|
artist, album = recording_info_based_on f
dest_file_name = File.basename(f).sub /flac/, "m4a"
unless File.exist? "#{DEST}/#{artist}/#{album}/#{dest_file_name}"
puts "missing #{f.sub /\.flac/, ''}"
count += 1
end
end
puts "found #{count} missing files"
end

Result

It took me about 2 months of ripping while I'm around my computer, running 2 computers (my laptop and desktop) in parallel. In the end, though, I ended up with 453 GB of music files, the FLAC ones safely tucked away on a mirrored drive and the M4A ones on my desktop, ready to be synced to my iPod (or a subset of them, anyway). Now, when I get a new CD, I rip it using Max to the NAS and either copy the files by hand (if it's just one CD) or use the Rake file to move lots en-masse. Storage is now dirt cheap, and I've leveraged almost a terabyte of it keeping the music files in 2 formats. I also recently bought a portable 500 GB drive so that I can keep all my music with me on the road. It's a copy of the desktop M4A files, but it's easy just to mirror the Music directory from the desktop to the portable drive.

I achieved my goal: an open archival format that I hope will be around for a very long time, and a convenience version for the way I happen to consume them today. And the shiny disks? I put them all in binders, so that if I ever need one of them (or it's sleeve), I can rummage around in the (mostly) alphabetical CD volumes. I didn't put a huge amount of effort creating an expandable storage that makes it easy to keep them in strict order because that would take lots of effort and it isn't something I expect to have to do often. If it turns out I got back to them all the time, I'll invest the time then.

12 comments:

Paul said...

A portable 500 TB drive? I wish this was a reality :D

Matt said...

This is a great post. I follow a similar process - I don't buy lossy digital music (well, there are some exceptions), I rip/encode to lossless FLAC, and use a BASH script to search through my FLAC collection and re-encode to lossy M4A for my iPod. (This is all on Linux). I don't have NAS, and wouldn't trust it myself anyway (what happens if the house burns down?). Instead, I backup my files to Amazon S3 via JungleDisk. I wasn't aware of FLAC supporting lossy compression, nor M4A supporting lossless compression, so I'm going to have to do some research on that now...

Matt said...

You know, now that I've looked at the FLAC documentation again, I think the statement "FLAC is a open standard encoding for music that offers lots of levels of lossiness (including lossless, which is what I wanted)" is misleading. According to http://flac.sourceforge.net/features.html , "...FLAC is intended for lossless compression only, as there are many good lossy formats already..."

Neal Ford said...

Fixed errors: 500 GB, not TB (I wish that was true as well).

FLAC is in fact entirely lossless.

Thanks for the corrections!

glv said...
This comment has been removed by the author.
glv said...

(Sorry about that last comment ... 1Password betrayed me.)

Thanks, Neal ... great info.

It's really a shame Apple won't open the Apple Lossless spec. There are good reasons for the format to exist -- FLAC isn't great for streaming to underpowered devices like Airport Express, whereas Apple Lossless is less symmetric in its CPU demands. And I can't think of a good reason to keep it proprietary. But your solution is a great practical compromise.

Giles Bowkett said...

Blatant self-promotion: I did something similar with Rails and an iPod last year.

http://gilesbowkett.blogspot.com/2007/12/back-up-your-ipod-with-rails.html

breun said...

Everyone calls MP4/AAC an Apple format, but it's not. It's just the successor to MP3: http://en.wikipedia.org/wiki/Advanced_Audio_Coding It was not created by Apple, Apple just added support for it in iTunes, since it's a better format than MP3. There are also open source encoders and decoders for AAC if that's important to you.

I also don't quite get why you didn't go with Apple Lossless. It's not an open standard, but again: there are open source libraries that can read these files, you can play Apple Lossless in iTunes and you're not trapped as you can always transcode to another lossless or lossy format in the future.

John J. said...

Great article, thanks. But I have one question: How did you handle the metadata of your CDs? How did they survive the several encoding processes?

Neal Ford said...

The magic of cue sheets kept all the meta-data for the entire trip. All the meta-data's intact. In fact, Max downloads track information automatically for lots of stuff, and I added it myself for the really obscure stuff. There is some magic about how FLAC (and others) handle cue sheets, but I never delved into it because it all Just Worked.

Brian said...

I've released my (newly rewritten) scripts for this; see: http://www.briangoetz.com/blog/?p=92

Brian said...

The next challenge -- what's your offsite backup strategy? Cheap services like Mozy Home won't back up NAS drives.