In search of a definitive album rating formula

When it comes to my iTunes library, I’m a regular statistics nut. Sure, my library exists primarily for my own enjoyment, but it contains so much organically-compiled data about my habits and tastes that I can’t help but want to take a look at it and find out what the data says about my interests.

But for a while now, I’ve struggled to quantify, tabulate and analyze the overall sense of my library. Which of my albums albums are truly the greatest? Which artists, when the sum of their parts are combined, are really my favorites? And by how much? I want numbers.

None of the iTunes stats options available at the moment give me the type of results that I want. The Album Ranking AppleScript provides a simple average that skews toward albums with fewer tracks. SuperAnalyzer provides a top 10 list that is skewed toward albums with more tracks.

Most iTunes stats tools simply provide averages or totals of play counts and/or star ratings. Averages, while somewhat useful, can be misleading. An album could have a handful of awesome songs and a bunch of filler and still rank as well as and album that’s consistently good, but without much breakout material.

And that can be frustrating to me, because, in terms of album or artist worth, I tend to value the ones with consistent performance.

Take, for example, my recent run-down of Air’s discography, specifically the albums 10000 Hz Legend and The Virgin Suicides. After many years of listening, my artistic impression is that Virgin Suicides is ever so slightly the better of the two. The songs on Legend vary from excellent to clunkers. Suicides is overall pretty good, with only one exceptional track. However, averaging my ratings shows that Suicides is a 3.85 while Legend rates as an even 4.

So, to reward albums that don’t veer wildly around the quality wheel, I’ve developed my own album rating formula that takes into account the consistency of all the star ratings on a given album.

The Formula

album rating = (mean of all songs + median of all songs) - standard deviation of the set

The mean sums up the whole of the album. The median shows the state of the album at its core. The standard deviation indicates the variety of the individual ratings. The result is a number on a scale of 1 to 10. (Alternately, divide that number by 2 to return the result to a 5-star scale).

Let’s take a look at the formula in action. Suppose we have two albums with twelve songs each. The first is generally excellent, but varies in quality. The second is good stuff throughout.

Ex. 1 Ex. 2
5 4
4 4
5 4
2 4
4 4
5 4
5 4
2 4
5 4
3 4
5 4
3 4
Mean 4 4
Median 4.5 4
total 8.5 8
STDEV 1.21 0
Score 7.29 8

This table shows the individual star ratings for the two theoretical albums, as well as all the statistical data, as calculated by Excel. As you can see, both albums average score is the same (4) and Ex 1 even has a higher median than Ex 2. But, because the quality of Ex 1’s songs vary a great deal, its standard deviation is substantial, so much so that its album rating becomes 7.29 (or 3.645 on a 5-star scale) when my formula is applied. Ex 2’s score suffers no penalty and its score remains 8 (4). In this case, the standard deviation awarded Ex 2 a bonus for being of uniform quality.

Let’s take a real world example, the two Air albums I mentioned above.

10 kHz Legend Virgin Suicides
4 4
5 4
4 4
5 3
5 3
4 4
3 5
4 4
3 4
3 4
4 4
4
3
Mean 4 3.84
Median 4 4
 
total 8 7.84
 
STDEV 0.77 0.55
 
Score 7.23 7.29

When the formula is applied to my ratings for each, the scores for 10000 Hz Legend and The Virgin Suicides become 7.23 (3.62) and 7.29 (3.65), respectively. So factoring in the standard deviation results in a score that more closely reflect my thoughts of those two albums.

So what does this mean? I’m not sure exactly. In practice, I could whip up some listy goodness and see which albums are truly my favorites. A comprehensive analysis would be cool. I’d love to see the distribution of my album ratings. However, that would require more programming skills than I have. Though that could be a good project to help me learn.

Out of curiosity though, I have picked 10 albums, just to see how they rate. One provision, of course, is that every song on an album must have a rating before the album score can be calculated. These ratings are on a 5-star scale.

AVG My Score
Radiohead – OK Computer 4.5 4.41
Air [french band] – Moon Safari 4.5 4.39
Nirvana – Nevermind 4.5 4.24
Mouse on Mars – Radical Connector 4.33 4.23
Ratatat – Ratatat 4.45 3.97
Nine Inch Nails – With Teeth 4.31 3.77
The Strokes – Is this it? 4.09 3.7
LCD Soundsystem – LCD Soundsystem 4 3.68
Basement Jaxx  –  Remedy 3.73 3.51
Prefuse 73 – One Word Extinguisher 3.82 3.47
Weezer – Make Believe 3.58 3.21

This is by no means a top 10 list, but it is interesting to see where things ended up. It’s also interesting to see how minor fluctuations in star ratings can change the final score. For instance, if that Ratatat album had one more 5 star song in place of a 4 star song, its median number would become 5 and its album score would jump to 4.51. Lower a 5 star to a 4 star and the score only drops slightly to 3.93. I don’t know if this is a flaw in the formula or a reward for albums that have a lot of good songs.

Problems and issues

Small data sets. These are troublesome in all statistical circumstances and this formula is no different. Albums with only one song will, by definition, not have a mean, median or standard deviation, and that kills the formula with a divide-by-zero error. Also, because the formula uses the average rating as a component, albums with a low number of songs will tend to skew one way or the other.

In my library, Boards of Canada’s EP In A Beautiful Place Out In The Country has four fantastic songs and ranks at 4.63, higher than anything on that list above. As a release, I’d say that’s accurate, but I’m sure it doesn’t surpass OK Computer. I would be interested to see a chart of how the album score changes as the number of tracks on an album increases.

Additionally, I haven’t figured out a way to rank partial albums, i.e. albums where I either don’t own all the songs or albums where I’ve deleted songs I didn’t like. For now, I’m just excluding them altogether.

Still, I’m fairly pleased with the results I’ve been getting as I run various albums through the formula. It’s working for me and my own song rating system, but I’m curious to see how it works with someone else’s.

Fortunately, Webomatica has posted his song-by-song ratings for The Beatles’ Sgt. Pepper’s Lonely Hearts Club Band. Using his numbers, the average for the album is 4.38, while my formula renders a 4.28. I’d say that’s a consistently good album.

::

Here’s a Microsoft Excel file you can download. Plug in your star ratings to find the album score. AlbumScore.zip

11 thoughts on “In search of a definitive album rating formula

  1. I find I can leave it to you to analyze things in more depth and detail than I – one question I have though, is this: when I did the beatles’ ratings, I did it with the intent of comparison over their whole career but explicitly NOT in comparison with any other artist’s music in the collection.

    It can get very complicated as in an early radiohead song (im currently working on their albums) might be very good in the context of that particular album but in the context of their whole career, not so good.

    And then if I were to compare the Beatles to Radiohead, ah, I just throw my hands up at that point.

    Maybe more than five stars would help…

    Reply

    tunequest Reply:

    When I rate songs in iTunes, I basically do so in a vacuum. Each songs is evaluated on its worth, individual of its place in an artist’s repertoire or my library at large. So, in my case, there’s a level playing field to compare artists and albums to each other.

    But I wouldn’t necessarily use an album’s score as the absolute judgement of its value, because an album is more than the sum of its ratings. The ratings, however, do provide a great starting point for discussing an album’s merits or shortcomings.

    Figuring out why Moon Safari ranks higher than Nevermind, there’s where the real insight is.

    Reply

    tunequest Reply:

    Also, I just remembered that iTunes actually saves its star ratings on a scale of 100, with each star representing 20 points. If you ever export your library or a playlist to an xml or txt file, you’ll see the ratings represented as 20, 40, 60 etc. iTunes even has the ability to display half stars. iPods don’t however.

    I’ve never had a need for that fine grain of control, but here’s method of for inputting half-star ratings and an AppleScript for setting a value between 0-100.

    Reply

  2. I also rate tracks only relative to others by the same artist (or composer for classical). While I can’t really judge Radiohead versus say Philip Glass, I find it satisfying and useful to compare all Radiohead songs versus each other. Knives Out and several others from Amnesiac are currently ***** and multiple songs from The Bends are *.

    Reply

    tunequest Reply:

    Yeah, I wouldn’t care to compare Radiohead and Philip Glass either, or just about anything across different genres. The criteria for judgment varies too much to make any comparison meaningful. Though the attempt might be fun, in an off-beat, tongue-in-cheek way:

    Cage Match: Radiohead vs Philip Glass. Settled once and for all!

    Seriously, I do think there is some worth to comparing artists and albums working in the same realm of music. But as I said above, the numbers are just the springboard for deeper exploration.

    Of course, the same system also works for comparing one artist to itself. In my library The Bends scores a 4.41, while Amnesiac scores a 4.10.

    Given those numbers, I’m now in a position to tell myself that as much as I appreciate Amnesiac’s artistic boundary-pushing, that album has some rough spots, whereas I think The Bends’ traditional songcraft is expertly executed.

    Reply

  3. Wow, I found this thread when it was ice cold, but maybe I can still post. An algorithmic formula to qualify albums is something I have been thinking about for a looong time, with varied results, so I am curious to know where this idea might lead some people. I am a stat-hound too in regards to my music. One thing I might add is that when I have albums that are short in numner of tracks, it is often because they are loooong tracks. What I like to do is take a song and adjust it by its playing time. Say you think an average song is 4 minutes, a 20 minute track would count as 5 songs in my total, grading it a piece at a time. If three ‘sections’ are great and two are slow and boring (for example) this could be a set of 5,5,4,2,2 in the totals. Just an idea. What to do with partial albums is something I would love to figure out. Ihave considered weighting them by percentage of total tracks owned, or percent of total play time owned, but I can never satisfy myself. Thanks for letting me squawk! Peace….

    Reply

    tunequest Reply:

    Hi Mike. I’m always up for discussing album ratings, so welcome!

    For partial albums, lately I’ve been thinking that one would eligible for ranking if I have a certain percentage of the album, say 66% or 75%. If I have 8 out of 10 songs from an album, it would get ranked normally, but with a penalty applied for incompleteness since I obviously didn’t like them enough to keep them.

    Like you I’ve often thought about rating songs by playing time. Set a threshold for inclusion (minimum album time of 25 minutes). then rate: (star rating * number of seconds)/60. I’ve not explored that avenue very much because that’s a lot of math to do by hand. I’m working to improve my programming abilities and am keeping this in mind for when I know what I’m doing.

    Reply

  4. Hey, have you noticed in the Smart Playlist creation options is an option called Album Rating? I have been playing with it to see how it works.

    Apparently it averages the ratings per album of only songs that have been rated. Does that make sense?

    Example: I rated only 1 song on Blitzen Trapper’s Furr with 5 stars and hadn’t rated the 13 others yet. When I pick the criteria “Album Rating -is greater than- ****” Furr appeared in the list. Rating just 1 other song *** caused it to disappear.

    You can choose Album Rating from the View menu but it only shows the stars. Does the XML show the actual, calculated figure? I wonder how iTunes weights the calculations….

    Reply

  5. I know this is comment is coming 3 and half years after the original posting, but i just wanted to say thanks for posting it! Definitely agree this favors really solid EP’s, but this is still by far the best rating formula i’ve ever seen.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *