Click me for a random insult Atheist Anok Andoru

Kobra's Corner - Rants, Editorials, and Other Bullshit

Home | Archives | Search | Hate Mail | Mailbag | Stats | FAQ | Contact | Links | Misc.
Are you experiencing lag? Click here to make it go away. [hide]

Weighted Bayesian Rating System

Overview

While the Bayesian formula is preferable to a simple average when calculating the value of user-rated content on a web-site, combining the formula with an algorithm that weighs a user's votes in a ratio (number of user's votes to average number of votes per user) yields a more accurate value of the importance or quality of the content (as decided by the users).

The Bayesian Rating

The Bayesian rating is a formula used by statisticians and web developers to obtain a more accurate rating from votes provided by the users. The formula is:
W = (v / (v+m) ) * R + (m / (v + m)) * C

Where:
W = Weighted Rating
v = Number of votes
m = Minimum number of votes (typically used in Top 100 lists)
R = The average score
C = The average vote across the entire dataset.

The Bayesian rating is superior to a simple average because the Bayesian rating scales the scores based on variable C. Here's an example of a dataset with a total of 15 votes:
Dataset
TitleScore (R)Number of Votes (v)Final Baysian Score*
I am better than your kids.9.548.89
Hundreds of Proofs of God's existence.7.54117.65
* Assume m = 3
(In the example, I gave the first article two ratings of 9 and two ratings of 10. The second received eight 10s and three 1s.)

The difference between the two scores is different because the number of votes is different. The more votes a piece of content receives, the less the second part of the equation factors in. C (which was calculated to be 8.07) drags high scores down, and low scores up. (With m > 0 and v = 0, W always equals C. With m = 0 and v > 0, W always equals R. If both m and v = 0, the formula divides by zero.)

The Weighted Bayesian

The weakness of the Bayesian rating lies in the variable R. R is a simple unweighted average of all the users' votes. In theory, someone attempting to push a piece of content into the #1 spot on the Top 100 list of a website that uses a Bayesian rating system needs only fabricate a few user accounts and rate it 10 out of 10. My algorithm, called the Weighted Bayesian is a combination of the Bayesian rating system and an algorithm that weighs each user's individual votes with a ratio of the number of their votes divided by the number of votes of the average user.

For example, user PZ has 10 votes, while the average number of votes is 5. Therefore, his vote is weighed as 2 votes. Another user Behe only has 1 vote, so his vote is weighted as 0.2 votes.

An example of this system in PHP is as follows:

<?php
/*
This part below calculates the vote ratio of each user, and stores it in a field in the members SQL table (vote_ratio).
*/

$sql = mysql_query("SELECT count(id) FROM votes");
$total_num_votes = mysql_result($sql, 0, 0);
if($total_num_votes < 1)
{
$total_num_votes = 1; // Prevent division by zero.
}
$sql = mysql_query("SELECT count(id) FROM members");
$avg_num_votes = $total_num_votes / mysql_result($sql, 0, 0);
$query = mysql_query("SELECT id FROM members WHERE 1");
$inc = 0;
while($r = mysql_fetch_array($query))
{
$sql = mysql_query("SELECT count(id) FROM votes WHERE member = '".$r["id"]."'");
$member_vote_ratio = mysql_result($sql, 0, 0) / $avg_num_votes;
mysql_free_result($sql);
mysql_query("UPDATE members SET vote_ratio = '".$member_vote_ratio." WHERE id = '".$r["id"]."'"); $inc++;
}

/*
This part calculates the raw score (used in place of the average in variable R).
*/

$query = mysql_query("SELECT id FROM content WHERE id = '$id'");
while($r=mysql_fetch_array($query))
{
$acc_score = 0; // Score accumulation.
$acc_ratio = 0.0; // Vote ratio accumulation.
$sql = mysql_query("SELECT v.score as score, m.vote_ratio as ratio FROM votes v, members m WHERE m.id = v.member AND v.content = '".$r["id"]."' GROUP BY v.id ASC");
while($s = mysql_fetch_array($sql))
{
$acc_score += ($s["score"] * $s["ratio"]);
$acc_ratio += $s["ratio"];
}
$rawscore = $acc_score / $acc_ratio; // Raw Score
}

/*
With the weighted score, the Bayesian formula is easy.
*/

$score = (round($rawscore, 2) * ($votes/($votes + TOPXMIN))) + (AVERAGE_VOTE * (TOPXMIN/($votes + TOPXMIN)));
// Note: TOPXMIN is the variable m. AVERAGE_VOTE is the variable C, and these are both easy enough to calculate so I'm not going to include them in this example.
?>

Note to programmers: I threw that script together as an example. My goal was functionality, not optimization.

If you're not a computer programmer, perhaps another dataset is in order. But this time, we're going to need three tables. (You'll see why.)
Dataset: "members"
IDUsernameVote Ratio
1voodooKobra1.3333
2Omen0.8888
3B.Nott1.3333
4PZ Myers0.8888
5BillDonohue0.4444
6Owlmirror1.3333
7Rev.BigDumbChimp0.8888
8iLostTheGame0.8888

Dataset: "votes"
IDContentMemberScore
11110
21210
31310
41410
5151
61610
71710
81810
92110
10229
112310
12268
13278
142810
153110
163310
173410
183610

Dataset: "content"
IDTitlePre-Bayesian ScoreFinal Score
1The Great Desecration9.509.42
2People Who are Holding Back the Evolution of our Species.9.209.21
3Life-Saving Testing Banned - Mad Cow Screening "Inconsistent" with U.S.D.A. Agenda10.009.67
As in the previous example, variable m is equal to 3. Variable C worked out to about 9.22. It's worth mentioning that the second article's score went up by 0.01, while the other two decreased. (The reason for this was explained above.) With more votes in the database, the user BillDonohue's 1/10 rating would be insignificant.

Further Considerations and Tips

  • In order to prevent the vote ratio from plummeting due to excessive user accounts, construct your SQL queries to only count votes from users who have logged in or voted within the past 7 days.
  • I recommend you don't use my example script in a functional website. The script is not optimized. Code it yourself or ask for help from an experienced programmer.
  • The examples assume there is a user registration system in place. It is possible to work around this assumption.

Why It Matters

Some people will always try to "game the system." If a webmaster wishes for their website to deliver content that the community truly recommends, it is important to design a system that makes vote fraud more difficult without frustrating the end user. I consider this algorithm a step in the right direction.
22576
9 people online.
Got some feedback, comments, suggestions, or want to call me an asshole? Send it to kobrasrealm@gmail.com.

Bored out of your mind? Read a random page.

Websites Endorsed by Kobra
How to Not Suck! Starless Umbra Nuklear Power XKCD (Nerd Humor) Rant Lister Rooster Teeth RvB BobSmash Kobra's Realm
How to Not Suck DragonHeartMan Nuklear Power XKCD Rant Lister Rooster Teeth BobSmash MSPA
No amount of money can buy you a spot here. Don't even ask.
Copyright © 2005-2010 Kobra's Corner. Published under the Attribution-Noncommercial 3.0 Unported License.

The contents of this website are the opinions of the author. If you disagree with my opinions, quit reading my fucking website!