How the Price Guide Works, Part I (Standard Scores)

33 Comments
December 30th, 2008 by Mays
Categories: Price Guide

I’ve stated that I believe the key to making the Price Guide the best possible tool for fantasy player valuation is to be transparent with how it works. By disclosing exactly what is going on, I hope that the community will be able to point out any flaws I have overlooked and provide insight into ways to improve.

With that in mind, I’m going to take a few posts to explain the methodology the Price Guide is using. A good deal of it is drawn from discussions elsewhere, but there are some unique aspects as well. (I also realize this is probably not terribly interesting stuff, so I figure that it is good to discuss it now while I’m basically just talking to myself.)

All of the systems for valuing players are based on finding the worth of the fantasy stats in relation to each other. All else being equal, is a player who hits 30 HR worth more than a player who steals 30 bases? How do those two relate to a player with 30 saves?

Comparing those players is not as simple as saying 30 = 30. Saves are more scarce than homeruns, and fantasy teams typically end up with far fewer saves than homeruns. Because they are more rare, each save is more valuable than each homerun. What we need to find out is exactly how much more valuable each save is.

To do this, we need to be able to put these various categories onto a single scale. This is actually a common problem in statistics, and the way it is solved is with standard scores. Standard scores will work perfectly for fantasy as well.

To compute the standard scores for each player’s stats, we want to subtract the stats of the average player and divide by the standard deviation for the pool of drafted players. That might sound complicated, but it is easy to do if you have a spreadsheet (and even easier with the Price Guide).

As an example, let’s generate values for a shallow, mixed league that uses the default Yahoo settings. In this league, there are 12 teams that each draft 9 hitters, meaning that a total of 108 hitters will be drafted for starting lineups.

Using the Marcel projections, then, the average stats for those 108 players will be:*

BA: 0.2840
R: 76.8
RBI: 75.7
HR: 19.5
SB: 10.5

Some Extra Work for Rate Stats
However, we need to do some extra work to find the standard scores for BA. If a player has 2 AB all year and gets 1 hit, they’ll have .500 BA. A player who has 600 AB and manages 200 hits will have a lower batting average (.333), but this player makes a greater positive contribution to a fantasy team’s BA than the first.

What we need to do is convert BA from a rate stat to a counting stat. To do that, we ask, “How many more hits would this player get than the average player, given the same number of AB?” We know that the average player will bat .284, so the formula for each player is:

xH = H – (AB * .284)

Consider the player above with 1 hit and 2 AB. The average player, given 2 AB will get 0.568 hits (2 * .284). Our sample player got 1 hit, or 0.432 above average (1 – 0.568). The formula explains how this works out:

xH = 1 – (2 * .284) = 0.432

For the other player, the formula yields:

xH = 200 – (600 * .284) = 29.6

That fits with what we expect–a player who bats .333 throughout the season is much more valuable than a player who hits .500 in a vary small sample.

Having computed xH for each player, we will use the average of xH instead of the average BA (.284). Our new averages are:

xH: -7.6E-15
R: 76.8
RBI: 75.7
HR: 19.5
SB: 10.5

The standard deviations for each category:

xH: 8.0
R: 11.6
RBI: 14.4
HR: 6.6
SB: 9.9

Computing Standard Scores
With the averages and standard deviations in hand, it is now possible to compute standard scores. Let’s use David Wright as an example, whom Marcel projects for 96 R, 101 RBI, 26 HR, 19 SB, 169 H, and 549 AB.

xH = 169 – (549 * .284) = 13.1

mH = (13.1 + 7.6E-15) / 8.0 = 1.6
mR = (96 – 76.8) / 11.6 = 1.7
mRBI = (101 – 75.7) / 14.4 = 1.8
mHR = (26 – 19.5) / 6.6 = 1.0
mSB = (19 – 10.5) / 9.9 = 0.9
Total = 1.6 + 1.7 + 1.8 + 1.0 + 0.9 = 7.0

Without any context, it’s not clear if those are good values or not. But if you do the same process for all the players, you’ll find that Wright has a very high value in all five categories. In fact, he ends up being the highest valued player in this league. You can see the full results that the Price Guide gives for this league:

12 team, 5×5 with 9 hitters

Also notice that, if you plug in the stats of the theoretically average player, his value in each category will be 0. So any player with a positive value for a category is above average in that category; any player with a negative value is below average.

We now have a preliminary value for each player, but our work’s not done yet. Before we can generate dollar values, these values need to be adjusted to take into account the replacement level at each position. That will be the subject of Part II.

*You may have noticed that I left out one crucial step here: How did we figure out who the best 108 players were before we generated values? The Price Guide handles this by going through the entire valuation process multiple times until it arrives at the optimal draft pool. This iterative approach is a topic I’ll tackle later on in this series.

Related posts:

33 Responses to “How the Price Guide Works, Part I (Standard Scores)”

  1. Nick says:

    Thank you for being so transparent about what is going on with the price guide. And I think I kind of understand why you were saying adding OPS is a bit more complicated…

  2. Molson says:

    Why .284 for the base average?

    That might be OK in a mixed-league setting, but .284 would have been 2nd place the last few years in my NL-only league, and I imagine the same would be true for a deep AL only league (even with the DH). You could get a more accurate evaluation for deep and/or AL/NL only leagues if you adjusted xH to use a different average, or allowed that to be put into the rater manually.

  3. Mays says:

    I didn’t make this very clear in the article, but the Price Guide does adjust the league average based on the specifics of the settings entered. So for an NL-only league it will use a much lower average than .284.

  4. Molson says:

    Gotcha – that makes sense.

  5. Confused says:

    u wrote, xH: -7.6E-15
    I understood everything in the article ( i actually made a spread sheet on excel sorta the same last year) except that part. What does the E represent and is it negative 7.6 times E minus 15?????

  6. Confused says:

    u also wrote:
    mH = (13.1 + 7.6E-15) / 8.0 = 1.6

    It looks like however you just did 13.1/8.0 so i’m not sure what “7.6E-15″ is.

  7. Mays says:

    That -7.6E-15 is .0000000000000076.

    xH is the calculation of how each player will get compared to the league average: A player with 5 xH will get 5 hits more than an average player would with the same number of at bats. A player with -10 would get 10 fewer hits than an average player.

    Since xH is calculated relative to league average. we would expect that the average in xH would be 0. The Price Guide gets pretty close to 0; off a little bit due to rounding errors.

  8. Molson says:

    @Confused:

    When you calculate z-scores (standard scores) the formula is (x-avg)/SD. You really don’t even need the average part of the formula: (x-avg)/SD = x/SD – avg/SD so the standard score for all players includes the -avg/SD component so you can simply the formula on your spreadsheet by using only x/SD if you’re trying to reproduce something similar on your own. The replacement level scores are going to be different but they’re all going to be off by the same constant, so it works out to the same thing.

  9. Mays says:

    @Molson: You’re correct that the numbers work out the same if you don’t subtract the league average. The only reason I do it is because it makes things clearer when you see that how a person compares to the average (0 in a category).

    Consider Jose Reyes:

    0.94 AVG, 2.24 R, -0.51 RBI, -0.50 HR, 4.73 SB

    So despite being a top 5 pick, he’s still below average in two categories!

  10. Molson says:

    Definitely. No argument on that front.

    I was just pointing out that it doesn’t matter whether the average xH is 0 or close to 0 or 500. The price valuation is the same either way and allows you to skip a step if you’re recreating this process on your own.

  11. Dough says:

    Just curious — are the SD’s based upon just the top 108 players as well?

  12. Mays says:

    No, the SDs are based on the total number of hitters/pitchers drafted in your specific league. It’s 108 if you have 14 hitters and 12 teams.

  13. Ed says:

    Why is Jon Lester so low in $$ and why is Dana Eveland pitcher from Oakland missing?

  14. Mays says:

    For Lester, see this:

    http://www.lastplayerpicked.com/lima-part-ii-danks-and-lester/

    Eveland is on there, but he won’t be ranked highly in a standard league. With the composite projections, he lands at -$13, somewhere just inside the top 1000 pitchers.

  15. Kevin says:

    Mays, first, thanks for the excellent work and letting us know how it’s done – it can help others contribute, as I will offer an idea here. I tried to dabble in something like this on my own in the past, but I really like your method here, especially with regard to positions. One thing I specifically noticed is how you change the calculation for BA by using xH (a counting stat) in its place. Now, I’m not a mathematician or statistician but bear with me here for a moment… I really don’t think you need to change a rate stat (such as BA) to a counting stat (such as xH) to correctly value the rate stat. In order to get a Standard Score for the Price Guide, we know that a rate stat, in this case BA (the formula H/AB), will need to be modified by the total number of times the player contributed to the league average rate (in this case that number is every AB the player had). Incidentally, the total number of times the player contributed is ALWAYS in the denominator of any rate stat (look at ERA, the IP is in the denominator also). What I think is necessary here is to calculate the Standard Score for the BA for a given player the same way you do for the other categories (using the player’s BA, the AVG BA, and the StDev for BA to get a base BA score). THEN, modify this standard score by a ratio of the difference in AB (the value in the denominator). This ratio is (AB-AVGAB) / (AB+AVGAB). So basically, when you have the base standard score for a player in BA, you modify it like this so the formula looks like:

    BAscore = base BAscore + [base BAscore * (AB-AVGAB) / (AB+AVGAB)]

    To show an example, the numbers I was using from the Price Guide was using 2008 stats, 1 player each at C, 1B, 2B, 3B, SS; 3 OF, and 1 UTIL in a 10 team league. The link for this is:

    http://www.lastplayerpicked.com/priceguide/index.php?t=10&l=MLB&m=260&b=1&ds=08S&dis=250&AVG=Y&R=Y&RBI=Y&HR=Y&SB=Y&W=Y&S=Y&ERA=Y&WHIP=Y&K=Y&C=1&1B=1&2B=1&3B=1&SS=1&OF=3&LF=0&CF=0&RF=0&CI=0&MI=0&Util=1&mg=20&SP=0&RP=0&P=9&ms=5&mr=5

    Now, what is being measured by the standard score is how many standard deviations above the league average a player is in a given category. The example I will provide is the BA score vs. the xH score for Dustin Pedroia in this league. Pedroia hit .326 in 2008 (213 hits in 653 AB). The xH score for Dustin using the Price Guide method is 2.02. Using the method I used above, this score comes out to 1.75 – a difference of 0.27. Here’s how I got that number (I used 7 decimal places for accuracy):

    League BA = .2906842
    StDev BA = .0219039
    AVG AB = 556

    Pedroia’s base BA score is (BA – AVG BA)/StDev or
    (.3261868 – .2906842)/.0219039 = 1.621

    Now this needs to be adjusted for his AB as described above:

    BAscore = 1.621 + [1.621 * (653 – 556)/(653 + 556)]
    BAscore = 1.751

    Now, to compare the two systems and see if one aligns more closely to what is being done in the other categories, I added the League BA to the StDev to get a BA of .3125881. This BA is exactly one StDev above the League BA. So it follows that anybody who had a batting average of .3125881 in 556 at bats (the league average) should have a value of 1.00 in their BA standard score (this is exactly how it works in the other counting categories). The amount of hits required to get a BA of .3125881 in 556 AB is 173.799 hits. I realize that this number is not possible in real baseball, but I needed to get an average exactly (or to 7 decimal places) equal to 1 SD above the league average. Now using these numbers and performing xH and BAscore calculations again:

    xH = 173.799 – (556 * .2906842) = 12.1785860
    xH score = (12.1785860 – 0)/11.4809838 (the 0 is the AVG xH)
    xH score = 1.06

    baseBA = (.3125881 – .2906842) / .0219039 = 1.00
    BAscore = 1.00 + [1.00 * (556 – 556)/(556 + 556)]
    BAscore = 1.00

    The BAscore showed the result we are looking for (1.00). There is also a difference of .06 between the two as I think that the xH comes in about .06 high when the batting average is exactly 1 SD above the league BA. When the same thing is done for 2 SD above league average – the BAscore hits it right on at 2.00 while the xH method calculates to 2.12 (just use 185.97755 as the hits and perform the above calculations again for 2 SD above league average). So it appears to me that the xH method is off .06 for every 1 SD above or below the league average (and I think it gets worse the further away from league average AB a player gets). Now, for most players this is negligible and I highly doubt it would affect the rankings much at all – Pedroia was an extreme example (.326 in 653 AB) but it outlines how I think that the further away from the league averages in AB or BA a player is, the more skewed the xH number. I bring this up not as a criticism, but more sincerely because if we are looking for the best system, this is a change that might more accurately project standard scores for rate categories.

    Hope I didn’t lose you all and let me know what you think.

  16. Molson says:

    Interesting point. Still thinking it through, but here’s my initial thoughts:

    >the further away from the league averages in AB or BA a player is, the more skewed the xH number.

    That’s part of the point. Someone who hits .290 in 600 AB is more valuable to your team than someone who hits .350 in 100 AB. If someone hits .350 in 600 AB, they’ll carry your team in BA.

    I believe the breakdown in your method comes when calculating the z-score for batting average. When you calculate the standard deviation of batting average, these averages all have a different number of ABs, so they should all weight differently in your SD calculation. Someone who goes 1/1 has a BA of 1.000 and this is going to skew your BA SD. His adjusted standard score will be very small, but this still goes into the SD with equal weight as someone with the league average SD with 600 AB.

    At least that’s my initial thought.

  17. Chris says:

    I’m not sure if anyone is still following this thread, but I’m not sure I understand how you determine which players to use to compute the standard deviations. In the example above, the averages were calculated based on the top 108 players using Marcel projections. How do you come up with those top 108 players? Is it the top in each category? It doesn’t appear to be done that way because the top 108 HR hitters would probably average somewhere around 25 instead of 19. If I have a spreadsheet of projections, how do I determine which players to use for my sample?

  18. mittal says:

    Simple question, Does normalization work when your working with stats that are not normally distributed (SB specifically).

    Your dollar valuation formula is

    $ = (player value / 529) * $2928 + $1

    STDEV for stolen bases is 10 (from what I read this means every 10 stolen bases above the average provides 1 unit of value, or $5.53 of value). $2928 marginal dollars divided by 10 categories and 12 teams means each team has $24 of marginal value to spend on stolen bases. Since stolen bases are positively skewed, this is overvaluing heavy stolen base players such as reyes, who are getting around $25 of value in stolen bases alone even though his 60 stolen bases wont be close to enough to win a league.

  19. Molson says:

    Yeah, it overvalues steals, but it’s not really that bad. Reyes’s 60 steals certainly is likely to win you the category, if the rest of your players are average.

    Since the $5.53 per unit seems to come from a standard Yahoo! league (in a deep league it’s more like $3.30/unit), lets go with that.

    You have to remember that the z-score is with regards to average, so the 60 steals certainly is enough to win the league _if all the other 8 guys on your team are average_. If that’s true, then you’ll get 8 guys with 11 steals and 1 guy with 61, giving you 149, which is, on average, going to win you steals in a Yahoo! public league. When you spend $25 on Reyes’s steals, remember you’re also spending $-5 on steals for someone projected to get zero. So if you draft Reyes and Ryan Howard, you’re only spending net $20 on steals, and if you fill your league with 7 11 steal guys, Reyes and Howard you’re probably going to finish 2nd in steals. If you have Reyes and 8 no steal guys, you’re spending net -$15 on steals and won’t do very well.

    Saves is an even more heavily skewed category, so you see the discrepancy there even better, and it’s less than you’d think again. Standard deviation is again about 10 with an average of 6, so drafting 4 30 closer guys (which would get you 120 saves, about enough to win the average Yahoo! league) is spending $48 on the saves, twice as much as you should. But each pitcher with zero saves is $-3 on saves, so 5 starters is -$15, so you’re really only spending $33, which is a little more reasonable (but still high).

    The only way to really fix this is to have a function that changes the relative worth of the different stats as the projected amounts of each stat change.

    If you can come up with a way to accurately measure the skewness distribution of the stats and then apply a formula which takes that into account, please share, but for the time being, treating the stats as if they’re normally distributed works pretty well, considering the limitations of the projections we’re working with.

  20. mittal says:

    I completely agree that on an aggregate level the dollar values will balance out, but for the stats that are not normally distributed, using normalization creates some weird dollar values on an individual basis from what I saw.

    I am not really too sure how to reconcile this to work better. The thing I can think of is to simply take the total number of stolen bases in a fantasy season, divide it by the total number of $ that can be spent on SB, and attaching that specific dollar value to every stolen base above 0. This does create the issue that no player will have negative value in stolen bases, but that is not THAT big of a problem to me because whenever I think of fantasy drafting, I don’t really consider a low stolen base guy to provide negative value, I just consider them to provide no positive value due to the way SB’s are skewed (Im probably going to have 3 or so guys providing the bulk of my stolen bases anyways).

    Thoughts?

  21. mittal says:

    Sorry for double posting. What I meant by weird dollar values on an individual basis is that with the valuation system, you have 33% of players being valued at -$4 or lower in SB, 50% of players being valued at -$2.75 or lower, and 75% of players having negative values in stolen bases.

  22. Molson says:

    The thing is, a low stolen base guy does provide negative value in the same way that a low HR guy provides negative value. But you’re right – 75% of players having negative value in steals isn’t quite right.

    Really though, since the normalization formula is (stat-avg)/SD, this is really stat/SD-avg/SD and the avg/SD portion is constant for all players and can be ignored. So you can use stat/SD and the only negative values you get are when you adjust for replacement level.

    So when you look at it this way, what we’re concerned with isn’t the z-score of a player, it’s the relative value of the different stats: how many HR is a SB worth?

    And just totaling expected steals and expected HRs doesn’t work, since that doesn’t give you a picture of the way the stats are distributed. Say we’ve got a 10 team league where there are 10 hitters per team. If there’s 2000 total steals and 2000 total HRs they’re valued equally, but if you have 100 players with 20 homeruns, you shouldn’t pay anything for the homeruns, so you’re better off by taking your HR money and spending it on steals. If the steals are distributed with 20 guys with 100 steals and 80 guys with 0, then spending $.12 per steal means you spend $12 on a guy with 100 steals. But if you know that steals have a higher relative value than HRs, then you want to spend more per steal than per HR.

    In the price guide system, Reyes gets a value of about $40 or so. In most expert leagues this year, Reyes goes for about $40 or so. Seems like it’s a pretty good system to me. The uncertainty in the projections far outpaces the errors in the projection system.

  23. Josh says:

    Does the Price Guide score the ERA and WHIP categories as runs prevented above average and H+BB prevented above average (similar to hits above average for the BA category)?

  24. Mays says:

    Josh: Yes, ERA and WHIP work the same as BA. I always think of the ERA value as “runs below average,” but “runs prevented above average” amounts to the same thing.

  25. Josh says:

    Hi Mays,

    I appreciate the response.

    Would you be willing to share more about how the iterations are done? As the script is PHP I’m assuming the code is related to that, although I have no idea how to make such an algorithm in PHP.

    I’ll be up front in stating that I’m trying to do something of a replication. I’m working with a friend on his fantasy team, and next draft is the first time I’ll be there with him (I just started helping him during this season because he has been busy with work). I’ve been putting together a PHP/MySQL site that tracks players and projections based upon the methodology that he has been using. However, his is something closer to a SGP model, as points awarded to the players are based upon how much they would accomplish relative to a team which finished 3rd in every category. I prefer the replacement level method myself. However, my only method would be to go through and pick the top players by hand (225 — 25 active x 9 teams).

    Now, I don’t really have a problem with doing that. Although I wouldn’t be 100% efficient in my selections, neither would the league be (so what I chose as the top 225 players would probably have an equal or perhaps slightly better composite projection than the actual 225 starting players). However, I would prefer to have it automated via script and hopefully at least a bit more accurate than my own eye.

    The site is to be set up on an Ubuntu MySQL server on the laptop, so that we will still be able to access it, add players to teams, constantly view the updated player lists and team financial reports, etc. Otherwise, I would simply run the report from the Price Guide if we only needed a list of player values.

    One other question I had – How well does a standard deviation in a category correlate to gains in that category? For instance, in a league where the the standard deviations for R and RBI are 11.6 and 14.4 respectively, does this difference in the players themselves generally also lead to 11.6 increased R being worth an equal amount as 14.4 increased RBI (not counting any outside circumstances such as a team punting R, for instance)? Would we say that they are roughly equal, or that a deviation in one category should be weighted more heavily than that in another?

    Once again, thank you. I appreciate any information you are willing to and can share.

  26. Mays says:

    I’ll start with the iterations. Basically, my algorithm works like this.

    1. Take the first 225 players on the list and build standard deviations and positional adjustments.

    2. Use those SDs and adjustments to value all of the players in the list.

    3. Sort the list from high to low values.

    4a. If the players are ordered the same as they were in Step 1, stop. We’ve found the optimal values.

    4b. If the players are ordered the same as they have been on any previous iteration, stop. This means there is more than one set of players that could be “optimal.” All of the “optimal” sets should be virtually identical, so we just display this one.

    4c. Otherwise, save this list so we can compare it with future iterations, and go back to Step 1.

    I’m assuming you’ve already seen this:

    http://www.lastplayerpicked.com/how-the-price-guide-works-part-iv-iterations/

    Feel free to email me if you still have questions.

  27. Mays says:

    Separation in the standings and player-level SD should correlate very well.

    When SGP computes how far teams are separated in the standings, it is basically figuring a team-level standard deviation. If everyone has 14 hitters, then the team-level SD in a stat should approximate the player-level SD.

    The difference, as you point out, is that team-level SDs can be distorted by certain strategies (streaming, punting, etc.) Typically it won’t make a big difference, though.

  28. Josh says:

    Hi Mays,

    I appreciate the answers. Yes, I had read part IV of the how-it-works, however, I wasn’t quite able to catch onto it. Now, I think I do understand somewhat better.

    You take the first 225 (or starting amount) players, and assume for the moment that they are the top players. Then you take the SD and positional adjustments from that, and carry those over to value the entire player population. After that, you sort by value the top 225 players, and then get the new SD / positional adjustments from those 225… continue revaluing and readjusting the SD and positional adjustments, until the top 225 matches any previous iteration. Is this correct?

    I may have other questions about this in the future, and I appreciate the offer! I will need to try out this system while reworking the site during the offseason. Thank you for all the explanations you’ve offered about the process!

  29. Chris Callahan says:

    This is exactly what I was looking for. We use the Sporting News for our prices, but, they are for a 5X5 league that does not include Losses or OBP. I am in a 6X6 retention league. Your site does a wonderful job in telling the bargains, pitchers that don’t get many L’s and hitters with high OBP(On base plus slugging). Please don’t tell anyone else in my league about this:)

  30. Rokka says:

    Can someone explain to me how he got xERA and xWHIP…For some reason I cannot get that correct because lower ERA and WHIP numbers are better. Any help would be appreciated.

  31. Molson says:

    If it helps, you can think of it as “xER” and “xWH” rather than “xERA” and “xWHIP.”

    xERA is the expected number of earned runs given up below what the number of earned runs given up would be for the average pitcher under the same number of IP.

    If the average ERA is 3.81, and Lincecum is projected to have a 2.85 ERA in 218 IP, his xERA is (avgERA-playerERA)*IP/9=(3.81-2.85)*218/9=23.25.

  32. Molson says:

    Mays – I wonder if we couldn’t get a more accurate value on pitchers by using a separate average ERA for starters and relievers.

    Thoughts?

  33. Rokka says:

    Much appreciated, this info is awesome.

Leave a Reply