Evaluating the Player Evaluators, Part III (Razzball and RotoTimes)

7 Comments
January 14th, 2009 by Mays
Categories: Price Guide

Last time, I took a look at how the Price Guide stacked up against a couple of other valuation systems, one from ESPN and one from Baseball Monster. Now I want to see how it compares to two others: RotoTimes and Razzball.

The format is the same as before: A 2008 Retro-Draft that gives all of the systems the benefit of perfect hindsight. The starting positions and categories are standard for 5×5 rotisserie. 12 teams will be drafted, with 4 teams representing each of the 3 fantasy raters. I put the teams in this order to try to remove any bias from selecting early or late:

RotoTimes A
Razzball A
Last Player Picked A
RotoTimes B
Razzball B
Last Player Picked B
RotoTimes C
Razzball C
Last Player Picked C
RotoTimes D
Razzball D
Last Player Picked D

Each team can only draft players at positions they qualify at (20 game requirement). I did all of the drafting by hand, so there could have been mistakes (although I doubt it would affect the outcome if there were).

So how did the draft turn out? Well, the first round was pretty indicative of each team’s “strategy”:

RT A: Roy Halladay
RB A: CC Sabathia
LPP A: Albert Pujols
RT B: Tim Lincecum
RB B: Cliff Lee
LPP B: Hanley Ramirez
RT C: Francisco Rodriguez
RB C: Johan Santana
LPP C: David Wright
RT D: Jose Reyes
RB D: Mariano Rivera
LPP D: Dustin Pedroia

What do we see? All four of the Razzball teams grabbed pitchers in the first round. The RotoTimes teams drafted three pitchers. LPP’s choices, however, look more like the conventional first round picks (with a bit of an emphasis on infielders).

Those basic trends continued throughout. The Razzball teams were the first to fill up their pitching. RotoTimes also favored pitchers but focused more on closers. They were also most likely to grab guys for SB, especially outfielders. LPP got the best catchers (starting with LPP A grabbing Mauer near the end of the 2nd round) and middle infielders, and filled in their pitching staffs with the leftovers.

Here are the final standings, which ended up much more clearly-cut than I expected:

LPP C 78
LPP B 75
LPP A 73
LPP D 72
RB B 69.5
RB D 66.5
RB C 63
RB A 62
RT B 58
RT A 58
RT D 54
RT C 51

My thoughts, once again:

1. Like last time, the Price Guide takes Gold, Silver, and Bronze (and 4th, whatever that would be). Since the methodology seems to me like the most logical way to value players, it was reassuring to see that the theory holds true in practice.

2. I actually expected RotoTimes to do better, since their site allows you to customize the number of hitters and pitchers. (Both RotoTimes and LPP’s picks were based on their rankings for 14 hitters and 9 pitchers.) Since Razzball’s Point Shares aren’t customized for the number of hitters or pitchers, they were starting with a possible disadvantage.

3. Like last time, I didn’t specify a minimum for IP or AB. With no minimum and with the other teams drafting SP like crazy, the Price Guide decided that 60 good innings from Geoff Geary or Jim Johnson were better than 180 innings from an average starter. The LPP teams ended up last in W, S, and K; and first in ERA and WHIP (in addition to being very good offensively). They also ended up with about 650 IP each.

I realize this isn’t realistic with a lot of leagues that have a minimum IP, so out of curiosity I went back and substituted some of those middle relievers on LPP teams with SP that went undrafted (Hiroki Kuroda, Tim Wakefield, Paul Maholm, A.J. Burnett, etc.) to get each of them above 800 IP. That hurt the LPP teams in ERA and WHIP, and it wasn’t enough to catch any of the teams in the other pitching categories. The results on the final standings were minor:

LPP C 73.5
LPP B 73
LPP D 70.5
RB B 70.5
LPP A 68
RB C 68
RB A 68
RB D 67.5
RT B 58
RT A 58
RT D 54
RT C 51

One Razzball team moves into a tie for third, and an LPP team slides into a three-way tie for fifth. Everything tightened between LPP and Razzball, but RotoTimes was completely unaffected.

5. I’m pretty confident that the Price Guide is the best, but can we tell how the others stack up? It’s hard to say for sure without directly comparing Razzball with ESPN or RotoTimes with Baseball Monster. I would guess that RotoTimes is actually the least accurate, judging by their uniformly poor performance this time. Competing against them may have benefited Razzball some, but I don’t think it’s clear that it did.

Someone else is welcome to do their own evaluation to see how they rank.

In the end, though, Last Player Picked’s values look to be more accurate for indicating league standings than any of the others. Since they are customizable for any fantasy league, I’d wager that they perform just as strongly for any other league configuration.

Related posts:

7 Responses to “Evaluating the Player Evaluators, Part III (Razzball and RotoTimes)”

  1. Rudy Gamble says:

    Hey Mays -
    Interesting test. Simulated draft tests are difficult to do w/o introducing bias. When comparing Razzball against ESPN and RotoTimes, I bypassed this in favor of a test to see how well each rater’s ‘points’ per category correlated to reality (http://razzball.com/the-player-rater-rater/). Razzball did admirably given this type of test ignored player position (which we are the only one of those three to account).

    For a 12-team league, our Point Shares assume 1 catcher per team (vs. the 2 in your test) and a 108 pitcher universe of about 67/41 or 66/42 in starters/relievers (5.5 SP/3.5 RP per team). This netted an average of 1,265 innings per team.

    I would argue that creating 4 800 IP teams is an artificial construct. You’re never going to see a league where 4 teams punt pitcher counting stats. I’d be interested to see how the test would go if you credited each team a realistic 5 starters and aim for a 1200 IP avg per team.

    I’ve done point shares for 2 Catcher leagues vs 1. It does bump up Catcher values. If you want, I can send you Point Shares for a 12-team, 2-catcher league.

    Keep up the good work -
    Rudy

  2. Nick says:

    More regarding using different baselines, as my 4 team NL only league may have been a bit unrealistic…

    Using a standard roto league and 2008 stats (same as in this simulation) these are the positional averages that I came up with.

    C: -3.71
    SS: -0.46
    2B: 0.13
    3B: 0.77
    OF: 0.73
    1B: 1.73

    I did not “double count” players who were eligible for multiple positions when finding the mean. I arbitrarily assigned them to “most difficult to play” position according to the C, SS, 2B, 3B, OF, 1B spectrum. I realize that doing this assumes some things we’d rather not assume, but I had to do something. Also, I didn’t compute averages for CI, MI, and DH cause I was lazy. When computing the averages (I guess I should say “mean”) I used the cutoff given by the price guide for replacement level as the last player included in the calculation.

    Looking at the differences between the replacement baseline and the avg baseline, we get

    C: -2.9
    SS: -3.22
    2B: -3.73
    3B: -3.63
    OF: -3.61
    1B: -4.19

    I would conclude from this that since there is not a constant difference between the baselines, it does in fact matter which one we use. If we adjust the raw scores to be scores above average rather than replacement, we’ll see that the value of top 1B is decreased, while the top SS and C become more valuable.

    Again, I think that deciding which position was “more scarce” to begin with may have significantly influenced these results (and for that matter I’m sure I made a math error somewhere), but it certainly does leave me with a lot of questions.

  3. Nick says:

    Yes, still going with this for some reason…

    If, when computing the average above replacement for each position, we include multi-position players in all eligible positions, the positional averages become

    C: -3.71
    SS: -0.46
    2B: -0.28
    3B: 0.63
    OF: 0.82
    1B: 1.57

    Pretty similar results, but they’re a bit more tightly bunched and the averages more closely follow the defensive spectrum.

    The differences between replacement and average are then

    C: -2.9
    SS: -3.22
    2B: -3.32
    3B: -3.49
    OF: -3.7
    1B: -4.03

    I’m not sure which method is “correct.” Or if it really matters. But I thought it was better to have some numbers when discussing.

  4. Nick says:

    And for reference, the replacement levels for this league are

    C: -6.61
    SS: -3.68
    2B: -3.60
    3B: -2.86
    OF: -2.88
    1B: -2.46
    MI: -3.68
    CI: -2.39
    Util: -2.88
    P: -2.34

  5. Mays says:

    @Nick: Wow! You’ve really been thinking this through! When I said earlier that the baseline doesn’t matter, I wasn’t thinking about comparing positions, just comparing two players at the same position.

    However, you are correct that the baseline will affect how one position relates to another. The basic idea is that the replacement-level baseline gives the biggest boost to the top players at a scarce position. An average-player baseline bumps the top-tier less, but also drops the worst players. (The players closest to average are affected the least.)

  6. Mays says:

    @Rudy: I appreciate you taking a look at this. You are right about bias, not to mention that all of this involves a pretty small sample.

    I liked the tests you did as well, although I think that ignoring positions makes it harder to draw conclusions. For my experiments, I really wanted a scenario that was as much as possible a “real” draft. This seemed like the most fair way to do it.

    I’ll run one more simulation with Point Shares to see if I can address some of your concerns.

  7. Rudy Gamble says:

    Thanks Mays. I think if you enforce minimums of 5 SP per team that you’ll find things get much closer. Pitching is different than hitting in that you’ve got 2 ratio stats to 1 and there’s less correlation b/w the ratio/counting stats for pitchers than hitters b/c of middle/late relievers – i.e., if the Razzball Point Shares ends up ‘punting’ catchers, it’s not like there’s a catcher in the 20th round who is going to be positive in two categories like Geoff Geary (who was probably owned by 0.1% of all teams last year).

    I’m less worried about 2 catchers b/c I think your system values them higher anyway so the Razzball teams wouldn’t pick them. Might have some impact against the other teams.

Leave a Reply