collapse collapse

* User Info

 
 
Welcome, Guest. Please login or register.

* Who's Online

  • Dot Guests: 12
  • Dot Hidden: 0
  • Dot Users: 0

There aren't any users online.

* Search


Author Topic: Introducing hWAR  (Read 2114 times)

0 Members and 1 Guest are viewing this topic.

Offline Huckleberry

  • Administrator
  • Hero Member
  • *****
  • Posts: 3300
    • View Profile
Introducing hWAR
« on: April 09, 2018, 07:32:28 PM »
Disclaimer - it's still a work in progress, if you have suggestions or questions this thread is the place to share them.

As discussed in the Slack channel, OOTP does not produce advanced analytical stats very well. I'll go over a brief list of some of its perceived (by me) weaknesses in this area that led me to want to calculate WAR for the WBA.

  • Inconsistent WAR totals - The total WAR in the league each year is not a consistent number. For each WBA league (ABL and IBL individually), the standard 1,000 MLB WAR should be scaled to just about 321. The annual totals for WAR in the ABL and IBL have bounced around from about 340 to 365 during our existence.
  • Defensive runs saved - the defensive runs saved, reported by OOTP as ZR or zone runs, in each league do not add up to zero during the year for each position as they should. Pitchers in particular are way off. In 2112 the ABL's pitchers totaled -64.2 ZR.
  • Inaccurate and insufficient park adjustments - OOTP uses internal park factors based off the input park factors on the edit ballpark screen (these values are visible on your team settings screen in-game). However, we know that weather does have an effect on outcomes in OOTP based on developer statements.
  • Position player/pitcher WAR balance - Throughout the history of the WBA I've felt that position players were overvalued in OOTP WAR compared to pitchers. In actuality I found that the balance has been between 57-60% for batting/fielding WAR compared to pitching WAR in every season with a couple of random 56% and 61% seasons. This is actually close to the standard Baseball-Reference and Fangraphs calculations which surprised me. As we'll find out later, there was something else in the OOTP calculations that I haven't specifically identified that was causing the top position players to have too large of values compared to others.
  • Handedness park factors - Handedness park factors are not utilized in OOTP value stats and are underutilized in real life as well.
  • General baseball WAR theory - As discussed in Slack, I have automated a standard adjustment algorithm and applied it to many sports, primarily college football, over the years. I had also run the numbers for MLB for about 15 historical seasons at one point so I knew it could be done to baseball stats. Once the Stats+ author pointed out the players_at_bats_batting_stats table to me, I knew I could do this for OOTP.

So with all that in mind and some encouragement from the Stats+ author I went about automating the adjustments for the WBA. What do I mean by adjustments? Here's the basic process:

  • A list of every right-handed plate appearance in the league for the season is made. A connection test is run to make sure that all players are connected, if any are unconnected then their baseline statistics are added to the adjusted stats table. These players will have very few plate appearances or batters faced in any given season or else they would be connected. The same thing is then done for left-handed plate appearances.
  • After the connection test is completed, every single plate appearance is analyzed based on batter, pitcher, batting side, and park. Each player's overall rates for each event are logged as their actual results and the same is done for each park. Then the "opponent strength" for each event is tabulated by looking at each plate appearance (with the exception of some discarded events such as catcher's interference) and summing the strength of the opponent as well as the park environment.
  • After one iteration each player's rate in each statistic is modified by their opponent strength. The iterations are continued until the values stabilize to a predetermined degree. If you have questions about this aspect, this link is a good introduction to the concept.
  • Once the iterations are complete we have the following values - adjusted batting stats, adjusted pitching stats, and single year park factors. My next step in the future will be to calculate weighted multi-year park factors for each statistic based on up to three seasons. For this first season, of course, we only have the single year numbers.
  • After determining multi-year park factors we then go back through the statistical adjustment iterations until the differences between each iteration stabilize. This essentially means that there are multiple solutions at this point because we have artificially introduced outside information. At this point I average the value in each stat from the last iteration and the second-to-last iteration.
  • I now have my final statistics and park factors and we then proceed to create advanced stats that we have all heard about. However, because of the adjustment basis for these stats we do not have to re-enter the park factors as they are already built in.
  • As a separate step I calculate overall runs park factors for each team solely for the calculation of ERA-.

Last list for now. Features of and comments on the results:

  • Pitcher batting stats are included during all calculations. Pitchers hit and field, their entire body of work is included in total WAR numbers.
  • Positional adjustments are calculated for each season based on how many offensive runs each position actually produced that season.
  • Batting runs include stolen bases and caught stealing, so they're really more offensive runs. Fielding runs are corrected to where they sum to zero at each position.
  • There are some issues with the OOTP data. E.g., retired players are immediately removed from the players_at_bat_batting_stats table so that sucks.

I will release the stats pages as I write them on this thread for a while. Without further ado, here are the pages I have so far:

Single season h.bfWAR (batting & fielding WAR)
Single season h.pWAR (pitching WAR)
Single season hWAR (total WAR)
« Last Edit: April 10, 2018, 10:42:39 AM by Huckleberry »

Offline Huckleberry

  • Administrator
  • Hero Member
  • *****
  • Posts: 3300
    • View Profile
Re: Introducing hWAR
« Reply #1 on: April 09, 2018, 08:40:32 PM »
hPark Factors

These are a byproduct of the analysis, but I want to post all the info I have. You'll see that K and BB factors stay near 100 which they should, then you'll see wild HBP numbers due to the infrequency of those events.

Offline Huckleberry

  • Administrator
  • Hero Member
  • *****
  • Posts: 3300
    • View Profile
Re: Introducing hWAR
« Reply #2 on: April 09, 2018, 08:41:23 PM »
Also, my plan is to use standard runs and home run park factors to calculate advanced stats for 2100-2111, add them to the table, then leave them there. This method will be used from 2112 onward.

Offline Huckleberry

  • Administrator
  • Hero Member
  • *****
  • Posts: 3300
    • View Profile
Re: Introducing hWAR
« Reply #3 on: April 10, 2018, 07:53:17 AM »

Offline Huckleberry

  • Administrator
  • Hero Member
  • *****
  • Posts: 3300
    • View Profile
Re: Introducing hWAR
« Reply #4 on: April 10, 2018, 07:55:05 AM »
Next up are some fielding pages but I just realized that I have to go back to recreate the advanced fielding tables. I summed everything up before making the table in my code which means I can't break out individual positions. So I'll have to change the table creation code so I can do that (and then sum up all positions for PHP page display if I want to show total defensive runs saved).

Offline Huckleberry

  • Administrator
  • Hero Member
  • *****
  • Posts: 3300
    • View Profile
Re: Introducing hWAR
« Reply #5 on: April 10, 2018, 10:28:08 AM »
DRS - P
DRS - C
DRS - 1B
DRS - 2B
DRS - 3B
DRS - SS
DRS - LF
DRS - CF
DRS - RF
DRS - Total

Keep in mind that these values are based strictly on the OOTP-reported ZR values, adjusted so that each position in each league sums to zero as this is supposed to be an "above average" statistic.

Offline Huckleberry

  • Administrator
  • Hero Member
  • *****
  • Posts: 3300
    • View Profile
Re: Introducing hWAR
« Reply #6 on: April 11, 2018, 01:40:11 PM »
Okay, I have past seasons entered now.

Offline Echo127

  • Team Owner
  • Sr. Member
  • *****
  • Posts: 462
    • View Profile
Re: Introducing hWAR
« Reply #7 on: April 11, 2018, 11:35:10 PM »
Rio only has one player in the top 100 single-season hWAR list (Ze Vargas at # 98). And 0 players since I took over in 2103.

By the time I finish my rebuild that will be 0 players.

Offline Huckleberry

  • Administrator
  • Hero Member
  • *****
  • Posts: 3300
    • View Profile
Re: Introducing hWAR
« Reply #8 on: April 12, 2018, 10:30:02 AM »
Craziest leaderboard to me is the pitching one.

http://www.worldbaseballassociation.com/leaderboard_wba_pitching_season_hpwar.php

The top 6 seasons were all by Paris pitchers.

 

SimplePortal 2.3.7 © 2008-2024, SimplePortal