Basic Baseball Simulation

Are we in a simulation? Perhaps. If you don’t believe me, well, I’m the one who is simulating your reality. I specifically programmed you to be skeptical of this premise, therefore, your skepticism is actually proof that I’m right. Ha! Got you! Don’t try to fight it. I already know what’s going to happen. Embrace the determinism!

Okay… moving on to the article itself, I recall that one day, three or so years ago, I decided to try to whip up a baseball simulator. I was inspired by Win Expectancy Finder, the wonderful site created by Greg Stoll, which holds a database of every single regular season MLB at bat since 1957 (currently 130,860 games!) and uses them to calculate the probability of a team winning in a current scenario, based on how often teams have won, historically, from said scenario. Here are a few examples:

Start of a game: Top of the 1st, no outs, no one on: Home Team wins 53.87% of the time

Bottom of the 9th, 2 outs, 2 men on: Home team wins 83.53% (don’t tell that to the TB Rays, though!)

The main flaw with this method is that it doesn’t take individual matchups into account. It gives a good average, but if the best team in the league is hosting the worst team in the league, or vice versa, or if a team is starting its best/worst pitcher, the actual probabilities will look quite different! Thus, it would be nice to be able to plug in specific player probabilities (pitchers & batters), to get a more accurate picture of how lineups would match up against each other.

To start off, we’re going to take the overall MLB average and use those baselines for all players. Here is the breakdown for 2019, the last full season:

Results of 2019 Plate Appearances (186,518)

  • Singles: 25,947 (13.91%)
  • Doubles: 8,531 (4.57%)
  • Triples: 785 (0.42%)
  • Home Runs: 6,776 (3.63%)
  • Walks: 17,879 (9.59%)
  • Outs: 126,600 (67.88%)

We will give this probability distribution for all players. We’ll treat all outs as strikeouts (i.e. no tagging up or double players), assume all players advance 2 bases on singles and on doubles.

First, we’ll initiate a series of variables to keep track of the game state: inning, outs, score (home & away team run totals), bases (e.g. empty, loaded). This could and should be done via classes, but for the sake of this exercise, we can get the job done via simple functions and global variables:

For those of you who don’t know the rules of baseball, here you go. For our purposes, we want the inning variable to increment at the end of each inning and reset to 1 at the end of the game. We want outs to increment at each out and reset to 0 at the end of each half-inning. We want bases to be updated and then reset as needed. So, we’ll design some functions to handle each at-bat.

Here is a function that randomly generates a number and assigns it a baseball outcome. Note: the random-looking integers are the cumulative sum of the aforementioned batting results.

Here’s a helper function to sum the elements of an array:

Here is a function that handles walks:

This mutates the bases according to baseball walk rules and returns the number of runs scored; a team scores on a walk only if the bases are loaded (i.e. if all bases are occupied by players prior to the walk).

Here is another function, to handle one of the hits (single, double, triple, home run):

Finally, here is a function to handle all at-bats, regardless:

Finally, here is a function that runs atBatResult() in a loop until the game is over, i.e. until gameOn equals false. Once the game is over, it pushes the result (0: away team wins, 1: home team wins) to the gameLog array and resets the game state.

Let’s go through and make sure the sim is outputting reasonable results:

Which outputs the following:

Hooray! These are reasonable, baseball-looking scores, and the results in the gameLog correspond to what we see onscreen. That is, the home team is 2–3.

As a final step, let’s modify our function so that we can input the starting game state that we want to simulate, e.g. bottom of the 9th, down 1, bases loaded, 2 outs:

This function will set the game at the specified state (with default values corresponding to the start of a game) and play it out until a result is reached.

As it turns out, the best to implement this simulator is to avoid all the messy global variables and do it OOP. That’s right, with classes, states, instances, inheritance and all that good stuff! After many painful hours tinkering, and in some cases simply blowing everything up and starting from scratch, here is what I got:

To test this out, let’s do a few sims, with 1000 iterations. We’d expect about 50%, given that the simulator treats both teams equally and gives no home-field advantage:

Pretty good! All the results are hovering right around 50%, well within the margin of error on both sides.
Now, let’s try the other scenario captured in the screenshot above and see if our results are close to 16.47% for the home team (i.e. 100% minus 85.53%):

Hey, not too bad! It makes sense that these sim results would be a little higher, since teams typically put their closers in at the end of the game, who are more efficient on a pitch-by-pitch basis. “Hey, if that’s the case, why don’t teams just use their closers the whole game?” Here’s why.


I hope you enjoyed the demonstration! My main takeaways are the following:

  1. Avoid global variables; use classes and OOP principles whenever possible.
  2. Seemingly simple things, like baseball rules, can be quite tricky to convert from intuitive understanding into code.
  3. It’s a lot easier to spend hours and hours on a problem when you enjoy the subject matter!

Until next weekend!



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store