Another Baseball Simulator… in Python!

Two weeks ago, I did a little bit of web scraping, getting data from a simple table. Last week, I created a baseball simulator in JavaScript. This week, I’m combining the two: scraping data from a baseball stats site, then building a simulator… but this time, both in Python!

Scraping The Data

Whereas in JavaScript we used the request().then().then() function syntax along with the DOM querySelector() and querySelectorAll() functions to find the table, we’ll be using the Python equivalents for this exercise: the ‘requests’ library for the initial GET request and then the famous ‘Beautiful Soup’ library to filter through the web page and get the desired table.

After downloading said libraries locally on your machine (via pip, conda or whatever your preference is), import the functions at the top of your file. Then, to get the data from the website, we use requests.get() and then several BeautifulSoup functions to parse the data:

The table we’re scraping- contains total batting data for every team in the MLB!

At this point, we invoke our last library: the magical Pandas, with its core DataFrame data structure, to hold our batting #s as a sort of Excel-style spreadsheet, which can be easily manipulated and filtered:

Using the wonderful Jupyter Notebook tool, which lets us run and tests individual code snippets, here’s what we get:

A Pythonic version of the Baseball Reference table from our initial page

Even though these rows are team-wide averages, we can treat them as individual player stats and later load them into our simulator. We don’t need all 28 columns; so we can just grab the key columns from this table and use them to generate a probability distribution for each “player”. Then, using our knowledge of baseball, we can calculate the % of plate appearances resulting in singles, doubles, triples, homers, walks and outs:

Here’s what our probability distribution looks like now:

Now, we can treat these 30 rows as players and load them onto teams! We’ll randomly select two groups of 9 rows and treat them as 9-player teams:

Now, let’s go through and create the game & simulator classes! We can basically just take the code from our JavaScript classes and convert them into Python, with several alterations to take into account the use of DataFrames and player-specific data:

Next, we create our team class, which is initialized with a 9-row DataFrame, each row serving as a player’s probability distribution:

Now we create the Game class, which contains the functions for generating each at-bat and updating the game state accordingly:

Lastly, we add the Simulator class, which takes the inputs specified by the player (game state, players) and simulates a given number of games, as determined by the user:

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store