Fighting Clean: Predicting UFC Battle Outcomes

Aldrin Brillante
6 min readDec 13, 2021

Data Centric Approach to Discover Victors of the Ultimate Fighting Championship (UFC)

Friday, October 18th, 2019, UFC on ESPN 6 Main Event — Dominick Reyes vs. Chris Weidman Light Heavyweight bout. This was a very heated discussion bother between the two fighters and the fans behind them. Who was gonna win the Light Heavyweight title?

With the rise of mixed martial arts becoming mainstream, a lot of people have decided to predict victors even prior to the match startin based of historical data of what they’ve seen and heard. Now, we can do it ourselves! And that’s what I would like to do for this DS project. Who would win between two fighters? Can I train a machine learning algorithm to answer that question for me?

Problem Definition

To Predict the winners of UFC Fights from each fighters’ statistics prior to the fight.

Introduction

In this report, we predict the results of MMA fights in the Ultimate Fighting Championship (UFC) using machine learning algorithms.

Background — What is MMA/UFC?

Mixed Martial Arts (MMA) has developed gradually from barbaric, unorganized bouts to one of the fastest growing sports in the world, mainly because of the broadening of Ultimate Fighting Championship (UFC). Due to increase in popularity and an increase in fighter’s analytics, I have decided to create a prediction model for obtaining the results of an MMA fight. It has many real life applications such as gambling, journalism and improvement of player performance by analyzing the advantages of the opponents’ and refining performance in accordance to data.

To summarize, the UFC is a modern day gladiator style competition where two fighters enter an octagon-shaped arena to test their hand to hand combat skills against each other to determine who is the superior fighter.

Where Da Data Come From?

We have the fortunate resource of available historical UFC fight data from (www.UFCstats.com)!!! This information can be used to train a statistical model to predict the winner of an upcoming fight. Even with us creating an ML algorithm to predict fight outcomes, please note that there are countless unknowns and confounding variables outside the scope of this dataset that will limit the effectiveness of the model.

BTD: Behind The Data

The dataset contained 5144 total fights ranging from the conception of the UFC in 1993 to mid 2019. There are 1915 unique fighters with a median of 3 recorded fights for each.

If you look at the picture above, you can notice that there was a large spike in interest in the UFC starting in 2005. With a bit more research, it was concluded that this was actually because of the premier of the popularity of Monster Energy drinks promotions and their collaboration in UFC fighting events.

Below is a screenshot peek of the dataset. Each row inside this dataset represents a completed fight between two competitors. Their individual stats leading up to the fight are provided as well as the winner of the fight.

Each row of this dataset has a certain statistic. Of the variables, there are ones such as Age, Height, Weight, Reach.

But there are also some more interesting behavioral stats given that are more indicative of style such as:

  • Punches Attempted / Landed
  • Kicks Attempted / Landed
  • Takedowns Attempted / Landed
  • etc.

In general, situations with various factors such as this bring to question the determination if there are any commonalities and/or differences regarding the overall traits as a whole between those who win and those who lose in the UFC. I wanted to identify key indicators of winning.

EDA — Exploratory Data Analysis

What is EDA? — In data science, exploratory data analysis (EDA) is an approach of analyzing data sets to summarize their main characteristics, often using statistical graphics and other data visualization methods

With that definition in mind, screenshots available:

Data seperated by weight class and grouped into the winners and losers for the purpose of finding any trend right away in a obvious manner.

REACH Conclusion:

According to the data, winners tend to have longer arms. This finding make sense, as a longer reach gives the figher more distance from their opponent.

HEIGHT Conclusion:

According to the data, the winners tend to be taller.

AGE Conclusion:

According to the data, winners tend to be younger as well.

Overall, Results make sense, but there are underlying complexities that can be uncovered with Machine Learning

Data Pre-Processing

There were a couple of gaps in the data were identified through EDA. ie many of the fighter’s reach was missing.

Considering the correlation between human’s height and arm length average.

We can use the the K-Nearest Neighbors algorithm to fill in the missing NaN values.

.

Modeling

Models used and were tested to see which performed the best for the prediciton task were:

  • Logistic Regression Classifier
  • Random Forest Classifier
  • XG Boost Classifier

A grid search was performed on each model to tune the hyperparameters of each model to see which yielded the highest accuracy.

RESULTS AND CONCLUSION

XGBOOST ended up being the most accurate with a score of ~63%

Screenshot of what model results present 2019 data

Most influential variables:

  • Age
  • Avg. Opponent Takedown Percentage
  • Avg. Opponent Significant Strike Percentage
  • Reach distance
  • Avg. HS(headshots) Landed

According to the data, it can be interpreted that:

  • Youth > experience
  • Defense is the best strategy for wins
  • takedowns and hits you allow is a major factor
  • Having longer reach can deam as the superior
  • Headshots = very effective

Reyes vs. Weidman w/ Model

Dominick Reyes odds in winning against CW — 60.9%

Chris Weidman odds in winning against DR — 57.6%

Model Predicted correct — Reyes won first round.

Acknowledgements:

  • Thank you Kaggle user Rajeevw for developing a web scraper to pull ufcstats.com data
  • Thank you to various youtube tutorials in assisting me throughout this hwole process.
  • Thank you to my instructor and mentor, Aakash Sudhakar, for sticking with me and helping us out throughout this semester.
  • Thank you to the web2 internet - lifesaver in this class in helping me understand.

--

--