DATA ANALYSIS

Shark Attacks

I analyzed a dataset I found on Kaggle for my coding class. I utilized Python for this project.

Project Summary

The dataset includes the date of the attack, year, type, country, area, location, activity, name, sex, age, injury, fatality, time, species of shark, and the source. There is an additional data source that includes regions that I joined with the shark attack data set.

My goal from analyzing this dataset included finding the most common age, gender, type of attack, region, and state for shark attack victims in order to see what combination of factors create the most likely victim. While there may not be a casual relationship between these factors and victims, it is still interesting to see. I also decided to reduce the scope of geography to the USA for this project.

  • Relationship between Gender and Region

  • Relationship between Gender and Area

  • Age Distribution

  • Relationship Between Year, Age, and Order

  • Relationship between Type of Attack and Age

  • Distribution of Victim Gender

Conclusion

Overall, I came to the conclusion that more men were attacked than women and the state with the highest shark attack occurrence was Florida, with a median age of 27 years old. This is further supported by the insight that the South Region had the highest shark attack occurrences. In addition, there appears to be no relationship between type of attack and age, however, unprovoked attacks were the most common.

Gaps/limitations that occur in this data set are that there are more factors that can contribute to a shark attack than listed. In addition, for the factors that are listed, it is not always possible to acquire all of the information since a shark attack is not planned and the victim is not always aware of certain factors, such as the shark species.