Everyday 8 million Americans board an airplane, putting their lives in the hands of the professional pilots, mechanics, flight controllers, and rampers who all make safety a priority. Airplane travel is statistically one of the safest ways to travel, with the number of deaths per passenger mile on commercial airlines in the United States between 1995 and 2000 at 3 deaths per 10 billion passenger miles traveled. ( https://en.wikipedia.org/wiki/Aviation_safety#cite_note-2 ). Compare that to the 41,945 road deaths in the US for the year 2000 ( http://www.iihs.org/iihs/topics/t/general-statistics/fatalityfacts/overview-of-fatality-facts), and 425 fatalaties for train travel ( https://oli.org/about-us/news/collisions-casulties ) and airtravel starts to look really safe.
In this article we will explore all aircraft accidents recorded by the National Travel and Safety Board ( NTSB ) from 1962 up until the present. We will look at the number of persons injured through the lens of single variables like Purpose of Flight ( for example, Air Races, Firefighting purposes, Flight Tests, Personal use, Public use , Skydiving and more ), Type of Engine ( Reciprocating propeller, Turbo Jets,Turbo Fans ) and Make/Model. We will explore how two variables interact when we look at things like how Phase of Flight ( the time during the flight the accident happened, for example at Landing or Takeoff ) and Aircraft Category combine to affect Total Injured. And finally we will bring it all together in a summary and final presentation of 3 main plots.
This data looks at 79,141 flights as recorded by NTSB from 1962 onward. This dataset contains 31 variables, 5 of which we create in order to better look at the data. We will look strictly at the USA subset of data because it contains the most information.
Let’s take a look at summary of our data.
As you can see we have factors that you would expect to see, like Date, City, State and Country. We also have Lat/Long pairs we will use to plot our accidents and we will use those coordinates to group these accidents together by state.
There is quite a bit of information in this summary so let me highlight some of the more interesting pieces:
Some interesting things appear when we start to look at the means of injuries. Per crash we have the observe the following statistics:
This gives an interesting factoid that if you were in an aircraft accident, according to this data, you have a 26% chance of being injured, and a 19% chance of being seriously injured.
Here we plot every known accident from its lat/long pair and place it on a grid. It’s clear to see how the accidents themselves can define boundary lines and paint a picture of our data.
Here we can see a distrubution of states and cities in our data set . For our cities, we have limited to the top 15 most accident prone cities in order to visualize it.
Above we can see the most crashes are “Non-Fatal”, and result in “Substantial” damage, but are not destroyed.
Here we can clearly see that Cesna created the most engines that were involved in accidents. It’s important to note that this number, although high, does not mean that Cesna makes poor engines, only that it makes a lot of engines. We don’t have the numbers for non-crash statistics, but I think you would see a high number of Cesna engines in those statistics as well.
We can also see that most engine types are called ‘Reciprocating’. This is a type of piston driven engine ( similar to what you find in a car ) that drives a shaft that turns a propeller. Also commonly called ‘Props’ or ‘Prop Planes’.
The Model graph shows mostly Cessna engines ( models 152 - 172s ) and one Piper engine ( the Pa-28-140).
As we will see in the next section, Phase of Flight is one of our most correlated factors. Phase of flight details when the accident occured during the flight, such as during landing or take off.
These two charts above give us the total number of crashes per Aircraft Category scaled with the log10() function, and the second chart shows us the percentage of people injured during an accident. The higher the ratio, the more people get injured on average per crash. In the chart above, Rockets have the highest injury rate per crash at 100%, followed by Powered Parachutes.
I ended up consolidating Purpose of Flight from 21 categories into 6 base categories in order to better visualize them. Many of those categories had overlapping intents, so reducing them to 6 makes it easier to visualize our data.
Our data is structured in ~80K rows and 36 columns. Our descriptive factors are what we will use to try and correlate to the total number of injuries. Can we create a model to explain the variance in number of injuries using the variables Aircraft Category, Engine Type, Make, Model, Purpose of Flight and Phase of Flight ? Let’s find out!
In this section, we will look at how two or more variables can combine to affect our total number of injuries. Let’s start by looking at total number of plane crashes by state. State is a variable we created for our data based on its Lat/Long pair.
As we can see California has the most accidents followed by Texas and then Florida. These have all been in the top 3 most populated states so those numbers line up with our expectations based on the number of people living there.
I’ve also color coded these by the most common source of accidents as seen through the Purpose of Flight variable.
After removing the overwhelming ‘Personal’ category, which accounted for 62% of all Airline accidents, we notice most states accidents come from Instructional flights. Second after that is ‘Aerial Application’, more commonly refered to as ‘Crop Dusting’ and is the act of spreading pesticides on crops from an airplane. Then when we had one state, Vermont, where most of the accidents come from Glider Tow. A Glider tow is where one plane pulls a Glider plane ( a plane with no engine only wings ). And one state, Wyoming had the most aviation accidents from Business Travel.
Just for posterity let’s look at the unaltered state graph.
This multivariate chart shows us the Total Injured, the Total Crashes, and Total Fatalties as looked at through the lense of Phase of Flight. We notice right away that each grouping has a different count leader, with the most injuries occuring during the TAKEOFF phase, the most number of crashes happening during the LANDING phase, and the most fatalaties during the MANUEVERING phase.
This map gives us a good feel as to total number of injuries as they occur by state. We have darkened each state according to the total number of injuries that occured there.
With this plot, we can clearly see that Personal accidents ( in green ) account for most of our crashes. We have adjusted our total number of injured with a log function in order to better visualize it, and we are using that variable to control bubble size. We notice an unusual string of red in the central united states, representing accidents involving government crashes.
In this plot we have removed the overwhelming personal accidents to get a better feel for what other types of accidents occur most often and in what geographical region. Wenotice a high concentration of Recreation injuries in the central part of the region, with very few on the east coast and a medium amount on the West Coast.
Here we have a map of the 15 most dangerous crashes in NTSB history. We have used the Total Injury count to increase the size of the points on the map, with San Marcos being the location of the one crash with the highest number of injuries.
Now Let’s take a look at a “Pairs Panel” from the Psych packgage.