Exploring US crime data (Part II)

Continuing with the exploration of US crime data from my previous post, now I am going to look at city-level murder rate, instead of state level.

Let's start with cities with a population of 500,000+ people. The data is publicly available from 1985 to 2014 at this FBI site. However, that original data set only had the city names, but not their longitude-latitude coordinates, which is what I need for plotting cities with Plotly. Because there are only about 35 cities in my dataset, I just searched online and downloaded the information about the city coordinates and added it manually to my original file, as two extra columns. The final file is hosted here.
This is how the data looks like:



The format is a little different than the one for the data at level-state that we used in the previous post. Now we have one row per city, and for each city, we have the name and coordinates, as well as the murder rate for each year between 1985 and 2014.

Well, full disclosure: the original FBI data didn't have the latitude-longitude coordinates, just the city and state names. However, it is easy to automatically add that info by using python and a free API. Go to this other post of mine if you want more details.


Let's look for example, at 1985:





Note: I've chosen to represent the area of the city as the murder rate for that city and that year, instead of its total population for that year, because I think it helps understand the patterns better. In other words, after normalizing by city population, larger circles represent higher number of murders per 100,000 people. We observe how Detroit is the most dangerous city in 1985 (in any year of my data set, really), followed by Dallas and Fort Worth, LA and the East coast cities.

I wonder how stable this ranking is over the years. To address this question, and just as a visual exploration, I'll add a slider to be able to plot the time evolution of all cities. I'll keep the area as corresponding variable for murder rate in a city, but also, to convey the relative ranking of cities for a given year, I'll add color: I divide my 35 cities into four groups, ranked from top group (the most dangerous cities that year, in the darkest shade of red), 2nd ranked group (the next most dangerous ones, represented with a lighter shade), and so on. On the other hand, and to make sure the comparison is fair across years, I need to make sure that the size of a circle in a given year represents exactly the same murder rate than a circle of the same size in a different year. That is to say, I normalized all values by the all-time highest value for murder rate (this is not something that Plotly does automatically, so I have do not normalize, the areas of the circles wouldn't be comparable across different years).


This is how the result looks like:







By sliding back and forth we quickly learn that, while Detroit is always a hub of murder rate, cities like LA or NYC significantly drop in the ranking in the 1990s.

Comments