Accident Trend Over the Year

data_day  = 
  data_day %>% 
  pivot_longer(
    `accident number`:`people killed number`,
    names_to   = "type",
    values_to = "number"
  )

plot_day = 
  data_day %>% 
  ggplot(aes(x = date,y = number, color = type))+
  geom_line()+
  labs(
   title = "Trend Over the Year",
   x = "Day of the Year",
   y = "Number") +
  annotate("text", x = as.Date("2018-11-15") , y = 695, label = "2018-11-15") +
  theme( legend.title = element_blank() )

ggplotly(plot_day) %>% 
    layout(legend = list(orientation = "h",   # show entries horizontally
                     x = 0.2, y = -0.2))  

Comments:

There is a steady trend over the year of 2018. Except that an outlier for accident number is observed around November. Aftering some additional exploration, we found that it’s 15 Nov 2018. We looked into this date and discovered that it is the date of a severe winter storm in New York, which can explain the sudden increase in accident number.

A video of the winter storm that day can be found at windstorm



Porportion of accidents by hour

hour_events_plot =
hour_events %>% 
  ggplot(aes(x = hour, y = proportion, color = type)) +
  geom_line() +
  annotate("rect", xmin = 8, xmax = 20, ymin = 0, ymax = Inf, fill = "blue", alpha = .1) +
  labs(
    title = "Proportion of Accident and Injured People by Hour", 
    x =  'Hour',
    y = 'Proportion') +
  theme( legend.title = element_blank() )

hour_events_plotly = ggplotly(hour_events_plot)
hour_events_plotly[['x']][['layout']][['shapes']] <- c()

hour_events_plotly %>%  
  layout( shapes = list(type = "rect", fillcolor = "pink", line = list(color = "pink"),
                       opacity = 0.3, 
                       x0 = 8, x1 = 20, xref = "x", y0 = 0, y1 = 0.08, yref = "y"),
          legend = list(orientation = "h",   # show entries horizontally
                     x = 0.3, y = -0.2))
#

Comments:

More car accidents happens from 8am to 8 pm, when people are on their way to work or getting back home. Although less accidents are around 7 am and 9 pm, those are the most fatal ones (because the killed proportion surpass the accident proportion line). We think this might be caused by fatigue driving, because it’s early in the morning and at night.


Proportion of Accident by Borough

boro_events = 
  tidy_data %>% 
  drop_na(borough, persons_injured, persons_killed) %>%
  group_by(borough) %>%
  summarize(accident = n(),
            injured = sum(persons_injured),
            killed = sum(persons_killed)) %>%
  mutate(prop_accident = round(accident/sum(accident),4),
         prop_injured = round(injured/sum(injured),4),
         prop_killed = round(killed/sum(killed),4)) %>%
  pivot_longer(
    prop_accident:prop_killed,
    values_to = "proportion",
    names_prefix = "prop_",
    names_to = "type",
  ) 

boro_events_plot = 
  boro_events %>% 
  ggplot(aes(x = borough, y = proportion, fill = type)) +
  geom_bar(stat = "identity",position = "dodge") +
  labs(title ="Proportion of Accident and Injured People by Borough", 
       x = 'Borough',
       y = 'Proportion'
  ) +
  theme( legend.title = element_blank() )

ggplotly(boro_events_plot)%>% 
    layout(legend = list(orientation = "h",   # show entries horizontally
                     x = 0.3, y = -0.2))  

Comments:

There is a positive correlation between number of accidents and number killed/injured adjusting for boroughs. In addition, accidents happened in Queens are the most fatal among all boroughs. There are more accidents happen in Queens and Brooklyn, while fewer accidents happen in Staten Island.


Top 10 Causes of Accident

cause_by_boro_plot =
  cause_by_boro %>% 
  ggplot(aes(x = cause, y = ratio, fill = borough)) +
      geom_bar(width = 0.5, stat = 'identity', show.legend = F) +
      coord_flip() +
      facet_grid(.~borough) +
      labs(title = 'Accident Causes by Borough',
           x = NULL,
           y = NULL) + 
  theme(axis.text.x = element_text(size = 6, angle = 45),
        legend.title = element_blank())

ggplotly(cause_by_boro_plot)  %>% 
  layout(legend = list(orientation = "h",   # show entries horizontally
                     x = 0.1, y = -0.1))  

Comments:

Try not to follow too closely while driving, since it’s the most common causal factor for collisions overall. When driving in Manhattan, be especially careful when passing or changing lanes as those factors contributes to around 30% of all accidents in Manhattanm and is a lot higher compare to other boroughs.


Collisions of Day for Different Vehicles

vehicle_type_plot = 
vehicle_type_data %>% 
  ggplot(aes(x = hour, y = number, color = vehicle_type)) +
  geom_line() +
  labs(
    color = "vehicle type",
    title = "Collisions of Day for Different Vehicles",
    x = "Hour of Day",
    y = "Number of Collisions"
    ) +
  theme(legend.title = element_blank())

ggplotly(vehicle_type_plot) %>% 
    layout(legend = list(orientation = "h",   # show entries horizontally
                     x = 0.4, y = -0.2))  

Comments:

Four most common vehicle types were chosen to be tested. Sport utility vehicle has the highest collisions number per day, followed by passenger vehicle. Two peaks of collisions are around eight o’clock in the morning and five o’clock in the afternoon.