The Squirrel Census (Porfolio Project)
When I was pursuing my data analytics certificate, I was given The Squirrel Census as an example of dataset that could be used a porfolio project. Recently, I moved to the United States, and I started seen squirrels everywhere, I am located in Texas though, but I love to see them because we do not have them where I used to live. I decided to use this dataset to help building my porfolio and practice a little bit of R, and at the same time just out of joy. I really enjoyed reading the notes on the files and plan on calling to their phone number one of these days.
If you go to their website, they explain how the data was collected and what was the purpose of it:
The Squirrel Census is a multimedia science, design, and storytelling project focusing on the Eastern gray (Sciurus carolinensis). They count squirrels and present their findings to the public.
On March 1, 2020 — with the help of 72 volunteer Squirrel Sighters, as well as NYC Open Data — they performed a sample count in 24 New York City parks, and gathered other material data. Four hundred and thirty-three squirrel sightings were tallied. The methodology was less focused on total squirrel numbers per hectare and more attuned to the stories — of squirrels, humans, and parks.
I will be using their datasets for my personal project, and I will be focusing in 3 factors as a guide for my analysis.
- The squirrel sight count and the fur color.
- The squirrel sight count by Park Area.
- The squirrel sight count by Park Name.
If you read their user guide, you will find deeper explanations on the datasets.
I decided to use R to explore and analyze the datasets and get to conclusions.
I downloaded the datasets on the website and started my analysis, I will be explaining it step-by-step.
1 — Install needed packages and open them.
#first we need to install the packages that we might need for our analysis
install.packages("tidyverse")
install.packages("janitor")
install.packages("dplyr")
install.packages("readr")
install.packages("ggplot2")
#and we opened them
library(tidyverse)
library(janitor)
library(dplyr)
library(readr)
library(ggplot2)
2 — Import the datasets.
park_data <- read_csv("park-data.csv")
squirrel_data <- read_csv("squirrel-data.csv")
stories <- read_csv("stories.csv")
3 — Check column names for uniformity.
colnames(squirrel_data)
colnames(park_data)
colnames(stories)
4 — Merge datasets and look for inconsistencies.
#first I merged the the park data and the squirrel data
#then I merged the new dataframe "df" into a new one that includes the stories
df<-merge(park_data,squirrel_data)
dff<-merge(df,stories)
#now I look for inconsistencies on the new dataframe (dff)
colnames(dff)
str(dff)
View(dff)
5 — Do calculations to gain insight about the data
# we use the mean function to look for the average number of things
mean(dff$`Number of Squirrels`)
# we know that the max number is 6 from the user's guide
mean(dff$`Number of Sighters`)
# maximum number of time spent looking for squirrels
max(dff$`Total Time (in minutes, if available)`)
# minimum number of time spent looking for squirrels
min(dff$`Total Time (in minutes, if available)`)
# max above ground height in feet where the squirrels were seen
max(dff$`Above Ground (Height in Feet)`))
From these calculations we learn that according to the information we have in this data frame; we got some numbers that might be interesting for the reader:
- The mean number of squirrels seen is 33.44.
- The mean number of sighters is 3.41.
- The max time spent looking for squirrels was 80 minutes.
- The min time spent looking for squirrels was 23 minutes.
- The max height where a squirrel was seen was approximately 75 feet.
6 — Now we summarize the data
summary(dff$`Number of Squirrels`)
summary(dff$`Number of Sighters`)
summary(dff$`Total Time (in minutes, if available)`)
As a result, from the summary function, we learn that:
summary(dff$`Number of Squirrels`)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.00 16.00 34.00 33.44 51.00 59.00
summary(dff$`Number of Sighters`)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.00 2.00 2.00 3.11 4.00 6.00
summary(dff$`Total Time (in minutes, if available)`)
Min. 1st Qu. Median Mean 3rd Qu. Max.
23.00 30.00 30.00 37.66 40.00 80.00
If we compare the summary function’s results with the mean, min and max that we did before, we see that there are some slight differences on the numbers.
7 — Filter the data to find relations between fur color, interactions with humans and the area ID where the squirrels were seen. As well as the most common squirrel colors.
#now we filter the data to find the relation with fur color, interaction with humans and area ID
most_common <- dff %>%
group_by(`Primary Fur Color`,`Interactions with Humans`,`Number of Squirrels`, `Park ID`, `Area ID`, `Above Ground (Height in Feet)`, `Park Conditions`) %>%
count() %>%
slice_max(n, with_ties = FALSE)
#most common squirrel colors
sq_color <- dff %>%
group_by(`Primary Fur Color`, `Number of Squirrels`, `Park ID`, `Area ID`, `Park Name`) %>%
count() %>%
slice_max(n, with_ties = FALSE)
View(most_common)
View(sq_color)
From this chunk of code, we are able to filter all the information and I will classify my conclusions into 4 parts.
Highest number of squirrel sights in ONE location.
We found that the highest number of squirrels seen in ONE location was 59, at Washington Square Park (park ID = 10, Area ID = B), located in Central Manhattan. And the most common fur color among these squirrels was gray.
Most common squirrel colors.
The most common primary fur colors among all squirrels observed, were gray (59), black (59, 51(different parks)), and cinnamon (44).
The Highest Squirrel (height in feet above ground)
The squirrel that was located at the highest height according to an approximation of the sighter, it was 75ft, located at the McCarren Park (Area D, Brooklyn) and its primary fur color was gray.
Smallest number of squirrel sighting in ONE location.
The smallest number of squirrel sighting in one location it was 1 squirrel, in the Teardrop Park (Area C, Lower Manhattan) and its primary fur color was gray.
8 — Use ggplot2 to make some visualizations about our data.
# first we open ggplot2
library(ggplot2)
# Squirrel Count by Fur Color
ggplot(dff, aes(x = `Primary Fur Color`, y= `Number of Squirrels`, fill=`Primary Fur Color`)) +
stat_summary(fun = "length", geom = "bar") +
labs(x = "Primary Fur Color", y = "Squirrel Count", title = "Squirrel Count by Fur Color")
As shown in this visualization, the most common primary fur color for the squirrels seen in the NY parks, is gray, the second most common is black, and the third one is cinnamon.
#Squirrel sight Count by Park Area
ggplot(dff, aes(x= `Area ID`, y=`Number of Squirrels`, fill=`Area ID`))+
stat_summary(fun="length", geom = "bar")+
labs(x="Park Area", y="Number of Squirrels", title = "Squirrel Count by Area ID")
# Squirrel sight Count by Park Name
ggplot(dff, aes(x= `Park ID`, y=`Number of Squirrels`, fill=`Park Name`))+
stat_summary(fun="length", geom = "bar")+
labs(x="Park Names", y="Number of Squirrels", title = "Squirrel Count by Park Name")
The Park Area with the most squirrels seen was the B (Central Manhattan), the second most populated with squirrels was the area A (Upper Manhattan), the third most populated was the area D (Brooklyn), and the last populated was the area C (Lower Manhattan).
The highest number of squirrel sightings occurred at Tompkins Square Park, Union Square Park, and McCarren Park, respectively.
Finally, using Tableau, I made a geographical visualization that shows you the areas where the squirrels were seen and their interaction with humans, you can look deeper at it using this link.
As a conclusion for this project, I have to say that it provided me an opportunity to practice my skills and the use of R, but also taught me that I find squirrels a lot more than I thought, I gained insights into their behavior, and the dynamics between them and the humans in the human environment. And it gave me the idea collecting information about my daily interactions with nature while I walk my dog.
(n.d.). NYC OPEN DATA WEEK MULTI-PARK SQUIRREL COUNT. The Squirrel Census. Retrieved July 12, 2023, from https://www.thesquirrelcensus.com/data