Avatar of Sarbhanu Baidya

Tableau: Four years of Blue Bikes

MIT License

Dashboard:

View with Tableau Public

Blue Bikes (formerly, Hubway) is owned by the municipalities of Boston, Brookline, Cambridge and Somerville, and operated by Motivate International, Inc in Boston, MA.

Snippets:

Stations page-1

Trips page-2

Trends page-3

Groups page-4

Seasonal page-5

Weather page-6

Data Source:

The datasets used in this project are provided by Blue Bikes (Motivate International, Inc.)

It includes:

  • Comprehensive set of trip histories which is updated each annual quarter
  • Real time system data, published in open General Bikeshare Feed Specification (GBFS) format - a format recommended by the North American Bike Share Association (NABSA).

Home: link
Dataset Bucket: link

Focus of the analysis:

  • Locationwise station details and most and least used stations.
  • Heatmap of trips for the last four years.
  • Details about trips. How does the usual commute look like durationwise? Subscription wise.
  • Trips taken by male and female Bluebikers.
  • What are the peak hours and peak weekdays? What are the most frequently travelled routes, yearwise?
  • Age groups. Subscription status by the age groups.
  • Trips taken at late hours, Agewise.
  • Trends of the trips taken. What were the busiest months? Estimation of total trips for the coming year.
  • Weather conditions and factors that might influence Bluebikers.

Preprocessing:

Below are the steps performed using Pandas and NumPy on this notebook file in order to convert the collected data to a managable format which can be used for visualization.

Analyzing available features

Motivate International, Inc publishes their data each month, so in order to get a time series for 4 years, from Jan, 2019 to Dec, 22 which gives 48 seperate files. [As of 01/09/2023, Dec-22 is the last data published.]

  • Merging Data in seperate dataframes by Year (2019, 2020, 2021, 2022)
  • Extracting the Features from each set. Note: Since all of the years don’t contain same features and sometimes aren’t of same data type, processing them seperately was the best way of dealing with features since those features are important.

Available features:

NameTypeDescriptionAvailability
tripdurationintDuration in seconds19, 20, 21, 22
starttimestringDate Time in YYYY-MM-DD HH:MM:SS19, 20, 21, 22
stoptimestringDate Time in YYYY-MM-DD HH:MM:SS19, 20, 21, 22
start station idintStart staion terminal id19, 20, 21, 22
start station namestringStart station name19, 20, 21, 22
start station latitudefloatLatitude19, 20, 21, 22
start station longitudefloatLongitude19, 20, 21, 22
end station idintEnd station terminal ID19, 20, 21, 22
end station namestringEnd station name19, 20, 21, 22
end station latitudefloatLatitude19, 20, 21, 22
end station longitudefloatLongitude19, 20, 21, 22
bikeidintBike Number19, 20, 21, 22
usertypeintSubscriber or Casual19, 20
birth yearintYear of Birth19, 20
genderintMale or Female19, 20
postal codeintIncomplete information20, 21, 22

Cleaning data:

YearShape
2019(2522771, 15)
2020(2073448, 16)
2021(2934378, 14)
2022(3757281, 14)
Total(11287878, 15)
Data after Cleaning

Trip Data: NB: Files are not included with the repository. Again, source data files can be downloaded from here. link

NameTypeDescription
tripdurationintDuration in seconds
starttimestringDate Time in YYYY-MM-DD HH:MM:SS
stoptimestringDate Time in YYYY-MM-DD HH:MM:SS
start station idintStart staion terminal id
end station idintEnd station terminal ID
bikeidintBike Number
usertypeintSubscriber or Casual, [0, 1]
genderintNaN, Male, Female [0, 1, 2]
ageint(birth year - starttime year)
Data Added:

Collected station information from the availabe data:

Station Data: [View CSV]

NameTypeDescription
idintTerminal id // Not unique
namestringname
latfloatLatitude
longfloatLongitude
streetstringStreet name
citystringNeighbourhood
countystringCounty name
statestringState name (MA)
zipintzip code
Geocoding API:

Here we used Geocodeio’s link API for python to reverse geocode some of the referenced latitudes and longitudes to fetch geographical information and categorical features about the stations and terminal locations.
More on the implementation here at In [88] of this notebook. link

Unofficial package for Geocodeio Python API from bennylope/pygeocodio by bennylope. link

Implementation details

  1. Install Geocodeio
!pip install pygeocodio

from geocodio import GeocodioClient
client = GeocodioClient('KEY_HERE', timeout=300) #timeout: API conn. timeout

Produced List

ParameterTypeDescription
client.reverse()ListPython List containing Lat Long information.
Adding weather information to the analysis:

Also, added the Weather data for each day from Jan, 2019 to Dec, 2022 Used Visualcrossing’s link weather data here.

Weather Data: [View CSV]

NameTypeDescription
datetimestringEach day from Jan, 2019 to Dec, 2022
tempfloatTemprature
dewfloatDew
humidityfloatHumidity
precipfloatPrecipitation
precipcoverfloatPrecipitation cover
preciptypestringPrecipitation type
uvindexintUV index
  • Since Postal code information was incomplete, we were unable to use that information. Instead later we collected the postcode of each station using Geocodeio API.
  • Removed the redundant information containing ‘start station name’, ‘start station latitude’, ‘start station longitude’, ’end station name’, ’end station latitude’, ’end station longitude’.

Reduction in raw data size (total):

BeforeAfterReduction
2.30 GB1.24 GB~ 46.08%