Shared Ride Efficiency in Brooklyn

In my previous post, I presented interesting trips patterns of NYC taxis. Here, I focus my analysis of Brooklyn. Brooklyn is a very popular place of interest for both tourists and local people to eat, drink, and have fun. I usually take subway train L or G to visit Brooklyn. However, L train is to be shut down in 2019 for 1.5 years [1], which may severely affect Brooklyn’s business. Taxi trips, in particular, shared taxi trips, may become the mainstream transportation in Brooklyn.

To understand possible shared trip, I grouped pickup and dropoff locations by zipcode, neighborhood, and county. In order to do this, I first converted coordinates to address using Geocoder API for green taxi. Since the API is slow, I used single days (Wednesday, Saturday, Sunday) to represent weekday and weekends. I also included Uber data in this analysis, which include the pickup neighborhood information. Check my first post of data wrangling here.

Trip within Brooklyn

I first looked at trips starting and ending within Brooklyn. Such trips are defined as local trip and they usually have short distances of 2 miles and travel time 10 minutes.

Hourly trip on weekday and weekend

Weekday (Wednesday) pickup in Brooklyn peaked at 9 pm. Saturday pickup peaked at midnight, starting to increase in late afternoon, and lasted to Sunday morning. Sunday however, had a lower pickup late night (lower than Wednesday).

Where did taxi pick up and drop off passengers?


Animated pickup map by hour

Wednesday Pickup in Brooklyn

Saturday Pickup in Brooklyn

Sunday Pickup in Brooklyn



The spatial temporal pattern revealed that evening and late weekend night were the busiest time for certain Brooklyn areas.

Top 10 zip codes for pickups are 11231, 11205, 11216, 11222, 11238, 11215, 11249, 11217, 11211, 11201

Top 10 zip codes for drop off are 11201, 11211, 11217, 11249, 11215, 11238, 11222, 11216, 11205, 11231

Top Neighborhoods

(image url

When I grouped trips by neighborhood names, the popular spots became familiar.  Williamsburg, Greenpoint, Park Slope, Clinton Hill were hot pickup and drop off spots. In particular, Williamsburg led the 2nd popular spot by 100%.

Uber is also popular in these neighborhoods

I looked at Uber’s pick up records on the same days.

Similar to Green Taxi, Uber trip in Brooklyn also had a morning peak and a evening peak on Weekday. Yet trips droped in the middle of the day. On Saturday (Friday) overnight and Sunday overnight, there were more trips, similar as Green taxi.

Top neighborhoods of Uber, again, include Williamsburg, Park Slope, Greenpoint.

Popular Routes

Now that we know most trips start and end at similar hot spots,  I then investigated popular routes in Brooklyn. If some routes are very popular, it may be possible to redesign bus/shuttle routes. Also, if passengers share the same routes, they may also share taxi rides, which only only help passengers to save money, but also increase the efficiency of taxi rides.

I started with Green taxi Saturday data and grouped trips by pickup and dropoff zip codes.

From the list of popular routes above, I noticed that these trips had same or similar pick up and drop off zip codes, and short trip distance < 2 miles.

I defined shareable trips as trips with pickup time within 5 minutes and with same pickup and dropoff zipcodes. If there are more than 1 trip, number of shareable trips = total passenger number / 6, while 6 is the capability of each taxi. Otherwise, the number of non-shareable trip is 1, for each route and time window. Then for each pickup zipcode, I count the total number of shareable trips and non-shareable trips. Shared ride efficiency for each zipcode  = #shareable trips / (#shareable+#non-shareable trips). Hourly ride efficiency is calculated as the mean efficiency across all zip codes.

1 Group by pickup and dropoff zipcode (namely route), and count total trip number, total passenger number, aggr trip number, and non aggr trip number. If total trip number of a certain route is 1, aggr is 0. Otherwise, divided the total number of passengers by 6 given that 6 is the capacity of a taxi. Non aggr trip is 0 when there is aggr trip, otherwise, it is 1.

2.Calculate shared ride percentage across all areas

3 Average across all routes

4. ..

Overall, I defined hourly shared ride efficiency.

Hourly shared ride efficiency

From my calculation, I reported that over 15% of trips on Saturday after 4 pm to Sunday 4 am are shareable. Over 10% of trips on Wednesday morning during rush hour are shareable.

Using similar analysis used above, I also analyzed trips between Brooklyn and Manhattan.

Total Hourly Trip and Shared ride efficiency

From my definition and calculation of hourly shared ride efficiency, it seems that shared ride efficiency should be correlated to total number of trips. The more trips there are, the more likely shared routes exist.

I plotted hourly shared ride efficiency and total hourly trip number of each day, and find that in general, share ride efficiency changes as total hourly trip number changes. There are several peaks in hourly share ride efficiency (red). Sat 1 am, 2 am, 8 pm; Sun 1-2 am (prolonged peak at midnight) ; Wednesday 3 am and 9 am. These peaks may suggest that at certain time, passengers are more likely go to similar direction.

There are fewer trips from Brooklyn to Manhattan compared with trips within Brooklyn and the shared ride efficiency is quite low on Wednesday.  On weekend, shared ride efficiency is above 0.06 only after 7 pm Sat to early Sunday. There is a peak on Sunday afternoon to Manhattan.


Linear Correlated?

To understand the correlation between total hourly trip and hourly shared efficiency, I first calculated the pearson correlation coefficient from 3 days (sat,sun,wed), 2 direcition (bk-bk, bk-man), and reported  0.899934, suggesting they are quite highly correlated. Again we can see there are fewer number of Brooklyn to Manhattan trips.

From the above analysis, I realize hourly shared ride efficiency is related to day of the week, hour of the day, destination (bk or man), as well as the total hourly trip. I then built a linear regression model to fit the data and try to predict a hourly shared ride efficiency of any given time.

Sample size: 144  ( 3 days, 24 hour a day, 2 destination)

Features are ‘hour’, ‘to’, ‘day’

The label is  ‘hourly_aggr_efficiency’

  hourly_aggr_efficiency total_hourly_trip hour to day
0 0.183372 1694 0 0 1
1 0.192233 1545 1 0 1
2 0.134911 1310 2 0 1
3 0.149407 1045 3 0 1

Using reg.score(X_test,y_test), I reported 0.82 R^2 score, suggesting a good fit of the data.

Last but not the least, I used cross validation to show the comparison of predicted value v.s. actual test value. As we can see, at lower shared ride efficiency, the predicted value seems to overestimate while as shared ride efficiency increase, the model seems to underestimate the value. Overall, the regression model provides us a rough idea about the shareability of Broolyn Trips.


[1] L train service between Brooklyn and Manhattan to be shut down for 18 months starting in 2019,

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.