Transit Module 22: Harnessing Social Media and Big Data Technologies for Transit Business Intelligence

HTML of the PowerPoint Presentation

(Note: This document has been converted from a PowerPoint presentation to 508-compliant HTML. The formatting has been adjusted for 508 compliance, but all the original text content is included, plus additional text descriptions for the images, photos and/or diagrams have been provided below.)


Slide 1:

This slide contains a graphic with the word “Welcome” in large letters. ITS Training Standards “WELCOME” slide, with reference to the U.S. Department of Transportation Office of Assistant Secretary for Research and Technology

Slide 2:

This slide contains a graphic with the word “Welcome” in large letters, photo of Kenneth Leonard, Director ITS Joint Program Office - Ken.Leonard@dot.gov - and on the bottom is a screeshot of the ITS JPO website - www.its.dot.gov/pcb

Slide 3:

Module 22:

Harnessing Social Media & Big Data Technologies for Transit Business Intelligence

Title slide: This slide contains the title, Module 22: Harnessing Social Media & Big Data Technologies for Transit Business Intelligence, with a graphic of a globe covered in various symbols to indicate the different types of social media technologies that may be used for transit business intelligence. These symbols are representative of cell phones, SMS messages, WiFi communications, cloud storage, email, etc.

Slide 4:

Instructor

This slide, entitled “Instructor” has a photo of the first instructor, Susan Bregman, on the left-hand side. Beneath Susan’s name, there is text that indicates her position as Principal at Oak Square Resources, LLC.

Susan Bregman Principal

Oak Square Resources, LLC

Slide 5:

Instructor

This slide, also entitled “Instructor” has a photo of the second instructor, Manny Insignares, on the left-hand side. Beneath Manny’s name, there is text that indicates his position as the Vice President Technology at Consensus Systems Technologies.

Manny Insignares

Vice President Technology

Consensus Systems Technologies

Slide 6:

Learning Objectives

Slide 7:

Learning Objective 1

Slide 8:

Overview

Slide 9:

What is Business Intelligence?

Slide 10:

What are Potential Data Sources?

Slide 11:

How Can Business Intelligence Benefit Transit Operators?

Slide 12:

How Can Business Intelligence Benefit Transit Operators?

Example icon. Can be real-world (case study), hypothetical, a sample of a table, etc.

On the left-hand side of the slide, there is a graphic of a pie chart. The pie chart consists of a green portion that represents half of the pie. There are blue and orange portions, both representing the remaining two quarters of the pie.

GOAL

Improve customer satisfaction ACTIONS

Slide 13:

How Can Business Intelligence Benefit Transit Operators?

Example icon. Can be real-world (case study), hypothetical, a sample of a table, etc.

On the left-hand side of the slide, there is a picture of a transit bus. The words “West Route” displayed on the route indicator on the front of the bus.

GOAL

Improve service reliability for bus operations

ACTIONS

Slide 14:

How Can Business Intelligence Benefit Transit Operators?

Example icon. Can be real-world (case study), hypothetical, a sample of a table, etc.

On the left-hand side there is a picture of a transit light-rail train arriving at a station.

GOAL

Improve maintenance at rail stations ACTIONS

Slide 15:

How Can Business Intelligence Benefit Transit Operators?

Example icon. Can be real-world (case study), hypothetical, a sample of a table, etc.

On the left-hand side, there is a graphic of a bar graph. It is not representative of any information displayed in the slide, merely a graphic. The tallest bar to the left is green. The next in the middle is orange. The shortest bar, farthest to the right is red.

GOAL

Improve transparency in performance reporting.

ACTIONS

Slide 16:

Activity Placeholder: This slide has the word “Activity” in large letters at the top of the slide, with a graphic of a hand on a computer keyboard below it.

Slide 17:

Question

Which of the following is NOT a source of data for business intelligence?

Answer Choices

  1. Automatic passenger counters (APC)
  2. Social media posts
  3. Electronic fare collection systems (EFCS)
  4. None of the above

Slide 18:

Review of Answers

A small graphical red and yellow X representing incorrect.a) Automatic passenger counters (APC)
Incorrect. APC data can be analyzed to support agency decision-making.

A small graphical red and yellow X representing incorrect.b) Social media posts
Incorrect. Social media posts can be analyzed to support agency decision-making.

A small graphical red and yellow X representing incorrect.c) Electronic fare collection systems data (EFCS)
Incorrect. EFCS data can be analyzed to support agency decision-making.

A small graphical green and yellow check mark representing correct.d) None of the above
Correct! All the data sources listed can be used to support transit decision-making.

Slide 19:

Learning Objective 2

Slide 20:

Overview

Slide 21:

What is Social Media?

This slide is entitled, “What is Social Media?”. On the left side of the slide, there is a graphic of an cellular phone screen, displaying the contents of the “Social” category. The applications displayed within this screen are in 3 columns and 3 rows, 9 applications in total. The applications in the first row, from left to right are Facebook, LinkedIn, and Twitter. In Row 2, Facebook Messenger, Instagram, and Skype. In Row 3, Timehop and Hangout. The third application in this row has been cropped out of the picture.

Slide 22:

Taxonomy of Social Media

A collection of social media and networking application logos. The logos shown are YouTube, Twitter, Facebook, LinkedIn, and a Wi-Fi symbol.

Slide 23:

Taxonomy of Social Media

Social Networks

This slide is a continuation of the last, entitled, “Taxonomy of Social Media” with the subtitle, “Social Networks”. On the right side of the slide there are three social networking application graphics displayed from top to bottom. The first is Facebook (at the top), followed by Twitter, and LinkedIn (at the bottom).

Slide 24:

Taxonomy of Social Media

Media-Sharing Networks

This is also a continuation of the last, entitled, “Taxonomy of Social Media” with the subtitle, “Media-Sharing Networks”. On the right side of the slide there are three social media-sharing application graphics displayed from top to bottom. The first is Instagram (at the top), followed by YouTube, and Vimeo (at the bottom).

Slide 25:

Taxonomy of Social Media

Discussion Forums

This is also a continuation of the last, entitled, “Taxonomy of Social Media” with the subtitle, “Discussion Forums”. On the right side of the slide there are two discussion forum application graphics displayed from top to bottom. The first is Reddit (at the top), followed by Quora.

Slide 26:

Taxonomy of Social Media

Content Curation Platforms

This is also a continuation of the last, entitled, “Taxonomy of Social Media” with the subtitle, “Content Curation Platforms”. On the right side of the slide there are two content curation platform application graphics displayed from top to bottom. The first is Pinterest (at the top), followed by SlideShare.

Slide 27:

Taxonomy of Social Media

Consumer Review Networks

This is also a continuation of the last, entitled, “Taxonomy of Social Media” with the subtitle, “Consumer Review Networks”. On the right side of the slide there are two consumer review network application graphics displayed from top to bottom. The fist is Yelp (at the top), followed by TripAdvisor.

Slide 28:

Taxonomy of Social Media

Blogging and Publishing Networks

This is also a continuation of the last, entitled, “Taxonomy of Social Media” with the subtitle, “Blogging and Publishing Networks”. On the right side of the slide there are three blogging and publishing network application graphics displayed from top to bottom. The first is Tumblr (at the top), followed by Blogger, and Medium (at the bottom).

Slide 29:

Agency-Generated Social Media

Overview

At the bottom of this slide, there is a red warning graphic (a triangle with a white exclamation point). Beside this warning graphic is red rectangle with the following words within it: “Outbound communications typically do not support business intelligence activities”.

Slide 30:

Agency-Generated Social Media

Service Updates and Alerts

Slide 31:

Agency-Generated Social Media

Service Updates and Alerts

This slide is entitled, “Agency-Generated Social Media” with the subtitle, “Service Updates and Alerts”. There are screenshots of two tweets on this slide, one in the top, left corner and the second in the bottom, right corner. The first tweet is from the “MBTA Commuter Rail” account. The tweet reads “Kingston Train 038 (7:36 am from Kingston) is operating 5-15 minutes behind schedule between Abington and South Station due to a crossing gate issue.”. The screenshot indicates at this was posted at 8:06 AM on January 16, 2020. It has 1 Retweet and 2 Likes. The second tweet is from the “South Transit” account. The tweet reads “Elevator Alert – The Pioneer Square Station south end mezzanine to surface elevator is out of service”. Below this text is a link to this service alert. The link has the South Transit graphic on the right (a bus and three trains with a blue background) followed by the same text as the tweet with a link to content.govdelivery.com. The screenshot indicates that this tweet was posted at 10:23 AM on January 16, 2020.

Slide 32:

Agency-Generated Social Media

Emergency Communications

Slide 33:

Agency-Generated Social Media

COVID-19 Pandemic Communications

This slide is entitled, “Agency-Generated Social Media” with the subtitle, “COVID-19 Pandemic Communications”. This slide has another screenshot from Twitter on the right-hand slide. The tweet is posted by the “NYCT Subway. Stay Home. Stop the Spread” account. The tweet reads, “We’re operating subway service for essential trips only. If you must travel, be sure to: #1 Check MTA.info or MYmta to see how often your line is running. #2 Wear a face covering. #3 Allow extra time to get where you’re going. If you need help, @ or DM us. 24/7.” This text is followed by a graphic. The graphic has a black bar at the top with the title displayed in white text, “Safe Travels”. The remainder of the graphic is yellow with black text. At the top right corner beneath the title, it reads “Keep them covered”. This is followed by a graphic of a person with a face covering giving a thumbs-up. The text beside this person reads “Cover your nose and mouth with a mask or cloth when you ride”. At the bottom of this graphic in smaller font reads, “Stop the spread. Save lives.” This is followed by the MTA logo.

Slide 34:

Agency-Generated Social Media

Public Safety Communications

This slide is also entitled, “Agency-Generated Social Media” with the subtitle, “Public Safety Communications”. There is a screenshot of the MBTA Transit Police Facebook account, from the account handler’s perspective. The right side of the graphic displays a window with the account’s avatar (the MBTA Transit Police Badge Symbol). This is followed by a menu with the following tabs displayed: Home, About, Photos, Videos, Posts, and Community. The tab for “Posts” is highlighted as if selected and to the right is a post with a link to an article with a photo of suspect entitled, “ID wanted. Random assault at Alewife Station.” Beneath this is subtext that reads, “If you know the whereabouts or identity of this individual please contact our Criminal Investigations Unit at 617-222-1050. If you would…”. This post has 19 reactions, 3 comments, and 16 shares.

Slide 35:

Agency-Generated Social Media

Marketing Activities

Slide 36:

Agency-Generated Social Media

Marketing Activities

This slide is entitled, “Agency-Generated Social Media”. This slide displays a screenshot of the Metro Los Angeles Instagram account. There is a photo of two transit buses riding between orange cones. The caption reads “#tbt. Many years ago, RTD bus operators trained in the LA River. #transithistory #GoMetro”. The post has 3725 likes and 3 comments that are displayed. The first reads, “No Way”, with four likes. The second reads, “For a second, I thought it was behind the scenes shots of the movie Speed” with 12 likes, and 2 replies. The third reads, “I just got trained the…”.

Slide 37:

Agency-Generated Social Media

Customer Service

Slide 38:

Agency-Generated Social Media

Customer Service

This slide is entitled, “Agency-Generated Social Media” with the subtitle, “Customer Service”. This graphic shows a Twitter post from SEPTA’s customer service account to address a customer complaint in real time. The graphic illustrates transit agency use of social media for customer service. The customer complaint reads, “A little wind and @SEPTA just falls apart…My 45-minute commute home should not be almost 3 hours”, followed by an emoji of a sad face. SEPTA’s customer service account responds, “Sorry to hear this, Erin. Were you riding on the Paoli/Thorndale Line this evening? A downed tree cause issues on the line. Our apologies for the inconvenience.” The graphic indicates that this response was posted at 6:26 PM on January 16, 2020.

Slide 39:

Agency-Generated Social Media

Solicit Customer Feedback

Slide 40:

Agency-Generated Social Media

Solicit Customer Feedback

This slide is entitled, “Agency-Generated Social Media” with the subtitle, “Solicit Customer Feedback”. This graphic shows an Instagram post from Long Beach Transit seeking customer feedback. The photo shows an aerial shot of Long Beach with the words “Talk Transit with us” in bold, larger font followed by “Let us know how to make the UCLA/Westwood Commuter Express service better for you” in smaller font. The words “UCLA/Westwood Commuter Express” are in bold. The caption for this post reads, “Long Beach Transit is holding a public meeting to listen to your valuable feedback about our UCLA/Westwood Commuter Express. Join us Wednesday, January 29 at 6:30pm at the Skylinks. 4800 E Wardlow Rd, Long Beach, CA 90808. If you can’t make the meeting but would still like to give feedback, please email comments@lbt.com.” The graphic indicates that this post garnered 52 likes.

Slide 41:

Agency-Generated Social Media

General Agency Announcements

Slide 42:

Agency-Generated Social Media

General Agency Announcements

This slide is entitled, “Agency-Generated Social Media” with the subtitle, “General Agency Announcements”. There are two overlapping graphics on this slide, one on the left and one on the right. The graphic on left shows a Flickr image posted by the Metropolitan Transportation Authority that reads, “Pay your fare and watch cat videos with the same device”, followed by “OMNY. Just tap and go.” This text is in white font against a black background. The graphic on right is from DART Facebook page. DART shares agency press releases via Facebook. The screenshot of this post displays the Home page of the DART Facebook page with a post that reads “In 2019, Dallas Area Rapid Transit (DART) was far more than just the thing you ride. With 13 service area cities covering a 700 sq. mile area made up of 2.6 million citizens, more than 140 bus/shuttle routes, 11,000 bus stops, 14 On-Demand GoLink zones, 93 miles of light rail transit, 64 light rail stations, 5 commuter rail stations, and paratransit service for persons who are mobility impaired, DART continued to expand its mission to be your preferred choice of transportation for now and in the future”. This is followed a picture of a DART bus in a parking lot and a link to the news release.

Slide 43:

Customer-Generated Social Media

Overview

This slide is entitled, “Customer-Generated Social Media” with the subtitle, “Overview”. At the bottom of this slide, there is a red warning graphic (a triangle with a white exclamation point). Beside this warning graphic is red rectangle with the following words within it: “Organizations can use data mining techniques to analyze user-generated social media posts to support business intelligence activities”.

Slide 44:

Customer-Generated Social Media

Customer Questions

This slide is entitled, “Customer-Generated Social Media” with the subtitle, “Customer Questions”. There is a screenshot of a TTC Customer Service Twitter account responding to questions from riders. One rider tweets, “I’m on the 504 behind the streetcar that had the medical emergency. Person has been picked up by the paramedics. Just wondering why we’re not moving yet? I’ve been stuck here for over half an hour and late for work”, followed by an emoji of a worried face. TTC Customer Service responds, “The Op would have to be given the all clear by Rte Supervisor or the Mobile Supervisor on scene before they proceed. Depending on the situation they may be quickly inspecting the vehicle or taking witness info prior to giving the all clear.” The rider responds, “Got it, thank you”. TTC Customer service responds, “No problem at all”.

Slide 45:

Customer-Generated Social Media

Customer Complaints

This slide is entitled, “Customer-Generated Social Media” with the subtitle, “Customer Complaints”. There is a screenshot of MBTA handling complaints from riders over Twitter. One rider tweets, “Why is the park street escalator never operable for more than like 4 consecutive days at a time?”. MBTA responds, “Hi and thanks for reaching out Can you tell us which escalator you are referring to?”. The rider responds, “The one used to get out of the station from the eastbound green line arrivals”. MBTA responds, “Thank you. This escalator was being worked on earlier and should be back in service. We’ll follow up with Station Maintenance”.

Slide 46:

Customer-Generated Social Media

Customer Compliments

This slide is entitled, “Customer-Generated Social Media” with the subtitle, “Customer Compliments”. This graphic shows a Twitter conversation between a rider and a representative at TransLink. One rider tweets, “TransLink just announced on our train the track issue at Waterfront is fixed”. TransLink responds, “YES! We have just been updating the info everywhere. The track issue has cleared and service is returning to normal”. The rider responds, “Happy for you guys, you’ve had a rough week. Thanks for helping us all get around and thanks to the crews for working to get it all done!”. TransLink responds, “Thank you very much. I will pass along the praise to the rest of the team”.

Slide 47:

Crowdsourcing and Peer-to-Peer Communications

Overview

Slide 48:

Crowdsourcing and Peer-to-Peer Communications

Overview

There is a graphic of a cellular phone at the bottom, right corner of the slide. The iPhone is open to a transit application notification portal that reads “Service Alerts”.

Slide 49:

Activity Placeholder: This slide has the word “Activity” in large letters at the top of the slide, with a graphic of a hand on a computer keyboard below it.

Slide 50:

Question

Which of these is NOT a source of social media data for business intelligence?

Answer Choices

  1. Agency marketing posts
  2. Customer complaints
  3. Customer questions
  4. Peer-to-peer communications

Slide 51:

Review of Answers

A small graphical green and yellow check mark representing correct.a) Agency marketing posts
Correct! Marketing social media posts can generate goodwill for an agency, but they are not used to inform data-driven decisions.

A small graphical red and yellow X representing incorrect.b) Customer complaints
Incorrect. Customer complaints can provide valuable data.

A small graphical red and yellow X representing incorrect.c) Customer questions
Incorrect. Customer questions can provide valuable data.

A small graphical red and yellow X representing incorrect.d) Peer-to-peer communications
Incorrect. Peer-to-peer communications can provide valuable data.

Slide 52:

Learning Objective 3

Slide 53:

Overview

Slide 54:

What is Big Data?

This slide is entitled, “What is Big Data?”. There is a graphic on the left-hand side of the slide of a word cloud comprised of the words “big” and “data” intermixed in various, bright colors.

Slide 55:

Characteristics of Large Datasets

This slide is entitled, “Characteristics of Large Datasets”. There are a series of graphics on the left-hand side of the slide depicting the different characterizations of large datasets. The graphics representing the characteristic “variety” are a series of orange, rectangular boxes in a cascading arrangement. The following words are displayed within the boxes: Videos, Photos, GIS, Text, and Spreadsheets. The graphic representing the characteristic “volume” is an funnel shape with three orange circles funneling into it. The largest orange circle has the word, “Brontobytes” written in it. The next largest circle has the word, “Petabytes” written in it. The smallest circle has the word, “Terabytes” written in it. The graphic representing the characteristic “velocity” is an orange arrow, pointing to the left. The graphic of the arrow has two uneven, horizontal lines running parallel to it in order to make the image look like it is moving with some speed.

Large datasets are characterized by their variety, volume, and velocity

Slide 56:

Examples of the 3 Vs and Transit-Related Data

Data Description Variety Volume - Storage Velocity -Frequency of updates
Vehicle Location 100,000 trips per year Structured 3.6 GB per year 50 bytes per vehicle every 5 seconds
Schedule Data (e.g., SEPTA bus) Structured (GTFS) and compressed 21 MB Seasonal
Video from 300 Cameras Video 1.2 TB Streaming
Geographic Information Files (NJT Bus) Structured 40 MB Seasonal

Slide 57:

Transit-Related Big Data Includes Internal and External Sources

Slide 58:

Characteristics of Social Media Datasets

Slide 59:

Social Media Data Standardization Challenges

Standards may be emerging, but standardization is a challenge.

Slide 60:

International Efforts on Big Data Standards (1 of 2)

There is a graphic on the left-hand side of the ISO/IEC JTC 1 Information Technology Preliminary Report on Big Data (2014).

SDO/Consortium Interest Area
ISO/IEC JTC 1/SC 32 Data management and interchange, including database languages, multimedia object management, metadata management and e-Business.
ISO/IEC JTC 1/SC 38 Standardization for interoperable Distributed Application Platform and Services including Web Services, Service Oriented Architecture (SOA), and Cloud Computing.
ITU-T SG13 Cloud computing for Big Data.
W3C Web and Semantic related standards for markup, structure, query, semantics, and interchange.

Slide 61:

International Efforts on Big Data Standards (2 of 2)

There is a graphic on the left-hand side of the ISO/IEC JTC 1 Information Technology Preliminary Report on Big Data (2014).

SDO/ Consortium Interest Area
Open Geospatial Consortium Geospatial related standards for the specification, structure, query, and processing of location related data.
Organization for the Advancement of Structured Information Standards Information access and exchange.
Transaction Processing Performance Council Benchmarks for Big Data Systems.
TM Forum Enable enterprises, service providers and suppliers to continuously transform in order to succeed in the digital economy.

Slide 62:

Activity Placeholder: This slide has the word “Activity” in large letters at the top of the slide, with a graphic of a hand on a computer keyboard below it.

Slide 63:

Question

Which of the below is not one of the 3 V characteristics of big data?

Answer Choices

  1. Velocity
  2. Viscosity
  3. Variety
  4. Volume

Slide 64:

Review of Answers

A small graphical red and yellow X representing incorrect.a) Velocity
Incorrect. Velocity refers to the speed required to convert input data into output data.

A small graphical green and yellow check mark representing correct.b) Viscosity
Correct! Viscosity is not one of the 3 Vs of Big Data, but a useful measure for assessing the quality of maple syrup and ketchup.

A small graphical red and yellow X representing incorrect.c) Variety
Incorrect. Variety refers to the diversity and inconsistency in the structured and unstructured data present in Big Data.

A small graphical red and yellow X representing incorrect.d) Volume
Incorrect. Volume refers to the quantity of data and growth rate.

Slide 65:

Learning Objective 4

Slide 66:

Overview

Slide 67:

Data Acquisition

This slide is entitled, “Data Acquisition”. There are three graphics on the left-hand side of the slide. The first, at the top-left corner is a wide, yellow arrow pointing down. The arrow is filled with the icons of various social media and networking applications (Facebook, Skype, Twitter, LinkedIn, YouTube, etc.). The next graphic is repeated from Slide #56. This graphic is the funnel shape with three orange circles funneling into it. The largest orange circle has the word, “Brontobytes” written in it. The next largest circle has the word, “Petabytes” written in it. The smallest circle has the word, “Terabytes” written in it. The third graphic, in the lower-left corner depicts a lake with blue water, surrounded by green trees and a snowy mountain backdrop. There are numbers embedded within the lake, representing data. The words “Data Lake” are written across the image.

Slide 68:

Data Preparation

This slide is entitled, “Data Preparation”. There are two graphics on the left-hand side of the slide, one above the other. The top graphic has five boxes representing valid and invalid data. The first box is grey with a small red “X” at the top, left corner. The words in this box read “Missing Data”. The next box, slightly beneath and to the left of the first is also grey with a small red “X”. This second box reads “Outlier”. Beneath these boxes is a third box that is blue with a black check mark that reads “Valid Data”. Beneath that is a fourth grey box with a small red “X” that reads “Out of Range”. A fifth box beneath that is blue with a black check mark that reads “Valid Data”. The final box is beneath this and is grey with a red “X” that reads “Null Data”. The second graphic is a screenshot from a file directory on a computer that has green check marks next to valid data and red cross marks next to invalid data.

Slide 69:

Data Analysis

This slide is entitled “Data Analysis”. There are a series of graphics on the left-hand side representing various methods of data analytics. The first graphic is a Gantt chart with the title “What algorithms/analytic methods do you TYPICALLY use?”. The x-axis at the top ranges from 0-100%. This chart also uses a color scheme to further organize the usage of the varying methods. Dark green represents “Most of the time”. Light green represents “Often”. Yellow represents “Sometimes”. Red represents “Rarely”. The y-axis along the left side lists the algorithms/analytic methods, displayed roughly in order of greatest use to least. This order is as follows: Regression (used 90% of the time), Decision trees (83%), Cluster analysis (87%), Time series (75%), Text mining (65%), Ensemble models (57%), Factor analysis (67%), Neutral nets (66%), Random forests (53%), Association rules (64%), Bayesian (64%), Support vector machines (SVM) (56%), Anomaly detection (58%), Proprietary algorithms (45%), Rule induction (50%), Social network analysis (45%), Uplift modeling (43%), Survival analysis (44%), Link analysis (40%), Genetic algorithms (42%), and MARS (31%). The second graphic is a word cloud filled with terminology related to big data, with the largest words being “Big” in blue and “Data” in grey. The third is a general graphic histogram of a curve. The fourth graphic is a standard deviation graph representing the bell curve with a mean of 6 and three standard deviations from the mean of 0.5, 1, and 2. The fifth graphic is scatter plot of entitled, “Scatter Plot of Summer Temperatures”. This plot has a diagonal line dividing the positive and the negative bias, with the majority of the data points lying on the side of the negative bias.

Slide 70:

Data Presentation

This slide is entitled, “Data Presentation”. There are three graphics on the left-hand side of the slide and two graphics at the bottom. Graphic #1, in the top-left corner, depicts a transit agency dashboard. The dashboard is divided into four squares for Reliability, Ridership, Financials, and Customer Service. The Reliability square reads “How dependable is our service?” with feedback that reports that the Subway is 86% reliable, the Commuter Rail is 96% reliable, and a service called the “The RIDE” is 95% reliable. The Ridership square reads “How many trips are taken on MBTA services on an average weekday?” followed by the answer in large numbers “1.18 Million”. The Financials square asks “How are we tracking against our operating budget?” followed by a graphic comparing spending year to date versus budget year to date (data from January 2020), with the spending exceeding the budget to date by $10. The Customer Service square asks “How do riders rate the MBTA?” followed by a graphic that shows three out of five gold stars colored gold. Graphic #2, beneath Graphic #1, displays a satellite heat map over the city of Austin, with the words “Showing where you can get to in an hour from any stop”. Graphic #3, in the lower-left corner, is a graph comparing planning time and travel time index values for Los Angeles city-wide data from 2003. The x-axis represents time of day (weekdays, non-holidays only) and the y-axis represents index values. Both the planning time and travel time peak twice around 8 AM and 6 PM, with the planning time exceeding the travel time by roughly 5-6 index values. Between the plots for planning time and travel time, there is vertical line representing the difference in those index values, the buffer time. There is text beside the buffer time that says “Buffer between expected (avg.) and 95th percentile travel times”.

Graphic #4 is a color-coded graph comparing the various methods of payment for transit users over the course of two years. The y-axis represents the percentage of a method used. The x-axis begins at Quarter 4, 2013 and ends with Quarter 4, 2015. The payment methods used during this time span include the following: cash fare, FareSaver ticket books, U-Pass BC, Monthly pass, Day pass, Compass Card, Employer pass, Concession tickets, and Other. From Quarter 4, 2013 to Quarter 2, 2015 the methods used remain roughly the same (25% cash fare, 16% monthly pass, 27% FareSaver ticket books, 2% day pass, 5% concession tickets, 12% U-Pass BC, 4% Compass Card, and 7% Other). The use of Compass Card increases to 30% by Quarter 4, 2015 and after Quarter 4, 2013 no employee passes are used. Graphic #5 is in the bottom-right corner. It is a scatter plot (with lines) with the title, “Weekly Boarding Rides” and the subtitle “Bus and MAX”. The data is arranged by month, beginning with February at the left hand side to January at the right. The data displays that the MAX rides from both years 2018-2019 and 2019-2020 remain within the range of 700,000 to 800,000 rides with peaks in Summer (May – July) and Fall (October). The Bus rides from years 2018-2019 and 2019-2020 vary from 1,000,000 to 1,180,000. There are peaks in these data lines in the months of April, May, and October. There are falls in this data in the months July and a larger fall in the number of rides in December.

Slide 71:

Big Data Process Steps Summary

This slide is entitled, “Big Data Process Steps Summary”. There are four vertical columns containing an accumulation of graphics from the previous slides, with a large, blue arrow pointing to the right at the bottom of each of the columns of graphics. Column #1 contains the same graphics from the Data Acquisition slide (slide #68). The blue arrow beneath these graphics reads “Data Acquisition”. The second column contains the graphics from the Data Preparation slide (slide #69). Beneath this column, the blue arrow reads “Data Preparation”. The third column contains the graphics from the Data Analysis slide (slide #70). Beneath this column, the blue arrow reads “Data Analysis”. The fourth and final column contains some of the graphics from the Data Presentation slide (slide #71). These graphics included here are the agency dashboard, the heat map, and the plot of Weekly Boarding Rides. Beneath this column, the blue arrow reads “Data Presentation”.

Slide 72:

Data Presentation Example: MBTA Dashboard

This slide is entitled, “Data Presentation Example: MBTA Dashboard”. The left column of this slide contains two graphics regarding the average reliability of subway line “C”. There is a graphic of the MBTA agency dashboard that tracks the reliability over the period of one month. In large, black font the graphic reads “80%” for “December 1, 2019”, “80%” for “Past 7 Days”, and “79%” for “Past 30 Days”. Beneath this data is a plot of the transit reliability. The y-axis is the reliability percentage and the x-axis is a time period spanning from November 25th to December 1st. A target value of 90% is displayed on the plot by a solid, horizontal line at the 90% mark. The subway average is also displayed on the plot in a light, grey color. This line largely follows the target of 90% with minor dips of 1% on Nov. 26th , 29th, and Dec. 1st. This subway average also deviates from the target value with a minor peak of 2% on Nov. 28th. The reliability of the subway line C falls below the target value at a 79% until Nov. 27th where it rises to 84% on Nov. 28th and dips to 77% on Nov. 29th. It then rises again to 84% by Nov. 30th only to drop back off to 80% on Dec. 1st.

Supplement icon indicating items or information that are further explained/detailed in the Student Supplement.

Slide 73:

Data Presentation Example: busstat.nyc

This slide is entitled, “Data Presentation Example: busstat.nyc”. There are two graphics in the left column regarding travel time. The first graphic is a plot entitled, “Cumulative Travel Time Across Stops”. The y-axis represents the minutes of travel time in 20-minute increments. The x-axis is representative of stops (in increments of 5). On this plot are the scheduled and actual data. The scheduled data is a dotted, grey line on the plot that forms a diagonal line with a slope of 2. The actual data is a solid, purple line forms a diagonal line with a slope of 2.2, roughly following that of the scheduled data. Beneath this plot are three grey squares containing travel time data. The first reads “Excess Wait Time” in grey, followed by “2.0 mins” in orange. The second reads “Route Lateness Factor” in grey followed by “39.4%” in orange. The third and final square reads “Average Speed” in grey followed by “8.0 mph” in green.

Supplement icon indicating items or information that are further explained/detailed in the Student Supplement.

Slide 74:

Other Issues

Policy Issues

Slide 75:

Other Issues

Technical Issues

Slide 76:

Activity Placeholder: This slide has the word “Activity” in large letters at the top of the slide, with a graphic of a hand on a computer keyboard below it.

Slide 77:

Question

Which of the below is not a step described in Big Data processing?

Answer Choices

  1. Data Preparation
  2. Data Field Quantization
  3. Data Analysis
  4. Data Acquisition

Slide 78:

Review of Answers

A small graphical red and yellow X representing incorrect.a) Data Preparation
Incorrect. Data preparation is the step of removing data that is incomplete, incorrect, and/or out of range from analysis.

A small graphical green and yellow check mark representing correct.b) Data Field Quantization
Correct! Data field quantization evaluates elements of the General Relativity Theory to prove gravity exists, and is the basis for the general rule that buses will roll instead of fly.

A small graphical red and yellow X representing incorrect.c) Data Analysis
Incorrect. Data analysis is the interpretation of relationships between data to gain insights about a problem or solutions.

A small graphical red and yellow X representing incorrect.d) Data Presentation
Incorrect. Data presentation is the process of using the results of analysis to make a case or explanation about data.

Slide 79:

Learning Objective 5

Slide 80:

Overview

Working with Social Media

Slide 81:

Overview

Example icon. Can be real-world (case study), hypothetical, a sample of a table, etc.

Slide 82:

Chicago Transit Authority

This slide is entitled, “Chicago Transit Authority”. The left-hand column contains a graphic of subway destination stickers. The stickers are ordered and colored as follows (from top to bottom): Midway (in yellow, with a plane next to it), Harlem (in green), Linden (in black), and 54th / Cermak (in red).

Measuring Customer Sentiment

Slide 83:

Chicago Transit Authority

Negative tweets spiked at 9 AM on July 23, 2011.

Author’s relevent description: This slide is a continuation of the last, entitled, “Chicago Transit Authority”. There are three different types of graphs displayed on this slide, composed from data extracted from negative tweets. The first graph in the top, left corner shows the total tweets and the sentiment strength by time of day for July 23, 2011. All tweets and negative sentiment spiked at 9 AM in response to service delays. The second graph in the top, right corner shows total tweets and the normalized sentiment strength by time of day for July 23, 2011. The third graph, centered beneath the first two, is a bar graph displaying the total number of tweets by tie of day for July 23, 2011.

Slide 84:

Chicago Transit Authority

A tag cloud confirmed customer communication around 9 AM about delays on the Red and Blue Lines because of flooding.

This slide is also entitled, “Chicago Transit Authority”. There is a graphic of a word cloud on the right half of this slide. The word cloud is was created from analysis after a flood caused delays on the Red and Blue lines. The largest words among the cloud include red, flooded, blue, train, 103rd street, soon, running, green, even, etc.

Slide 85:

San Diego MTS

Please see extended text description below.

(Extended Text Description: This slide is entitled, "San Diego MTS" with the subtitle, "Combat Fare Evasion." This slide has a background of a red San Diego MTS bus approaching a stop. There are green palm trees near a traffic signal where the bus is stopped. A pedestrian is crossing. Overlaid on top are the following bullet items:

Slide 86:

San Diego MTS

Combat Fare Evasion

Slide 87:

Transport for London

This slide is entitled, “Transport for London”. In the left-hand column there is a photo of Big Ben in London from the ground. Also captured in this photo is a sign for the a Transport for London stop, a red circular sign with a rectangular black bar through the center that reads, “Underground”.

Optimizing Advertising

Slide 88:

Metro Transit

This slide is entitled, “Metro Transit”. There is a graphic on the left-hand side of a transit bus stopped at a stop sign in the rain.

Locating Bus Shelters

Slide 89:

Metro Transit

This slide is entitled, “Metro Transit”. There is a graphic on the left-hand side of a transit bus stopped at a stop sign in the rain.

Locating Bus Shelters

Slide 90:

Activity Placeholder: This slide has the word “Activity” in large letters at the top of the slide, with a graphic of a hand on a computer keyboard below it.

Slide 91:

Question

Based on these examples, analyzing social media data helped inform agency decisions about which of the following?

Answer Choices

  1. Where to upgrade bus shelters
  2. How to understand customer sentiment
  3. Where to add fare enforcement
  4. How to report non-fare revenues

Slide 92:

Review of Answers

A small graphical red and yellow X representing incorrect.a) Where to upgrade bus shelters
Incorrect. Agency did not consider social media posts.

A small graphical green and yellow check mark representing correct.b) How to understand customer sentiment
Correct! Researchers analyzed social media posts to assess CTA customer sentiment.

A small graphical red and yellow X representing incorrect.c) Where to add fare enforcement
Incorrect. Agency did not use social media to solve problem.

A small graphical red and yellow X representing incorrect.d) How to report non-fare revenues
Incorrect. None of the examples focused on non-fare revenues. Social media is not a source of this data.

Slide 93:

Module Summary

Slide 94:

Thank you for completing this module.

Feedback
Please use the Feedback link below to provide us with your thoughts and comments about the value of the training.

Thank you!

↑ Return to top