Transit Module 22: Harnessing Social Media and Big Data Technologies for Transit Business Intelligence
HTML of the PowerPoint Presentation
(Note: This document has been converted from a PowerPoint presentation to 508-compliant HTML. The formatting has been adjusted for 508 compliance, but all the original text content is included, plus additional text descriptions for the images, photos and/or diagrams have been provided below.)
Slide 1:
![This slide contains a graphic with the word “Welcome” in large letters. ITS Training Standards “WELCOME” slide, with reference to the U.S. Department of Transportation Office of Assistant Secretary for Research and Technology](mod22ppt1.jpg)
Slide 2:
![This slide contains a graphic with the word “Welcome” in large letters, photo of Kenneth Leonard, Director ITS Joint Program Office - Ken.Leonard@dot.gov - and on the bottom is a screeshot of the ITS JPO website - www.its.dot.gov/pcb](mod22ppt2.jpg)
Slide 3:
Module 22:
Harnessing Social Media & Big Data Technologies for Transit Business Intelligence
![Title slide: This slide contains the title, Module 22: Harnessing Social Media & Big Data Technologies for Transit Business Intelligence, with a graphic of a globe covered in various symbols to indicate the different types of social media technologies that may be used for transit business intelligence. These symbols are representative of cell phones, SMS messages, WiFi communications, cloud storage, email, etc.](mod22ppt3.jpg)
Slide 4:
Instructor
![This slide, entitled “Instructor” has a photo of the first instructor, Susan Bregman, on the left-hand side. Beneath Susan’s name, there is text that indicates her position as Principal at Oak Square Resources, LLC.](mod22ppt4.jpg)
Susan Bregman Principal
Oak Square Resources, LLC
Slide 5:
Instructor
![This slide, also entitled “Instructor” has a photo of the second instructor, Manny Insignares, on the left-hand side. Beneath Manny’s name, there is text that indicates his position as the Vice President Technology at Consensus Systems Technologies.](mod22ppt5.jpg)
Manny Insignares
Vice President Technology
Consensus Systems Technologies
Slide 6:
Learning Objectives
- Define how transit providers use business intelligence
- Define social media platforms and their applications to public transportation
- Define big data in relation to social media and transit
- Understand the process for applying big data analytics to social media to inform transit business intelligence
- Incorporate findings to support business intelligence with data-driven decisions
Slide 7:
Learning Objective 1
- Define how transit providers use business intelligence
Slide 8:
Overview
- What is business intelligence?
- What are potential data sources?
- How can business intelligence benefit transit operators?
Slide 9:
What is Business Intelligence?
-
Combines information from multiple sources to support data-driven decisions
- Quantitative - Ridership, fare revenue, mileage
- Qualitative - Focus groups, interviews, social media
-
Integrates data from internal and external sources
- Internal - Automated vehicle location systems, customer panels
- External - Social media posts, Census data
- Enables organizations to evaluate progress in achieving goals
- Supports internal decision-making
Slide 10:
What are Potential Data Sources?
-
Qualitative data (agency-generated)
- Customer surveys and panels
- Focus groups and stakeholder interviews
-
Quantitative data (agency-generated)
- Automatic passenger counting data (APC)
- Automated vehicle location data (AVL)
- General Transit Feed Specification files (GTFS and GTFS-rt)
- Electronic fare payment system datasets (EFPS)
-
External data sources
- Social media posts
- Census files and other public datasets
Slide 11:
How Can Business Intelligence Benefit Transit Operators?
- Meet mandated reporting requirements
- Provide greater transparency in reporting to internal and external audiences
- Provide input for planning, operations, and capital investments
- Support briefings for senior staff and board of directors
Slide 12:
How Can Business Intelligence Benefit Transit Operators?
![Example icon. Can be real-world (case study), hypothetical, a sample of a table, etc.](mod22pptimexample.jpg)
![On the left-hand side of the slide, there is a graphic of a pie chart. The pie chart consists of a green portion that represents half of the pie. There are blue and orange portions, both representing the remaining two quarters of the pie.](mod22ppt7.jpg)
GOAL
Improve customer satisfaction ACTIONS
- Conduct surveys and focus groups.
- Establish online customer panel.
- Analyze social media posts to understand customer sentiment.
Slide 13:
How Can Business Intelligence Benefit Transit Operators?
![Example icon. Can be real-world (case study), hypothetical, a sample of a table, etc.](mod22pptimexample.jpg)
![On the left-hand side of the slide, there is a picture of a transit bus. The words “West Route” displayed on the route indicator on the front of the bus.](mod22ppt9.jpg)
GOAL
Improve service reliability for bus operations
ACTIONS
- Review internal data on on-time performance and travel time.
- Examine social media posts to identify specific locations where bus routes are prone to delay.
Slide 14:
How Can Business Intelligence Benefit Transit Operators?
![Example icon. Can be real-world (case study), hypothetical, a sample of a table, etc.](mod22pptimexample.jpg)
![On the left-hand side there is a picture of a transit light-rail train arriving at a station.](mod22ppt11.jpg)
GOAL
Improve maintenance at rail stations ACTIONS
- Review internal maintenance records.
- Analyze social media posts to identify issues on specific vehicles or stations.
- Encourage customers to report issues via social media (e.g., broken lights, overflowing trash, disabled ticketing machines).
Slide 15:
How Can Business Intelligence Benefit Transit Operators?
![Example icon. Can be real-world (case study), hypothetical, a sample of a table, etc.](mod22pptimexample.jpg)
![On the left-hand side, there is a graphic of a bar graph. It is not representative of any information displayed in the slide, merely a graphic. The tallest bar to the left is green. The next in the middle is orange. The shortest bar, farthest to the right is red.](mod22ppt13.jpg)
GOAL
Improve transparency in performance reporting.
ACTIONS
- Develop key performance indicators (KPI) from available data sources.
- Report KPIs via online performance dashboard.
Slide 16:
![Activity Placeholder: This slide has the word “Activity” in large letters at the top of the slide, with a graphic of a hand on a computer keyboard below it.](mod22ppt14.jpg)
Slide 17:
Question
Which of the following is NOT a source of data for business intelligence?
Answer Choices
- Automatic passenger counters (APC)
- Social media posts
- Electronic fare collection systems (EFCS)
- None of the above
Slide 18:
Review of Answers
a) Automatic passenger counters (APC)
Incorrect. APC data can be analyzed to support agency decision-making.
b) Social media posts
Incorrect. Social media posts can be analyzed to support agency decision-making.
c) Electronic fare collection systems data (EFCS)
Incorrect. EFCS data can be analyzed to support agency decision-making.
d) None of the above
Correct! All the data sources listed can be used to support transit decision-making.
Slide 19:
Learning Objective 2
- Define social media platforms and their applications to public transportation
Slide 20:
Overview
- What is social media?
- Taxonomy of social media platforms
- Use of social media by transit operators for agency-generated information
- Use of social media by transit customers and stakeholders for user-generated information
- Use of crowdsourcing and peer-to-peer platforms for sharing communication about transit
Slide 21:
What is Social Media?
![This slide is entitled, “What is Social Media?”. On the left side of the slide, there is a graphic of an cellular phone screen, displaying the contents of the “Social” category. The applications displayed within this screen are in 3 columns and 3 rows, 9 applications in total. The applications in the first row, from left to right are Facebook, LinkedIn, and Twitter. In Row 2, Facebook Messenger, Instagram, and Skype. In Row 3, Timehop and Hangout. The third application in this row has been cropped out of the picture.](mod22ppt16.jpg)
- Social media platforms are web-based or mobile applications that encourage users to interact with (and often influence) one another in real time.
- Social media, also called social networking, includes different types of applications.
- Platforms are mostly owned by private companies with proprietary formats and are not consistently regulated.
- Social media posts can share information (and misinformation).
- Social media is still evolving, and platforms continue to change.
Slide 22:
Taxonomy of Social Media
- Social networks
- Media sharing networks
- Discussion forums
- Content curation
- Consumer review networks
- Blogging and publishing networks
![A collection of social media and networking application logos. The logos shown are YouTube, Twitter, Facebook, LinkedIn, and a Wi-Fi symbol.](mod22ppt17.jpg)
Slide 23:
Taxonomy of Social Media
Social Networks
- Connect with other people online
- Share information, comments, and media
- Personal and professional networks
![This slide is a continuation of the last, entitled, “Taxonomy of Social Media” with the subtitle, “Social Networks”. On the right side of the slide there are three social networking application graphics displayed from top to bottom. The first is Facebook (at the top), followed by Twitter, and LinkedIn (at the bottom).](mod22ppt18.jpg)
Slide 24:
Taxonomy of Social Media
Media-Sharing Networks
- Share images, videos, and other types of media with others.
- Offer comments and other forms of feedback.
![This is also a continuation of the last, entitled, “Taxonomy of Social Media” with the subtitle, “Media-Sharing Networks”. On the right side of the slide there are three social media-sharing application graphics displayed from top to bottom. The first is Instagram (at the top), followed by YouTube, and Vimeo (at the bottom).](mod22ppt19.jpg)
Slide 25:
Taxonomy of Social Media
Discussion Forums
- Platforms serve as discussion boards
- Users can ask and answer questions, share information, and participate in discussions
![This is also a continuation of the last, entitled, “Taxonomy of Social Media” with the subtitle, “Discussion Forums”. On the right side of the slide there are two discussion forum application graphics displayed from top to bottom. The first is Reddit (at the top), followed by Quora.](mod22ppt20.jpg)
Slide 26:
Taxonomy of Social Media
Content Curation Platforms
- Identify and share content from multiple sources
- Content types include photographs, graphics, videos, presentations, and text
![This is also a continuation of the last, entitled, “Taxonomy of Social Media” with the subtitle, “Content Curation Platforms”. On the right side of the slide there are two content curation platform application graphics displayed from top to bottom. The first is Pinterest (at the top), followed by SlideShare.](mod22ppt21.jpg)
Slide 27:
Taxonomy of Social Media
Consumer Review Networks
- Generate reviews and share opinions about goods and services.
- Most consumer websites also include customer reviews (e.g., Amazon).
![This is also a continuation of the last, entitled, “Taxonomy of Social Media” with the subtitle, “Consumer Review Networks”. On the right side of the slide there are two consumer review network application graphics displayed from top to bottom. The fist is Yelp (at the top), followed by TripAdvisor.](mod22ppt22.jpg)
Slide 28:
Taxonomy of Social Media
Blogging and Publishing Networks
- Create content on user-defined topics.
- Posts are typically longer than most social networking sites.
- Organizations may use platforms to share news.
![This is also a continuation of the last, entitled, “Taxonomy of Social Media” with the subtitle, “Blogging and Publishing Networks”. On the right side of the slide there are three blogging and publishing network application graphics displayed from top to bottom. The first is Tumblr (at the top), followed by Blogger, and Medium (at the bottom).](mod22ppt23.jpg)
Slide 29:
Agency-Generated Social Media
Overview
-
Most transit operators use social media for outbound communications.
- Service updates and alerts
- Emergency communications
- Marketing activities
- Customer service
- Solicit customer feedback
- General agency communications
- Audiences may include riders, stakeholders, media, first responders, public officials, and community members.
![At the bottom of this slide, there is a red warning graphic (a triangle with a white exclamation point). Beside this warning graphic is red rectangle with the following words within it: “Outbound communications typically do not support business intelligence activities”.](mod22ppt24.jpg)
Slide 30:
Agency-Generated Social Media
Service Updates and Alerts
-
Notify customers about service changes
- Provide information about traffic delays and construction impacts
- Provide details about service during special events
- Twitter is especially well-suited for real-time alerts
Slide 31:
Agency-Generated Social Media
Service Updates and Alerts
![This slide is entitled, “Agency-Generated Social Media” with the subtitle, “Service Updates and Alerts”. There are screenshots of two tweets on this slide, one in the top, left corner and the second in the bottom, right corner. The first tweet is from the “MBTA Commuter Rail” account. The tweet reads “Kingston Train 038 (7:36 am from Kingston) is operating 5-15 minutes behind schedule between Abington and South Station due to a crossing gate issue.”. The screenshot indicates at this was posted at 8:06 AM on January 16, 2020. It has 1 Retweet and 2 Likes. The second tweet is from the “South Transit” account. The tweet reads “Elevator Alert – The Pioneer Square Station south end mezzanine to surface elevator is out of service”. Below this text is a link to this service alert. The link has the South Transit graphic on the right (a bus and three trains with a blue background) followed by the same text as the tweet with a link to content.govdelivery.com. The screenshot indicates that this tweet was posted at 10:23 AM on January 16, 2020.](mod22ppt25.jpg)
Slide 32:
Agency-Generated Social Media
Emergency Communications
- Use social media to communicate during health emergencies, weather events, and natural disasters (e.g., COVID-19, hurricanes, earthquakes).
- Use social media to share public safety information (e.g., Amber alerts, criminal activity).
- Twitter is especially well-suited for real-time alerts.
Slide 33:
Agency-Generated Social Media
COVID-19 Pandemic Communications
![This slide is entitled, “Agency-Generated Social Media” with the subtitle, “COVID-19 Pandemic Communications”. This slide has another screenshot from Twitter on the right-hand slide. The tweet is posted by the “NYCT Subway. Stay Home. Stop the Spread” account. The tweet reads, “We’re operating subway service for essential trips only. If you must travel, be sure to: #1 Check MTA.info or MYmta to see how often your line is running. #2 Wear a face covering. #3 Allow extra time to get where you’re going. If you need help, @ or DM us. 24/7.” This text is followed by a graphic. The graphic has a black bar at the top with the title displayed in white text, “Safe Travels”. The remainder of the graphic is yellow with black text. At the top right corner beneath the title, it reads “Keep them covered”. This is followed by a graphic of a person with a face covering giving a thumbs-up. The text beside this person reads “Cover your nose and mouth with a mask or cloth when you ride”. At the bottom of this graphic in smaller font reads, “Stop the spread. Save lives.” This is followed by the MTA logo.](mod22ppt26.jpg)
Slide 34:
Agency-Generated Social Media
Public Safety Communications
![This slide is also entitled, “Agency-Generated Social Media” with the subtitle, “Public Safety Communications”. There is a screenshot of the MBTA Transit Police Facebook account, from the account handler’s perspective. The right side of the graphic displays a window with the account’s avatar (the MBTA Transit Police Badge Symbol). This is followed by a menu with the following tabs displayed: Home, About, Photos, Videos, Posts, and Community. The tab for “Posts” is highlighted as if selected and to the right is a post with a link to an article with a photo of suspect entitled, “ID wanted. Random assault at Alewife Station.” Beneath this is subtext that reads, “If you know the whereabouts or identity of this individual please contact our Criminal Investigations Unit at 617-222-1050. If you would…”. This post has 19 reactions, 3 comments, and 16 shares.](mod22ppt27.jpg)
Slide 35:
Agency-Generated Social Media
Marketing Activities
- Social media can help agencies create an image or identity.
- Media-sharing and blogging platforms are a good match for these posts.
Slide 36:
Agency-Generated Social Media
Marketing Activities
![This slide is entitled, “Agency-Generated Social Media”. This slide displays a screenshot of the Metro Los Angeles Instagram account. There is a photo of two transit buses riding between orange cones. The caption reads “#tbt. Many years ago, RTD bus operators trained in the LA River. #transithistory #GoMetro”. The post has 3725 likes and 3 comments that are displayed. The first reads, “No Way”, with four likes. The second reads, “For a second, I thought it was behind the scenes shots of the movie Speed” with 12 likes, and 2 replies. The third reads, “I just got trained the…”.](mod22ppt28.jpg)
Slide 37:
Agency-Generated Social Media
Customer Service
- Provide real-time customer service.
- Address customer comments and complaints.
Slide 38:
Agency-Generated Social Media
Customer Service
![This slide is entitled, “Agency-Generated Social Media” with the subtitle, “Customer Service”. This graphic shows a Twitter post from SEPTA’s customer service account to address a customer complaint in real time. The graphic illustrates transit agency use of social media for customer service. The customer complaint reads, “A little wind and @SEPTA just falls apart…My 45-minute commute home should not be almost 3 hours”, followed by an emoji of a sad face. SEPTA’s customer service account responds, “Sorry to hear this, Erin. Were you riding on the Paoli/Thorndale Line this evening? A downed tree cause issues on the line. Our apologies for the inconvenience.” The graphic indicates that this response was posted at 6:26 PM on January 16, 2020.](mod22ppt29.jpg)
Slide 39:
Agency-Generated Social Media
Solicit Customer Feedback
- Use social media to reach out to customers.
- Seek feedback on projects or programs.
Slide 40:
Agency-Generated Social Media
Solicit Customer Feedback
![This slide is entitled, “Agency-Generated Social Media” with the subtitle, “Solicit Customer Feedback”. This graphic shows an Instagram post from Long Beach Transit seeking customer feedback. The photo shows an aerial shot of Long Beach with the words “Talk Transit with us” in bold, larger font followed by “Let us know how to make the UCLA/Westwood Commuter Express service better for you” in smaller font. The words “UCLA/Westwood Commuter Express” are in bold. The caption for this post reads, “Long Beach Transit is holding a public meeting to listen to your valuable feedback about our UCLA/Westwood Commuter Express. Join us Wednesday, January 29 at 6:30pm at the Skylinks. 4800 E Wardlow Rd, Long Beach, CA 90808. If you can’t make the meeting but would still like to give feedback, please email comments@lbt.com.” The graphic indicates that this post garnered 52 likes.](mod22ppt30.jpg)
Slide 41:
Agency-Generated Social Media
General Agency Announcements
- Share agency information
- Job listings
- Press releases
- Social posts can complement - but should not replace - traditional communications channels.
Slide 42:
Agency-Generated Social Media
General Agency Announcements
![This slide is entitled, “Agency-Generated Social Media” with the subtitle, “General Agency Announcements”. There are two overlapping graphics on this slide, one on the left and one on the right. The graphic on left shows a Flickr image posted by the Metropolitan Transportation Authority that reads, “Pay your fare and watch cat videos with the same device”, followed by “OMNY. Just tap and go.” This text is in white font against a black background. The graphic on right is from DART Facebook page. DART shares agency press releases via Facebook. The screenshot of this post displays the Home page of the DART Facebook page with a post that reads “In 2019, Dallas Area Rapid Transit (DART) was far more than just the thing you ride. With 13 service area cities covering a 700 sq. mile area made up of 2.6 million citizens, more than 140 bus/shuttle routes, 11,000 bus stops, 14 On-Demand GoLink zones, 93 miles of light rail transit, 64 light rail stations, 5 commuter rail stations, and paratransit service for persons who are mobility impaired, DART continued to expand its mission to be your preferred choice of transportation for now and in the future”. This is followed a picture of a DART bus in a parking lot and a link to the news release.](mod22ppt31.jpg)
Slide 43:
Customer-Generated Social Media
Overview
- Social media posts from transit customers, stakeholders, and others can provide unfiltered feedback
-
User-generated posts typically include the following
- Questions (e.g., where is the bus? what is the fare?)
- Complaints (e.g., service, maintenance, safety, security)
- Compliments (e.g., operator commendations)
- These inbound communications can be generated by riders, stakeholders, and community members and shared widely.
![This slide is entitled, “Customer-Generated Social Media” with the subtitle, “Overview”. At the bottom of this slide, there is a red warning graphic (a triangle with a white exclamation point). Beside this warning graphic is red rectangle with the following words within it: “Organizations can use data mining techniques to analyze user-generated social media posts to support business intelligence activities”.](mod22ppt32.jpg)
Slide 44:
Customer-Generated Social Media
Customer Questions
![This slide is entitled, “Customer-Generated Social Media” with the subtitle, “Customer Questions”. There is a screenshot of a TTC Customer Service Twitter account responding to questions from riders. One rider tweets, “I’m on the 504 behind the streetcar that had the medical emergency. Person has been picked up by the paramedics. Just wondering why we’re not moving yet? I’ve been stuck here for over half an hour and late for work”, followed by an emoji of a worried face. TTC Customer Service responds, “The Op would have to be given the all clear by Rte Supervisor or the Mobile Supervisor on scene before they proceed. Depending on the situation they may be quickly inspecting the vehicle or taking witness info prior to giving the all clear.” The rider responds, “Got it, thank you”. TTC Customer service responds, “No problem at all”.](mod22ppt33.jpg)
Slide 45:
Customer-Generated Social Media
Customer Complaints
![This slide is entitled, “Customer-Generated Social Media” with the subtitle, “Customer Complaints”. There is a screenshot of MBTA handling complaints from riders over Twitter. One rider tweets, “Why is the park street escalator never operable for more than like 4 consecutive days at a time?”. MBTA responds, “Hi and thanks for reaching out Can you tell us which escalator you are referring to?”. The rider responds, “The one used to get out of the station from the eastbound green line arrivals”. MBTA responds, “Thank you. This escalator was being worked on earlier and should be back in service. We’ll follow up with Station Maintenance”.](mod22ppt34.jpg)
Slide 46:
Customer-Generated Social Media
Customer Compliments
![This slide is entitled, “Customer-Generated Social Media” with the subtitle, “Customer Compliments”. This graphic shows a Twitter conversation between a rider and a representative at TransLink. One rider tweets, “TransLink just announced on our train the track issue at Waterfront is fixed”. TransLink responds, “YES! We have just been updating the info everywhere. The track issue has cleared and service is returning to normal”. The rider responds, “Happy for you guys, you’ve had a rough week. Thanks for helping us all get around and thanks to the crews for working to get it all done!”. TransLink responds, “Thank you very much. I will pass along the praise to the rest of the team”.](mod22ppt35.jpg)
Slide 47:
Crowdsourcing and Peer-to-Peer Communications
Overview
- Crowdsourcing solicits ideas and feedback on a specific topic from a large group of people via the Internet.
- Some mobile applications create a platform for subscribers to share information with one another.
Slide 48:
Crowdsourcing and Peer-to-Peer Communications
Overview
- Crowdsourcing solicits ideas and feedback on a specific topic from a large group of people via the Internet.
- Some mobile applications create a platform for subscribers to share information with one another.
-
Examples include:
- Transit - Mobile app complements real-time data feeds with crowdsourced info
- Pigeon - Google app for crowdsourced info
- Clever Commute - Mobile app for sharing customer info for NJ Transit, LIRR, MNR services
![There is a graphic of a cellular phone at the bottom, right corner of the slide. The iPhone is open to a transit application notification portal that reads “Service Alerts”.](mod22ppt36.jpg)
Slide 49:
![Activity Placeholder: This slide has the word “Activity” in large letters at the top of the slide, with a graphic of a hand on a computer keyboard below it.](mod22ppt37.jpg)
Slide 50:
Question
Which of these is NOT a source of social media data for business intelligence?
Answer Choices
- Agency marketing posts
- Customer complaints
- Customer questions
- Peer-to-peer communications
Slide 51:
Review of Answers
a) Agency marketing posts
Correct! Marketing social media posts can generate goodwill for an agency, but they are not used to inform data-driven decisions.
b) Customer complaints
Incorrect. Customer complaints can provide valuable data.
c) Customer questions
Incorrect. Customer questions can provide valuable data.
d) Peer-to-peer communications
Incorrect. Peer-to-peer communications can provide valuable data.
Slide 52:
Learning Objective 3
- Define big data in relation to social media transit
Slide 53:
Overview
- What is big data?
- Large datasets characterized by variety, volume, and velocity
- Sources of transit-related big data include internal and external data sources
- Characteristics of social media datasets
- Social media data standards are emerging
Slide 54:
What is Big Data?
![This slide is entitled, “What is Big Data?”. There is a graphic on the left-hand side of the slide of a word cloud comprised of the words “big” and “data” intermixed in various, bright colors.](mod22ppt39.jpg)
-
Large volume of data
- Structured data
- Unstructured data
- Difficult to process with traditional database and software techniques
Slide 55:
Characteristics of Large Datasets
![This slide is entitled, “Characteristics of Large Datasets”. There are a series of graphics on the left-hand side of the slide depicting the different characterizations of large datasets. The graphics representing the characteristic “variety” are a series of orange, rectangular boxes in a cascading arrangement. The following words are displayed within the boxes: Videos, Photos, GIS, Text, and Spreadsheets. The graphic representing the characteristic “volume” is an funnel shape with three orange circles funneling into it. The largest orange circle has the word, “Brontobytes” written in it. The next largest circle has the word, “Petabytes” written in it. The smallest circle has the word, “Terabytes” written in it. The graphic representing the characteristic “velocity” is an orange arrow, pointing to the left. The graphic of the arrow has two uneven, horizontal lines running parallel to it in order to make the image look like it is moving with some speed.](mod22ppt40.jpg)
Large datasets are characterized by their variety, volume, and velocity
-
Variety
- Multiple sources
- Multiple formats: text, photo, video, PDF, database, CSV, spreadsheets
- Structured and unstructured
-
Volume
- Terabytes (1012)
- Petabytes (1015)
- Brontobytes (1027) and upwards
-
Velocity
- Speed required to convert inputs into outputs
- Streaming, which is continuous conversion from inputs to outputs
Slide 56:
Examples of the 3 Vs and Transit-Related Data
Data Description |
Variety |
Volume - Storage |
Velocity -Frequency of updates |
Vehicle Location 100,000 trips per year |
Structured |
3.6 GB per year |
50 bytes per vehicle every 5 seconds |
Schedule Data (e.g., SEPTA bus) |
Structured (GTFS) and compressed |
21 MB |
Seasonal |
Video from 300 Cameras |
Video |
1.2 TB |
Streaming |
Geographic Information Files (NJT Bus) |
Structured |
40 MB |
Seasonal |
Slide 57:
Transit-Related Big Data Includes Internal and External Sources
-
Internal sources
- Rider surveys and panels
- Focus groups and stakeholder interviews
- Automatic passenger counting data (APC)
- Automated vehicle location data (AVL)
- General Transit Feed Specification files (GTFS/GTFS-rt)
- Electronic fare payment system datasets (EFPS)
-
External data sources
- Social media posts
- Census files and other public datasets
- Traffic data
- Web pages (HTML)
Slide 58:
Characteristics of Social Media Datasets
- Unstructured text, written in natural language
- Uncategorized
- Voluminous
- Variety of formats (e.g., JPG, GIF, MP3, MP4)
Slide 59:
Social Media Data Standardization Challenges
Standards may be emerging, but standardization is a challenge.
- Social media is unstructured and may include natural text, images, and video.
- Social media platforms are mostly owned by private for-profit entities and data (e.g., posts) may use a proprietary format.
- Some social media have Application Programming Interfaces (APIs) for downloading data, but others have no API.
Slide 60:
International Efforts on Big Data Standards (1 of 2)
![There is a graphic on the left-hand side of the ISO/IEC JTC 1 Information Technology Preliminary Report on Big Data (2014).](mod22ppt41.jpg)
SDO/Consortium |
Interest Area |
ISO/IEC JTC 1/SC 32 |
Data management and interchange, including database languages, multimedia object management, metadata management and e-Business. |
ISO/IEC JTC 1/SC 38 |
Standardization for interoperable Distributed Application Platform and Services including Web Services, Service Oriented Architecture (SOA), and Cloud Computing. |
ITU-T SG13 |
Cloud computing for Big Data. |
W3C |
Web and Semantic related standards for markup, structure, query, semantics, and interchange. |
Slide 61:
International Efforts on Big Data Standards (2 of 2)
![There is a graphic on the left-hand side of the ISO/IEC JTC 1 Information Technology Preliminary Report on Big Data (2014).](mod22ppt42.jpg)
SDO/ Consortium |
Interest Area |
Open Geospatial Consortium |
Geospatial related standards for the specification, structure, query, and processing of location related data. |
Organization for the Advancement of Structured Information Standards |
Information access and exchange. |
Transaction Processing Performance Council |
Benchmarks for Big Data Systems. |
TM Forum |
Enable enterprises, service providers and suppliers to continuously transform in order to succeed in the digital economy. |
Slide 62:
![Activity Placeholder: This slide has the word “Activity” in large letters at the top of the slide, with a graphic of a hand on a computer keyboard below it.](mod22ppt43.jpg)
Slide 63:
Question
Which of the below is not one of the 3 V characteristics of big data?
Answer Choices
- Velocity
- Viscosity
- Variety
- Volume
Slide 64:
Review of Answers
a) Velocity
Incorrect. Velocity refers to the speed required to convert input data into output data.
b) Viscosity
Correct! Viscosity is not one of the 3 Vs of Big Data, but a useful measure for assessing the quality of maple syrup and ketchup.
c) Variety
Incorrect. Variety refers to the diversity and inconsistency in the structured and unstructured data present in Big Data.
d) Volume
Incorrect. Volume refers to the quantity of data and growth rate.
Slide 65:
Learning Objective 4
- Understand the process for applying big data analytics to social media to inform transit business intelligence.
Slide 66:
Overview
- Data acquisition
- Data preparation
-
Data analysis
-
Data presentation
-
Other Issues
- Policy issues
- Technical issues
Slide 67:
Data Acquisition
![This slide is entitled, “Data Acquisition”. There are three graphics on the left-hand side of the slide. The first, at the top-left corner is a wide, yellow arrow pointing down. The arrow is filled with the icons of various social media and networking applications (Facebook, Skype, Twitter, LinkedIn, YouTube, etc.). The next graphic is repeated from Slide #56. This graphic is the funnel shape with three orange circles funneling into it. The largest orange circle has the word, “Brontobytes” written in it. The next largest circle has the word, “Petabytes” written in it. The smallest circle has the word, “Terabytes” written in it. The third graphic, in the lower-left corner depicts a lake with blue water, surrounded by green trees and a snowy mountain backdrop. There are numbers embedded within the lake, representing data. The words “Data Lake” are written across the image.](mod22ppt45.jpg)
-
Data acquisition is the means necessary to gather data for subsequent steps. These may include:
- Data collection
- Data recording of natural events
- Data recording of human-made events
- Data entry
-
What data do I have?
- Internal sources
- External sources
- What data do I need that I don’t have?
-
Do I need:
- To do data scraping
- To use an Application Programming Interface
- How much will new data cost me to acquire?
-
What are my storage requirements
- Volume, security, in the cloud, in-house
-
Where do I store my data?
- We introduce the term "data lake"
Slide 68:
Data Preparation
![This slide is entitled, “Data Preparation”. There are two graphics on the left-hand side of the slide, one above the other. The top graphic has five boxes representing valid and invalid data. The first box is grey with a small red “X” at the top, left corner. The words in this box read “Missing Data”. The next box, slightly beneath and to the left of the first is also grey with a small red “X”. This second box reads “Outlier”. Beneath these boxes is a third box that is blue with a black check mark that reads “Valid Data”. Beneath that is a fourth grey box with a small red “X” that reads “Out of Range”. A fifth box beneath that is blue with a black check mark that reads “Valid Data”. The final box is beneath this and is grey with a red “X” that reads “Null Data”. The second graphic is a screenshot from a file directory on a computer that has green check marks next to valid data and red cross marks next to invalid data.](mod22ppt46.jpg)
- Data preparation removes data that is incomplete, incorrect, or out of range from analysis.
-
Do I have the right data?
- Granularity
- Coverage
- Content
- Geographic region and data (GPS, GIS files)
- Time frame
- If there is a standard available, this is the step to map data to the standard
-
Data scrubbing and filtering occurs in this step
- Remove outliers
- Handle of missing data
- Remove out of range data
- Handle null data values
- Define any rules for sentiment analysis, topic maps, and linkages between disparate data sets.
Slide 69:
Data Analysis
![This slide is entitled “Data Analysis”. There are a series of graphics on the left-hand side representing various methods of data analytics. The first graphic is a Gantt chart with the title “What algorithms/analytic methods do you TYPICALLY use?”. The x-axis at the top ranges from 0-100%. This chart also uses a color scheme to further organize the usage of the varying methods. Dark green represents “Most of the time”. Light green represents “Often”. Yellow represents “Sometimes”. Red represents “Rarely”. The y-axis along the left side lists the algorithms/analytic methods, displayed roughly in order of greatest use to least. This order is as follows: Regression (used 90% of the time), Decision trees (83%), Cluster analysis (87%), Time series (75%), Text mining (65%), Ensemble models (57%), Factor analysis (67%), Neutral nets (66%), Random forests (53%), Association rules (64%), Bayesian (64%), Support vector machines (SVM) (56%), Anomaly detection (58%), Proprietary algorithms (45%), Rule induction (50%), Social network analysis (45%), Uplift modeling (43%), Survival analysis (44%), Link analysis (40%), Genetic algorithms (42%), and MARS (31%). The second graphic is a word cloud filled with terminology related to big data, with the largest words being “Big” in blue and “Data” in grey. The third is a general graphic histogram of a curve. The fourth graphic is a standard deviation graph representing the bell curve with a mean of 6 and three standard deviations from the mean of 0.5, 1, and 2. The fifth graphic is scatter plot of entitled, “Scatter Plot of Summer Temperatures”. This plot has a diagonal line dividing the positive and the negative bias, with the majority of the data points lying on the side of the negative bias.](mod22ppt47.jpg)
- Data analysis is the interpretation of relationships between data to gain insights about a problem or solution.
-
Data analysis techniques include
- Data mining
- Data visualization
- Topic maps
- Sentiment analysis
- Data similarity analysis
- Stochastic analysis
- Data correlation
-
Artificial intelligence and machine learning
- Image processing
- Facial recognition
- Automated license plate recognition (ALPR)
- Predictive analytics
Slide 70:
Data Presentation
![This slide is entitled, “Data Presentation”. There are three graphics on the left-hand side of the slide and two graphics at the bottom. Graphic #1, in the top-left corner, depicts a transit agency dashboard. The dashboard is divided into four squares for Reliability, Ridership, Financials, and Customer Service. The Reliability square reads “How dependable is our service?” with feedback that reports that the Subway is 86% reliable, the Commuter Rail is 96% reliable, and a service called the “The RIDE” is 95% reliable. The Ridership square reads “How many trips are taken on MBTA services on an average weekday?” followed by the answer in large numbers “1.18 Million”. The Financials square asks “How are we tracking against our operating budget?” followed by a graphic comparing spending year to date versus budget year to date (data from January 2020), with the spending exceeding the budget to date by $10. The Customer Service square asks “How do riders rate the MBTA?” followed by a graphic that shows three out of five gold stars colored gold. Graphic #2, beneath Graphic #1, displays a satellite heat map over the city of Austin, with the words “Showing where you can get to in an hour from any stop”. Graphic #3, in the lower-left corner, is a graph comparing planning time and travel time index values for Los Angeles city-wide data from 2003. The x-axis represents time of day (weekdays, non-holidays only) and the y-axis represents index values. Both the planning time and travel time peak twice around 8 AM and 6 PM, with the planning time exceeding the travel time by roughly 5-6 index values. Between the plots for planning time and travel time, there is vertical line representing the difference in those index values, the buffer time. There is text beside the buffer time that says “Buffer between expected (avg.) and 95th percentile travel times”.](mod22ppt48.jpg)
- Data presentation is the process of using the results of analysis to provide an explanation or make a claim about the data.
-
Agency dashboards draw data from multiple sources to share key performance indicators:
- Ridership
- Service performance
- Financial
- Customer satisfaction
- Maintenance records
- Electronic fare payment
![Graphic #4 is a color-coded graph comparing the various methods of payment for transit users over the course of two years. The y-axis represents the percentage of a method used. The x-axis begins at Quarter 4, 2013 and ends with Quarter 4, 2015. The payment methods used during this time span include the following: cash fare, FareSaver ticket books, U-Pass BC, Monthly pass, Day pass, Compass Card, Employer pass, Concession tickets, and Other. From Quarter 4, 2013 to Quarter 2, 2015 the methods used remain roughly the same (25% cash fare, 16% monthly pass, 27% FareSaver ticket books, 2% day pass, 5% concession tickets, 12% U-Pass BC, 4% Compass Card, and 7% Other). The use of Compass Card increases to 30% by Quarter 4, 2015 and after Quarter 4, 2013 no employee passes are used. Graphic #5 is in the bottom-right corner. It is a scatter plot (with lines) with the title, “Weekly Boarding Rides” and the subtitle “Bus and MAX”. The data is arranged by month, beginning with February at the left hand side to January at the right. The data displays that the MAX rides from both years 2018-2019 and 2019-2020 remain within the range of 700,000 to 800,000 rides with peaks in Summer (May – July) and Fall (October). The Bus rides from years 2018-2019 and 2019-2020 vary from 1,000,000 to 1,180,000. There are peaks in these data lines in the months of April, May, and October. There are falls in this data in the months July and a larger fall in the number of rides in December.](mod22ppt49.jpg)
Slide 71:
Big Data Process Steps Summary
![This slide is entitled, “Big Data Process Steps Summary”. There are four vertical columns containing an accumulation of graphics from the previous slides, with a large, blue arrow pointing to the right at the bottom of each of the columns of graphics. Column #1 contains the same graphics from the Data Acquisition slide (slide #68). The blue arrow beneath these graphics reads “Data Acquisition”. The second column contains the graphics from the Data Preparation slide (slide #69). Beneath this column, the blue arrow reads “Data Preparation”. The third column contains the graphics from the Data Analysis slide (slide #70). Beneath this column, the blue arrow reads “Data Analysis”. The fourth and final column contains some of the graphics from the Data Presentation slide (slide #71). These graphics included here are the agency dashboard, the heat map, and the plot of Weekly Boarding Rides. Beneath this column, the blue arrow reads “Data Presentation”.](mod22ppt50.jpg)
Slide 72:
Data Presentation Example: MBTA Dashboard
![This slide is entitled, “Data Presentation Example: MBTA Dashboard”. The left column of this slide contains two graphics regarding the average reliability of subway line “C”. There is a graphic of the MBTA agency dashboard that tracks the reliability over the period of one month. In large, black font the graphic reads “80%” for “December 1, 2019”, “80%” for “Past 7 Days”, and “79%” for “Past 30 Days”. Beneath this data is a plot of the transit reliability. The y-axis is the reliability percentage and the x-axis is a time period spanning from November 25th to December 1st. A target value of 90% is displayed on the plot by a solid, horizontal line at the 90% mark. The subway average is also displayed on the plot in a light, grey color. This line largely follows the target of 90% with minor dips of 1% on Nov. 26th , 29th, and Dec. 1st. This subway average also deviates from the target value with a minor peak of 2% on Nov. 28th. The reliability of the subway line C falls below the target value at a 79% until Nov. 27th where it rises to 84% on Nov. 28th and dips to 77% on Nov. 29th. It then rises again to 84% by Nov. 30th only to drop back off to 80% on Dec. 1st.](mod22ppt51.jpg)
- MBTA has an online dashboard for key performance indicators
- Supports transparency in reporting for internal and external audiences
- Supports drill down by mode, line, and route to get a snapshot of service performance.
- This dashboard does not include social media posts.
- The URL is in the Student Supplement.
![Supplement icon indicating items or information that are further explained/detailed in the Student Supplement.](mod22pptimsupplement.jpg)
Slide 73:
Data Presentation Example: busstat.nyc
![This slide is entitled, “Data Presentation Example: busstat.nyc”. There are two graphics in the left column regarding travel time. The first graphic is a plot entitled, “Cumulative Travel Time Across Stops”. The y-axis represents the minutes of travel time in 20-minute increments. The x-axis is representative of stops (in increments of 5). On this plot are the scheduled and actual data. The scheduled data is a dotted, grey line on the plot that forms a diagonal line with a slope of 2. The actual data is a solid, purple line forms a diagonal line with a slope of 2.2, roughly following that of the scheduled data. Beneath this plot are three grey squares containing travel time data. The first reads “Excess Wait Time” in grey, followed by “2.0 mins” in orange. The second reads “Route Lateness Factor” in grey followed by “39.4%” in orange. The third and final square reads “Average Speed” in grey followed by “8.0 mph” in green.](mod22ppt53.jpg)
- busstat.nyc measures and displays performance for New York City buses.
- Project is in beta as of January 2020.
- Proposed metrics join data from multiple sources to generate performance indicators that reflect customer experience and agency progress toward meeting goals.
- Route lateness factor compares actual trip time to scheduled trip time. No social media posts were included.
- Project developed by the NYU Center for Urban Science and a capstone project of the master program sponsored by TransitCenter
- The URL is in the Student Supplement.
![Supplement icon indicating items or information that are further explained/detailed in the Student Supplement.](mod22pptimsupplement.jpg)
Slide 74:
Other Issues
Policy Issues
- Protecting user privacy
- Data security
- Regulatory environment and limitations/policy of government agencies use of social media
- Understanding how well social media data represents agency customer base
- Analyzing social communications in multiple languages
Slide 75:
Other Issues
Technical Issues
-
A data lake may be partitioned into "data ponds" to:
- Limit access
- Share data resources with another agency
- Provide a means of data sharing between agencies.
- A regional lake may provide ponds for separate transit properties
-
Open source/open data tools
- Need to consider whether adequate technical support and security are available
- Resource requirements (e.g., skills, storage, hardware, licensing, in-house vs. contracted)
Slide 76:
![Activity Placeholder: This slide has the word “Activity” in large letters at the top of the slide, with a graphic of a hand on a computer keyboard below it.](mod22ppt55.jpg)
Slide 77:
Question
Which of the below is not a step described in Big Data processing?
Answer Choices
- Data Preparation
- Data Field Quantization
- Data Analysis
- Data Acquisition
Slide 78:
Review of Answers
a) Data Preparation
Incorrect. Data preparation is the step of removing data that is incomplete, incorrect, and/or out of range from analysis.
b) Data Field Quantization
Correct! Data field quantization evaluates elements of the General Relativity Theory to prove gravity exists, and is the basis for the general rule that buses will roll instead of fly.
c) Data Analysis
Incorrect. Data analysis is the interpretation of relationships between data to gain insights about a problem or solutions.
d) Data Presentation
Incorrect. Data presentation is the process of using the results of analysis to make a case or explanation about data.
Slide 79:
Learning Objective 5
- Incorporate findings to support business intelligence with data-driven decisions
Slide 80:
Overview
Working with Social Media
- Social media posts from transit customers, stakeholders, and others (inbound communications) can provide unfiltered feedback.
- Social media posts use natural language, which requires special analytical techniques to create meaningful datasets.
- Posts usually include usernames, which must be removed during analysis to protect privacy.
- Some transit agencies restrict use of social media by staff.
- Social media users may not be representative of all transit customers.
Slide 81:
Overview
![Example icon. Can be real-world (case study), hypothetical, a sample of a table, etc.](mod22pptimexample.jpg)
- Chicago Transit Authority (IL)
- San Diego Metropolitan Transit System (CA)
- Transport for London (UK)
- Metro Transit (MN)
Slide 82:
Chicago Transit Authority
![This slide is entitled, “Chicago Transit Authority”. The left-hand column contains a graphic of subway destination stickers. The stickers are ordered and colored as follows (from top to bottom): Midway (in yellow, with a plane next to it), Harlem (in green), Linden (in black), and 54th / Cermak (in red).](mod22ppt58.jpg)
Measuring Customer Sentiment
- In one of the first papers on the topic, researchers analyzed tweets that mentioned the Chicago Transit Authority to better understand customer sentiment.
- Researchers assembled a dataset of Twitter posts that mentioned CTA or individual lines.
- Analysis determined that customers were more likely to express negative sentiments toward a situation than positive sentiments.
Slide 83:
Chicago Transit Authority
Negative tweets spiked at 9 AM on July 23, 2011.
![Author’s relevent description: This slide is a continuation of the last, entitled, “Chicago Transit Authority”. There are three different types of graphs displayed on this slide, composed from data extracted from negative tweets. The first graph in the top, left corner shows the total tweets and the sentiment strength by time of day for July 23, 2011. All tweets and negative sentiment spiked at 9 AM in response to service delays. The second graph in the top, right corner shows total tweets and the normalized sentiment strength by time of day for July 23, 2011. The third graph, centered beneath the first two, is a bar graph displaying the total number of tweets by tie of day for July 23, 2011.](mod22ppt59.jpg)
Slide 84:
Chicago Transit Authority
A tag cloud confirmed customer communication around 9 AM about delays on the Red and Blue Lines because of flooding.
![This slide is also entitled, “Chicago Transit Authority”. There is a graphic of a word cloud on the right half of this slide. The word cloud is was created from analysis after a flood caused delays on the Red and Blue lines. The largest words among the cloud include red, flooded, blue, train, 103rd street, soon, running, green, even, etc.](mod22ppt60.jpg)
Slide 85:
San Diego MTS
![Please see extended text description below.](mod22ppt61.jpg)
(Extended Text Description: This slide is entitled, "San Diego MTS" with the subtitle, "Combat Fare Evasion." This slide has a background of a red San Diego MTS bus approaching a stop. There are green palm trees near a traffic signal where the bus is stopped. A pedestrian is crossing. Overlaid on top are the following bullet items:
- San Diego Metropolitan Transit System used big data to help combat fare evasion on trolleys.
- Trolleys use barrier free honor system to collect fares. Customers tap smartcards to fare validators on the platform.
- MTS contracted with a consultant to analyze fare payment patterns.
Slide 86:
San Diego MTS
Combat Fare Evasion
-
Analysis incorporated multiple data sources.
- GTFS showed vehicle location.
- Fare validators showed smartcard taps before boarding.
- Automatic passenger counters calculated boardings per station.
- Data analysis correlated farecard taps with passenger counts and vehicle arrivals to determine locations for additional fare enforcement.
- Social media was not a data source for this analysis.
Slide 87:
Transport for London
![This slide is entitled, “Transport for London”. In the left-hand column there is a photo of Big Ben in London from the ground. Also captured in this photo is a sign for the a Transport for London stop, a red circular sign with a rectangular black bar through the center that reads, “Underground”.](mod22ppt62.jpg)
Optimizing Advertising
- Researchers tested a methodology for analyzing geotagged social media posts in Transport for London Underground stations to optimize advertising campaigns.
- Tweets were analyzed and categorized based on topics of interest (e.g., sports, entertainment).
- Information was intended to provide guidance for advertising campaigns at different stations.
Slide 88:
Metro Transit
![This slide is entitled, “Metro Transit”. There is a graphic on the left-hand side of a transit bus stopped at a stop sign in the rain.](mod22ppt63.jpg)
Locating Bus Shelters
- Metro Transit in Minneapolis/St. Paul uses big data analytics.
-
Strategic Initiatives Department draws on data from multiple sources to support data-driven decision making.
- How to allocate resources for bus shelters and amenities?
- How to improve on-time performance?
- How to design a transit network to best meet customer needs?
Slide 89:
Metro Transit
![This slide is entitled, “Metro Transit”. There is a graphic on the left-hand side of a transit bus stopped at a stop sign in the rain.](mod22ppt64.jpg)
Locating Bus Shelters
-
Data sources
- Customer survey
- Facilities
- Ridership
- Demographics
- Equity-focused measures were developed to inform decisions.
- Data sources do not include social media.
Slide 90:
![Activity Placeholder: This slide has the word “Activity” in large letters at the top of the slide, with a graphic of a hand on a computer keyboard below it.](mod22ppt65.jpg)
Slide 91:
Question
Based on these examples, analyzing social media data helped inform agency decisions about which of the following?
Answer Choices
- Where to upgrade bus shelters
- How to understand customer sentiment
- Where to add fare enforcement
- How to report non-fare revenues
Slide 92:
Review of Answers
a) Where to upgrade bus shelters
Incorrect. Agency did not consider social media posts.
b) How to understand customer sentiment
Correct! Researchers analyzed social media posts to assess CTA customer sentiment.
c) Where to add fare enforcement
Incorrect. Agency did not use social media to solve problem.
d) How to report non-fare revenues
Incorrect. None of the examples focused on non-fare revenues. Social media is not a source of this data.
Slide 93:
Module Summary
- Learned how transit operators can use business intelligence tools to make data-driven decisions
- Saw examples of agency-generated and customer-generated social media posts
- Learned about potential sources of big data for use in transportation analysis
- Reviewed process for applying big data analytics to social media to inform transit business intelligence
- Reviewed examples of using big data to support business intelligence
Slide 94:
Thank you for completing this module.
Feedback
Please use the Feedback link below to provide us with your thoughts and comments about the value of the training.
Thank you!
↑ Return to top