Connected and multimodal passenger transport through big data analytics

. Passenger transport is becoming more and more connected and multimodal. Instead of just taking a series of vehicles to complete a journey, the passenger is actually interacting with a connected cyber-physical social (CPS) transport system. In this study, we present a case study where big data from various sources is combined and analyzed to support and enhance the transport system in the Tampere region. Different types of static and real-time data sources and transportation related APIs are investigated. The goal is to find ways in which big data and collaborative networks can be used to improve the CPS transport system itself and the passenger satisfaction related to it. The study shows that even though the exploitation of big data does not directly improve the state of the physical transport infrastructure, it helps in utilizing more of its capacity. Secondly, the use of big data makes it more attractive to passengers.


Introduction
Passenger transport is becoming more and more connected and multimodal.Instead of just purchasing a ticket to a single means of transportation, the future passenger will purchase a mobility service that will take him from the point of departure to the point of arrival.A multimodal transport service combines several different ways of transport -such as bus, train and a bicycle ride -under a single transport service.This multimodal and connected transport service is regarded a MaaS (Mobilitity as a Service) if the following requirements are met: 1) the physical transport service is augmented with other services such as trip planning, reservations and payments, 2) the services can be used through a single interface, and 3) the services are packaged into a tailored mobility package similar to a monthly mobile phone contract [1].ABI Research [2] estimates that MaaS revenues will exceed $1 trillion by 2030 and customers' mindset is moving towards buying a service instead owning.
Currently, during the journey, the passenger will typically use several different means of transportation and hopefully receive timely information concerning his itinerary.Ticket purchases, information seeking and sharing as well as real-time communications regarding the purchased journey will all performed using a digital application.Thus, instead of just taking a series of vehicles to complete a journey, the passenger will actually be interacting with a connected cyber-physical [see e.g.3] social (CPS) transport system.The system is constantly producing stream data, such as IoT data from the vehicles, mobility data of the passengers and interaction data created by passengers and transport service providers.When all this data is combined with static and standardized data such as public transport trajectory, ticket fare, service provider and geographic data, we have a big data landscape on which the connected CPS transport ecosystem can be built.The different layers of a CPS transport system are illustrated in Figure 1.
Fig. 1.The CPS transport system divided into three layers from the point of view of the passenger interacting with the system.Each layer and dimension of the CPS transport system is illustrated with an example.
The research on multimodal collaborative CPS transport systems is scant.There are existing models for dealing with some aspects of the multimodal collaborative transport systems.One example is the proposal for a common architecture for payments handling in such a system [4].However, there is no systematic study about all aspects of a multimodal collaborative CPS transport system.In this paper, we study through a case study the data-architecture of a multimodal collaborative CPS transport system in the Tampere city region.Special attention is given to all available open data -stream and static as well as to open interfaces that can be used to control the system (such as the traffic light API) as well as to how passengers interact with the data using digital applications.Based on this, a novel data architecture for a big data driven multimodal and connected CPS transport system is developed.
There are currently several initiatives that foster the advent of transport 4.0.Transport 4.0 means that different and previously disconnected forms of transportation are being integrated into one single passenger interaction that combines several modalities of transport and that takes the passenger from door-to-door.In this paper, we adhere to the general definition of Industrie 4.0 [5] from which the concept of Transport 4.0 is derived.We also regard MaaS as a business model that implements the concept on Transport 4.0.The initiatives towards Transport 4.0 are related to either legislation or availability and standardization of transportation related data and application programming interfaces (APIs) or both.One example of such an initiative of the level of the European Union is the first phase of National Access Point (NAP) system which will become mandatory by the end of 2019.The NAP system has to be implemented in every member state.It contains the essential information about every national mobility service provider with more detailed information coming in subsequent phases scheduled in December 2020 and 2021 [6].
Currently there are multiple sources of transport related open data.Most common are SIRI and GTFS -APIs that are used in public transport.SIRI is standardized in Europe and it is used in several cities and regions to offer real-time location data on public transport [7].GTFS is a globally used specification used to publish bus routes and timetables for journey planning [8].It was originally created for Google Maps, but now days majority of big public transport operators in Europe provide GTFS formatted timetables.GTFS has also been extended to support real-time fleet updates [9].
In this study we present how big data and analytics can produce value to transport organizations, cities and end users.In addition, a data-architecture derived from the case study is presented.The goal of the study was twofold.Firstly, it aimed to find ways in which big data and different interfaces can be used to improve the multimodal and collaborative CPS transport system itself and the passenger satisfaction.Secondly, the study aimed to illustrate the data-architecture of such a system.The study contributes to the big data, transportation 4.0 and CPS transport systems debate where various types of data is collected from several actors participating in the collaborative network.The paper is organized as follows: First, we review the fundamental concepts behind the study, namely: 1) Big data analytics for decision makers, 2) Big data in MaaS and 3) Collaborative networks in passenger transport.In Section 3, we present both the research methodology used and the case of Tampere City region in Finland.Sections 4 and 5 present the system architecture and results through example cases.Section 6 includes a discussion on the significance and contribution of the results.

Big Data and Connected Multimodal Passenger Transport Systems
The expected impacts of big data analytics in passenger transportation are related to both the assistance of decision makers and to the development of new data-driven services for passengers.Big data also serves as a technological enabler for new business models and for a new platform economy.

Big Data Analytics for Decision Makers
Decision makers at different levels of public administration as well as in the wide industry of passenger transportation and travel benefit from convenient tools for strategic planning like feasibility studies, environmental, competitor, and trend analyses.Analyzed big data gives keys to follow and understand consumer and traveler behavior better, but also in some cases to upgrade customer service through optimization.The taxi business provides a good example of this.Several studies show how passengers' travel trajectories and city structure have been mined from taxi ride data [10; 11].This information may be used in order to understand passenger behavior and in transport planning.In addition to this, the taxi business may also use taxi ride data for optimizing its business [12; 13].Typical optimization cases in the taxi business include optimizing the location of idle taxis at a given point in time and optimizing the trajectory of a taxi that has a passenger at a given moment.This contributes to the goal of more sustainable cities as the existing resources (taxis and routes) can be exploited in an optimal manner and the need for new resources may be decreased.
In addition to being able to optimize the usage of resources, it is also very important to provide fast reaction to changes in traffic flows.Examples of events that produce changes are mass events, sudden malfunction in some part of the transport system and changes in weather.To be able to react to these events timely, real-time or near realtime information as well as information about the future -such as upcoming events and weather forecasts -are needed.This kind of data is readily available e.g. from devices that are located in vehicles, but also from passengers' devices such as mobile phone location data and smart card data.Chen et al. [14] have performed human mobility pattern mining on mobile phone data and Tao et al. [15] have used smart card data in passenger transport related case studies.
In addition to the above-mentioned proprietary data sources, freely available open data sources and open interfaces for retrieving the data are a very important source of transportation-related data.Open datasets have been used e.g. in data driven customer demand prediction and dynamic pricing [16].Pereira et al. [17] use freely available event data from the Internet to predict public transport rides in the City of Singapore.
Big data analytics supports transport companies as well as public communal, local district and governmental partners and planning associations in their strategic decisions.It also complements the decision process with data for potential assessment, different studies, and analyses.Big Data analytics contributes to itinerary and route planning of transportation companies (e.g.coach/bus and car rental companies), but probably also to tour operators, incoming agencies and travel management departments in private companies and public institutions including fleet management.
Big data analytics tools can inform mobility center how they can include the data in their day-to-day-work for example.Big data analytics allows to improve the customer service of travel intermediaries and travel service suppliers by allowing to react faster e.g. in strikes and/or unpredictable occurrence of traffic disruptions.
In order to fully utilize big data analytics in passenger transportation EU-wide guidelines and recommendations regarding the integration of new data sources as well as a novel data formats are needed.Furthermore, existing privacy and legislation frameworks need to be noted.For example, in Europe, the GDPR (General Data Privacy Regulation) [18] has to be taken into account when dealing with data that may contain privacy issues.Only data that has been properly anonymized is free from regulations concerning privacy.In turn, big data analytics provides an infrastructure and analytical functionality enabling the prediction of mobility and congestion hotspots.

Big Data in Multimodal Transport Services
Multimodal transport services mean that the passenger only purchases one ticket from the point of departure A to the point of arrival B. The journey from A to B typically involves several modalities of transport, such as airplane, ship, train, tram, bus, car sharing, taxi and/or bicycle.The transportation service providers that offer different modalities of transport are an example of a collaborative network.A collaborative network (CN) is a network of largely autonomous, geographically distributed, and heterogeneous organizations that collaborate to better achieve common goals, and whose interactions are supported by a computer network [19].The modalities may change dynamically during the journey due to traffic incidents, weather or traffic conditions, among others.An important part of the multimodal transport service is the application that is used to communicate between the collaborative network of service providers and the passenger.
The MaaS business model is an example of an implementation of a multimodal transportation service where the digital applications related to it are an important part of the service.These digital applications are typically are to route planning, payments and communicating timely information from the transport provider to the passenger.
The tools used in big data analytics enhances opportunities also for start-ups and other companies to innovate new mobility services and enable new business models such as new platforms, interfaces and APIs for example.All this is possible given that the MaaS companies adhere to the local privacy regulations.One good solution for sharing mobility data containing personal identifiers between organizations is to anonymize it.It is important to choose the correct level of anonymization so that the data still has value.
Big data analytics could be integrated in existing mobility services like Moovel, Switch and Whim.However, also completely new mobile service possibilities will appear in connection to analyzed data available.Big data analytics also contributes to the emerging MaaS business model that many of the new start-ups have adopted.

Collaborative Networks in Passenger Transport
The collaborative network of multimodal transport service providers presented in the previous section is just one example of the existing and future networks in passenger transport.There are several other examples of collaborative networks in transport.Most of them involve organizations, but some also involve passengers in an active role.Crowdsourcing and sharing economy are typical examples of collaborative networks in transport.The use of open data also calls for collaborative networks because the producers and users of open data seldom are those who build the applications, Crowdsourcing and social transportation offer new possibilities for passengers to organize their journeys [20].Crowdsourcing may also be used in cases of traffic disruption where the official travel service providers fail to deliver accurate and ontime information.Crowdsourcing is a sourcing model in which individuals or organizations obtain goods and services, including ideas, from a large, relatively open and often rapidly evolving group of internet users.This type of group can be called collaborative network.Crowdsourcing divides work between participants to achieve a cumulative result.[eg.21; 22] Sharing economy -also called collaborative economy -is a mode of consumption where goods and services are not owned by a single user, but temporarily accessed by members of a collaborative network with or without charge.Sharing economy can refer to social peer-to-peer processes that include sharing of access to goods and services [23], or any rental transaction facilitated by a two-sided market, sometimes also including business to consumer (B2C).[24] 3 Method and Case Description In this study, we firstly have a review in conceptual part of the paper.Secondly, we picked a case region to support our findings on what kind of open data platforms and APIs there are and what is already being implemented on top of the APIs to make the stakeholders benefit from the data.Since the value from the data for the citizens using the services cannot be presented using a numerical method, we use a set of case examples to illustrate how the solutions built upon the data generate benefits to different stakeholders.Both pilots and known products are considered.The region chosen is the Tampere Region, which is the second largest metropolitan area in Finland with about 385 000 inhabitants.The city center of Tampere is located between two lakes limiting the connection possibilities through the city.Thus, it is important to develop a system that exploits the currently existing physical infrastructure and uses it as efficiently as possible.
On national level, Finland moved towards new legislation about mobility services, which aimed to unify all transport regulations into one code.A large part of the changes in new act was to achieve cost savings in transportation due to information system development and new open data interfaces.One of main targets was to open all the essential data of transportation services to public and to define the provisions of more unified ticket and payment systems, which would allow better combination of different service providers and forms of transportation [25].Finland as a country has also been early adopter of new data interfaces, such as Digiroad and Digitraffic [26; 27].Therefore, many of Finland's largest cities have some similar APIs and sources freely available compared to Tampere.To simplify the results, the paper is chosen to use one case area, in this case, Tampere, which is familiar to the writers and is one of the cities having the most data and APIs available.The city also has a program to create new smart city solutions in cooperation with local research facilities and businesses.
On top the Finnish data, the City of Tampere has also multiple data sources that support the mobility development.Some of the APIs are only in use in the City of Tampere but some also cover the whole region.Examples of different APIs, in addition to already mentioned GTFS-and SIRI-APIs, available consist for example for traffic lights, route planning (for different modes), parking and incidents and roadworks [28].A large part of the APIs are available freely, though some special information is available upon agreement.For example open parking API gives basic information, but fares and exact capacities are in closed API [29].Different sources of both national and regional data can be seen in Figure 2. The Figure also shows examples of amount of regional data provided.[29; 30; 31; 32; 33] National data sources include for example different official platforms, like Digiroad, Digitransit and Digitraffic.Even though shown separated in Figure 2, most of the data sources are linked together.For example, Digiroad and national maps provide basis to local mapping solutions and Digitransit combines regional GTFS-data to build national data [34].It should also be noted that these services itself rely on different global data sources, like Digitransit uses OpenStreetMap-data based mapping solution [34].

Fig. 2. Examples of different data sources -Case Tampere Region
The City of Tampere is also focused on collaboration in mobility area to create new solutions and pilots.ITS Factory is a network that consist of the city and local authorities, local businesses and research facilities.The main concept of the network is to connect different stakeholders.The city provides different test areas and opens new mobility data sources to allow companies to pilot and develop new services and products.Different research facilities are closely connected in the process as a part of development and pilot phases.The City of Tampere also hosts test sites to pilot for example 5G networking, autonomous cars in everyday traffic and indoor positioning.

Results
Based on the case study of the Tampere region mobility services and on the different information sources in use, the authors propose a novel data architecture for digital services that enable MaaS.This architecture is depicted in Figure 3.As can be observed from the figure 3, there is a collaborative network of organizations and end users that all contribute to creating the big data that is the fuel of the MaaS solutions.In the bottom layer of the picture, we can observe that there is IoT data generated from the vehicles.This may be real-time location data as well as realtime occupation rate data.Special sensors may also measure the availability of wheelchair, children's buggy and luggage slots, for example.The middle layer of the image consists of the big data collected by different organizations of the collaborative network.The leftmost data source contains open data such as the data available through the NAP-interface or SIRI and GTFS data regarding timetables and trajectories.The transport and infrastructure providers create this data and it is reliable and well structured.
The next data source is created or obtained from sensors from the transport service or infrastructure providers.Typical transport infrastructure providers are route keepers and parking facility businesses.In addition to the data collected from the first layer, this layer contains mobility data from the passengers and business data.
The third data source contains data from digital services for passenger transport.In addition to business data, this data contains significant quantities of usage data from passengers.This data includes data such as all saved trajectories, paid trips and performed searches for each user of the service.
The last data source contains crowdsourced data, which is created by passengers and it may sometimes be unreliable and unstructured.It contains data such as ratings of service providers and services as well as real-time information about traffic disruptions and about shared transportation resources such as ride sharing.
In our case area Tampere region, multiple different APIs are used to serve as a backbone of providing new services creating value through data; these include for example real-time location of city busses and congestion data of road network through the city's traffic lights.Nowadays, different public transport and routing APIs can be seen presumable to support.It should still be remembered that it is just a start when the data is open and accessible.The key of creating value from the data is to make different solutions and applications that the end-user uses and benefits from.In this study, examples on what kind of different benefits are found from APIs and big data in our case region and how they affect users, organizations or the city, are given.
For the user, value can mean many things.It is normal to assume that different apps can show possible routes around city or that public transport timetables can be found digitally, which doesn't make the user consider it as a benefit.Same applies to accident reports and basic traffic flow information that are provided in many areas around the world.In our case area traffic flow can be better estimated through traffic lights that share the traffic volume and wait time information through the API [35].Same API can also be used to check the status of a specific light (green/amber/red).The status itself doesn't necessarily give any value to the user, but there are different pilots that use the connected traffic lights.For example, mobile app GLOSA aims to read the status of next set of traffic lights and then show the car driver the desired speed to approach so that no stop on red lights would be necessary [36].Currently, the pilot runs on 16 of the city's 173 traffic lights [37].For traffic light junctions with pedestrian and cyclist crossing button, CrossCycle is a mobile app, which tracks the route of the cyclist and reserves the green light in advance, turning it green when the cyclist arrives.The app is supported in 36 light guided junctions around the main cycle paths heading city center.[38] Added value for user can also be seen in arrival guidance to parking areas, which can be implemented in current navigation applications.Since the status and current capacity of the area can be queried beforehand, the user can be automatically routed to a location that will have free spaces [29] and be also given extra information about parking fares and different services, like EV charging.
For different infrastructure users, the value can be defined more easily.For transport providers, the better the information about delays and accidents, the better it is to minimize delays of own fleet and to calculate new routes when needed.For example, the public transport operator can get an priority in the connected traffic lights through the public transport information systems, which can be used to improve the punctuality and shorten the journey times [39].In a study conducted in Helsinki, it was noted that there is no common results of the effects of these benefits.They ran a pilot by removing priority from public transport for a day.As a result, the punctuality dropped on average by 3%, but the results were largely route dependent.It was also noted that the main effect of these benefits is to improve the punctuality and not to shorten the actual journey time.[40] In Tampere, 118 traffic light junctions support the public transport priority, which covers practically the areas where public transport is widely used [41].Through an own system, also emergency vehicles can be given priority on junctions to make the traffic flow in the desired direction [42].
The city can also benefit from the different services build on top of data.Different routing solutions that take the congestion into account can separate the traffic flow into different paths, which generates virtually more capacity, since infrastructure is better used.Different applications that create value to the user can also collect information to the city, which gets information about the routes and choices that users normally take.
The different solutions build on mobility data and APIs can also help to reach different sustainability targets.Different routing applications, which all rely on public APIs to gather information, can be used to promote new methods of transport and make them more attractive to the user by for example showing different recommendations for walking (interesting and walkable) and cycling (avoids large ascents) directions.This can promote modal shift towards sustainable modes of transport.Big data and analytics give possibility to route passengers and transport more efficiently and thus lower congestion and emission in cities. Better traffic management will allow greater capacity for passengers and goods allowing present infrastructure to be used more efficiently and therefore resources to be used more effectively.

Conclusion
Big data analytics offers new opportunities for both decision makers and new businesses as well as for passengers.Many of the new opportunities contribute also to sustainable development -from an environmental or from a social perspective.Open data and APIs have a key role since they allow to seamlessly connect different services, and in this case transport modes, into one easy-to-use solution for the passenger, who does not have to be aware on what is happening in the background.The system would also work in case of different incidents and congestion situations, since all this data would be available for the CPS transport platform that handles the journeys of users.
The study showed that even though the exploitation of big data does not directly improve the state of the physical transport infrastructure itself, it helps in utilizing more of its capacity.In the Tampere region case analyzed in this study, this means for example that the existing road network can be used more efficiently as transport service providers and individual drivers can plan their itineraries to avoid traffic congestions due to mobility data available from the road network.Another result of the study was that the use of big data makes the CPS transport system more attractive to passengers.An example of this is that displaying the real-time locations of the fleet to the passengers typically does improve passenger satisfaction with the system.This applies especially in cases where there is delay in the public transportation timetables.
In the future, new sensors and IoT could allow entirely new ways to collect and provide data to users.For example, IoT-sensors would allow to track the current occupancy rate of different public transport services and all the data could be integrated into new services.Data from different transport modalities is already available.However, the collaborative network is the core for the functioning CPS transport system.Via collaboration such as sharing and sourcing data and services within the network, the connected multimodal passenger transport systems are completed.
Interesting further research questions could relate to new business model possibilities in collaborative networks and transport systems.Another line of future research could relate to the usability of the CPS transport system by different passenger groups, since sustainable mobility is not just about improving transport infrastructure and services, but also about overcoming socio, economic, political and physical challenges.It would be important to study which passenger segments have a risk of being excluded from new digital mobility services.The CPS transport system requires different skills and attitudes from the passengers than a very traditional transport system that only has a physical dimension.Even though big data, open APIs and collaborative networks do provide significant improvements to the CPS transport system, the most important factor remains the usability of the system.Even the most intelligent system provides only very little value if it is not being used by the passengers.

Fig. 3 .
Fig.3.The data architecture for a MaaS service.The journey of the passenger starts with a bus and ends with a bicycle ride.The bottom layer illustrates the physical dimension of the system.It could consist of any combination and number of transport modes operated by different service providers as well as of physical infrastructure that is connected to the network.In this figure, the traffic lights are connected to the data repository as well as to the digital services.