Public Transport Safety Using K-Means To Profile Drivers
- 0 Collaborators
● The number of people who currently die every year due to road accidents is 1.3 million globally. ● It is high in developing countries and is the leading cause of death for people aged 15 to 29 years. ● In Kenya, 3,000 deaths occur every year and about 40% are pedestrians. ● These road crashes cost Kenya 300 billion shillings annually which is 5.6% of the GDP. ● With the increase in population, it is predicted that road fatalities will rise. ● The solution aims to use the K-means algorithm in driver profiling public transport drivers. ...learn more
Project status: Under Development
Internet of Things, Artificial Intelligence
            Intel Technologies
            
              
                Other
              
            
          
Overview / Usage
ABSTRACT
The project aids in public transportation safety through profiling drivers according to their driving styles. Mobility in Kenya keeps increasing at an alarming rate. This is evident in the public transport industry where Public Service Vehicles popularly known as matatus on the roads keep increasing. A report from the National Transport and Safety Authority states that the number of PSVs were 17,600 in 1990 to 40,000 in 2003 and are over 100,000 currently. It also states that 3,000 people die every year due to road accidents costing 300 billion Kenya Shillings of the Gross Domestic Product (GDP). Bad driving habits contribute to the high mortality rate on the road due to crashes.
Classifying drivers according to their driving style singles out bad drivers and when the necessary actions are taken, it can lead to improved public transport safety. The web-based driver profiling application uses data collected from matatus using sensors and data collectors who provide human validation for the sensor data collected. The sensor data collected is analyzed using a machine learning algorithm called K-means which is suitable for unsupervised learning data. K-means algorithm uses clustering to a group and labels the data.
A matatu driver registers into the system using a USSD application and their details are linked to a specific vehicle. Insurance companies, SACCOs and matatu owners can then access the driver profiles through the web-based application. In addition, they get access to the vehicle reports including the human validation data.
The project was presented during the Miss Geek Africa competition and awarded first place for the impact it could have in promoting road safety and creating smart cities. Insurance companies can use the system in setting premiums for the drivers based on their risk level on the road. The National Transport and Safety Authority has recently launched a smart license that displays the details of the driver which could work with the driver profiling system to provide additional information.
The driver profiling system benefits a lot of stakeholders in the transport system as stated. It is beneficial in monitoring each driver individually and therefore improving road safety.
Methodology / Approach
1.0 Proposed Solution
Profiling the drivers using machine learning will make it possible to provide knowledge on whether a driver is more prone to making human mistakes which cause road accidents.
When a driver is profiled and determined as high risk, the traffic police may be warned of that driver. This makes it easier for the traffic police to manage road safety remotely.
Insurance companies can also use this information to create a driver risk profile based on the information provided about the drivers.
This solution is not as popular as it is in the developed countries. The solution is working well in those countries due to a few factors like the fact that the drivers strictly follow the traffic rules unlike in Kenya. It can, however, contribute to road traffic safety if insurance companies award drivers with a good profile.
The context of implementing this solution in a developed country such as Kenya is entirely different from those implemented in the developed countries. This might also contribute as a challenge in implementing the solution but with the support of policymakers, insurance companies and the traffic police the proposed system will thrive.
1.1 Background of the data
The sensor and human validation data used in the project was provided by the University of Nairobi in conjunction with a Berkley University during a project done together. The data was a representative sample of various routes within Nairobi. There were two types of data, the sensor data collected through sensors fitted in the public vehicles and human validation data collected by data collectors who boarded the public vehicles. The format for both was in Comma-Separated Values (CSV). The sensor data did not have labels which led to the use of an unsupervised machine learning algorithm. The most suitable algorithm was chosen after analyzing it against the sensor data.
1.2 Machine Learning
It is an area in Artificial Intelligence that enables computers to learn from data without being programmed. When exposed to new data the computer can learn, grow and adapt by
themselves from the data, (Junior, J.F., Carvalho, E., Ferreira, B.V., de Souza, C., Suhara, Y., Pentland, A. and Pessin, G., 2017).
Machine learning was the preferred method in developing the system because it enables the system to learn from the data and predict the results. In the initial stages, the data will be used in training, validating and testing the data.
1.3 K-Means
k-means is one of the simplest algorithms which uses an unsupervised learning method to solve known clustering issues. It works well with large datasets. The data provided for the project did not have labels and during the analysis of the data, it displayed grouping according to several clusters. This led to the selection of K-Means which is a useful machine learning algorithm for clustered data.
1.3.1 Benefits of K-Means
i. Strong sensitivity to outliers and noise
ii. Doesn't work well with non-circular cluster shape -- a number of clusters and initial seed
value need to be specified beforehand
iii. Low capability to pass the local optimum.
1.3.2 Disadvantages of K-Means
i. Difficulty in comparing the quality of the clusters produced (e.g. for different initial partitions or values of K affect the outcome).
ii. Fixed number of clusters can make it difficult to predict what K should be.
iii. Does not work well with non-circular clusters.
iv. Different initial partitions can result in different final clusters. It is helpful to rerun the
program using the same as well as different K values, to compare the results achieved.
1.4 Challenges of a Driver Profiling System in Kenya
Some drivers are sceptical about providing data that is eventually used to profile them. In addition, corruption by the offenders involving the traffic police will make it harder to punish traffic offenders. Lastly, some PSV drivers may disconnect the device that is used to transmit the data, and this has already happened with the current data collection.
1.5 How Intel Technology Can Be Used For The Solution
In the Developer Guide for Intel® Data Analytics Acceleration Library 2018 Update 3, KMeans can be used as a clustering algorithm.
Technologies Used
2.0 Resources
2.0.1 Hardware
- HP Notebook – Laptop
- Processor – Core i3 5th generation.
- RAM – 4 GB
- SSD - 256GB
2.0.2 Software
- Operating system- Ubuntu 16.04 LTS and Windows 10
- IDE- Sublime 3 text editor.
- Jupyter Python notebooks.
- PostgreSQL Database.
- Scikit – Machine learning software.
- Distributed Version Control –Github.
- Gantt chart – Project schedule.
- Edraw- Drawing the system designs
- Balsamiq Mock-ups- for interface design.
- Django framework-Backend development
- Django Rest Framework –for implementing API endpoints.
- Africas Talking API- for USSD gateway.
- Bootstrap – for front-end development.
- Ngrok – to host the USSD pages online.
- Mozilla Firefox version 38.05 – to run the web system.
