Assignment 6 - Outlier mining You are required to ...
Assignment 6 - Outlier mining You are required to use outlier mining methods to detect the outliers with given data sets. In a section of a city road, several cameras are set to collect the plate of vehicles from 2017-06-09 to 2017-06-12, as well as the date and time when passing the start point and the finish point. Travel time is calculated later. Time serial is another form of transformation from start time. So each instance contains 8 attributes, including serial number, license plate number, date and time passing start/end point, time serial and travel time. There are totally 4977 instances. You need to finish the following tasks. Task: (1) Use statistic-based approach to detect the outliers of travel time. Calculate the mean value and the variance of travel time. Write out the confidence interval. Take time serial as X-axis and the travel time as Y-axis. Plot the scatter diagram and mark the outliers you have recognized. (2) Use distance-based approach to detect the outliers of travel time. An object o in data set D is defined as an outlier with parameters r and π described as DB(r,π), if a fraction of the objects in D lie at a distance less than r from o is less than π, o is an outlier. Let parameter r vary from 0.1 to 0.3 with the step of 0.1, and π vary from 30 to 90 with the step of 30, find the outliers and the number of the outliers. You can use the Euclidian distance. (3) Use density-based approach to detect the outliers of travel time. With different k (from 3 to 400 with the step of 5), the number of neighbors, calculate the LOF for each data point. Set 2.0 as a threshold for LOF and an object is labeled as an outlier if its LOF exceeds 2.0. Firstly, take k value as X-axis and the number of outliers as Y-axis. Plot the line chart. Secondly, calculate the LOF for each data point and give the top 4 outliers. Use k=350 and the Euclidian distance.