Towards Twitter Feeds Classification and Redundancy Removal

Abstract

The problem addressed through this paper is a not having a proper mechanism to classify the twitter feeds according to the TV show they belong and remove the redundancy feeds when multiple news providers are aggregated. This paper presents a method that can be used to classify the twitter feeds related to the domain of TV shows and filter out the feeds with duplicate contents after aggregating multiple twitter feeds providers together. A bag of words model will be used for the classification purpose and a method of combing the word-to-word similarity metrics into a text-to-text metric will be used to derive a similarity score indicating the similarity of the contents of the two given feeds. The proposed solution for the classification has shown a 81% of accuracy.

2016
M.D.N. Perera

Faculty of Information Technology, University of Moratuwa, Sri Lanka. +94 3492170 dilan.namila@gmail.com

K.V.L. Deshapriya

Faculty of Information Technology, University of Moratuwa, Sri Lanka. +94 711606624 lakmalv91@gmail.com

C.D.K. Ilangasinghe

Faculty of Information Technology, University of Moratuwa, Sri Lanka. +94 770679671 ilchathu001@gmail.com

M.K.D.K. Alwis

Faculty of Information Technology, University of Moratuwa, Sri Lanka. +94 718970121 dilani.alwis@gmail.com,

C. Wijesiriwardana

Faculty of Information Technology, University of Moratuwa, Sri Lanka. +94 718670601 chamanw@gmail.com