LogXtractor - An Intelligent Integrated Development Environment for Log Structuring and Log Extraction


Log files are important to identify the activities of a computer system. In practice, complex computer systems generate humungous amounts of log files in a short time period. Moreover, various software and embedded systems use different types of log files. Analysis and gaining insights from such data is extremely time consuming for any organisation. One of the major problems in log analysis arena is the lack of a generic log parser and an analyser which can derive useful information from any given log file. Therefore, the purpose of the project is to research, design, implement and test a prototype by which the log analysts can map any log file and extract the useful information without requiring programming skills. The proposed solution intelligently identifies the structure of any log file using a hybrid of algorithms of pattern matching and machine learning to automatically generate a script which expresses the structure of the log file using Log Data Extraction Language. Hence, it will enable analysts to extract useful data efficiently and accurately to leverage the true value of analysing log files. The overall solution has been architected and developed as a web based integrated development environment which utilises modern web technologies. Moreover, a RESTful server was implemented using Python for analysing and parsing log files. The core algorithms have been implemented using C as it provided a huge performance impact. The author presents a novel log pattern identification algorithm which utilises a combination of regular expression matching and unsupervised learning approach to discover interesting and useful patterns in any log file regardless of the structure of the log file.

S. S. Serasinghe

Informatics Institute of Technology No. 57, Ramakrishna Road, Colombo 6 +94 77 6868 537 sahan.serasinghe@gmail.com