1

M.Sc. Thesis: Kaushikkumar Gondaliya on Learning to Categorize Issues in Distributed Bug Tracker Systems

Supervisors: Elmar Rückert, Prof. Dr. Jan Peters, Christian Bernecker, Thorsten Busch

Finished: 27.Juni.2017

Abstract

A bug report plays an essential role in software development and maintenance. It is a key element of an efficient IT problem management. Due to the complexity of a software product, many teams such as design, usability, performance etc. work on a single product. Bug reports typically need to be routed to these different expert teams to be resolved. A manual routing process is time-consuming and less efficient. Therefore, the challenge is to find an approach that automate the routing process. With the advent of natural language processing methods, machine learning and deep learning algorithms, new automatic approaches were proposed to address this challenge.
In this thesis, bug reports from the business process manager (BPM) product of IBM Deutschland GmbH were used to compare and assess the performance of prediction models. Different combination of natural language processing methods (i.e., lemmatization, pos tagger, N-gram and stopword removal) and classification algorithms (i.e., linear support vector machines (SVM), multinomial naive Bayes, and long sort term memory (LSTM)) were evaluated. The feature processing was done using the term frequency-inverse document frequency (TF-IDF) method. Our evaluation shows that the best obtained prediction model for bug triage system is a combination of the bigram method with a linear support vector machine. For the long short term memory (LSTM), more than 8000 features would be needed. The bug triage tool wasimplemented in microservice architecture using docker containers which allows for extending of individual componentsof frameworks.

Thesis

Learning to Categorize Issues in Distributed Bug Tracker Systems