A Framework for Measuring Semantic Similarity between Software Requirements

Document Type : Original Article


Department of Computer Application, Integral University, Lucknow, India


The problem of measuring sentence similarity is an essential issue in the natural language processing area. Computing semantic similarity between software requirements in natural language is a related issue. In requirement engineering, the task of measuring requirements similarity is to find semantic symmetry in two sentences of software requirements, regardless of word order and context of the words. The correct requirements can be identified by checking similarity between the requirements received from the various stakeholders. A reusable software component can result in substantial savings in both time and money. A comparison of the requirements of a new project with those of previous projects prior to starting a new project or even at a later stage during development is useful for identifying reusable components. This paper proposes a framework (ReSim) for identifying software requirements' similarities, in an attempt to improve reusability and identify the correct requirements. A crucial component of ReSim is to measure similarity between software requirements. Some of the methods used to measure similarity between the software requirements include dice, jaccard, and cosine coefficients, but in this paper we have used recently developed hybrid method which considers not only semantic information including lexical databases, word embeddings, and corpus statistics, but also implied word order information and produced significant improvements in the results related to the measurement of semantic similarity between words and sentences. As part of the experiments, the study used PURE dataset - in order to demonstrate the efficacy of the proposed framework.


Main Subjects