CASMS : Combining clustering with attention semantic model for identifying security bug reports
Research output: Journal Publications and Reviews › RGC 21 - Publication in refereed journal › peer-review
Author(s)
Related Research Unit(s)
Detail(s)
Original language | English |
---|---|
Article number | 106906 |
Journal / Publication | Information and Software Technology |
Volume | 147 |
Online published | 26 Mar 2022 |
Publication status | Published - Jul 2022 |
Link(s)
Abstract
Context: Inappropriate public disclosure of security bug reports (SBRs) is likely to attract malicious attackers to invade software systems; hence being able to detect SBRs has become increasingly important for software maintenance. Due to the class imbalance problem that the number of non-security bug reports (NSBRs) exceeds the number of SBRs, insufficient training information, and weak performance robustness, the existing techniques for identifying SBRs are still less than desirable.
Objective: This prompted us to overcome the challenges of the most advanced SBR detection methods.
Method: In this work, we propose the CASMS approach to efficiently alleviate the imbalance problem and predict bug reports. CASMS first converts bug reports into weighted word embeddings based on 𝑡𝑓 − 𝑖𝑑𝑓 and 𝑤𝑜𝑟𝑑2𝑣𝑒c techniques. Unlike the previous studies selecting the NSBRs that are the most dissimilar to SBRs, CASMS then automatically finds a certain number of diverse NSBRs via the Elbow method and k-means clustering algorithm. Finally, the selected NSBRs and all SBRs train an effective Attention CNN–BLSTM model to extract contextual and sequential information.
Results: The experimental results have shown that CASMS is superior to the three baselines (i.e., FARSEC, SMOTUNED, and LTRWES) in assessing the overall performance (g-measure) and correctly identifying SBRs (recall), with improvements of 4.09%–24.26% and 10.33%–36.24%, respectively. The best results are easily obtained under the limited ratio ranges of the two-class training set (1:1 to 3:1), with around 20 experiments for each project. By evaluating the robustness of CASMS via the standard deviation indicator, CASMS is more stable than LTRWES.
Conclusion: Overall, CASMS can alleviate the data imbalance problem and extract more semantic information to improve performance and robustness. Therefore, CASMS is recommended as a practical approach for identifying SBRs.
Objective: This prompted us to overcome the challenges of the most advanced SBR detection methods.
Method: In this work, we propose the CASMS approach to efficiently alleviate the imbalance problem and predict bug reports. CASMS first converts bug reports into weighted word embeddings based on 𝑡𝑓 − 𝑖𝑑𝑓 and 𝑤𝑜𝑟𝑑2𝑣𝑒c techniques. Unlike the previous studies selecting the NSBRs that are the most dissimilar to SBRs, CASMS then automatically finds a certain number of diverse NSBRs via the Elbow method and k-means clustering algorithm. Finally, the selected NSBRs and all SBRs train an effective Attention CNN–BLSTM model to extract contextual and sequential information.
Results: The experimental results have shown that CASMS is superior to the three baselines (i.e., FARSEC, SMOTUNED, and LTRWES) in assessing the overall performance (g-measure) and correctly identifying SBRs (recall), with improvements of 4.09%–24.26% and 10.33%–36.24%, respectively. The best results are easily obtained under the limited ratio ranges of the two-class training set (1:1 to 3:1), with around 20 experiments for each project. By evaluating the robustness of CASMS via the standard deviation indicator, CASMS is more stable than LTRWES.
Conclusion: Overall, CASMS can alleviate the data imbalance problem and extract more semantic information to improve performance and robustness. Therefore, CASMS is recommended as a practical approach for identifying SBRs.
Research Area(s)
- Security bug report, Clustering, Hybrid neural networks
Citation Format(s)
CASMS: Combining clustering with attention semantic model for identifying security bug reports. / Ma, Xiaoxue; Keung, Jacky; Yang, Zhen et al.
In: Information and Software Technology, Vol. 147, 106906, 07.2022.
In: Information and Software Technology, Vol. 147, 106906, 07.2022.
Research output: Journal Publications and Reviews › RGC 21 - Publication in refereed journal › peer-review