Abstract
In the information-based, diversified, and intelligent era of big data, human activities related to production and daily life have become increasingly diverse and dynamic, generating vast, complex, and varied data. Data science, emerging from the foundation of these data, presents significant opportunities and challenges for society. The ability to efficiently extract valuable information and knowledge from various data sources to provide useful services is becoming critically important. Against this backdrop, data mining, a core domain of knowledge discovery, is gaining even greater prominence in the big data era.Data mining focuses on extracting latent knowledge from massive datasets and presenting it in forms such as rules, concepts, and patterns. Although data mining techniques based on frequency and utility have garnered significant attention and achieved substantial progress, they generally lack consideration for temporal factors. Future research needs to overcome several limitations and challenges: (1) Recent temporal information is more valuable than long-past information; trends over time are more meaningful than the magnitude of observations. Most current research fails to utilize and capitalize on these temporal characteristics fully. (2) A few studies consider the impact of temporal factors but do not leverage them to enhance mining performance. (3) The theoretical framework for mining with temporal considerations remains underdeveloped regarding efficiency, systematization, continuity, and depth. Therefore, this thesis primarily explores data mining and analysis techniques incorporating temporal factors into frequency and utility considerations. The specific contents are as follows:
Fuzzy frequent periodic pattern mining. This part addresses the deficiencies of existing algorithms for mining fuzzy periodic frequent patterns in quantified temporal transaction databases by proposing a more efficient FP2M algorithm. The algorithm introduces two novel pruning strategies based on temporal factors, allowing for the filtering of more invalid candidate nodes during the mining process, thus improving the performance of the fuzzy periodic frequent pattern mining framework. Experimental comparisons demonstrate that the patterns mined by the FP2M algorithm are consistent with those mined by the state-of-the-art algorithms, proving its effectiveness. The FP2M algorithm outperforms current advanced algorithms regarding time consumption, memory usage, and the number of candidates, demonstrating its efficiency. To address the instability in mining fuzzy periodic frequent patterns, this study proposes the SFP2M algorithm for mining stable fuzzy periodic frequent patterns in quantified temporal transaction databases. The algorithm employs lability metrics and introduces pruning strategies based on lability. Experimental comparisons show that the SFP2M algorithm can discover many patterns that traditional frameworks might erroneously discard.
Skyline pattern mining with temporal recency. To address the limitation of traditional two-dimensional skyline frequency-monetary pattern mining, which does not consider temporal factors, this part proposes a three-dimensional framework for skyline recency-frequency-monetary pattern mining incorporating temporal recency. Different levels of constraints are defined within this framework. The newly proposed (W/S)SRFM-Miner algorithm introduces two new pruning strategies based on temporal recency. It designs the sparse storage structure-maximum monetary list for the maximal monetary matrix to verify pruning conditions quickly. The (weak/strong) skyline patterns storage matrix can be rapidly updated during the mining process based on recency-frequency indices. Experiments show that the patterns mined by the algorithm are a superset of the two-dimensional skyline patterns, demonstrating its effectiveness. The algorithm performs competitively in terms of time consumption, memory usage, and candidate quantity compared to the most advanced two-dimensional skyline mining algorithms, showcasing its performance. The algorithm exhibits near-linear growth in time and memory with different data scales, proving its scalability.
Real-time skyline pattern mining in dynamic incremental data. Existing skyline frequency-utility pattern mining methods are designed for static databases. In contrast, static data ignores the influence of time factors and cannot reflect the change in knowledge over time. Knowledge in the real world is time-sensitive, and external influences constantly update dynamic data, and its hidden knowledge is the most real-time and novel. In this part, we propose a problem framework for mining skyline frequency-utility patterns in dynamic data. To address the problem framework, this part proposes the ISFUM algorithm for mining skyline patterns in incremental transaction databases. The algorithm uses two new pruning strategies to reduce the search space and the main idea is to utilize the knowledge of the old skyline frequency-utility patterns to reduce the search space of the newly added dataset. The algorithm makes use of global and local utility lists to avoid repeated scanning of the old database during incremental mining. The algorithm is experimentally compared with the state-of-the-art algorithms for mining skyline frequency-utility patterns in static scenarios, and the consistent patterns mined by the algorithm prove the effectiveness of the algorithm; the algorithm's performance is demonstrated by the fact that the algorithm does not lose in terms of time consumption, memory usage, number of candidates, and number of combining operations compared to the comparison algorithm; the algorithm's time and memory do not show exponential growth in different data sizes, which proves the algorithm's scalability.
Temporal trend mining based on transaction data. Existing utility-based mining frameworks in transaction datasets often neglect temporal factors. To address this gap, this part of the thesis proposes a model for mining temporal trend patterns, establishing a utility ratio as the metric for time trends. The proposed SSIM and PSSIM algorithms are designed to mine short-sighted patterns exhibiting decreasing utility trends in quantified temporal transaction databases. Both algorithms use pruning strategies based on utility ratios, with PSSIM additionally employing temporal pruning to identify the earliest time exhibiting short-sighted attributes. Comparisons with state-of-the-art high-utility mining algorithms without temporal factors show that the mined patterns are subsets of high-utility patterns, proving the algorithms' effectiveness. The algorithms perform competitively in terms of time consumption, memory usage, and candidate quantity, showcasing their performance. The algorithms exhibit near-linear growth in time and memory with different data scales, demonstrating their scalability. Furthermore, by adjusting thresholds, the algorithms can be compatible with traditional high-utility mining algorithms.
| Date of Award | 28 Jul 2025 |
|---|---|
| Original language | English |
| Awarding Institution |
|
| Supervisor | Guoting Chen (External Supervisor), Jilu WANG (External Supervisor) & Linqi SONG (Supervisor) |