Analogical Relevance Index

01/08/2023
by   Suryani Lim, et al.
0

Focusing on the most significant features of a dataset is useful both in machine learning (ML) and data mining. In ML, it can lead to a higher accuracy, a faster learning process, and ultimately a simpler and more understandable model. In data mining, identifying significant features is essential not only for gaining a better understanding of the data but also for visualization. In this paper, we demonstrate a new way of identifying significant features inspired by analogical proportions. Such a proportion is of the form of "a is to b as c is to d", comparing two pairs of items (a, b) and (c, d) in terms of similarities and dissimilarities. In a classification context, if the similarities/dissimilarities between a and b correlate with the fact that a and b have different labels, this knowledge can be transferred to c and d, inferring that c and d also have different labels. From a feature selection perspective, observing a huge number of such pairs (a, b) where a and b have different labels provides a hint about the importance of the features where a and b differ. Following this idea, we introduce the Analogical Relevance Index (ARI), a new statistical test of the significance of a given feature with respect to the label. ARI is a filter-based method. Filter-based methods are ML-agnostic but generally unable to handle feature redundancy. However, ARI can detect feature redundancy. Our experiments show that ARI is effective and outperforms well-known methods on a variety of artificial and some real datasets.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset
Success!
Error Icon An error occurred

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro