Selecting features by utilizing intuitionistic fuzzy Entropy method
Feature selection is the most significant pre-processing activity, which intends to reduce the data dimensionality for enhancing the machine learning process. The evaluation of feature selection must consider classification, performance, efficiency, stability, and many factors. Nowadays, uncertainty is commonly occurred in the feature selection process due to time limitations, imprecise information, and the subjectivity of human minds. Moreover, the theory of intuitionistic fuzzy set has been proven as an extremely valuable tool to tackle the uncertainty and ambiguity that arises in many practical situations. Thus, this study introduces a novel feature selection framework using intuitionistic fuzzy entropy. In this regard, new entropy for IFS is proposed first and then compared with some of the previously developed entropy measures. As entropy is a measure of uncertainty present in data (features), features with higher entropy values are filtered out, and the remaining features having lower entropy values have been used to classify the data. To verify the effectiveness of the proposed entropy-based feature selection, some experiments are done with ten standard benchmark datasets by employing a support vector machine, K-nearest neighbor, and Naïve Bias classifiers. The outcomes of the study validate that the proposed entropy-based filter feature selection is more feasible and impressive than existing filter-based feature selection methods.
Álvarez, J.D., Matias-Guiu, J.A., Cabrera-Martín, M.N., Risco-Martín, J.L., & Ayala, J.L. (2019). An application of machine learning with feature selection to improve diagnosis and classification of neurodegenerative disorders. BMC Bioinformatics, 20, 01-12.
Ansari, M.D., Mishra, A.R., & Ansari, F.T. (2018). New divergence and entropy measures for intuitionistic fuzzy sets on edge detection, International Journal of Fuzzy Systems, 20(2), 474-487. DOI: https://doi.org/10.1007/s40815-017-0348-4
Aremu, O.O., Cody, R.A., Hyland-Wood, D., & McAree, P.R. (2020). A relative entropy based feature selection framework for asset data in predictive maintenance. Computers & Industrial Engineering, 145, 106536, https://doi.org/10.1016/j.cie.2020.106536.
Atanassov, K.T. (1986). Intuitionistic fuzzy sets. Fuzzy Sets and Systems, 20(1), 87-96. DOI: https://doi.org/10.1016/S0165-0114(86)80034-3
Bahassine, S., Madani, A., Al-Sarem, M., & Kissi, M. (2020). Feature selection using an improved Chi-square for Arabic text classification. Journal of King Saud University-Computer and Information Sciences, 32(2), 225-231. DOI: https://doi.org/10.1016/j.jksuci.2018.05.010
Bairagi, B. (2022). A new framework for green selection of material handling equipment under fuzzy environment. Decision Making: Applications in Management and Engineering. https://doi.org/10.31181/dmame0313052021b. DOI: https://doi.org/10.31181/dmame0313052021b
Blum, A., & Langley, P. (1997). Selection of relevant features and examples in machine learning. Artificial Intelligence, 97, 245–271. DOI: https://doi.org/10.1016/S0004-3702(97)00063-5
Bustince, H., & Burillo, P. (1996). Vague sets are intuitionistic fuzzy sets. Fuzzy Sets and Systems, 79, 403-405 DOI: https://doi.org/10.1016/0165-0114(95)00154-9
Chapelle, O., Haffner, P., & Vapnik, V.N. (2018). Support Vector Machines for Histogram-Based Image Classification. IEEE Transactions on Neural Networks, 10, 1055–1064. DOI: https://doi.org/10.1109/72.788646
Chen J., Yuan S.H., Lv D., & Yang, X. (2021). A novel self-learning Feature selection approach based on feature attributions. Expert Systems with Applications, https://doi.org/10.1016/j.eswa.2021.115219.
Coelho, F., Costa, M., Verleysen, M., & Braga, A.P. (2020). LASSO multi-objective learning algorithm forfeature selection. Soft Computing 24, 13209–13217.
D1: Cancer patient data obtained by https://www.kaggle.com/rishidamarla/cancer-patients-data accessed on 18 Feb 2021.
Delizo, J.P.D., Abisado, M.B., & Trinos, M.I.D. (2020) Philippine twitter sentiments during covid-19 pandemic using multinomial naïve-bayes. International Journal of Advanced Trends in Computer Science and Engineering, 9, 408–412.
Dua, D., & Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.
Eiras-Franco, C., Guijarro-Berdiñas, B., Alonso-Betanzos, A., & Bahamonde, A. (2021). Scalable feature selection using ReliefF aided by locality-sensitive hashing. International Journal of Intelligent Systems, 36, 6161-6179.
Estevez, P., Tesmer, M., Perez, C., & Zurada, J.M. (2009). Normalized Mutual Information Feature Selection. IEEE Transactions on Neural Networks, 20(2), 189-201. DOI: https://doi.org/10.1109/TNN.2008.2005601
Gao, L., & Wu, W. (2020). Relevance assignation feature selection method based on mutual information for machine learning. Knowledge-Based Systems, 209, 106439, doi:10.1016/j.knosys.2020.106439.
He, X.F., Cai, D., & Niyogi, P. (2005). Laplacian score for feature selection. Proceedings of the 18th International Conference on Neural Information Processing Systems, 5-8 December 2005, 507-514.
Hezam, I.M., Mishra, A.R., Rani, P., Cavallaro, F., Saha, A., Ali, J., Strielkowski, W., & Štreimikienė, D. (2022). A hybrid intuitionistic fuzzy-MEREC-RS-DNMA method for assessing the alternative fuel vehicles with sustainability perspectives. Sustainability 14, 1-32.
High, R., Eyres, G. T., Bremer, P., & Kebede, B. (2021). Characterization of blue cheese volatiles using finger printing, self-organizing maps, and entropy-based feature selection. Food Chemistry 347, 128955, https://doi.org/10.1016/j.foodchem.2020.128955.
Jaganathan, P., & Kuppuchamy, R. (2013). A threshold fuzzy entropy based feature selection for medical database classification. Computer in Biology and Medicine, 43(12), 2222–2229. DOI: https://doi.org/10.1016/j.compbiomed.2013.10.016
Ji, B., Lu, X., Sun, G., Zhang, W., Li, J., & Xiao, Y. (2020). Bio-Inspired Feature Selection: An Improved Binary Particle Swarm Optimization Approach. IEEE Access, 8, 85989-86002.
Jo, I., Lee, S., & Oh, S. (2019). Improved Measures of Redundancy and Relevance for mRMR Feature Selection. Computers, 8, 01-15.
Kim, K., & Zzang, S.Y. (2019). Trigonometric comparison measure: A feature selection method for text categorization. Data & Knowledge Engineering, 119, 1-21.
Kohavi, R., & John, G. H. (1997). Wrappers for feature subset selection. Artificial Intelligence, 7, 273–323. DOI: https://doi.org/10.1016/S0004-3702(97)00043-X
Kumar, S., Maity, S. R., & Patnaik, L. (2022). Optimization of Wear Parameters for Duplex-TiAlN Coated MDC-K Tool Steel Using Fuzzy MCDM Techniques. Operational Research in Engineering Sciences: Theory and Applications. https://doi.org/10.31181/110722105k. DOI: https://doi.org/10.31181/110722105k
Kushwaha, D. K., Panchal, D., & Sachdeva, A. (2020). Risk analysis of cutting system under intuitionistic fuzzy environment. Reports in Mechanical Engineering, 1(1), 162-173. DOI: https://doi.org/10.31181/rme200101162k
Lee H. M., Chen, C. M., Chen, J. M., & Jou, Y. L. (2001). An efficient fuzzy classifier with feature selection based on fuzzy entropy. IEEE Transactions on Systems, Man and Cybernetics-Part B: Cybernetics, 31(3), 425-432. DOI: https://doi.org/10.1109/3477.931536
Lohrmann, C., Luukka, P., Jablonska-Sabuka, M., & Kauranne, T. (2018). A combination of fuzzy similarity measures and fuzzy entropy measures for supervised feature selection. Expert Systems with Applications, 110, 216-236. DOI: https://doi.org/10.1016/j.eswa.2018.06.002
Luukka, P. (2011). Feature selection using fuzzy entropy measures with similarity classifier Expert Systems with Applications, 38, 4600–4607. DOI: https://doi.org/10.1016/j.eswa.2010.09.133
Mishra, A.R. (2016). Intuitionistic fuzzy information with application in rating of township development. Iranian Journal of Fuzzy Systems, 13, 49–70
Mishra, A.R., Jain, D., & Hooda, D.S., (2017a). Exponential intuitionistic fuzzy information measure with assessment of service quality. International Journal of Fuzzy Systems, 19(3), 788-798. DOI: https://doi.org/10.1007/s40815-016-0278-6
Mishra, A.R., Kumari, R., & Sharma, D.K. (2019a). Intuitionistic fuzzy divergence measure-based multi-criteria decision-making method. Neural Computing and Applications 31, 2279-2294. DOI: https://doi.org/10.1007/s00521-017-3187-1
Mishra, A.R., & Rani, P. (2019). Shapley divergence measures with VIKOR method for multi-attribute decision making problems. Neural Computing and Applications 31(2), 1299-1316. DOI: https://doi.org/10.1007/s00521-017-3101-x
Mishra, A.R., Rani, P., & Jain, D. (2017b) Information measures based TOPSIS method for multicriteria decision making problem in intuitionistic fuzzy environment. Iranian Journal of Fuzzy Systems 14(6), 41-63.
Mishra, A.R., Singh, R.K., & Motwani, D. (2019b). Multi-criteria assessment of cellular mobile telephone service providers using intuitionistic fuzzy WASPAS method with similarity measures. Granular Computing, 4, 511-529,
Mishra, A.R., Singh, R.K., & Motwani, D. (2020b). Intuitionistic fuzzy divergence measure-based ELECTRE method for performance of cellular mobile telephone service providers. Neural Computing and Applications, 32, 3901-3921.
Mishra, A.R., Sisodia, G., Pardasani, K.R., & Sharma, K. (2020a). Multicriteria IT personnel selection on intuitionistic fuzzy information measures and ARAS methodology. Iranian Journal of Fuzzy Systems, 17(4), 55-68.
Murugesan, S., Bhuvaneswaran, R.S., Nehemiah, H.K., Sankari, S.K., & Jane, Y.N. (2021) Feature selection and classification of clinical datasets using bioinspired algorithms and super learner. Computational and Mathematical Methods in Medicine, Article ID 6662420, https://doi.org/10.1155/2021/6662420.
Omuya, E.O., Okeyo, G.O., & Kimwele, M.W. (2021). Feature selection for classification using principal component analysis and information gain. Expert Systems with Applications, 174, 114765, https://doi.org/10.1016/j.eswa.2021.114765.
Parlak, B., & Uysal, A.K. (2021). A novel filter feature selection method for text classification: Extensive Feature Selector. Journal of Information Science. doi:10.1177/0165551521991037.
Pintas, J.T., Fernandes, L.A.F., & Garcia, A.C.B. (2021). Feature selection methods for text classification: a systematic literature review. Artificial Intelligence Review, 54, 6149–6200.
Precup, R.-E., Preitl , S., Petriu, E., Bojan-Dragos , C.-A., Szedlak-Stinean, A.-I., Roman, R.-C., & Hedrea, E.-L. (2020). Model-Based Fuzzy Control Results for Networked Control Systems. Reports in Mechanical Engineering, 1(1), 10-25. DOI: https://doi.org/10.31181/rme200101010p
Qu, Y., Li, R., Deng, A., Shang, C., & Shen, Q. (2020). Non-unique decision differential entropy-based feature selection. Neurocomputing, 393, 187-193.
Rahimi, M., Kumar, P., Moomivand, B., & Yari, G. (2021). An intuitionistic fuzzy entropy approach for supplier selection. Complex &Intelligent Systems, 7, 1869–1876.
Rangasamy, P. (2021). Intuitionistic Fuzzy (IF) Logic Toolbox (https://www.mathworks.com/matlabcentral/fileexchange/68198-intuitionistic-fuzzy-if-logic-toolbox), MATLAB Central File Exchange. Retrieved JANUARY 30, 2021.
Rani, P., & Jain, D. (2017). Intuitionistic fuzzy PROMETHEE technique for multi-criteria decision making problems based on entropy measure In: Singh, M., Gupta, P., Tyagi, V., Sharma, A., Ören, T., Grosky, W. (eds) Advances in Computing and Data Sciences. ICACDS 2016. Communications in Computer and Information Science, 721, 290–301. DOI: https://doi.org/10.1007/978-981-10-5427-3_31
Rani, P., Mishra, A. R., Ansari, M. D., & Ali, J. (2021). Assessment of performance of telecom service providers using intuitionistic fuzzy grey relational analysis framework (IF-GRA). Soft Computing, 25, 1983-1993.
Rehman, A., Javed, K., & Babri, H. A. (2017). Feature selection based on a normalized difference measure for text classification. Information Processing & Management, 53(2), 473-489. DOI: https://doi.org/10.1016/j.ipm.2016.12.004
Revanasiddappa, M.B., & Harish, B.S. (2018). A new feature selection method based on intuitionistic fuzzy entropy to categorize text documents. International Journal of Interactive Multimedia and Artificial Intelligence, 5(3), 106-117. DOI: https://doi.org/10.9781/ijimai.2018.04.002
Rostami, M., Berahmand, K., & Forouzandeh, S. (2021). A novel community detection based genetic algorithm for feature selection. Journal of Big Data, 8, 2, https://doi.org/10.1186/s40537-020-00398-3
Ruan, F., Hou, L., Zhang, T., & Li, H. (2021). A novel hybrid filter/wrapper method for feature selection in archaeological ceramics classification by laser-induced break down spectroscopy. Analyst, 146, 1023-1031, DOI: 10.1039/d0an02045a.
Shah, C., & Jivani, (2013). A Comparison of Data Mining Classification Algorithms for Breast Cancer Prediction. 2013 Fourth International Conference on Computing, Communications and Networking Technologies (ICCCNT), pp. 1-4, https://doi.org/10.1109/ICCCNT.2013.6726477. DOI: https://doi.org/10.1109/ICCCNT.2013.6726477
Singh, S., Shreevastava, S., Som, T., & Jain, P. (2019). Intuitionistic Fuzzy Quantifier and Its Application in Feature Selection. International Journal of Fuzzy Systems, 21, 441-453.
Sun, L., Wang, L., Ding, W., Qian, Y., & Xu, J. (2021). Feature selection using fuzzy neighborhood entropy-based uncertainty measures for fuzzy neighborhood multigranulation rough sets. IEEE Transactions on Fuzzy Systems, 29(1), 19-33.
Sun, S., & Huang, R. (2010). An Adaptive k-Nearest Neighbor Algorithm. 2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery, pp. 91-94, https://doi.org/10.1109/FSKD.2010.5569740. DOI: https://doi.org/10.1109/FSKD.2010.5569740
Sun, X., Liu, Y.H., Li, J.Q., Zhu, J.Q., Liu, X., & Chen, H. (2012). Using cooperative game theory to optimize the feature selection problem. Neurocomputing, 97(15), 86-93. DOI: https://doi.org/10.1016/j.neucom.2012.05.001
Sun, L., Wang, L., Xu, J., & Zhang, S. (2019). A Neighborhood Rough Sets-Based Attribute Reduction Method Using Lebesgue and Entropy Measures. Entropy. 21(2), 138. https://doi.org/10.3390/e21020138
Szmidt, E., & Kacprzyk, J. (2001). Entropy for Intuitionistic Fuzzy Sets. Fuzzy Sets and Systems, 118, 467-477. DOI: https://doi.org/10.1016/S0165-0114(98)00402-3
Tang, B., Kay, S., & He, H. (2016).Toward optimal feature selection in naive Bayes for text categorization. IEEE Transactions on Knowledge and Data Engineering, 28(9), 2508-2521. DOI: https://doi.org/10.1109/TKDE.2016.2563436
Thakkar, A., & Chaudhari, K. (2020). Predicting stock trend using an integrated term frequency–inverse document frequency-based feature weight matrix with neural networks. Applied Soft Computing, 96, 106684, https://doi.org/10.1016/j.asoc.2020.106684.
Thao, N. X., & Chou, S. Y. (2022). Novel similarity measures, entropy of intuitionistic fuzzy sets and their application in software quality evaluation. Soft Computing 26, 2009–2020.
Tiwari, A. K., Shreevastava, S., Subbiah, K., & Som, T. (2019). An intuitionistic fuzzy-rough set model and its application to feature selection. Journal of Intelligent & Fuzzy Systems, 36, 4969-4979.
Tran, M.-Q., Elsisi, M., & Liu, M.-K. (2021). Effective feature selection with fuzzy entropy and similarity classifier for chatter vibration diagnosis. Measurement, 184, 109962, https://doi.org/10.1016/j.measurement.2021.109962.
Tripathi, D., Nigam, S. K., Mishra, A. R., & Shah, A. R. (2022a). A Novel Intuitionistic Fuzzy Distance Measure-SWARA-COPRAS Method for Multi-Criteria Food Waste Treatment Technology Selection. Operational Research in Engineering Sciences: Theory and Applications. https://doi.org/10.31181/oresta111022106t DOI: https://doi.org/10.31181/oresta111022106t
Tripathi, D.K., Nigam, S.K., Rani, P., & Shah, A.R. (2022b). New intuitionistic fuzzy parametric divergence measures and score function-based CoCoSo method for decision-making problems. Decision Making: Applications in Management and Engineering. https://doi.org/10.31181/dmame0318102022t DOI: https://doi.org/10.31181/dmame0318102022t
Vlachos, I. K., & Sergiadis, G.D. (2007). Intuitionistic fuzzy information-applications to pattern recognition, Pattern Recognition Letters, 28, 197-206. DOI: https://doi.org/10.1016/j.patrec.2006.07.004
Wei, C.P. Gao, Z.H., & Guo, T.T. (2012). An intuitionistic fuzzy entropy measure based on the trigonometric function. Control and Decision, 27, 571-574
Zadeh, L.A. (1965), Fuzzy sets. Information & Control, 8, 338–353. DOI: https://doi.org/10.1016/S0019-9958(65)90241-X
Zaidi, N., Cerquides, J., Carman, M., & Webb, G. (2013). Alleviating Naive Bayes Attribute Independence Assumption by Attribute Weighting. The Journal of Machine Learning Research, 14, 1947-1988.
Zhang, X., Mei, C., Chen, D., & Li, J. (2016.) Feature selection in mixed data: A method using a novel fuzzy rough set-based information entropy, Pattern Recognition 56, 1-15, https://doi.org/10.1016/j.patcog.2016.02.013 DOI: https://doi.org/10.1016/j.patcog.2016.02.013
Zhang, Q. S., & Jiang, S.Y. (2008). A note on information entropy measures for vague sets and its applications. Information Sciences, 178, 4184-4191. DOI: https://doi.org/10.1016/j.ins.2008.07.003
Zhang, X., Fan, Y., & Yang, J. (2021a). Feature selection based on fuzzy-neighborhood relative decision entropy. Pattern Recognition Letters, 146, 100-107, https://doi.org/10.1016/j.patrec.2021.03.001.
Zhang, Y., Huang, F., Deng X., & Jiang, W. (2021b). A New Total Uncertainty Measure from A Perspective of Maximum Entropy Requirement. Entropy, https://doi.org/10.3390/e23081061.
Zhao, J., Liang, J.-M., Dong, Z.-N., Tang, D.-Y., & Liu, Z. (2020). Accelerating information entropy-based feature selection using rough set theory with classified nested equivalence classes. Pattern Recognition, 107, 107517, https://doi.org/10.1016/j.patcog.2020.107517.