Skip to main content

2024 | OriginalPaper | Buchkapitel

Diminishing Unclear Consequences of Missing Values in Data Mining

verfasst von : Bhathawala Vaishnavi Pareshbhai, Sanjay H. Buch

Erschienen in: ICT: Innovation and Computing

Verlag: Springer Nature Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In the realm of data mining, the presence of missing values poses significant challenges that can undermine the accuracy and reliability of analytical outcomes. This study delves into the critical task of addressing missing values to mitigate the potential for ambiguous results in data mining processes. Recognizing the pivotal role of complete and accurate data in generating meaningful insights, this article explores various approaches for handling missing values, including omission, imputation, interpolation, and model-based techniques with valuable insights into selecting the most appropriate strategy based on contextual factors. Study also provides information about the potential of model-based imputation with their variants. The research article highlights the nuanced process of model selection and its pros and cons. The study provides a layman framework that integrates both traditional and innovative methodologies; this study contributes to a holistic understanding of mitigating the impact of missing values.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Ayilara OF, Zhang L, Sajobi TT, Sawatzky R, Bohm E (2019) Lix LM (2019), Impact of missing data on bias and precision when estimating change in patient-reported outcomes from a clinical registry. Health Qual Life Outcomes 17:106CrossRef Ayilara OF, Zhang L, Sajobi TT, Sawatzky R, Bohm E (2019) Lix LM (2019), Impact of missing data on bias and precision when estimating change in patient-reported outcomes from a clinical registry. Health Qual Life Outcomes 17:106CrossRef
2.
Zurück zum Zitat Grzymała-Busse JW, Grzymała-Busse WJ, Goodwin LK (1999), A closest fit approach to missing attribute values in preterm birth data. In: Zhong N, Skowron A, Ohsuga S (eds) New directions in rough sets, data mining, and granular-soft computing. RSFDGrC 1999. Lecture Notes in Computer Science, vol 1711. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-48061-7_49 Grzymała-Busse JW, Grzymała-Busse WJ, Goodwin LK (1999), A closest fit approach to missing attribute values in preterm birth data. In: Zhong N, Skowron A, Ohsuga S (eds) New directions in rough sets, data mining, and granular-soft computing. RSFDGrC 1999. Lecture Notes in Computer Science, vol 1711. Springer, Berlin, Heidelberg. https://​doi.​org/​10.​1007/​978-3-540-48061-7_​49
3.
Zurück zum Zitat Wang H, Wang S (2009) Discovering patterns of missing data in survey databases: an application of rough sets. Expert Syst Appl 36(3):6256–6260CrossRef Wang H, Wang S (2009) Discovering patterns of missing data in survey databases: an application of rough sets. Expert Syst Appl 36(3):6256–6260CrossRef
4.
Zurück zum Zitat Acock AC (2005) Working with missing values. J Marriage Fam 67(4):1012–1028CrossRef Acock AC (2005) Working with missing values. J Marriage Fam 67(4):1012–1028CrossRef
5.
Zurück zum Zitat Allison PD (2002) Missing data. In: Sage University Papers series on Quantitative Applications in Social Sciences, 07-136. Sage, Thousand Oaks, CA Allison PD (2002) Missing data. In: Sage University Papers series on Quantitative Applications in Social Sciences, 07-136. Sage, Thousand Oaks, CA
6.
Zurück zum Zitat Puri A, Gupta M (2019) Review on missing value imputation techniques in data mining. Int J Sci Res Comput Sci Eng Inform Technol 2(7):35–40 Puri A, Gupta M (2019) Review on missing value imputation techniques in data mining. Int J Sci Res Comput Sci Eng Inform Technol 2(7):35–40
7.
Zurück zum Zitat King G, Honaker J, Joseph A, Scheve K (2001) Analyzing incomplete political science data: an alternative algorithm for multiple imputation. Am Polit Sci Rev 95(1):49–69CrossRef King G, Honaker J, Joseph A, Scheve K (2001) Analyzing incomplete political science data: an alternative algorithm for multiple imputation. Am Polit Sci Rev 95(1):49–69CrossRef
8.
Zurück zum Zitat Baldwin KD, Ohman-Strickland P (2005) Missing data in orthopedic research. Univ Pennsylvania Orthop J 19 Baldwin KD, Ohman-Strickland P (2005) Missing data in orthopedic research. Univ Pennsylvania Orthop J 19
9.
Zurück zum Zitat Rana P, Pahuja D, Gautam R (2014) A critical review on outlier detection techniques. Int J Sci Res 3(12):2394–2403 Rana P, Pahuja D, Gautam R (2014) A critical review on outlier detection techniques. Int J Sci Res 3(12):2394–2403
10.
Zurück zum Zitat Sugar CA, Belim TR (2015) Evaluating model-based imputation methods for missing covariates in regression models with interactions. worldwidescience.org Sugar CA, Belim TR (2015) Evaluating model-based imputation methods for missing covariates in regression models with interactions. worldwidescience.org
11.
Zurück zum Zitat Horton NJ, Kleinman KP (2007) Much ado about nothing: a comparison of missing data methods and software to fit incomplete data regression models. Am Stat 61(1):79–90MathSciNetCrossRef Horton NJ, Kleinman KP (2007) Much ado about nothing: a comparison of missing data methods and software to fit incomplete data regression models. Am Stat 61(1):79–90MathSciNetCrossRef
12.
13.
Zurück zum Zitat Von Hippel PT (2007) Regression with missing Y’s: an improved strategy for analyzing multiply imputed data. Sociol Methodol 37:83–117CrossRef Von Hippel PT (2007) Regression with missing Y’s: an improved strategy for analyzing multiply imputed data. Sociol Methodol 37:83–117CrossRef
14.
Zurück zum Zitat Honaker J, King G (2010) What to do about missing values in time-series cross-section data. Am J Polit Sci 54(2):561–581CrossRef Honaker J, King G (2010) What to do about missing values in time-series cross-section data. Am J Polit Sci 54(2):561–581CrossRef
16.
Zurück zum Zitat Paul C, Mason WM, McCaffrey D, Fox SA (2008) A cautionary case study of approaches to the treatment of missing data. Stat Methods Appl 17(3):351–372MathSciNetCrossRef Paul C, Mason WM, McCaffrey D, Fox SA (2008) A cautionary case study of approaches to the treatment of missing data. Stat Methods Appl 17(3):351–372MathSciNetCrossRef
17.
Zurück zum Zitat Singh S, Prasad J (2013) Estimation of missing values in data mining. J Interdiscip Sci 1(2):75–90CrossRef Singh S, Prasad J (2013) Estimation of missing values in data mining. J Interdiscip Sci 1(2):75–90CrossRef
18.
Zurück zum Zitat Gaur S (2012) Closest fit approach to handle odd size missing block values. Int J Math Arch 3(7) Gaur S (2012) Closest fit approach to handle odd size missing block values. Int J Math Arch 3(7)
19.
Zurück zum Zitat Gaur S, Pandya DD, Soni D (2020) Closest fit approach through linear interpolation to recover missing values in data mining. In: Yang XS, Sherratt S, Dey N, Joshi A (eds) Fourth International Congress on Information and Communication Technology. Advances in Intelligent Systems and Computing, vol 1041. Springer, Singapore Gaur S, Pandya DD, Soni D (2020) Closest fit approach through linear interpolation to recover missing values in data mining. In: Yang XS, Sherratt S, Dey N, Joshi A (eds) Fourth International Congress on Information and Communication Technology. Advances in Intelligent Systems and Computing, vol 1041. Springer, Singapore
20.
Zurück zum Zitat Gaur S, Dulawat MS (2010) A perception of statistical inference in data mining. Int J Comput Sci Commun 1(2):653–658 Gaur S, Dulawat MS (2010) A perception of statistical inference in data mining. Int J Comput Sci Commun 1(2):653–658
21.
Zurück zum Zitat Sharma S, Gaur S (2013) Contiguous agile approach to manage odd size missing block in data mining. Int J Adv Res Comput Sci 4(11):214 Sharma S, Gaur S (2013) Contiguous agile approach to manage odd size missing block in data mining. Int J Adv Res Comput Sci 4(11):214
24.
Zurück zum Zitat Gaur S (2014) Estimation of missing value at extremes in data mining. Int J Adv Found Res Comput 14(03):13–19 Gaur S (2014) Estimation of missing value at extremes in data mining. Int J Adv Found Res Comput 14(03):13–19
Metadaten
Titel
Diminishing Unclear Consequences of Missing Values in Data Mining
verfasst von
Bhathawala Vaishnavi Pareshbhai
Sanjay H. Buch
Copyright-Jahr
2024
Verlag
Springer Nature Singapore
DOI
https://doi.org/10.1007/978-981-99-9486-1_21

Neuer Inhalt