Numerical Data Imputation for Multimodal Data Sets: A Probabilistic Nearest-Neighbor Kernel Density Approach

AI-generated keywords: Data Imputation kNN KDE Accuracy Likelihood Estimation

AI-generated Key Points

  • Numerical data imputation algorithms are commonly used to replace missing values in incomplete datasets.
  • Current imputation methods struggle with accurately estimating missing values for multimodal or complex distributions, resulting in poor imputation results.
  • The $k$NN$\times$KDE algorithm is proposed as a new data imputation method that combines nearest neighbor estimation ($k$NN) with density estimation using Gaussian kernels (KDE).
  • Experiments were conducted using artificial and real-world datasets with different types and rates of missing data to evaluate the effectiveness of the $k$NN$\times$KDE algorithm.
  • Results demonstrate that the $k$NN$\times$KDE algorithm can handle complex original data structures and produces lower imputation errors compared to existing methods.
  • The approach provides probabilistic estimates with higher likelihoods than current techniques.
  • The code for the $k$NN$\times$KDE algorithm has been released as open-source on GitHub for easy access and use by the community (https://github.com/DeltaFloflo/knnxkde).
  • The study introduces a novel data imputation method that combines $k$NN and KDE techniques.
  • Extensive experiments show that the approach outperforms existing methods in terms of accuracy and likelihood estimation.
  • Researchers and practitioners can easily implement and apply the $k$NN$\times$KDE algorithm in their own work for improved outcomes.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Floria Lalande, Kenji Doya

30 pages, 8 figures, accepted in TMLR (Reproducibility certification)
License: CC BY 4.0

Abstract: Numerical data imputation algorithms replace missing values by estimates to leverage incomplete data sets. Current imputation methods seek to minimize the error between the unobserved ground truth and the imputed values. But this strategy can create artifacts leading to poor imputation in the presence of multimodal or complex distributions. To tackle this problem, we introduce the $k$NN$\times$KDE algorithm: a data imputation method combining nearest neighbor estimation ($k$NN) and density estimation with Gaussian kernels (KDE). We compare our method with previous data imputation methods using artificial and real-world data with different data missing scenarios and various data missing rates, and show that our method can cope with complex original data structure, yields lower data imputation errors, and provides probabilistic estimates with higher likelihood than current methods. We release the code in open-source for the community: https://github.com/DeltaFloflo/knnxkde

Submitted to arXiv on 29 Jun. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2306.16906v1

Numerical data imputation algorithms are commonly used to replace missing values in incomplete datasets. However, current imputation methods often struggle to accurately estimate missing values when dealing with multimodal or complex distributions, leading to poor imputation results. To address this issue, we propose a new data imputation method called the $k$NN$\times$KDE algorithm. This approach combines nearest neighbor estimation ($k$NN) with density estimation using Gaussian kernels (KDE). In order to evaluate the effectiveness of our method, we conducted experiments using both artificial and real-world datasets with different types and rates of missing data. Our results demonstrate that the $k$NN$\times$KDE algorithm is capable of handling complex original data structures and produces lower imputation errors compared to existing methods. Additionally, our approach provides probabilistic estimates with higher likelihoods than current techniques. To facilitate further research and application of our method, we have released the code as open-source on GitHub for the community to access and use (https://github.com/DeltaFloflo/knnxkde). In summary, our study introduces a novel data imputation method that combines $k$NN and KDE techniques. Through extensive experiments, we show that our approach outperforms existing methods in terms of accuracy and likelihood estimation. The availability of our open-source code enables researchers and practitioners to easily implement and apply the $k$NN$\times$KDE algorithm in their own work for improved outcomes.
Created on 30 Jun. 2023

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.