Numerical Data Imputation for Multimodal Data Sets: A Probabilistic Nearest-Neighbor Kernel Density Approach
AI-generated Key Points
- Numerical data imputation algorithms are commonly used to replace missing values in incomplete datasets.
- Current imputation methods struggle with accurately estimating missing values for multimodal or complex distributions, resulting in poor imputation results.
- The $k$NN$\times$KDE algorithm is proposed as a new data imputation method that combines nearest neighbor estimation ($k$NN) with density estimation using Gaussian kernels (KDE).
- Experiments were conducted using artificial and real-world datasets with different types and rates of missing data to evaluate the effectiveness of the $k$NN$\times$KDE algorithm.
- Results demonstrate that the $k$NN$\times$KDE algorithm can handle complex original data structures and produces lower imputation errors compared to existing methods.
- The approach provides probabilistic estimates with higher likelihoods than current techniques.
- The code for the $k$NN$\times$KDE algorithm has been released as open-source on GitHub for easy access and use by the community (https://github.com/DeltaFloflo/knnxkde).
- The study introduces a novel data imputation method that combines $k$NN and KDE techniques.
- Extensive experiments show that the approach outperforms existing methods in terms of accuracy and likelihood estimation.
- Researchers and practitioners can easily implement and apply the $k$NN$\times$KDE algorithm in their own work for improved outcomes.
Authors: Floria Lalande, Kenji Doya
Abstract: Numerical data imputation algorithms replace missing values by estimates to leverage incomplete data sets. Current imputation methods seek to minimize the error between the unobserved ground truth and the imputed values. But this strategy can create artifacts leading to poor imputation in the presence of multimodal or complex distributions. To tackle this problem, we introduce the $k$NN$\times$KDE algorithm: a data imputation method combining nearest neighbor estimation ($k$NN) and density estimation with Gaussian kernels (KDE). We compare our method with previous data imputation methods using artificial and real-world data with different data missing scenarios and various data missing rates, and show that our method can cope with complex original data structure, yields lower data imputation errors, and provides probabilistic estimates with higher likelihood than current methods. We release the code in open-source for the community: https://github.com/DeltaFloflo/knnxkde
Ask questions about this paper to our AI assistant
You can also chat with multiple papers at once here.
Assess the quality of the AI-generated content by voting
Score: 0
Why do we need votes?
Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.
The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.
Similar papers summarized with our AI tools
Navigate through even more similar papers through a
tree representationLook for similar papers (in beta version)
By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.
Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.