Generating High-fidelity, Synthetic Time Series Datasets with DoppelGANger
AI-generated Key Points
⚠The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.
- Limited data access is a major challenge in data-driven networking research and development
- Privacy concerns often impede the sharing of confidential information within organizations and with external stakeholders
- Synthetic data models have had limited success due to their narrow scope
- DoppelGANger is a synthetic data generation framework based on generative adversarial networks (GANs)
- DoppelGANger is designed for time series datasets with both continuous and discrete features
- DoppelGANger employs a new conditional architecture that separates metadata generation from time series generation
- DoppelGANger achieves up to 43% better fidelity compared to baseline models
- DoppelGANger captures structural properties of the data that baseline methods are unable to learn
- DoppelGANger provides an easy mechanism for data holders to protect attributes of their data without significant loss of utility
- This research presents a novel approach for generating high-fidelity synthetic time series datasets using GANs, addressing limitations of existing models and offering promising potential for overcoming barriers related to limited data access in networking research and development.
Authors: Zinan Lin, Alankar Jain, Chen Wang, Giulia Fanti, Vyas Sekar
Abstract: Limited data access is a substantial barrier to data-driven networking research and development. Although many organizations are motivated to share data, privacy concerns often prevent the sharing of proprietary data, including between teams in the same organization and with outside stakeholders (e.g., researchers, vendors). Many researchers have therefore proposed synthetic data models, most of which have not gained traction because of their narrow scope. In this work, we present DoppelGANger, a synthetic data generation framework based on generative adversarial networks (GANs). DoppelGANger is designed to work on time series datasets with both continuous features (e.g. traffic measurements) and discrete ones (e.g., protocol name). Modeling time series and mixed-type data is known to be difficult; DoppelGANger circumvents these problems through a new conditional architecture that isolates the generation of metadata from time series, but uses metadata to strongly influence time series generation. We demonstrate the efficacy of DoppelGANger on three real-world datasets. We show that DoppelGANger achieves up to 43% better fidelity than baseline models, and captures structural properties of data that baseline methods are unable to learn. Additionally, it gives data holders an easy mechanism for protecting attributes of their data without substantial loss of data utility.
Ask questions about this paper to our AI assistant
You can also chat with multiple papers at once here.
⚠The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.
Assess the quality of the AI-generated content by voting
Score: 0
Why do we need votes?
Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.
The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.
⚠The license of this specific paper does not allow us to build upon its content and the summarizing tools will be run using the paper metadata rather than the full article. However, it still does a good job, and you can also try our tools on papers with more open licenses.
Similar papers summarized with our AI tools
Navigate through even more similar papers through a
tree representationLook for similar papers (in beta version)
By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.
Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.