Generating High-fidelity, Synthetic Time Series Datasets with DoppelGANger

AI-generated keywords: DoppelGANger GANs Synthetic Data Time Series Dataset Fidelity

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • Limited data access is a major challenge in data-driven networking research and development
  • Privacy concerns often impede the sharing of confidential information within organizations and with external stakeholders
  • Synthetic data models have had limited success due to their narrow scope
  • DoppelGANger is a synthetic data generation framework based on generative adversarial networks (GANs)
  • DoppelGANger is designed for time series datasets with both continuous and discrete features
  • DoppelGANger employs a new conditional architecture that separates metadata generation from time series generation
  • DoppelGANger achieves up to 43% better fidelity compared to baseline models
  • DoppelGANger captures structural properties of the data that baseline methods are unable to learn
  • DoppelGANger provides an easy mechanism for data holders to protect attributes of their data without significant loss of utility
  • This research presents a novel approach for generating high-fidelity synthetic time series datasets using GANs, addressing limitations of existing models and offering promising potential for overcoming barriers related to limited data access in networking research and development.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Zinan Lin, Alankar Jain, Chen Wang, Giulia Fanti, Vyas Sekar

28 pages, 35 figures

Abstract: Limited data access is a substantial barrier to data-driven networking research and development. Although many organizations are motivated to share data, privacy concerns often prevent the sharing of proprietary data, including between teams in the same organization and with outside stakeholders (e.g., researchers, vendors). Many researchers have therefore proposed synthetic data models, most of which have not gained traction because of their narrow scope. In this work, we present DoppelGANger, a synthetic data generation framework based on generative adversarial networks (GANs). DoppelGANger is designed to work on time series datasets with both continuous features (e.g. traffic measurements) and discrete ones (e.g., protocol name). Modeling time series and mixed-type data is known to be difficult; DoppelGANger circumvents these problems through a new conditional architecture that isolates the generation of metadata from time series, but uses metadata to strongly influence time series generation. We demonstrate the efficacy of DoppelGANger on three real-world datasets. We show that DoppelGANger achieves up to 43% better fidelity than baseline models, and captures structural properties of data that baseline methods are unable to learn. Additionally, it gives data holders an easy mechanism for protecting attributes of their data without substantial loss of data utility.

Submitted to arXiv on 30 Sep. 2019

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 1909.13403v1

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

Limited data access is a major challenge in data-driven networking research and development. Despite the motivation to share data, privacy concerns often impede the sharing of confidential information within organizations and with external stakeholders such as researchers and vendors. To address this issue, researchers have proposed synthetic data models but these models have had limited success due to their narrow scope. In this study, the authors introduce DoppelGANger, a synthetic data generation framework based on generative adversarial networks (GANs). DoppelGANger is specifically designed for time series datasets that contain both continuous features (e.g., traffic measurements) and discrete features (e.g., protocol name). Modeling time series and mixed-type data has traditionally been difficult but DoppelGANger overcomes these difficulties by employing a new conditional architecture. This architecture separates the generation of metadata from time series while allowing metadata to strongly influence time series generation. The effectiveness of DoppelGANger is demonstrated using three real-world datasets. The results show that DoppelGANger achieves up to 43% better fidelity compared to baseline models and captures structural properties of the data that baseline methods are unable to learn. Additionally, DoppelGANger provides an easy mechanism for data holders to protect attributes of their data without significant loss of utility. Overall, this research presents a novel approach for generating high-fidelity synthetic time series datasets using GANs which addresses the limitations of existing models and provides improved fidelity and structural properties offering promising potential for overcoming barriers related to limited data access in networking research and development.
Created on 09 Aug. 2023

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

The license of this specific paper does not allow us to build upon its content and the summarizing tools will be run using the paper metadata rather than the full article. However, it still does a good job, and you can also try our tools on papers with more open licenses.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.