Emergent Denoising of SDSS Galaxy Spectra Through Unsupervised Deep Learning

Authors: Oliver Camilleri, Zahra Sharbaf, Ignacio Ferreras

Abstract: Spectroscopy represents the ideal observational method to maximally extract information from galaxies regarding their star formation and chemical enrichment histories. However, absorption spectra of galaxies prove rather challenging at high redshift or in low mass galaxies, due to the need to spread the photons into a relatively large set of spectral bins. For this reason, the data from many state-of-the-art spectroscopic surveys suffer from low signal-to-noise (S/N) ratios, and prevent accurate estimates of the stellar population parameters. In this paper, we tackle the issue of denoising an ensemble by the use of unsupervised Deep Learning techniques trained on a homogeneous sample of spectra over a wide range of S/N. These methods reconstruct spectra at a higher S/N and allow us to investigate the potential for Deep Learning to faithfully reproduce spectra from incomplete data. Our methodology is tested on three key line strengths and is compared with synthetic data to assess retrieval biases. The results suggest a standard Autoencoder as a very powerful method that does not introduce systematics in the reconstruction. We also note in this work how careful the analysis needs to be, as other methods can -- on a quick check -- produce spectra that appear noiseless but are in fact strongly biased towards a simple overfitting of the noisy input. Denoising methods with minimal bias will maximise the quality of ongoing and future spectral surveys such as DESI, WEAVE, or WAVES.

Submitted to arXiv on 09 Oct. 2025

Explore the paper tree

Click on the tree nodes to be redirected to a given paper and access their summaries and virtual assistant

Also access our AI generated Summaries, or ask questions about this paper to our AI assistant.

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.