Thesis defense: Reconstruction of the Large-Scale Structure of the Universe using photometric redshifts and machine learning techniques

Thesis defense
Student: Erik Vinicius Rodrigues de Lima
Program: Astronomy
Title: "Reconstruction of the Large-Scale Structure of the Universe using photometric redshifts and machine learning techniques"
Advisor: Prof. Dr. Laerte Sodré Junior

Judging Committee

Presidente da Banca: Prof. Dr. Laerte Sodré Junior - IAG/USP

Profa. Dra. Cláudia Lucia Mendes de Oliveira - IAG/USP
Prof. Dr. Gastão Cesar Bierrenbach Lima Neto - IAG/USP
Dr. Clécio Roque de Bom – CBPF (por videoconferência)
Prof. Dr. Valerio Marra - UFES (por videoconferência)
Dr. Rafael Duarte Coelho dos Santos – INPE (por videoconferência)

Abstract

To fully understand our Universe, we need to characterize and study the structures that form it. These structures are composed of galaxy clusters and filaments, voids, and walls, and are distributed in space in a web-like structure, forming what we call the Cosmic Web. Since our observations of the sky offers only a two-dimensional visualization of the Universe, we need to estimate the distances to the celestial objects in order to obtain a fully, comprehensive, and three-dimensional visualization of the large-scale structure (LSS) that surrounds us. The current all-sky surveys are based on photometry, where the light of an object is observed in a number of filters, each of which covers a specific wavelength range. It is possible to leverage this photometric information for the estimation of a faster, more efficient, although less precise, redshift measurement, called photometric redshifts (photo-zs). In this work, we developed a machine learning algorithm, using a Bayesian Mixture Density Network architecture, to estimate photo-zs and probability density functions (PDFs) for the Southern Photometric Local Universe Survey (SPLUS). SPLUS is a survey focused on observing the Southern Hemisphere, and at the latest, fifth internal data release, has mapped over 4000 square degrees of the sky, with the goal of observing 9000 square degrees at the end of the project. The photometric information from this survey is complemented by the photometry of the Galaxy Evolution Explorer (GALEX), the Vista Hemisphere Survey (VHS), and the unWISE project, based on the Wide-field Infrared Survey Explorer (WISE) data, and morphological data from SPLUS. The joining of this data allows a broad wavelength coverage, from the ultraviolet to the mid-infrared. Since our model is supervised, we need to build a training sample that contains the value we want to predict, the spectroscopic redshift, and since a larger dataset can lead to a better generalization capacity, we created what is possibly the largest publicly available spectroscopic redshift compilation in the Southern Hemisphere, with data from 1852 catalogs and over 8 million objects, of which 2.5 million are galaxies. Using this data, we train our models with magnitudes, colors, and morphological information, and choose the best aperture and network hyperparameters using a Bayesian optimization scheme with the Optuna package in Python. We verify that our trained model is able to predict accurate and precise photo-zs and provide well-calibrated PDFs, through the analysis of several performance metrics for both cases. We also compared our results to those obtained by other methods (Random Forests, K-Nearest Neighbors, and Bayesian Automatic Relevance Determination Regression) trained on the same data, and to photometric redshifts from the Sloan Digital Sky Survey (SDSS) data-release 18 and from the DECam Local Volume Exploration Survey (DELVE) data-release 2, obtained for a sample of objects in the Stripe-82 region for which we also have estimates, and verified that the BMDN model performs better both for single-point estimates and PDFs. Using these accurate photo-zs as a starting point, and under the supposition that our estimates correspond a noisier spectroscopic redshifts, we develop other neural network models with the aim of recovering the LSS of the Universe, as seen with spectroscopic data, using photometric information only. For this task we choose models from the Autoencoder family, commonly used for dimensionality and noise reduction, Denoising Diffusion Probabilistic Models (DDPMs), which have become the state-of-the-art for noise removal in images, and Graph Neural Networks (GNNs), which are able to leverage spatial information and connections between samples in order to obtain more precise estimates. This step of our work is currently in progress. For future work, we plan to continue developing our LSS recovery models, implementing a template-fitting step during our model training process, effectively making it a "hybrid" model, taking advantage of the higher precision of machine learning and improved generalization capacity of template fitting, and using the magnitude errors as input to the network, so it would be trained on magnitude distributions instead of single values. Also, we intend to provide our code in a customizable and easy-to-use method for the community.

Keywords: large-scale structure, distances and redshifts, galaxies: photometry, methods: data analysis, techniques: photometric, catalogs, surveys

Date

Horário de início

Local

Diretoria

Depto. de Astronomia

Depto. de Geofísica

Depto. de Ciências Atmosféricas

Observatório Abrahão de Moraes

Estação Meteorológica

Secretaria de Graduação

Secretaria de Pós-graduação

E-mail