The Loss Surface of Multilayer Networks
Auteurs : Anna Choromanska, Mikael Henaff, Michael Mathieu, Gerard Ben Arous, Yann LeCun
Résumé : We study the connection between the highly non-convex loss function of a simple model of the fully-connected feed-forward neural network and the Hamiltonian of the spherical spin-glass model under the assumptions of: i) variable independence, ii) redundancy in network parametrization, and iii) uniformity. These assumptions enable us to explain the complexity of the fully decoupled neural network through the prism of the results from the random matrix theory. We show that for large-size decoupled networks the lowest critical values of the random loss function are located in a well-defined narrow band lower-bounded by the global minimum. Furthermore, they form a layered structure. We show that the number of local minima outside the narrow band diminishes exponentially with the size of the network. We empirically demonstrate that the mathematical model exhibits similar behavior as the computer simulations, despite the presence of high dependencies in real networks. We conjecture that both simulated annealing and SGD converge to the band containing the largest number of critical points, and that all critical points found there are local minima and correspond to the same high learning quality measured by the test error. This emphasizes a major difference between large- and small-size networks where for the latter poor quality local minima have non-zero probability of being recovered. Simultaneously we prove that recovering the global minimum becomes harder as the network size increases and that it is in practice irrelevant as global minimum often leads to overfitting.
Explorez l'arbre d'article
Cliquez sur les nœuds de l'arborescence pour être redirigé vers un article donné et accéder à leurs résumés et assistant virtuel
Recherchez des articles similaires (en version bêta)
En cliquant sur le bouton ci-dessus, notre algorithme analysera tous les articles de notre base de données pour trouver le plus proche en fonction du contenu des articles complets et pas seulement des métadonnées. Veuillez noter que cela ne fonctionne que pour les articles pour lesquels nous avons généré des résumés et que vous pouvez le réexécuter de temps en temps pour obtenir un résultat plus précis pendant que notre base de données s'agrandit.