Deep learning should not work as well as it seems to: according to traditional statistics and machine learning, any analysis that has too many adjustable parameters will overfit noisy training data, and then fail when faced with novel test data. In clear violation of this principle, modern neural networks often use vastly more parameters than data points, but they nonetheless generalize to new data quite well.
The shaky theoretical basis for generalization has been noted for many years. One proposal was that neural networks implicitly perform some sort of regularization—a statistical tool that penalizes the use of extra parameters. Yet efforts to formally characterize such an "implicit bias" toward smoother solutions have failed, said Roi Livni, an advanced lecturer in the department of electrical engineering of Israel's Tel Aviv University. "It might be that it's like a needle in a haystack, and if we look further, in the end we will find it. But it also might be that the needle is not there."
No entries found