The following article by Zhang et al. is well-known for having highlighted that widespread success of deep learning in artificial intelligence brings with it a fundamental new theoretical challenge, specifically: Why don't today's deep nets overfit to training data? This question has come to animate the theory of deep learning.
Let's understand this question in context of supervised learning, where the machine's goal is to learn to provide labels to inputs (for example, learn to label cat pictures with "1" and dog pictures with "0"). Deep learning solves this task by training a net on a suitably large training set of images that have been labeled correctly by humans. The parameters of the net are randomly initialized and thereafter adjusted in many stages via the simplest algorithm imaginable: gradient descent on the current difference between desired output and actual output.
No entries found