This is particularly galling because in everyday life, we humans generalize phenomenally well.
This behaviour is strange when contrasted to human learning.
Indeed, we could almost certainly get considerably better results by continuing to train past 400 epochs.

Of course, ten seconds isn't very long, but if you want to trial dozens of hyper-parameter choices it's annoying, and if you want to trial hundreds or thousands of choices it starts to get debilitating.
In fact, this is part of a more general strategy, which is to use the validation_data to evaluate different trial choices of hyper-parameters such as the number of epochs to train for, the learning rate, the best network architecture, and.
For instance, let's look at the cost on the test data: We can see that the cost on the test data improves until around epoch 15, but after that it actually starts to get worse, even though the cost on the training data is continuing.
The cost is the quadratic cost function, C, introduced back in Chapter.
This is achieved by adding a customer scanning area next to the cashier scanning area. We're not actually going to use softmax layers in the remainder of the chapter, so if you're in a great hurry, you can skip to the next section. In particular, our neuron is trying to compute the function x rightarrow y y(x). In other words, more training data can sometimes compensate for differences in the machine learning algorithm used. Maybe we really need at least 100 hidden neurons?

So it's possible to adopt more or less aggressive strategies for early stopping.
If we continued to use lambda.1 that would mean much less weight decay, and thus much less of a regularization effect.