Table of contents
Discussion:
We discussed the universal approximation theorem that I presented in L1 form. Some additional info: If we consider Lp approximation for unbounded domains we need that the activation function is unbounded, if we consider bounded continuous functions in bounded domains then we can approximate uniformly with any continuous non-constant activation function.
For more info about the proof see Horniks paper: https://doi.org/10.1016/0893-6080(91)90009-T.
We also discussed the conjugate gradient method for optimising a neural network, in 2006, Hager and Zhang developed a line-search method for nonlinear functions (a "nonlinear conjugate gradient scheme"), see https://doi.org/10.1145/1132973.1132979. It is implemented in Tensorflow.