Deep Learning: Physics, numerical experiment and study of neural network and deep learning

euler.bonjour@gmail.com / https://github.com/khj1977
金輝俊 / Hwi Jun KIM

Aim

Aim: To make mega-city in Japan robust;
Objective: To make better object detection algorithm for automatic controlled vehicle;
Methodology: Physical modelling, numerical experiment and study.

Experiment with photo and possibly with video would be done. But experiment with vehicle is out of scope.

Introduction

Deep learning is a hot topic in the world recent years. It is one kind of algorithm of machine learning and also one of the technique of AI. Deep learning is one kind of neural network. What is a neural network? What is a difference between deep learning and neural network? In this article, author's consideration will be shown.

Before to go to issue of deep learning, why deep learning is researched? Here is citation from my real M.Phil thesis:

In recent year, importance of automatic controlled vehicle gets higher. What kind of theories or techniques are used in automatic controlled vehicle? Those are control enginering, machine learning and more. Author was a specialist of machine learning in Japanese venture company as Chief IT Architect who is a general manager level. Also, as shown in this thesis, author was a specialist of adaptive robust control. Moreover, author was a PM in IT (SI) company and head of office which is mixture of PM and PdM. In the view point of triple of machine learning, robust control and PM/PdM, there could be still problem for automatic controlled vehicle if it were used in any situation in daily life of ordinary person.

Why automatic controlled vehicle? It could be said that the current topology of train based transportation of people in Tokyo is not robust but weak with respect to some accident. That topology is star network type composed by Yamanote-line and private train company from major terminals. A solution would be distributed Tokyo. In distributed Tokyo, there are many small but dense and compact town or city. Inside city, it can be spent life with walking, bicycle and small automatic controlled EV. Each towns are spread over distributed Tokyo almost randomly. Inter-connection between town or city is done by automatic controlled bus whose number is really high. If there are many bus and other kind of automatic vehicles, it could not be deployed by human but automatic controlled bus would be better choice. Therefore, automatic controlled vehicle is better choice as for strategy of Tokyo or big region in Japan.

One of an application of neural network is object recognition especially for automatic controlled vehicle. It is kind of supervised learning. Neural network consists of neuron networks and output. There are several outputs and each output expresses class or kind of object. Using that output, object recognition or detection will be done.

Mathematically speaking, neural network is graph of neurons. There ie equivalent matrix of neural network. Then, learning and recognition is all about matrix calculation.

The question, here, is why using neural network, object detection can be done? What is step function? What is sigmoid function and why it is used?

According to text book of brain which is used for medical school student in Japan, brain consists of 13 billion number of neurons and kind of electricity is passed neuron to neuron. This is very similar to analogue electrical circuit. Actually, author thinks that it is very similar to high order dynamical system.

If it were dynamical system, node or neuron is state and it has value. So, using sigmoid function, it is possible to expresss fire or memory of a value. It is very similar to bit of CPU and RAM. That means that brain is very similar to parallel process computer or CPU and RAM.

It means, as pointed out by a researcher, brain is a biological analogue parallel computer.

We use eye and brain to recognize object. Neural network emulate behaviour of brain and eye. So, using neural network as similarity to a brain, it is possible to recognize object with supervised learning. That is author's supposition.

The question is why deep learning? Author thinks that only difference between deep learning and ordinary neural network is its depth of layer of neurons. Actually, brain consists of 13 billion neurons. Classical neural network may not use so much neuron. That may be because of analysis and fast CPU to calculate by appropriate speed. Since deep learning gets closer to eye and especially, brain, it works well. This is author's supposition.

Note: In deep learing, there is technique called Recurrent Neural Network (RNN). Author has not investigated RNN net. However, author suppooses it has good effect to output because brain may be naturally feedbacked loop dynamical system. Author supposes RNN may emulates behaviour of dynamics of brain well compared with feedforward neural network. However, author supposes an application of neural network model does not model every aspect of brain anytime, so using RNN would be depends on situation.

Also, brain would be consists of inter-connected subsystems, possibly which are recursively inter-connected, with multiple feedback loops. Author supposes it would be better to use that idea to modelling of neural network or deep learning.

There is always physics behind mathematics. This is my first step to deep learning.

Analysis of mathematical property of MNIST with little tiny physics

In this section, tiny small physical and mathematical property of MNIST neural network is analyzed with a program and some study with numerical result. This is first step to understand what is a deep learning. Physically speaking, as shown in introduction, neural network is emulatation of brain or some wetware. But it is only frame. So, in practice, it is what kind of network and what kind of mathematical and physical model is it? It is discussed in this section.

From the following program [DL]: deep-learning-from-scratch/ch03/neuralnet_mnist.py.

Author would like to express special thanks to author of [DL]. This book describes neural network by simple but enough logic. Also, it includes very good program to help understanding of neural network.

        def predict(network, x):
          W1, W2, W3 = network['W1'], network['W2'], network['W3']
          b1, b2, b3 = network['b1'], network['b2'], network['b3']
          a1 = np.dot(x, W1) + b1
          z1 = sigmoid(a1)
          a2 = np.dot(z1, W2) + b2
          z2 = sigmoid(a2)
          a3 = np.dot(z2, W3) + b3
          y = softmax(a3)
          return y

        (784, 50) print(W1.shape)
        (50, 100) print(W2.shape)
        (100, 10) print(W3.shape)
        (50,)  print(b1.shape)
        (100,) print(b2.shape)
        (10,) print(b3.shape)
        (784,) print(x[0].shape)

Input: 780 dim to 50 dim. Which is retina.
Mid: 50 dim to 100 dim Which is middle layer of eye or bottom layer of eye.
Output: 100dim to 10 dim. Which is bottom of eye or simple brain.

This is basically eye or eye + simple brain.

Number of lens in retina or resolution of retina is enough for this simple problem. Why bottom is only 50 dim? For eye, it is only 50 recognition? What is eye? Author thinks 100 dim to simple brain is appropriate. Brain is kind of super computer but it is wetware and each neuron is connected each other. So number of combination would be enough.

Author supposes the reason why 50 for input of buttom of eye is similar to spectrum as an analogy. If it were FFT, dimension of time series data is packed to frequency domain, and if it were digital, dimension of frequency domain may be small enough. There would be recognition system close to signal processing or some kind of characteristics detection.

Sigmoid emulates bit but it is analogue. Each combination or connection between neuron emulates some kind of processing or calculation. So, which is combination of analogue RAM and CPU.

As a researcher mentioned to me, model of eye is better to be 3-layer model. How about simple brain? Author supposes it would be at least 2-layer. Also, there would be feedback between last layer of simple brain and bottom of eye considering mechanism of learning of detection. Then, modified version of MNIST would be 5-layer and last layler of simple brain to bottm of eye feedback model.

Why W1 matrix, which is weight of each neuron, is not sparse enough? Why all of each other is connected? It would be odd considering what is a brain. That may be because number of neuron is small and there are no subsystem? Or just a mistake of modelling? Mistake of his investigate of W1 matrix, because it is little bit large to investigate? Mistake of learning of W matrix? Or my supposition is not correct? It should be investigated later.

In section of Convolutional Neural Neteork (CNN) of book [DL], it is implied that in some situation, all connected neuron is a problem. But supposition taken by author and him are different. Thus, the above question is required to be investigated.

Modelling of neural network for number detection with re-modelling of MNIST: first step for model for object detection

Overall architecture

Note: Although author is applied mathematician, he is not specialist of brain. The following modelling is his supposition. However, as computer science, it is serious. If it were article for function of brain, it would be problem. But this is article for neural network, deep learning and computer science. Author thinks there is no problem. Because modelling is for learning for CPU or an application, not simulation of brain itself.

Author suppose that for object detect function, eye is connected to primitive type brain. Since every animal including human has brain. How about clustering of information? It could be done primitive animal. But meaning could be obtained only by human or possibly by monkey. Thus, primitive part and high level part is supposed to be connected. It is supposed that eye is connected to cerebellum since they are close to each other. At this time, cerebellum is supposed to be primitive part. Cerebrum is supposed to be high level part. Therefore, author would like to make a model as follows:

                   |<---C----|
        Eye->Cerebellum->Cerebrum
          |<--C--|

Remark:

Yet another question is, object detection algorithm should be deep? That is fine to use neural network. That is an elegant technique to model and emulate behaviour of brain. Actually, several journal paper says that is correct. However, model is a model. Not copy of whole of actual system. Author remember that in University of Tsukuba, Dr. Sankai said to me that there is student who model part of brain by technique of mechanical engineering. It is HAL and Cyberdyne. What is essense? What is a model? Author supposes RNN is essence not deep layer. If order of numerical calculation gets high, GPU cluster may be required. TensorFlow? Apple M1? Only Dell with graphic board? Not sure. That is a problem. Author is not Google. Let's go to simple model with RNN. If failed, go to deep learning.

Remark:

Reader may found that author only read one book about deep learning. No journal paper about deep learning. Why? If one starts one's research one should read many articles, right? In C2cube, which is high tech venture in Tokyo, I ignored a book about NLP. Check the book. Hmm... it is not impressive. Ignore that and make my own algorithm as I did in CTAC, Coventry University. People around me welcomed that attitude. Hey! it is very interesting! Let's do it! Then, I won 0.1 billion yen by my research as equity finance. In that time, I was manager level. Should I read so many books and articles? That depends.

Hypothesis 1: A start point of modelling of neural network for object detection

Analysis and direction of modelling

The following is a simple eye and brain part of modelling for object detection. This is first hypothesis. After some investigation, another hypothesis would be deployed.

cornea-->misc-->retina-->Corebellum

  |<-C-|<--C-|<--C--|<--C--|<---C---|
eye->edge->color->layer->object->Cerebrum

  |<----C-----|<---C--|<----C--|
Cerebellum->category->what->output

In the model above, each block would be basically implemented by convolution (CONV). However, object, category and what blocks would be affine, then, it would be passed to softmax. In book [DL], after CONV layer, ReLU is used. However, considering physical side of brain, sigmoid seems natural for author. Which is better would be investigated later. It is uncertain pooling (POOL) layer is introduced. It could understand what is a POOL layer. But author is not sure it is required. It would be investigated later. Since it could be thought that each function of brain is taught or connected each other, there are several feedbacks;i.e. feedback from object to layer or layer to color or even object to color. It would be investigate later.

No feedback from cerebellum to eye? What is a color bus effect? If one think about blue shirt, one see blue thing a lot in a town. Is it effect of eye and brain? or feedback from brain to eye? Author supposes that is no. It would be psychological effect. It is supposed that psychology is about in-brain dynamics. Therefore, there is no feedback from cerebellum to eye.

According to [DL], this model is similar to AlexNet. However, there is difference. That is feedback. It is interesting to investigate between each model.

Feedback would be close to calibration considering about similarity of control of rocket. Feedback would be PID with tensor and negative feedback. For instance, input of color would be something whose color is not unknown but it would have some candidate of color. Then, feedforward block is applied. But output is not 100% sure. Then, fedback to input as control engineering. This hypothesis is not the same as pure deep learning. However, as control engineering, it could be OK. Since author is come from control engineering, this hypothesis is taken. This hypothesis is interesting but little bit odd. So, this hypothesis would be studied via analytical modelling and numerical experiment.

If it were supervised signal, feedback is appropriate since it is close to tracking problem. If it were detection problem, it is not tracking problem but estimation. However, even so, feedback is required since it would be natural as brain. PID tensor would be made in the process of supervised signal and learning. And it would make characteristics of neural network better. Hence, it would make better estimation in the sense of physics of neural network.

What is a detail of PID tensor? Author supposes the problem is what is a characteristics output and input of CONV block of neural network. In book of [DL], there are implications about edge detection or something like that for CONV block. But author thinks it is not enough. So, mathematical experiment and some study of characteristics is required to understand what is a output and input of CONV block. And then, it will be thought what is a PID tensor feedback.

If this is a thesis for medical school, brain itself is investigated, and then, model is made. However, since author is applied mathematician and this model is for computer science and object detection, author's hypothesis is utilized for modelling.

Detailed modelling for numerical experiment

Numerical experiment and study

Environment of numerical experiment is as follows:

Programming language: Python (ver. 3.9.x)
Parallel proccessing framework: Tensorflow with metal plugin
CPU and GPU: Apple M1 2020

Bibliography

[DL] Deep Learning, making from zero (in Japanese), O'Reilly Japan, Kouki Saito
[NC] Neural Control Engineering, The MIT Press, Steven J. Schiff
[DY] Book of structure of brain and dorpamine hypothesis, section of psychotropic medicine and function of brain (in Japanese)
[YD] Yet another book of dorpamine hypothesis, chapter 10. mental illness and schizophrenia (in Japanese)
[PA] Proactive behaviour of dorpamine neuron and function of control of activity, Ichinose et al (in Japanese)
[WK] Wikipedia, https://ja.wikipedia.org/wiki/ (in Japanese)
[SR] The Real M.Phil Thesis | The Mathematical Foundation of Smart Material Systems / Yet Another Mori-Tanaka, Coventry University, Hwi Jun KIM
[WB] What is a brain? (in Japanese), Newton Press, Newton
[AE] Anatomy of eye (in Japanese), Newton Press, Newton
[DS] Object Detection With Deep Learning: A Review, IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 30, NO. 11, NOVEMBER 2019, Zhong-Qiu Zhao et al