Skip to main content


Showing posts from December, 2017

Data and information muliplexing and demultiplexing with the WHT

If you have a neural network with say 128 outputs and you need only 10 of them you can apply the Walsh Hadamard transform (WHT) to output of the net and then use an y 10 elements.  During training the 10 error terms will be spread in orthogonal patterns over all the 128 output neurons by the inverse WHT (which is usually the same as the forward WHT.)   Even where you want all the 128 outputs there could be cases where some outputs are required to have a higher informational value than the others.  In that case it could also make sense to apply a WHT as complexity of output will go where it is needed.
You can also use WHT based random projections to achieve the same thing.
There are application dependent questions of computational efficiency to consider and possibly you could introduce some Gaussian noise terms compared to more expensive options.
For evolution based neural network training it seems like a good idea though, because it allows automatic focusing of resources on important …

A few thoughts about neural networks.

There is real information loss caused by the non-linear activation functions in neural networks: With no activation function (linear network) there is no information loss and a deep linear network is equivalent to a shallow one.  Apart that is, from numeric rounding errors which can compound over a number of layers.
Assuming a significantly non-linear activation function then after a few layers the input information is washed out and the network goes on a set trajectory where it is no longer able to extract any further information from the input to make any further decisions.  A way to fix that is every layer or every few layers you should have some weights connecting back to the input data (if that option will work with back-propagation.)
You can have a chaos theory view of neural nets.  Where the non-linear behavior of the net compounds (as in compound interest) layer after layer.   The weighted sum operations only being able to partially cancel out t…