Skip to main content


Neural Network Weight Sharing using Random Projections

If you have a weight vector and take multiple different vector random projections of that data you can use those as weights instead for a neural network. 
The price you pay is Gaussian noise term that limits the numerical precision of your new enlarged weight set. 
However with the correct training algorithm some of the weights can be very high precision at the expense of making others less precise (higher Gaussian noise.)
Vector random projections can be invertible if your training algorithm needs that (probably unless you are using evolution.)
Also you can use the same idea for other algorithms than could benefit from variable precision parameters.

Fast random projection code:
You can create an inverse random projection by changing the order of the operations in the random projection code.
Recent posts

Data and information muliplexing and demultiplexing with the WHT

If you have a neural network with say 128 outputs and you need only 10 of them you can apply the Walsh Hadamard transform (WHT) to output of the net and then use an y 10 elements.  During training the 10 error terms will be spread in orthogonal patterns over all the 128 output neurons by the inverse WHT (which is usually the same as the forward WHT.)   Even where you want all the 128 outputs there could be cases where some outputs are required to have a higher informational value than the others.  In that case it could also make sense to apply a WHT as complexity of output will go where it is needed.
You can also use WHT based random projections to achieve the same thing.
There are application dependent questions of computational efficiency to consider and possibly you could introduce some Gaussian noise terms compared to more expensive options.
For evolution based neural network training it seems like a good idea though, because it allows automatic focusing of resources on important …

A few thoughts about neural networks.

There is real information loss caused by the non-linear activation functions in neural networks: With no activation function (linear network) there is no information loss and a deep linear network is equivalent to a shallow one.  Apart that is, from numeric rounding errors which can compound over a number of layers.
Assuming a significantly non-linear activation function then after a few layers the input information is washed out and the network goes on a set trajectory where it is no longer able to extract any further information from the input to make any further decisions.  A way to fix that is every layer or every few layers you should have some weights connecting back to the input data (if that option will work with back-propagation.)
You can have a chaos theory view of neural nets.  Where the non-linear behavior of the net compounds (as in compound interest) layer after layer.   The weighted sum operations only being able to partially cancel out t…

Double weighting for neural networks

The weight and sum operation for a neural network is a dot product.
The Walsh Hadamard transform is a collection of dot product operations.  
The Walsh Hadamard transform connects every single input point to the entirety of output points. The weighted sum of number of dot products is still a dot product.

The idea is to weight the inputs to n Walsh Hadamard transforms and then weight their outputs.  After running the input vector through the n double weighted transforms you sum together each of the corresponding dimensions and use that as the input to the neuron activation function.  Thus each neuron accounts for 2n weight parameters.  The number of neurons is the order of the transform. That makes the network fully connected on a layer basis with only a limited number of weights.  It should also allow the network to pick out regularities that maybe would otherwise require time consuming correlation operations. 
If wi are weight vectors and WHT is the transform and x the input then sa…

Probing deep neural networks

Probing randomly initialized deep neural networks with 2D variations. Depth=number of layers.

Random Projections and the Gaussian distribution using the WHT

Using the Walsh Hadamard transform for rapid (nlog(n)) full mixing Random Projections or to generate random numbers from the Gaussian distribution:
Overview of the (fast) Walsh Hadamard transform:

Using the WHT for fast random projections and Gaussian RNG.

Associative Memory links