Skip to main content

Data and information muliplexing and demultiplexing with the WHT

If you have a neural network with say 128 outputs and you need only 10 of them you can apply the Walsh Hadamard transform (WHT) to output of the net and then use an y 10 elements.  During training the 10 error terms will be spread in orthogonal patterns over all the 128 output neurons by the inverse WHT (which is usually the same as the forward WHT.)   Even where you want all the 128 outputs there could be cases where some outputs are required to have a higher informational value than the others.  In that case it could also make sense to apply a WHT as complexity of output will go where it is needed.
You can also use WHT based random projections to achieve the same thing.
There are application dependent questions of computational efficiency to consider and possibly you could introduce some Gaussian noise terms compared to more expensive options.
For evolution based neural network training it seems like a good idea though, because it allows automatic focusing of resources on important target dimensions.


  1. Using ladder random projections for data multiplexing and demultiplexing:


Post a Comment

Popular posts from this blog

Neural Network Weight Sharing using Random Projections

If you have a weight vector and take multiple different vector random projections of that data you can use those as weights instead for a neural network. 
The price you pay is Gaussian noise term that limits the numerical precision of your new enlarged weight set. 
However with the correct training algorithm some of the weights can be very high precision at the expense of making others less precise (higher Gaussian noise.)
Vector random projections can be invertible if your training algorithm needs that (probably unless you are using evolution.)
Also you can use the same idea for other algorithms than could benefit from variable precision parameters.

Fast random projection code:
You can create an inverse random projection by changing the order of the operations in the random projection code.

Double weighting for neural networks

The weight and sum operation for a neural network is a dot product.
The Walsh Hadamard transform is a collection of dot product operations.  
The Walsh Hadamard transform connects every single input point to the entirety of output points. The weighted sum of number of dot products is still a dot product.

The idea is to weight the inputs to n Walsh Hadamard transforms and then weight their outputs.  After running the input vector through the n double weighted transforms you sum together each of the corresponding dimensions and use that as the input to the neuron activation function.  Thus each neuron accounts for 2n weight parameters.  The number of neurons is the order of the transform. That makes the network fully connected on a layer basis with only a limited number of weights.  It should also allow the network to pick out regularities that maybe would otherwise require time consuming correlation operations. 
If wi are weight vectors and WHT is the transform and x the input then sa…

Probing deep neural networks

Probing randomly initialized deep neural networks with 2D variations. Depth=number of layers.