Skip to main content

Probing deep neural networks

Probing randomly initialized deep neural networks with 2D variations.
Depth=number of layers.

Activation function square, depth 5.

Activation function square, depth10
Activation function square, depth 15.   
Activation function square, depth 20.
Activation function square, depth 25.
Activation function square, depth 30.
Activation function square, depth 35.
Activation function square, depth 40.
Activation function square, depth 45.
Activation function square, depth 50.
Activation function ReLU, depth 5.
Activation function ReLU, depth 10.
Activation function ReLU, depth 15.
Activation function ReLU, depth 20.
Activation function ReLU, depth 25.
Activation function ReLU, depth 30.
Activation function ReLU, depth 35.


  1. Topological mixing in chaotic systems:


Post a Comment

Popular posts from this blog

Neural Network Weight Sharing using Random Projections

If you have a weight vector and take multiple different vector random projections of that data you can use those as weights instead for a neural network. 
The price you pay is Gaussian noise term that limits the numerical precision of your new enlarged weight set. 
However with the correct training algorithm some of the weights can be very high precision at the expense of making others less precise (higher Gaussian noise.)
Vector random projections can be invertible if your training algorithm needs that (probably unless you are using evolution.)
Also you can use the same idea for other algorithms than could benefit from variable precision parameters.

Fast random projection code:
You can create an inverse random projection by changing the order of the operations in the random projection code.

Double weighting for neural networks

The weight and sum operation for a neural network is a dot product.
The Walsh Hadamard transform is a collection of dot product operations.  
The Walsh Hadamard transform connects every single input point to the entirety of output points. The weighted sum of number of dot products is still a dot product.

The idea is to weight the inputs to n Walsh Hadamard transforms and then weight their outputs.  After running the input vector through the n double weighted transforms you sum together each of the corresponding dimensions and use that as the input to the neuron activation function.  Thus each neuron accounts for 2n weight parameters.  The number of neurons is the order of the transform. That makes the network fully connected on a layer basis with only a limited number of weights.  It should also allow the network to pick out regularities that maybe would otherwise require time consuming correlation operations. 
If wi are weight vectors and WHT is the transform and x the input then sa…