Misplaced Pages

Hyper basis function network

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.

In machine learning, a Hyper basis function network, or HyperBF network, is a generalization of radial basis function (RBF) networks concept, where the Mahalanobis-like distance is used instead of Euclidean distance measure. Hyper basis function networks were first introduced by Poggio and Girosi in the 1990 paper “Networks for Approximation and Learning”.

Network Architecture

The typical HyperBF network structure consists of a real input vector x R n {\displaystyle x\in \mathbb {R} ^{n}} , a hidden layer of activation functions and a linear output layer. The output of the network is a scalar function of the input vector, ϕ : R n R {\displaystyle \phi :\mathbb {R} ^{n}\to \mathbb {R} } , is given by

ϕ ( x ) = j = 1 N a j ρ j ( | | x μ j | | ) {\displaystyle \phi (x)=\sum _{j=1}^{N}a_{j}\rho _{j}(||x-\mu _{j}||)}

where N {\displaystyle N} is a number of neurons in the hidden layer, μ j {\displaystyle \mu _{j}} and a j {\displaystyle a_{j}} are the center and weight of neuron j {\displaystyle j} . The activation function ρ j ( | | x μ j | | ) {\displaystyle \rho _{j}(||x-\mu _{j}||)} at the HyperBF network takes the following form

ρ j ( | | x μ j | | ) = e ( x μ j ) T R j ( x μ j ) {\displaystyle \rho _{j}(||x-\mu _{j}||)=e^{(x-\mu _{j})^{T}R_{j}(x-\mu _{j})}}

where R j {\displaystyle R_{j}} is a positive definite d × d {\displaystyle d\times d} matrix. Depending on the application, the following types of matrices R j {\displaystyle R_{j}} are usually considered

  • R j = 1 2 σ 2 I d × d {\displaystyle R_{j}={\frac {1}{2\sigma ^{2}}}\mathbb {I} _{d\times d}} , where σ > 0 {\displaystyle \sigma >0} . This case corresponds to the regular RBF network.
  • R j = 1 2 σ j 2 I d × d {\displaystyle R_{j}={\frac {1}{2\sigma _{j}^{2}}}\mathbb {I} _{d\times d}} , where σ j > 0 {\displaystyle \sigma _{j}>0} . In this case, the basis functions are radially symmetric, but are scaled with different width.
  • R j = d i a g ( 1 2 σ j 1 2 , . . . , 1 2 σ j z 2 ) I d × d {\displaystyle R_{j}=diag\left({\frac {1}{2\sigma _{j1}^{2}}},...,{\frac {1}{2\sigma _{jz}^{2}}}\right)\mathbb {I} _{d\times d}} , where σ j i > 0 {\displaystyle \sigma _{ji}>0} . Every neuron has an elliptic shape with a varying size.
  • Positive definite matrix, but not diagonal.

Training

Training HyperBF networks involves estimation of weights a j {\displaystyle a_{j}} , shape and centers of neurons R j {\displaystyle R_{j}} and μ j {\displaystyle \mu _{j}} . Poggio and Girosi (1990) describe the training method with moving centers and adaptable neuron shapes. The outline of the method is provided below.

Consider the quadratic loss of the network H [ ϕ ] = i = 1 N ( y i ϕ ( x i ) ) 2 {\displaystyle H=\sum _{i=1}^{N}(y_{i}-\phi ^{*}(x_{i}))^{2}} . The following conditions must be satisfied at the optimum:

H ( ϕ ) a j = 0 {\displaystyle {\frac {\partial H(\phi ^{*})}{\partial a_{j}}}=0} , H ( ϕ ) μ j = 0 {\displaystyle {\frac {\partial H(\phi ^{*})}{\partial \mu _{j}}}=0} , H ( ϕ ) W = 0 {\displaystyle {\frac {\partial H(\phi ^{*})}{\partial W}}=0}

where R j = W T W {\displaystyle R_{j}=W^{T}W} . Then in the gradient descent method the values of a j , μ j , W {\displaystyle a_{j},\mu _{j},W} that minimize H [ ϕ ] {\displaystyle H} can be found as a stable fixed point of the following dynamic system:

a j ˙ = ω H ( ϕ ) a j {\displaystyle {\dot {a_{j}}}=-\omega {\frac {\partial H(\phi ^{*})}{\partial a_{j}}}} , μ j ˙ = ω H ( ϕ ) μ j {\displaystyle {\dot {\mu _{j}}}=-\omega {\frac {\partial H(\phi ^{*})}{\partial \mu _{j}}}} , W ˙ = ω H ( ϕ ) W {\displaystyle {\dot {W}}=-\omega {\frac {\partial H(\phi ^{*})}{\partial W}}}

where ω {\displaystyle \omega } determines the rate of convergence.

Overall, training HyperBF networks can be computationally challenging. Moreover, the high degree of freedom of HyperBF leads to overfitting and poor generalization. However, HyperBF networks have an important advantage that a small number of neurons is enough for learning complex functions.

References

  1. T. Poggio and F. Girosi (1990). "Networks for Approximation and Learning". Proc. IEEE Vol. 78, No. 9:1481-1497.
  2. ^ R.N. Mahdi, E.C. Rouchka (2011). "Reduced HyperBF Networks: Regularization by Explicit Complexity Reduction and Scaled Rprop-Based Training". IEEE Transactions of Neural Networks 2:673–686.
  3. F. Schwenker, H.A. Kestler and G. Palm (2001). "Three Learning Phases for Radial-Basis-Function Network" Neural Netw. 14:439-458.
Categories: