Misplaced Pages

Alternating conditional expectations

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
This article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these messages)
This article relies largely or entirely on a single source. Relevant discussion may be found on the talk page. Please help improve this article by introducing citations to additional sources.
Find sources: "Alternating conditional expectations" – news · newspapers · books · scholar · JSTOR (August 2018)
This article may be too technical for most readers to understand. Please help improve it to make it understandable to non-experts, without removing the technical details. (August 2018) (Learn how and when to remove this message)
(Learn how and when to remove this message)

Alternating conditional expectations (ACE) is an algorithm to find the optimal transformations between the response variable and predictor variables in regression analysis.

Introduction

In statistics, a nonlinear transformation of variables is commonly used in practice in regression problems. Alternating conditional expectations (ACE) is one of the methods to find those transformations that produce the best fitting additive model. Knowledge of such transformations aids in the interpretation and understanding of the relationship between the response and predictors.

ACE transforms the response variable Y {\displaystyle Y} and its predictor variables, X i {\displaystyle X_{i}} to minimize the fraction of variance not explained. The transformation is nonlinear and is iteratively obtained from data.

Mathematical description

Let Y , X 1 , , X p {\displaystyle Y,X_{1},\dots ,X_{p}} be random variables. We use X 1 , , X p {\displaystyle X_{1},\dots ,X_{p}} to predict Y {\displaystyle Y} . Suppose θ ( Y ) , φ 1 ( X 1 ) , , φ p ( X p ) {\displaystyle \theta (Y),\varphi _{1}(X_{1}),\dots ,\varphi _{p}(X_{p})} are zero-mean functions and with these transformation functions, the fraction of variance of θ ( Y ) {\displaystyle \theta (Y)} not explained is

e 2 ( θ , φ 1 , , φ p ) = E [ θ ( Y ) i = 1 p φ i ( X i ) ] 2 E [ θ 2 ( Y ) ] {\displaystyle e^{2}(\theta ,\varphi _{1},\dots ,\varphi _{p})={\frac {\mathbb {E} \left^{2}}{\mathbb {E} }}}

Generally, the optimal transformations that minimize the unexplained part are difficult to compute directly. As an alternative, ACE is an iterative method to calculate the optimal transformations. The procedure of ACE has the following steps:

  1. Hold φ 1 ( X 1 ) , , φ p ( X p ) {\displaystyle \varphi _{1}(X_{1}),\dots ,\varphi _{p}(X_{p})} fixed, minimizing e 2 {\displaystyle e^{2}} gives θ 1 ( Y ) = E [ i = 1 p φ i ( X i ) | Y ] {\displaystyle \theta _{1}(Y)=\mathbb {E} \left}
  2. Normalize θ 1 ( Y ) {\displaystyle \theta _{1}(Y)} to unit variance.
  3. For each k {\displaystyle k} , fix other φ i ( X i ) {\displaystyle \varphi _{i}(X_{i})} and θ ( Y ) {\displaystyle \theta (Y)} , minimizing e 2 {\displaystyle e^{2}} and the solution is:: φ ~ k = E [ θ ( Y ) i k φ i ( X i ) | X k ] {\displaystyle {\tilde {\varphi }}_{k}=\mathbb {E} \left}
  4. Iterate the above three steps until e 2 {\displaystyle e^{2}} is within error tolerance.

Bivariate case

The optimal transformation θ ( Y ) , φ ( X ) {\displaystyle \theta ^{*}(Y),\varphi ^{*}(X)} for p = 1 {\displaystyle p=1} satisfies

ρ ( X , Y ) = ρ ( θ , φ ) = max θ , φ ρ ( θ ( Y ) , φ ( X ) ) {\displaystyle \rho ^{*}(X,Y)=\rho ^{*}(\theta ^{*},\varphi ^{*})=\max _{\theta ,\varphi }\rho (\theta (Y),\varphi (X))}

where ρ {\displaystyle \rho } is Pearson correlation coefficient. ρ ( X , Y ) {\displaystyle \rho ^{*}(X,Y)} is known as the maximal correlation between X {\displaystyle X} and Y {\displaystyle Y} . It can be used as a general measure of dependence.

In the bivariate case, the ACE algorithm can also be regarded as a method for estimating the maximal correlation between two variables.

Software implementation

The ACE algorithm was developed in the context of known distributions. In practice, data distributions are seldom known and the conditional expectation should be estimated from data. R language has a package acepack which implements ACE algorithm. The following example shows its usage:

library(acepack)
TWOPI <- 8 * atan(1)
x <- runif(200, 0, TWOPI)
y <- exp(sin(x) + rnorm(200)/2)
a <- ace(x, y)
par(mfrow=c(3,1))
plot(a$y, a$ty)  # view the response transformation
plot(a$x, a$tx)  # view the carrier transformation
plot(a$tx, a$ty) # examine the linearity of the fitted model

Discussion

The ACE algorithm provides a fully automated method for estimating optimal transformations in multiple regression. It also provides a method for estimating the maximal correlation between random variables. Since the process of iteration usually terminates in a limited number of runs, the time complexity of the algorithm is O ( n p ) {\displaystyle O(np)} where n {\displaystyle n} is the number of samples. The algorithm is reasonably computer efficient.

A strong advantage of the ACE procedure is the ability to incorporate variables of quite different types in terms of the set of values they can assume. The transformation functions θ ( y ) , φ i ( x i ) {\displaystyle \theta (y),\varphi _{i}(x_{i})} assume values on the real line. Their arguments can, however, assume values on any set. For example, ordered real and unordered categorical variables can be incorporated in the same regression equation. Variables of mixed type are admissible.

As a tool for data analysis, the ACE procedure provides graphical output to indicate a need for transformations as well as to guide in their choice. If a particular plot suggests a familiar functional form for a transformation, then the data can be pre-transformed using this functional form and the ACE algorithm can be rerun.

As with any regression procedure, a high degree of association between predictor variables can sometimes cause the individual transformation estimates to be highly variable, even though the complete model is reasonably stable. When this is suspected, running the algorithm on randomly selected subsets of the data, or on bootstrap samples can assist in assessing the variability.

References

  1. Breiman, L. and Friedman, J. H. Estimating optimal transformations for multiple regression and correlation. J. Am. Stat. Assoc., 80(391):580–598, September 1985b. Public Domain This article incorporates text from this source, which is in the public domain.
Category: