Fast Multidimensional Scaling using Vector Extrapolation

本文Google Docs地址

Guy Rosman,Alexander M. Bronstein,Michael M. Bronstein,Avram Sidi,Ron Kimmel
Fast Multidimensional Scaling using Vector Extrapolation
Technion – Computer Science Department – Technical Report CIS-2008-01 –2008

摘要

MDS是一类低维表示方法,对象是给定距离矩阵的点集。在很多MDS应用中,算法的速

度和效率是关键问题,向量外推方法可以用来提高固定点迭代算法的收敛速度。本文

将提出用向量外推加速MDS数值解速度的方法。

2 多维标度

2.1 最小二乘MDS

在n维欧式空间中,坐标表达:{X=\left( {x_{ij} } \right)}

,是N{\times }m度量值;{d_{ij} \left( X \right)}    表示i、j之间的距离;{\delta _{ij} }表示输入距

离,本文将使用一个新的表示形式。

通常MDS问题可以用下述优化模型表示:

$latex \displaystyle

\mathop {\min }\limits_X \sum\limits_{i<j} {w_{ij} f_{ERR} \left( {d_

{ij} \left( X \right),\delta _{ij} } \right)} &fg=000000$

{w_{ij} }是表征每一对距离重要性的权值;{f_{ERR}    }是计算输入距离和近似距离的误差代价函数,选择距离误差函数的二次

项形式,即「stress 」函数:

$latex \displaystyle

f_{STRESS} \left( {d,\delta } \right)=\left( {d-\delta } \right)^2

&fg=000000$

定义距离平方之差的形式:

$latex \displaystyle f_{SSTRESS}

\left( {d,\delta } \right)=\left( {d^2-\delta ^2} \right)^2 &fg=000000$

得到一个函数,通常称之为「sstress」函数。

2.2 SMACOF 算法

为了最小化stress函数,{s\left( {\rm    {\bf X}} \right)=\sum\limits_{i<j}^ {w_{ij} \left( {d_{ij} \left( {\rm    {\bf X}} \right)-\delta _{ij} } \right)^2} },我们需要求出{s\left( {\rm {\bf X}} \right)}关于{{\rm {\bf X}}}    梯度:

$latex \displaystyle \nabla s\left( {\rm

{\bf X}} \right)=2{\rm {\bf VX}}-2{\rm {\bf B}}\left( {\rm {\bf X}}

\right){\rm {\bf X}} &fg=000000$

这里{{\rm {\bf V}}}{{\rm {\bf B}}}

过下式给出:

$latex \displaystyle \left(

{\rm {\bf V}} \right)_{ij} =\left\{ {\begin{array}{c} -w_{ij} \mbox{ } if

i\ne j \\ \sum\limits_{k\ne i}^ {w_{ik} } if i=j \\ \end{array}} \right. \

\ \ \ \ (1)&fg=000000$

$latex \displaystyle \left( {\rm {\bf

B}} \right)_{ij} =\left\{ {\begin{array}{c} \mbox{-w}_{ij} \delta _{ij} d_

{ij}^{-1} \left( {\rm {\bf X}} \right) \mbox{if }i\ne j \mbox{and} d_{ij}

\left( {\rm {\bf X}} \right)\ne 0 \\ 0 \mbox{if }i\ne j \mbox{and} d_{ij}

\left( {\rm {\bf X}} \right)=0 \\ -\sum\nolimits_{k\ne i} {b_{ik} } \mbox

{if }i=j \\ \end{array}} \right. \ \ \ \ \ (2)&fg=000000$

使用一阶优化,则通过下式得到:

{2{\rm {\bf VX}}=2{\rm {\bf B}}\left( {\rm {\bf X}} \right){\rm {\bf    X}}},或者:

$latex \displaystyle

{\rm {\bf X}}={\rm {\bf V}}^\dag {\rm {\bf B}}\left( {\rm {\bf X}}

\right){\rm {\bf X}} \ \ \ \ \ (3)&fg=000000$

这里{\dag }表示矩阵伪逆。

由(3)可以得到迭代形式(方法见文献[25]):

\displaystyle {\rm {\bf X}}^{\left( {k+1} \right)}={\rm {\bf V}}^    \dag {\rm {\bf B}}\left( {{\rm {\bf X}}^{\left( k \right)}} \right){\rm    {\bf X}}^{\left( k \right)} \ \ \ \ \ (4)

(3)式可视为一个固定点,将(3)迭代使之收敛到stress 代价函数的局部极

小值。以上处理思路称为「SMACOF」——standing from Scaling by Majorizing a

Complicated Function。

stress函数有一个重要特性就是可以保证一个stress值的单调下降序列。这是其他优

化问题少见的。另一方面,SMACOF算法的收敛速度是比较慢的。

2.3 经典标度

另一类使用广泛的代价函数是「strain

(定义见下文),其引出了一类代数MDS算法,称为「经典标度」(classical

scaling)。

{\Delta ^{\left( 2 \right)}}表示输入距离矩阵的平方;

{{\rm {\bf D}}^{\left( \mbox{2} \right)}\left( {\rm {\bf X}}    \right)}表示目标欧式距离的平方:

\displaystyle {\rm {\bf D}}^\mbox{2}\left( {\rm {\bf X}}\right)\mbox{=}{\rm {\bf c1}}^T+{\rm {\bf 1c}}^T-2{\rm {\bf XX}}^T \ \ \ \ \ (5)

这里{c_i =\left\langle {x_i ,x_i } \right\rangle }

{{\rm {\bf J}}\mbox{=}{\rm {\bf I}}\mbox{-}\frac{1}{N}{\rm {\bf    11}}^T},表示中心矩阵。

假设给定距离均为欧式距离,得到这些点的内积矩阵(Gram 矩阵):

$latex \displaystyle {\rm {\bf B}}_\Delta =\frac{1}{2}{\rm

{\bf J}}\Delta ^{\left( 2 \right)}{\rm {\bf J}} &fg=000000$

对于欧式距离的情形,{\left( {{\rm {\bf B}}_\Delta } \right)_{ij} =    \left\langle {x_i ,x_j } \right\rangle }

但实际上,输入矩阵往往不是欧式的,二者的差异度可以通过一个测量值近似表示,

称之为「strain」:

$latex \displaystyle \begin

{array}{l} \left\| {-\frac{1}{2}{\rm {\bf J}}\left( {{\rm {\bf D}}^{\left(

2 \right)}\left( {\rm {\bf X}} \right)-\Delta ^{\left( 2 \right)}} \right)

{\rm {\bf J}}} \right\|_F^2 \\ {\rm {\bf =}}\left\| {{\rm {\bf XX}}^T+

\frac{1}{2}{\rm {\bf J}}\left( {\Delta ^{\left( 2 \right)}} \right){\rm

{\bf J}}} \right\|_F^2 \\ =\left\| {{\rm {\bf XX}}^T-{\rm {\bf B}}_\Delta }

\right\|_F^2 \\ \end{array} &fg=000000$

通过对{{\rm {\bf B}}_\Delta }进行特征值分解,可以找到

strain函数的全局最优解。

经典标度方法的缺陷在于它的自适应性不强,而且计算开销也较大。因此在后文的论

述中,将只关注stress函数以及SMACOF算法。然而,全局优化可保证经典标度方法在

SMACOF的初始阶段具有良好性能。

3 向量外推方法

考虑使用历次迭代方法来加速SMACOF算法的收敛速度,如使用向量外推方法来预测收

敛极限。

本文考察了两个向量外推方法:

1、 「MPE」——最小多项式外推(minimal polynomial extrapolation,文献[11]

);

2、「RRE」——减秩外推(reduced rank extrapolation,文献[31,19])。

两个方法在加速非线性/线性/大型稀疏系统方程中固定点迭代的向量序列收敛速度方

面都具有高效的性能。

二者均需要考察一个经过线性固定点迭代过程得出的序列({{\rm {\bf x}}_0    ,{\rm {\bf x}}_1 ,...}):

$latex

\displaystyle x_{n+1} ={\rm {\bf A}}x_n +b, n=0,1,… \ \ \ \ \ (6)

&fg=000000$

此处{{\rm {\bf A}}}是给定{N\times N}    矩阵,{{\rm {\bf b}}}是给定N维向量,{x_0 }是用户选取的初始向量。这个序列有一个极限s,它是下述方程的

唯一解:

$latex \displaystyle {\rm {\bf

x}}={\rm {\bf Ax}}+{\rm {\bf b}} \ \ \ \ \ (7)&fg=000000$

假设{\rho \left( {\rm {\bf A}} \right)}是谱半

径,{\rho \left( {\rm {\bf A}} \right)<{\rm x}},另一

方面(7)式也可以写为{\left( {{\rm {\bf I}}-{\rm {\bf A}}} \right)    x={\rm {\bf b}}},由于1不是{{\rm {\bf A}}}的奇

异值,所以矩阵{{\rm {\bf I}}-{\rm {\bf A}}}是非奇异的。

给定(6)式的序列,令

$latex \displaystyle

{\rm {\bf u}}_{\rm {\bf n}} {\rm {\bf =\Delta x}}_{\rm {\bf n}} {\rm {\bf

=x}}_{{\rm {\bf n+1}}} {\rm {\bf -x}}_{\rm {\bf n}} {\rm {\bf , n=0,1,…}}

&fg=000000$

再定义误差向量:

$latex \displaystyle ?_n

=x_n -s\mbox{, }n=0,1,… \ \ \ \ \ (8)&fg=000000$

考虑{{\rm {\bf s}}={\rm {\bf As}}+{\rm {\bf b}}},我

们可以通过n步迭代从初始误差得到当前误差:

\displaystyle ?_n =\left( {{\rm {\bf Ax}}_{n-1} +b} \right)-\left(    {{\rm {\bf As}}+{\rm {\bf b}}} \right)={\rm {\bf A}}\left( {{\rm {\bf x}}_    {n-1} -{\rm {\bf s}}} \right)={\rm {\bf A}}?_{n-1} \ \ \ \ \ (9)

由此可以得到:

$latex \displaystyle

?_n ={\rm {\bf A}}^n?_0 \mbox{, }n=0,1,… \ \ \ \ \ (10)&fg=000000$

通过对k+1连续{x_i }(k +1 consecutive)采用「加权平

均」来近似s:

$latex \displaystyle

{\rm {\bf s}}_{n,k} =\sum\limits_{i=0}^k {\gamma _i {\rm {\bf x}}_{n+i} }

\mbox{; }\sum\limits_{i=0}^k {\gamma _i } =1 \ \ \ \ \ (11)&fg=000000$

将(8)带入(11),并考虑{\sum\nolimits_{i=0}    ^k {\gamma _i } =1},可以得到:

\displaystyle {\rm {\bf s}}_{n,k} =\sum\limits_{i=0}^k {\gamma _i    \left( {{\rm {\bf s}}+?_{n+i} } \right)} ={\rm {\bf s}}+\sum\limits_{i=0}^k    {\gamma _i } ?_{n+i} \ \ \ \ \ (12)

再带入(10),得到

\displaystyle {\rm {\bf s}}_{n,k} ={\rm {\bf s}}+\sum\limits_    {i=0}^k {\gamma _i {\rm {\bf A}}^{n+i}} ?_0 \ \ \ \ \ (13)

为了使{?_{n+i\mbox{, }} i=0,1,...}的加权和{\sum\nolimits_{i=0}^k {\gamma _i {\rm {\bf A}}^{n+i}?_0 } }尽可

能小,我们必须选择合适的{\gamma _i }

现在,给定一个N{\times }N矩阵B,以及任意一个N维向

u,则存在具有最小阶数(最多N)的唯一莫尼多项式(monic polynomial)

{P\left( z \right)},是u的零化多项式:{P    \left( {\rm {\bf B}} \right){\rm {\bf u}}=0},这个多项式称为

B关于u的「极小多项式」(minimal polynomial)。{P\left(    z \right)}的零解部分或者全部是B的特征值。

因此,若A关于{?_n }的极小多项式是:

$latex \displaystyle P\left( z \right)=\sum\limits_{i=0}^k

{c_i z^i} ;\mbox{ }c_k =1 &fg=000000$

即:{P\left( {\rm {\bf A}} \right)?_n =0},联立(10)式,得:

$latex

\displaystyle \sum\limits_{i=0}^k {c_i {\rm {\bf A}}^i?_n } =\sum

\limits_{i=0}^k {c_i ?_{n+i} } =0 \ \ \ \ \ (14)&fg=000000$

(14)是N个线性方程的集,k个未知参数{c_0 ,c_1    ,...,}同时{c_k =1}。这N个方程的解是唯一的,因

{P\left( z \right)}的解是唯一的。我们可以由x{_    {i}}的信息单独得到c{_{i}},联立(14)和(9),有:

$latex

\displaystyle 0=\sum\limits_{i=0}^k {c_i {\rm {\bf A}}?_{n+i} } =\sum

\limits_{i=0}^k {c_i ?_{n+i+1} } &fg=000000$

减去(14),由此得到线性系统:

$latex \displaystyle \sum\limits_{i=0}^k {c_i {\rm {\bf u}}

_{n+i} } ={\rm {\bf 0}} \ \ \ \ \ (15)&fg=000000$

一旦{c_0 ,c_1 ,...,c_{k-1} }确定,我们令{c_k    }=1、{\gamma _i ={c_i } \mathord{\left/ {\vphantom {{c_i    } {\sum\nolimits_{j=0}^k {c_j } }}} \right. \kern-\nulldelimiterspace}    {\sum\nolimits_{j=0}^k {c_j } },\mbox{ }i=0,1,...,k.}

综上所述,若k是A关于{?_n }的极小多项式的阶数,则存在序列

满足{\sum\nolimits_{i=0}^k {\gamma _i =1} },则有:{\sum\nolimits_{i=0}^k {\gamma _i {\rm {\bf x}}_{n+i} ={\rm {\bf s}}} }    (Why not {{\rm {\bf s}}_{n,k} }?)

值得注意的是,不论是否{\rho \left( {\rm {\bf A}} \right)<1}    s都是{\left( {{\rm {\bf I}}-{\rm {\bf A}}}    \right){\rm {\bf x}}={\rm {\bf b}}}的解,因此,无论{\mathop {\lim }\limits_{n\rightarrow \infty } {\rm {\bf x}}_n }

是否存在,{{\rm {\bf s}}=\sum\nolimits_{i=0}^k {\gamma _i {\rm {\bf    x}}_{n+i} } }都成立。

为简化表述,使用如下标记:

$latex

\displaystyle {\rm {\bf U}}_s^{\left( j \right)} =\left[ {{\rm {\bf u}}_j

\vert {\rm {\bf u}}_{j+1} \vert …\vert {\rm {\bf u}}_{j+s} } \right] \ \

\ \ \ (16)&fg=000000$

因此,{{\rm {\bf U}}_s^{\left( j \right)} }是一个N

{\times }(j+1)矩阵,则(14)可以表述

为:

$latex \displaystyle {\rm {\bf U}}

_k^{\left( n \right)} {\rm {\bf c}}={\rm {\bf 0}}\mbox{; }{\rm {\bf c}}=

\left[ {c_0 ,c_1 ,…c_k } \right]^T \ \ \ \ \ (17)&fg=000000$

当然,用{\sum\nolimits_{i=0}^k {c_i } }分解(17),

可以得到:

$latex \displaystyle {\rm {\bf

U}}_k^{\left( n \right)} \gamma ={\rm {\bf 0}}\mbox{; }\gamma =\left[

{\gamma _0 ,\gamma _1 ,…\gamma _k } \right]^T \ \ \ \ \ (18)

&fg=000000$

3.1 MPE派生

前文所述,极小多项式的阶数可以达到N,若N很大

,这就可能会导致比较大的存储开销;此外,我们还没有方法可以准确得知这个阶数

,鉴于此,可以采取一种解决途径:先任意找一个远远小于阶数的正指数k,带入(15),则(15)则不再是一致的,因此也不再

有对于{c_0 ,c_1 ,...,c_{k-1} }(其中{c_\mbox{k}    =1})的唯一解。通常对于这样的问题,我们可以采取最小二乘方法求解

,沿着这个思路,计算(15)中描述的{\gamma _0 ,    \gamma _1 ,...\gamma _k },然后计算向量{{\rm {\bf s}}_    {n,k} =\sum\limits_{i=0}^k {\gamma _i {\rm {\bf x}}_{n+i} } }

计算结果即为s的近似。以上方法称为「极小多项式外推算法」(MPE——

minimal polynomial extrapolation),其算法流程见表-1:

\centerline{\includegraphics[width=5.83in,height=1.41in]{mytex31.eps}}

\caption{MPE算法流程}

3.2 RRE派生

同3.1,在(18)中带入k,则不再对{\gamma _0 ,    \gamma _1 ,...\gamma _k }具有唯一解,同样应用最小二乘算法(约束

{\sum\nolimits_{i=0}^k {\gamma _i =1} }),然后计算

s的近似{{\rm {\bf s}}_{n,k} =\sum\limits_{i=0}^k {\gamma _i    {\rm {\bf x}}_{n+i} } }。这个方法称为「减秩外推」(RRE——

reduced rank extrapolation),算法流程见表-2:

\centerline{\includegraphics[width=5.83in,height=1.22in]{mytex32.eps}}

\caption{RRE算法}

3.3 对非线性方程的处理

前文所述,在SMACOF算法中存在非线性方程,现在用向量外推方法处理这个问题。假

设非线性方程记作:

$latex \displaystyle

{\rm {\bf x}}={\rm {\bf F}}\left( {\rm {\bf x}} \right) \ \ \ \ \ (19)

&fg=000000$

这里{{\rm {\bf F}}\left( {\rm {\bf x}} \right)}是N维

向量值函数,x是N维未知向量,通过下式得到近似值x{_{n}}

$latex \displaystyle {\rm

{\bf x}}_{n+1} ={\rm {\bf F}}\left( {{\rm {\bf x}}_n } \right)\mbox{, }

n=0,1,… \ \ \ \ \ (20)&fg=000000$

并且假设这个序列收敛于解s{{\rm {\bf F}}}

SMACOF迭代的右边函数,由于x接近s{{\rm {\bf F}}\left(    {\rm {\bf x}} \right)}可通过泰勒级数展开:

\displaystyle {\rm {\bf F}}\left( {\rm {\bf x}} \right)={\rm {\bf    F}}\left( {\rm {\bf s}} \right)+{\rm {\bf {F}'}}\left( {\rm {\bf s}}    \right)\left( {{\rm {\bf x}}-{\rm {\bf s}}} \right)+o\left( {\left\| {{\rm    {\bf x}}-{\rm {\bf s}}} \right\|^2} \right)\mbox{ as }{\rm {\bf    x}}\rightarrow {\rm {\bf s}}

此处{{\rm {\bf {F}'}}\left( {\rm {\bf x}} \right)}

{{\rm {\bf F}}\left( {\rm {\bf x}} \right)}的雅各比矩阵,

又因为{{\rm {\bf F}}\left( {\rm {\bf s}} \right)={\rm {\bf s}}}    ,则上式可写为:

$latex \displaystyle {\rm {\bf

F}}\left( {\rm {\bf x}} \right)={\rm {\bf s}}+{\rm {\bf {F}’}}\left( {\rm

{\bf s}} \right)\left( {{\rm {\bf x}}-{\rm {\bf s}}} \right)+o\left(

{\left\| {{\rm {\bf x}}-{\rm {\bf s}}} \right\|^2} \right)\mbox{ as }{\rm

{\bf x}}\rightarrow {\rm {\bf s}} &fg=000000$

假设序列{{\rm {\bf x}}_0 ,{\rm {\bf x}}_1 ,...,}收敛于

s,则当n足够大时,{{\rm {\bf x}}_n }逼近s

因此有:

$latex \displaystyle {\rm {\bf x}}_{n+1} ={\rm

{\bf s}}+{\rm {\bf {F}’}}\left( {\rm {\bf s}} \right)\left( {{\rm {\bf x}}

_n -{\rm {\bf s}}} \right)+o\left( {\left\| {{\rm {\bf x}}_n -{\rm {\bf

s}}} \right\|^2} \right)\mbox{ as n}\rightarrow \infty &fg=000000$

即:{{\rm {\bf x}}_{n+1} -{\rm {\bf s}}={\rm {\bf {F}'}}\left( {\rm    {\bf s}} \right)\left( {{\rm {\bf x}}_n -{\rm {\bf s}}} \right)+o\left(    {\left\| {{\rm {\bf x}}_n -{\rm {\bf s}}} \right\|^2} \right)\mbox{ as    n}\rightarrow \infty }

对于所有的大n,向量{{\rm {\bf x}}_n }可视为从形如{\left( {{\rm {\bf I}}-{\rm {\bf A}}} \right){\rm {\bf x}}={\rm {\bf b}}}    的线性系统产生:

$latex

\displaystyle {\rm {\bf x}}_{n+1} ={\rm {\bf Ax}}_n +{\rm {\bf b}}\mbox{,

}n=0,1,…, \ \ \ \ \ (21)&fg=000000$

此处{{\rm {\bf A}}={\rm {\bf {F}'}}\left( {\rm {\bf s}}    \right)}{{\rm {\bf b}}=\left[ {{\rm {\bf I}}-{\rm {\bf    {F}'}}\left( {\rm {\bf s}} \right)} \right]{\rm {\bf s}}}

MPE和RRE已经应用在大型稀疏数值系统中,如计算流体动力学、半导体研究和X-射线

断层扫描系统。

3.4 MPE和RRE的有效应用

MPE和RRE的关键问题是关于最小二乘问题的精确解

和尽可能减少计算时间和存储空间开销。

关于最小二乘问题的解决,可以采用对基{{\rm {\bf U}}_k^{\left( n    \right)} }进行QR分解:

$latex \displaystyle

{\rm {\bf U}}_k^{\left( n \right)} ={\rm {\bf Q}}_k {\rm {\bf R}}_k

&fg=000000$

此处{{\rm {\bf Q}}_k }是N{\times }(k

+1)阶酉矩阵,可记为

$latex \displaystyle

{\rm {\bf Q}}_k =\left[ {{\rm {\bf q}}_0 \vert {\rm {\bf q}}_1 \vert …

\vert {\rm {\bf q}}_k } \right] \ \ \ \ \ (22)&fg=000000$

其中,{{\rm {\bf q}}_i^\ast {\rm {\bf q}}_j =\delta _{ij} }

{{\rm {\bf R}}_k }是(k+1){\times }    (k+1)阶上三角矩阵,对角线元素均为正数:

$latex \displaystyle {\rm {\bf R}}_k =\left[ {{\begin

{array}{*{20}c} {r_{00} } \hfill } \hfill & \cdots \hfill } \hfill \\ \hfill } \hfill & \cdots \hfill }

\hfill \\ \hfill & \hfill & \ddots \hfill & \vdots \hfill \\ \hfill &

\hfill & \hfill } \hfill \\ \end{array} }} \right]\mbox{; }r_{ii}

\mbox{>0, }i=0,1,…,k \ \ \ \ \ (23)&fg=000000$

QR分解可采用改良Gram-Schmidt正交方法(MGS),表-3描述了对{{\rm    {\bf U}}_k^{\left( n \right)} }应用MGS的流程:

\centerline{\includegraphics[width=5.93in,height=1.48in]{mytex33.eps}}

\caption{MGS算法}

此处{\left\| {\rm {\bf x}} \right\|\mbox{-}\sqrt {{\rm {\bf x}}^\ast    {\rm {\bf x}}} }{{\rm {\bf u}}_i^{\left( {j+1} \right)}    }重写为{{\rm {\bf u}}_i^{\left( j \right)} }

如此则{{\rm {\bf u}}_{n+i} ,{\rm {\bf u}}_i^{\left( j \right)} ,{\rm    {\bf q}}_i }占有相同的存储空间。

很重要的一点是,当构建矩阵{{\rm {\bf U}}_k^{\left( n \right)} }    时,我们将{{\rm {\bf x}}_{n+i} }重写为{{\rm {\bf u}}_{n+i} ={\rm {\bf x}}_{n+i} },而只保存{{\rm {\bf x}}_n };下一步,当计算矩阵{{\rm {\bf Q}}_k }    时,我们把{{\rm {\bf q}}_i \mbox{,i=0,1,...,k}}    重写为{{\rm {\bf u}}_{n+i} }。这就意味着,在计

{{\rm {\bf Q}}_k }{{\rm {\bf R}}_x }    的每一阶段,我们都只保留k+2个向量,{{\rm {\bf x}}_{n+1}    ,...,{\rm {\bf x}}_{n+k+1} }无需保存。

表-4展示了采用QR分解的MPE和RRE算法,它们具有一致的框架:

\centerline{\includegraphics[width=6.00in,height=5.63in]{mytex34.eps}}

\caption{采用QR分解的MPR/RRE算法}

3.5 误差估计

1、 对于线性序列:当(6)中的迭代向量{{\rm {\bf    x}}_i }是线性时,有:

$latex \displaystyle {\rm

{\bf r}}\left( {\rm {\bf x}} \right)={\rm {\bf b}}-\left( {{\rm {\bf I}}-

{\rm {\bf A}}} \right){\rm {\bf x}}=\left( {{\rm {\bf Ax}}+{\rm {\bf b}}}

\right)-{\rm {\bf x}} &fg=000000$

即:{{\rm {\bf r}}\left( {{\rm {\bf x}}_n } \right)={\rm {\bf x}}_    {n+1} -{\rm {\bf x}}_n ={\rm {\bf u}}_n }

在应用MPE和RRE算法时,考虑{\sum\nolimits_{i=0}^k {\gamma _i =1} }    ,可得

$latex \displaystyle

{\rm {\bf r}}\left( {{\rm {\bf s}}_{n,k} } \right)=\sum\limits_{i=0}^k

{\gamma _i {\rm {\bf u}}_{n+i} } ={\rm {\bf U}}_k^{\left( n \right)} \gamma

\ \ \ \ \ (24)&fg=000000$

我们考察其{l_2 }范数,即:

$latex

\displaystyle \left\| {{\rm {\bf r}}\left( {{\rm {\bf s}}_{n,k} } \right)}

\right\|=\left\| {{\rm {\bf U}}_k^{\left( n \right)} \gamma } \right\|

&fg=000000$

2、 对于非线性序列:当(20)中的{{\rm {\bf x}}_i    }是线性时,有:

$latex \displaystyle {\rm {\bf

r}}\left( {{\rm {\bf s}}_{n,k} } \right)={\rm {\bf F}}\left( {{\rm {\bf

s}}_{n,k} } \right)-{\rm {\bf s}}_{n,k} \approx {\rm {\bf U}}_k^{\left( n

\right)} \gamma &fg=000000$

因此:

$latex \displaystyle \left\| {{\rm {\bf r}}\left(

{{\rm {\bf s}}_{n,k} } \right)} \right\|\approx \left\| {{\rm {\bf U}}_k^

{\left( n \right)} \gamma } \right\| &fg=000000$

不论{{\rm {\bf x}}_i }是否是线性的,{\left\|    {{\rm {\bf U}}_k^{\left( n \right)} \gamma } \right\|}都可以无需

计算{{\rm {\bf s}}_{n,k} }而得出:

$latex

\displaystyle \left\| {{\rm {\bf U}}_k^{\left( n \right)} \gamma } \right

\|=\left\{ {{\begin{array}{*{20}c} {r_{kk} \left| {\gamma _k } \right|

\mbox{ for MPE}} \hfill \\ {\sqrt \lambda \mbox{ for RRE}} \hfill \\ \end

{array} }} \right. &fg=000000$

此处{r_{kk} }是矩阵{{\rm {\bf R}}_k }

对角线上的最后一个元素,{\lambda }是上一节算法的第三步中

得到的参数(文献[44])。

3.6 MPE和RRE的误差分析

关于MPE和RRE的线性误差分析在文献[42、46、45、48、49]中已有论述。本文将重点

介绍非线性的情况下,向量外推方法可以达到怎样的性能。

定理1 假设向量{{\rm {\bf x}}_n }满足:

$latex \displaystyle {\rm {\bf x}}_n ={\rm {\bf s}}+\sum

\limits_{i=1}^p {{\rm {\bf v}}_i \lambda _i^n } &fg=000000$

此处,向量{{\rm {\bf v}}_i }是线性独立的,非零标量{\lambda _i \ne 1},取值不同,且满足:

$latex

\displaystyle \left| {\lambda _1 } \right|\ge \left| {\lambda _2 }

\right|\ge … &fg=000000$

若有{\left| {\lambda _k } \right|\ge \left| {\lambda _{k+1} }    \right|},则对于MPE和RRE,有:

$latex

\displaystyle {\rm {\bf s}}_{n,k} -{\rm {\bf s}}=o\left( {\lambda _{k+1}^n

} \right)\mbox{ as }n\rightarrow \infty &fg=000000$

以及:

$latex \displaystyle \mathop {\lim }\limits_{n

\rightarrow \infty } \sum\limits_{i=0}^k {\gamma _i^{\left( {n,k} \right)}

z^i} =\prod\limits_{i=1}^k {\frac{\lambda -\lambda _i }{1-\lambda _i }}

&fg=000000$

此处,{\gamma _i^{\left( {n,k} \right)} }代表{\gamma _i }

{{\rm {\bf x}}_n }是(6)中迭代产生时

{\lambda _i }部分或者全部是迭代矩阵{{\rm {\bf    A}}}{{\rm {\bf A}}}是对角化的)的非零特征值

{{\rm {\bf v}}_i }是对应的特征向量。

定理2 假设向量{{\rm {\bf x}}_n }{{\rm    {\bf x}}_{n+1} ={\rm {\bf Ax}}_n +{\rm {\bf b}}}迭代产生,矩阵

{{\rm {\bf I}}-{\rm {\bf A}}}非奇异,{\left( {{\rm    {\bf I}}-{\rm {\bf A}}} \right){\rm {\bf x}}={\rm {\bf b}}}的解

{{\rm {\bf s}}}{{\rm {\bf A}}}的非

零特征值为:

$latex \displaystyle \left| {\lambda _1 }

\right|\ge \left| {\lambda _2 } \right|\ge … &fg=000000$

不论是否{\left| {\lambda _k } \right|\ge \left| {\lambda _{k+1} }    \right|},对于

(i)RRE无条件满足;

(ii)MPE在提供{{\rm {\bf I}}-{\rm {\bf A}}}的特征值(这

些特征值均位于一条复平面上通过原点的直线的同一侧,例如当{{\rm {\bf    A}}+{\rm {\bf A}}^\ast }是正定)时,

下式均成立:

$latex \displaystyle {\rm {\bf s}}_{n,k} –

{\rm {\bf s}}=o\left( {\lambda _{k+1}^n } \right)\mbox{ as }n\rightarrow

\infty &fg=000000$

综合定理1、2,注意到我们并未假设{\lim _{n\rightarrow \infty } {\rm    {\bf x}}_n }存在,对于{\lim _{n\rightarrow \infty } {\rm    {\bf x}}_n }不存在的情况,{\left| {\lambda _1 } \right|    \ge 1}的条件是必须的。

事实上,若用{{\rm {\bf x}}_i }逼近s,可得误差

$latex \displaystyle ?_n ={\rm {\bf x}}_n -{\rm {\bf s}}=o

\left( {\lambda _1^n } \right)\mbox{ as }n\rightarrow \infty

&fg=000000$

因此,无论{\lim _{n\rightarrow \infty } {\rm {\bf x}}_n }    存在与否,只要给定{\left| {\lambda _{k+1} } \right|    <1},我们均可得到{\lim _{n\rightarrow \infty } {\rm    {\bf s}}_{n,k} ={\rm {\bf s}}}。此外,当给定{\left|    {\lambda _{k+1} } \right|<\left| {\lambda _1 } \right|}时,序

{{\rm {\bf x}}_n }收敛于s{{\rm {\bf    s}}_{n,k} }收敛于s的速度更快。即,MPE和RRE方法加速了序列

{\left\{ {{\rm {\bf x}}_n } \right\}}的收敛速度。

3.7 循环MPE/RRE

定理1、2需要固定k,然后令n趋于无限大,很显然

实际中无法满足这个条件;此外增加k也可以使得MPE和RRE的收敛速度加快,然而我们

同样无法无限增大k。

循环(或者再启动)方法可以用来解决上述问题,在循环模式中,n和k是固定的,

表-5的算法流程中1~3步称为一个「循环」,第i次循环的{{\rm {\bf s}}_    {n,k} }记作{{\rm {\bf s}}_{n,k}^{\left( i \right)} }

\centerline{\includegraphics[width=5.91in,height=1.30in]{mytex35.eps}}

\caption{循环模式下的向量外推算法}

循环模式带来的两个好处(文献[48,49]):

1、产生一个紧致的误差上界;

2、防止了「广义最小残差停滞」(GMRES stagnates)

循环MPE/RRE在非线性系统下的性能分析(文献[50,51])带来的启发意义是,当第i次

循环中k接近k{_{i}}时,矩阵{{\rm {\bf {F}'}}\left(    {\rm {\bf s}} \right)}关于{?_\mbox{0} \mbox{=}{\rm {\bf    x}}_\mbox{0} -{\rm {\bf s}}}的极小多项式的阶数——序列{\left\{ {{\rm {\bf s}}_{n,k}^{\left( i \right)} } \right\}_{i=0}^\infty }    将二阶收敛于{{\rm {\bf s}}}

但由于k{_{i}}可以等于N,且无法知道它的准确大小,对于大规

模问题的应用,其存储开销可能会很大;另一方面,尝试从循环MPE/RRE实现二阶收敛

可能是不现实的。

3.8 与Krylov子空间方法结合

对于线性序列应用,MPE和RRE方法与Krylov子空间

方法——如Arnoldi(文献[1])、GMRES(文献[38])——关系很大。文献[43]给

出了如下定理:

定理3 考察线性系统{\left( {{\rm {\bf I}}-{\rm {\bf A}}}    \right){\rm {\bf x}}={\rm {\bf b}}},其中{{\rm {\bf x}}_0    }是初始向量,令向量序列{\left\{ {{\rm {\bf x}}_n }    \right\}}通过{{\rm {\bf x}}_{n+1} ={\rm {\bf Ax}}_n +{\rm    {\bf b}}}生成。分别应用MPE、RRE,从该序列中生成{{\rm {\bf    s}}_{0,k}^{MPE} }{{\rm {\bf s}}_{0,k}^{RRE} }    ;同样分别应用k步Arnoldi和GMRES,从{\left( {{\rm {\bf    I}}-{\rm {\bf A}}} \right){\rm {\bf x}}={\rm {\bf b}}}中生成

{{\rm {\bf s}}_k^{\mbox{Arnoldi}} }{{\rm {\bf    s}}_k^{\mbox{GMRES}} }。则有如下关系:

{{\rm {\bf s}}_{0,k}^{MPE} ={\rm {\bf s}}_k^{\mbox{Arnoldi}} }    {{\rm {\bf s}}_{0,k}^{RRE} ={\rm {\bf s}}_k^{\mbox    {GMRES}} }

3.9 循环SMACOF算法

为了加速SMACOF的收敛速度,我们将循环模式应用进来,由于这是个非线性问题,故

外推法的近似极限向量并不一定会有低的stress值。因此我们必须采取某种保护措施

一种方法是检查外推极限的stress值,这个值若较高,则采用最后一次的迭代向量代

替,这个方法在表- 5 循环模式下的向量外推算法中的第五步做一个简单更改即可,

表-6做出了算法描述:
第二种方法是减小步长。

参考文献

[1] Walter .E. Arnoldi. The principle of minimized iterations in the

solution
of the matrix eigenvalue problem. Quart. Appl. Math., 9:17–29, 1951.
[2] Brian T. Bartell, Garrison W. Cottrell, and Richard K. Belew. Latent
semantic indexing is an optimal special case of multidimensional scaling.
In SIGIR ’92: Proceedings of the 15th annual international ACM SIGIR
conference on Research and development in information retrieval, pages
161–167, New York, NY, USA, 1992. ACM Press.
[3] Mikhail Belkin and Partha Niyogi. Laplacian eigenmaps and spectral

tech-
niques for embedding and clustering. In T. G. Dietterich, S. Becker, and
Z. Ghahramani, editors, Advances in Neural Inf. Proc. Sys., volume 14,
pages 585–591, Cambridge, MA, 2002. MIT Press.
[4] Ronald F. Boisvert, Roldan Pozo, Karin Remington, Richard Barrett, and
Jack J. Dongarra. The Matrix Market: A web resource for test matrix
collections. In Ronald F. Boisvert, editor, Quality of Numerical Software,
Assessment and Enhancement, pages 125–137, London, 1997. Chapman
[5] Ingwer Borg and Patrick Groenen. Modern multidimensional scaling: The-
ory and applications. Springer Verlag, New York, 1997.
[6] Ulrik Brandes and Christian Pich. Eigensolver methods for progressive
multidimensional scaling of large data. In Michael Kaufmann and Dorothea
Wagner, editors, Graph Drawing, Karlsruhe, Germany, September 18-20,
2006, pages pp. 42–53. Springer, 2007.
[7] Achi Brandt and Vladimir Mikulinsky. On recombining iterants in multi-
grid algorithms and problems with small islands. SIAM J. Sci. Comput.,
16(1):20–28, 1995.
[8] Alex M. Bronstein, Michael M. Bronstein, and Ron Kimmel. Expression-
invariant face recognition via spherical embedding. In Proc. IEEE Inter-
national Conf. Image Processing (ICIP), 2005.
[9] Alex M. Bronstein, Michael M. Bronstein, and Ron Kimmel. Three-
dimensional face recognition. International Journal of Computer Vision,
64(1):5–30, August 2005.
[10] Michael M. Bronstein, Alex M. Bronstein, R. Kimmel, and I. Yavneh.
Multigrid multidimensional scaling. Numerical Linear Algebra with Appli-
cations, Special issue on multigrid methods, 13(2-3):149–171, March-April
2006.
[11] Stan Cabay and L.W. Jackson. A polynomial extrapolation method for
finding limits and antilimits of vector sequences. SIAM J. Numer. Anal.,
13:734–752, 1976.
[12] Matthew Chalmers. A linear iteration time layout algorithm for

visualising
high-dimensional data. In IEEE Visualization, pages 127–132, 1996.
[13] Ka W. Cheung and Hing C. So. A multidimensional scaling framework for
mobile location using time-of-arrival measurements. IEEE transactions on
signal processing, 53(2):460–470, 2005.
[14] Ronald R. Coifman, Stephane Lafon, Ann B. Lee, Mauro Maggioni, Boaz
Nadler, Frederick Warner, and Steven W. Zucker. Geometric diffusions as
a tool for harmonic analysis and structure definition of data. Proc. Natl.
Acad. Sci. USA, 102(21):7426–7431, May 2005.
[15] Lee G. Cooper. A review of multidimensional scaling in marketing

research.
Applied Psychological Measurement, 7(4):427–450, 1983.
[16] Payel Das, Mark Mol, Hernan Stamati, Lydia E. Kavraki, , and Cecilia
Clementi. Low-dimensional, free-energy landscapes of protein-folding reac-
tions by nonlineardimensionality reduction. Proc. Natl. Acad. Sci. USA,
103(26):9885–9890, June 2006.
[17] Vin de Silva and Joshua B. Tenenbaum. Global versus local methods in
nonlinear dimensionality reduction. In Suzanna Becker, Sebastian Thrun,
and Klaus Obermayer, editors, Advances in Neural Inf. Proc. Sys., pages
705–712. MIT Press, 2002.
[18] Roberto Moreno Diaz and Alexis Quesada Arencibia, editors. Coloring of
DT-MRI Fiber Traces using Laplacian Eigenmaps, Las Palmas de Gran
Canaria, Spain, February 24–28 2003. Springer Verlag.
[19] Robert P. Eddy. Extrapolating to the limit of a vector sequence. In

P.C.C.
Wang, editor, Information Linkage Between Applied Mathematics and In-
dustry, pages 387–396, New York, 1979. Academic Press.
[20] Asi Elad and Ron Kimmel. On bending invariant signatures for surfaces.
IEEE Trans. Pattern Anal. Mach. Intell., 25(10):1285–1295, 2003.
[21] Christos Faloutsos and King-Ip Lin. FastMap: A fast algorithm for

index-
ing, data-mining and visualization of traditional and multimedia datasets.
In Michael J. Carey and Donovan A. Schneider, editors, Proceedings of the
1995 ACM SIGMOD International Conference on Management of Data,
pages 163–174, San Jose, California, 22–25 May 1995.
[22] Ronald A. Fisher. The systematic location of genes by means of

crossover
observations. The American Naturalist, 56:406–411, 1922.
[23] Emden R. Gansner, Yehuda Koren, and Stephen C. North. Graph drawing
by stress majorization. In J´anos Pach, editor, Graph Drawing, volume 3383
of Lecture Notes in Computer Science, pages 239–250. Springer, 2004.
[24] Gene H. Golub and Charles F. Van Loan. Matrix Computations. The Johns
Hopkins University Press, London, third edition, 1996.
[25] Louis Guttman. A general nonmetric technique for finding the smallest
coordinate space for a configuration of points. Psychometrika, 33:469–506,
1968.
[26] Yoshi hiro Taguchi and Yoshitsugu Oono. Relational patterns of gene

ex-
pression via non-metric multidimensional scaling analysis. Bioinformatics,
21(6):730–740(11), march 2005.
[27] Anthony Kearsley, Richard Tapia, and Michael W. Trosset. The solution
of the metric stress and sstress problems in multidimensional scaling using
Newton’s method. Computational Statistics, 13(3):369–396, 1998.
[28] Yosi Keller, Stephane Lafon, and Michael Krauthammer. Protein cluster
analysis via directed diffusion. In The fifth Georgia Tech International
Conference on Bioinformatics, November 2005.
[29] Jospeh B. Kruskal. Multidimensional scaling by optimizing goodness of

fit
to a nonmetric hypothesis. Psychometrika, 29:1–27, 1964.
[30] Cecilio Mar-Molinero and Carlos Serrano-Cinca. Bank failure: a

multidi-
mensional scaling approach. The European Journal of Finance, 7(2):165–
183, June 2001.
[31] M. Me˘sina. Convergence acceleration for the iterative solution of the

equa-
tions X = AX + f. Comput. Methods Appl. Mech. Engrg., 10:165–173,
1977.
[32] Alistair Morrison, Greg Ross, and Matthew Chalmers. Fast multidimen-
sional scaling through sampling, springs and interpolation. Information
Visualization, 2(1):68–77, 2003.
[33] Philip H. Frances Patrick Groenen. Visualizing time-varying

correlations
across stock markets. Journal of Empirical Finance, 7:155–172, 2000.
[34] Robert Pless. Using Isomap to explore video sequences. In Proceedings

of
the 9th International Conference on Computer Vision, pages 1433–1440,
Nice, France, October 2003.
[35] Keith T. Poole. Nonparametric unfolding of binary choice data.

Political
Analysis, 8(3):211–237, March 2000.
[36] Guy Rosman, Alex M. Bronstein, Michael M. Bronstein, and Ron Kimmel.
Topologically constrained isometric embedding. In Proc. Conf. on Machine
Learning and Pattern Recognition (MLPR), 2006.
[37] Sam T. Roweis and Lawrence K. Saul. Nonlinear dimensionality reduction
by locally linear embedding. Science, 290:2323–2326, 2000.
[38] Yousef Saad and Martin H. Schultz. GMRES: A generalized minimal resid-
ual method for solving nonsymmetric linear systems. SIAM J. Sci. Statist.
Comput., 7:856–869, 1986.
[39] J.R. Schmidt. On the numerical solution of linear simultaneous

equations
by an iterative method. Phil. Mag., 7:369–383, 1941.
[40] Eric. L. Schwartz, Alan Shaw, and Estarose Wolfson. A numerical

solution
to the generalized mapmaker’s problem: Flattening nonconvex polyhedral
surfaces. IEEE Trans. Pattern Anal. Mach. Intell., 11:1005–1008, Novem-
ber 1989.
[41] Daniel Shanks. Nonlinear transformations of divergent and slowly

conver-
gent sequences. J. Math. and Phys., 34:1–42, 1955.
[42] Avram Sidi. Convergence and stability properties of minimal polyno-
mial and reduced rank extrapolation algorithms. SIAM J. Numer. Anal.,
23:197–209, 1986. Originally appeared as NASA TM-83443 (1983).
[43] Avram Sidi. Extrapolation vs. projection methods for linear systems of
equations. J. Comp. Appl. Math., 22:71–88, 1988.
[44] Avram Sidi. Efficient implementation of minimal polynomial and reduced
rank extrapolation methods. J. Comp. Appl. Math., 36:305–337, 1991.
Originally appeared as NASA TM-103240 ICOMP-90-20.
[45] Avram Sidi. Convergence of intermediate rows of minimal polynomial and
reduced rank extrapolation tables. Numer. Algorithms, 6:229–244, 1994.
[46] Avram Sidi and J. Bridger. Convergence and stability analyses for some
vector extrapolation methods in the presence of defective iteration

matrices.
J. Comp. Appl. Math., 22:35–61, 1988.
[47] Avram Sidi, William F. Ford, and David A. Smith. Acceleration of con-
vergence of vector sequences. SIAM J. Numer. Anal., 23:178–196, 1986.
Originally appeared as NASA TP-2193, (1983).
[48] Avram Sidi and Yair Shapira. Upper bounds for convergence rates of

vector
extrapolation methods on linear systems with initial iterations. Technical
Report 701, Computer Science Department, Technion–Israel Institute of
Technology, 1991. Appeared also as NASA Technical memorandum 105608,
ICOMP-92-09, (1992).
[49] Avram Sidi and Yair Shapira. Upper bounds for convergence rates of ac-
celeration methods with initial iterations. Numer. Algorithms, 18:113–132,
1998.
[50] Stig Skelboe. Computation of the periodic steady-state response of

nonlin-
ear networks by extrapolation methods. IEEE Trans. Circuits and Systems,
27:161–175, 1980.
[51] David A. Smith, William F. Ford, and Avram Sidi. Extrapolation methods
for vector sequences. SIAM Rev., 29:199–233, 1987.
[52] Joshua B. Tenenbaum, Vin de Silva, and John C. Langford. A global
geometric framework for nonlinear dimensionality reduction. Science,
290(5500):2319–2323, December 2000.
[53] Warren S. Torgerson. Multidimensional scaling I. theory and method.
PSym, 17:401–419, 1952.
[54] Shusaku Tsumoto and Shoji Hirano. Visualization of rule’s similarity

using
multidimensional scaling. In ICDM, pages 339–346, 2003.
[55] Jason Tsong-Li Wang, Xiong Wang, King-Ip Lin, Dennis Shasha, Bruce A.
Shapiro, and Kaizhong Zhang. Evaluating a class of distance-mapping
algorithms for data mining and clustering. In KDD ’99: Proceedings of the
fifth ACM SIGKDD international conference on Knowledge discovery and
data mining, pages 307–311, New York, NY, USA, 1999. ACM Press.
[56] Kilian Q. Weinberger and Laurence K. Saul. Unsupervised learning of
image manifolds by semidefinite programming. In Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, volume 2, pages
988–995, Washington D.C., 2004. IEEE Computer Society.
[57] Tynia Yang, Jinze Liu, Leonard McMillan, and Wei Wang. A fast approx-
imation to multidimensionalscaling. In IEEE workshop on Computation
Intensive Methods for Computer Vision, 2006.
[58] Gil Zigelman, Ron Kimmel, and Nahum Kiryati. Texture mapping us-
ing surface flattening via multidimensional scaling. IEEE Transactions on
Visualization and Computer Graphics, 8(2):198–207, 2002.

Advertisements
  1. No trackbacks yet.

发表评论

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / 更改 )

Twitter picture

You are commenting using your Twitter account. Log Out / 更改 )

Facebook photo

You are commenting using your Facebook account. Log Out / 更改 )

Google+ photo

You are commenting using your Google+ account. Log Out / 更改 )

Connecting to %s