Optimal Sparse L1-Norm Principal-Component Analysis

Coordinate-based preference of data processing motivates the introduction of the concept of sparsity over the designed principal components (PCs). The Sparse Principal Component Analysis (Sparse PCA) problem is a variant of the classical PCA problem.

The main contribution of this reseach is to present an algorithm for the optimal computation of the robust sparse \(L_1\)-norm principal component $\mathbf{q}_{{\it{L}}_1}^{S}$ of any data matrix \({\bf{X}}\in \mathbb{R}^{D\times N}\) that can be mathematically formulated as $$\mathbf{q}_{{\it{L}}_1}^{S}= \underset{\substack{\mathbf{q}\in \mathbb{R}^{\it{D}},~\|\mathbf{q}\|=1,\\ \|\mathbf{q}\|_{0}= S} }{\arg\max} \|{\bf{X}}^{T}\mathbf{q}\|_{1}.$$ The above problem has an equivalent maximization problem representation as $$ \underset{\substack{\mathbf{q}\in \mathbb{R}^{\it{D}},~ \|\mathbf{q}\|=1,\\ \|\mathbf{q}\|_{0}= S}} {\max} \|{\bf{X}}^{T}\mathbf{q}\|_{1} =\underset{{\bf{b}} \in \{\pm 1\}^{N}}{\max}\hspace{0.1 cm} \underset{\substack{\mathcal{I} \subseteq \mathcal{D},\\ |{\mathcal{I}}|=S}}{\max}\hspace{0.1 cm} \| {\bf{X}}_{{\mathcal{I}},:} {\bf{b}} \|.$$ The work presented bifurcates the algorithm depending on the asymptotic nature of sample support $N$ and data dimension $D$.

Case $1$ $N < D$

Since $N$ is fixed, we prefer to exhaustively consider all binary vector ${\bf{b}} \in \{\pm 1\}^{N}$ with computational cost $\mathcal{O}(2^N)$. For every fixed binary vector $\bf b$, the optimal index set is computed with complexity $\mathcal{O}(D)$. Thus the overall computational cost is $\mathcal{O}(2^ND)$ but for fixed sample support $N$, the algorithm presents the computational cost of $\mathcal{O}(D)$.

Case $2$ $D < N$

Since $D$ is fixed, we prefer to exhaustively consider all index set $\mathcal{} ~|{\mathcal{I}}|=S$ with computational cost $\mathcal{O}(D^S)$. For every fixed index set $\mathcal I$, the optimal binary vector is computed with complexity $\mathcal{O}(N^S)$. Thus the overall computational cost is $\mathcal{O} (D^SN^S)$ but for fixed data dimension $D$, the algorithm presents the computational cost of $\mathcal{O} (N^S)$, polynomial in sample support $N$. The complete optimal algorithm is given in pseudo code in Fig. 1.





Outlier Processing via L1-Principal Subspaces

With the advent of big data, there is a growing demand for smart algorithms that can extract relevant information from high-dimensional large data sets, potentially corrupted by faulty measurements (outliers). In this context, we present a novel line of research that utilizes the robust nature of $L_1$-norm subspaces for data dimensionality reduction and outlier processing. The overview of the proposed algoritm is as follows:

Step 1

Obtain $L_1$-principal components $\left( {\bf{Q}}_{L_1} \right)_{D\times P}$ of data matrix ${\bf{X}}\in\mathbb{R}^{D\times N}$ as $$ \mathbf{Q}_{{\it{L}}_1}= \hspace{-0.1 cm}\underset{ {\mathbf{Q}}\in \mathbb{R}^{\it{D\times P}},~ {\bf{Q}}^{T}{\bf{Q}}= {\bf{I}}_P } {\arg\max} \|{\bf{X}}^{T}\mathbf{Q}\|_{1}.$$

Step 2

Evaluate reliability weight corresponding to each sample as $\overline{w}_n = \|{\bf{x}}_n - {\bf{Q}}_{L_1}{\bf{Q}}_{L_1}^{T}{\bf{x}}_n \|_2^2~~~ \forall~ n \in \{1, 2, \ldots, N \}$ and further normalize it (relative degree of corruption) as $$ w_n = \frac{\overline{w}_n}{\sum_{i=1}^N \overline{w}_n}.$$

Step 3

Remove the potential outliers via conventional (K=2)-means clustering over the computed one-dimensional weights. Remove the samples corresponding to the samples retained in higher mean cluster ($\mathcal I$ contains the indices of samples to be removed) as $${\bf{X}}\left(:,{\mathcal{I}} \right)=\emptyset~~ \text{and} ~~{\bf{X}}_{clean} \leftarrow {\bf{X}}$$

Step 4

Recalculate $L_1$-principal components $\left({\bf{Q}}_{L_1}^{clean}\right)$ over outlier processed/removed data ${\bf{X}}_{clean}$ as $$\mathbf{Q}_{{\it{L}}_1}^{clean}= \hspace{-0.1 cm}\underset{ {\mathbf{Q}}\in \mathbb{R}^{\it{D\times P}},~ {\bf{Q}}^{T}{\bf{Q}}= {\bf{I}}_P } {\arg\max} \|{\bf{X}}^{T}_{clean}\mathbf{Q}\|_{1}.$$ It is expected that the so designed ${\bf{Q}}_{L_1}^{clean}$ principal subspaces reveal a deeper insight into the given data matrix than the original ${\bf Q}_{L_{1}}$ (or ${\bf Q}_{L_{2}}$) principal subspaces. For the benefit of the readers, we also outline the pseudo-code of the proposed scheme in Figure 2.





Efficient L1-Norm Principal-Component Analysis via Bit Flipping

It was shown recently that the $K$ L1-norm principal components (L1-PCs) of a real-valued data matrix ${\bf{X}} \in \mathbb{R}^{D\times N}$ ($N$ data samples of $D$ dimensions) can be exactly calculated with cost $\mathcal{O}(2^{NK})$ or, when advantageous, $\mathcal{O}(N^{dK-K+1})$ where $d$ = rank($\bf X$). However, the computational cost is prohibitive when data size $\bf X$ is large (“big” data of large $N$ and/or “heavy” data of large $D$). This paper presents a novel near-optimal algorithm for the calculation of the $K$ <$d$ L1-PCs of $\bf X$ of cost $ {\mathcal O}(ND~ min\{N, D\} + N^2(K^4 + dK^2) + dNK^3)$, which is comparable to that of standard ($L_2$-norm) PC analysis.

Given a data matrix $\bf{X}$ $=$ $[{{\bf{x}}_1}~ {{\bf{x}}_2} \cdots {\bf{x}_{\it{N}}}] \in \mathbb{R}^{D \times N} $ of rank $d \leq$ min {$D,~N$}, we are interested in calculating a low-rank data subspace of dimensionality $K < d$ in the form of an orthonormal basis $\mathbf{Q}_{{\it{L}}_1}$ that solves $$ \mathbf{Q}_{{\it{L}}_1}= \hspace{-0.1 cm}\underset{ {\mathbf{Q}}\in \mathbb{R}^{\it{D\times K}},~ {\bf{Q}}^{T}{\bf{Q}}= {\bf{I}}_K } {\arg\max} \|{\bf{X}}^{T}\mathbf{Q}\|_{1}.$$ The above problem can be equivalently represented as a binary combinatorial maximization problem as $$ {\bf{B}}_{\text{opt}}= \underset{{\bf{B}} \in \{\pm 1\}^{N\times K}}{\arg\max} \| {\bf{XB}} \|_{*}$$ such that $\|{\bf{X}}^{T}{\mathbf{Q}}_{{\it{L}}_1}\|_{1} = \| {\bf{XB}}_{\text{opt}} \|_{*} $. The presented algorithm begins by considering the trivial-case of $K=1$ single $L_1$-principal component and further generalize the concept to $K$>$1$ multiple $L_1$-PCs. For ease in reference, complete pseudo code for L1-BF, $K = 1$, is provided in Fig. 1. Formal proof of convergence, theoretical analysis of converging points, and detailed asymptotic complexity derivation were carried out. The proposed algorithm bridges the gap between outlier resistant and computationally efficient $L_1$-principal-component analysis.





Fastest-Known Near-ML Decoding of Golden Codes

The “Golden code” in particular is a full-rate full-diversity code for two or more receive antennas, that achieves exceptional error-rate performance when symbols are drawn from M-quadrature amplitude-modulation (QAM) constellations.

In this paper, we present a novel reliability-based lowest known average-complexity decoder for Golden code that practically matches the error-rate performance of the ML decoder. In the developed algorithm, we first apply a zero-forcing (ZF) filter to the received signal. Then, the output of the ZF filter is passed through a likelihood-based reliability metric calculator. Symbols deemed reliable are directly decoded and removed from the received signal. The remaining symbols (deemed unreliable) are decoded via reduced dimension (low-complexity) ML or near-ML decoders.

We follow the generic signal model over ${T=2}$ time slots from $M_t=2$ transmit antennas and received with $M_r=2$ receive antennas. Assuming perfect channel state information (CSI) at the receiver, the received signal matrix $\bf Y$ $\in$ $\mathbb{C}^{T \times M_r}$ is given by $$ \overbrace{\left[ \begin{array}{c c} y_{1} & y_{3}\\ y_{2} & y_{4}\\ \end{array} \right]}^{{ \bf { Y}}}= \sqrt{\frac{\rho}{2}} \overbrace{\left[ \begin{array}{c c} h_{1} & h_{3}\\ h_{2} & h_{4}\\ \end{array} \right]}^{\bf H}\bf{X}+ \overbrace{\left[ \begin{array}{c c} n_{1} & n_{3}\\ n_{2} & n_{4}\\ \end{array} \right]}^{\bf N}$$ where $\rho$ is the signal-to-noise ratio (SNR) per receive antenna, $\bf H$ is the channel matrix, and $\bf N$ is the noise matrix. The channel-equivalent real-valued representation of above equation is given by $$ {\bf y_e}=\sqrt{\frac{\rho}{2}}{\bf H_e}{\bf G}{\bf s}+{\bf n_e}$$ where ${\bf y_e}$, ${\bf n_e}$, and ${\bf s}$ can be constructed by alternately stacking element-wise real and imaginary components of ${\bf Y}$,~${\bf N}$, and ${\bf \tilde s}$ respectively. The Zero-forcing (ZF) filter is given by $$ {\bf W} = ({\bf H_e}{\bf G})\Bigl( \sqrt{\frac{\rho}{2}}{\bf G}^{T}{\bf H_e}\hspace{-0.00 cm}^{T}\hspace{0.00 cm}{\bf H_e} {\bf G}\Bigr) ^{-1}$$ and the output of ZF filter as ${\bf \bar{s}} = {\bf W}^{T}{\bf {y}_e}.$ We further implement the reliability based decoding to retain the reliable symbols and use the ML-decoder (higher cost) for non-reliable symbols. For the benefit of the readers, we also outline the pseudo-code of the proposed near-ML scheme in Fig. 1.

 

Copyright © The above publication material is presented to ensure timely dissemination of scholarly and technical work. Kindly cite the corresponding article, in case you use any part of the above provided code.