Note

The following content provides technical and mathematical background for the mprod-package. Most users of downstream applications such as TCAM would probably like to skip this part

\(\newcommand{\mat}[1]{\mathbf{#1}}\) \(\newcommand{\matM}{\mat{M}}\) \(\newcommand{\matMt}{\matM^{\T}}\) \(\newcommand{\matMi}{\matM^{-1}}\) \(\newcommand{\T}{\mat{T}}\) \(\newcommand{\xx}{\times}\) \(\newcommand{\mpn}{m \xx p \xx n}\) \(\newcommand{\pmn}{p \xx m \xx n}\) \(\newcommand{\tens}[1]{\mathcal{#1}}\) \(\newcommand{\tA}{\tens{A}}\) \(\newcommand{\tAt}{\tA^{\T}}\) \(\newcommand{\thA}{\widehat{\tA}}\) \(\newcommand{\thAt}{\thA^{\T}}\) \(\newcommand{\tC}{\tens{C}}\) \(\newcommand{\tCt}{\tC^{\T}}\) \(\newcommand{\thC}{\widehat{\tC}}\) \(\newcommand{\thCt}{\thC^{\T}}\) \(\newcommand{\tB}{\tens{B}}\) \(\newcommand{\tBt}{\tB^{\T}}\) \(\newcommand{\thB}{\widehat{\tB}}\) \(\newcommand{\thBt}{\thB^{\T}}\) \(\newcommand{\tsub}[1]{\xx_{#1}}\) \(\newcommand{\tsM}{\tsub{3}\matM}\) \(\newcommand{\tsMinv}{\tsub{3}\matM^{-1}}\) \(\newcommand{\mm}{\star_{\scriptscriptstyle \matM } }\) \(\newcommand{\RR}{\mathbb{R}}\) \(\newcommand{\tI}{\tens{I}}\) \(\newcommand{\thI}{\widehat{\tI}}\) \(\newcommand{\tE}{\tens{E}}\) \(\newcommand{\tQ}{\tens{Q}}\) \(\newcommand{\tQt}{\tQ^{\T}}\) \(\newcommand{\thQ}{\widehat{\tQ}}\) \(\newcommand{\thQt}{\thQ^{\T}}\) \(\newcommand{\tV}{\tens{V}}\) \(\newcommand{\tVt}{\tV^{\T}}\) \(\newcommand{\thV}{\widehat{\tV}}\) \(\newcommand{\thVt}{\thV^{\T}}\) \(\newcommand{\tU}{\tens{U}}\) \(\newcommand{\tUt}{\tU^{\T}}\) \(\newcommand{\thU}{\widehat{\tU}}\) \(\newcommand{\thUt}{\thU^{\T}}\) \(\newcommand{\tS}{\tens{S}}\) \(\newcommand{\tSt}{\tS^{\T}}\) \(\newcommand{\thS}{\widehat{\tS}}\) \(\newcommand{\thSt}{\thS^{\T}}\) \(\newcommand{\hsigma}{\hat{\sigma}}\) \(\newcommand{\rnk}{\operatorname{rank}}\) \(\newcommand{\rrho}{\boldsymbol{\rho}}\) \(\newcommand{\TNorm}[1]{\|#1\|_{2}}\) \(\newcommand{\FNorm}[1]{\|#1\|_{F}}\) \(\newcommand{\NNorm}[1]{\|#1\|_{*}}\) \(\newcommand{\FNormS}[1]{\FNorm{#1}^2}\) \(\newcommand{\TNormS}[1]{\TNorm{#1}^2}\)

The main functionality of mprod-package is factorization of tensors, that is, expressing a tensor \(\tA \in \RR^{d_1 \xx ... \xx d_N}\) as a product of other, “simpler” tensors. For this aim, one must first obtain some notion of tensor-tensor multiplication. The “M-product” (denoted by \(\mm\) ), defined in 1, refers to a “family” of tensor-tensor products, and provides the notion of multiplication which enables the factorization of tensors. Here, we briefly walk through the steps of \(\mm\)-product’s formal construction.

The M-product¶

We begin with some definitions. Let \(\matM\) be an \(n\xx n\) unitary matrix (\(\matM \matMt = \mat{I}_n = \matMt \matM\)), and a tensor \(\tA \in \RR^{\mpn}\). We define the domain transform specified by \(\matM\) as \(\thA := \tA \tsM\), where \(\tsM\) denotes the tensor-matrix multiplication of applying \(\matM\) to each of the tensor \(n\) dimensional tube fibers (\(\tA_{i,j,:}\)).

A practical demonstration using scipy and numpy libraries:

[1]:

import numpy as np
from scipy.stats import ortho_group # used for sampling random unitary matrices
                                    # from the Haar distribution

m, p, n = 10, 5, 8

A = np.random.randn(m, p, n) # generate a random tensor
M = ortho_group.rvs(n)       # random sample unitary M

A_hat = np.zeros_like(A)
for i in range(m):
    for j in range(p):
        A_hat[i,j,:] = M @ A[i,j,:]

Attention

The tensor-matrix product implementation is much more efficient than the above for loop

The transpose of a real \(\mpn\) tensor \(\tA\) with respect to \(\matM\), denoted by \(\tA^{\T}\), is a \(\pmn\) tensor for which

\[[\widehat{\tA^{\T}}]_{:,:,i} = [\thA^{\T}]_{:,:,i} = {[\thA]_{:,:,i}}^{\T}\]

Given two tensors \(\tA \in \RR^{\mpn}\) and \(\tB \in \RR^{p \xx r \xx n}\) , the facewise tensor-tensor product of \(\tA\) and \(\tB\), denoted by \(\tA \vartriangle \tB\) , is the \(m \xx r \xx n\) tensor for which

\[[\tA \vartriangle \tB]_{:,:,i} = \tA_{:,:,i} \tB_{:,:,i}\]

The \(\mm\) -product of \(\tA \in \RR^{\mpn}\) and \(\tB \in \RR^{p \xx r \xx n}\) is defined by

\[\tA \mm \tB := (\thA \vartriangle \thB) \tsMinv \in \RR^{m \xx r \xx n}\]

The mprod-package offers utility functions like m_prod implementing \(\mm\) as well as random and spectral analysis based generators of unitary transforms

[2]:

from mprod import  m_prod
from mprod import  generate_haar, generate_dct

funm_haar, invm_haar = generate_haar(n) # Utility wrapper arround
                                        #  scipy.stats.ortho_group
funm_dct, invm_dct = generate_dct(n)    # Generates dct and idct transforms using scipy's
                                        #  fft module. the default dct type is 2

# generate random tensor B
r = 15
B = np.random.randn(p,r,n)

# Multiply A and B with respect to a randomly sampled M
C_haar = m_prod(A,B,funm_haar, invm_haar)

# Multiply A and B with respect to M = dct
C_dct = m_prod(A,B,funm_dct, invm_dct)

print(np.linalg.norm(C_haar - C_dct))

102.53857064543121

As shown above, given two distinct transforms \({\matM}_1, {\matM}_2\) , we have that \(\tA \star_{\scriptstyle \matM_1} \tB\) and \(\tA \star_{\scriptstyle \matM_2} \tB\) are not equal in general. This fact, as we will see, provides high flexibility when applying \(\mm\) based dimensionality reduction schemes.

Two tensors \(\tA, \tB \in \RR^{1 \xx m \xx n}\) are called \(\mm\) -orthogonal slices if \(\tA^{\T} \mm \tB = \mathbf{0}\), where \(\mathbf{0} \in \RR^{1\xx 1 \xx n}\) is the zero tube fiber, while \(\tQ \in \RR^{m \xx m \xx n}\) is called \(\mm\) -unitary if \(\tQ^{\T} \mm \tQ = \tI = \tQ \mm \tQ^{\T}\) . A tensor \(\tB \in \RR^{p \xx k \xx n}\) is said to be a pseudo \(\mm\) -unitary tensor (or pseudo \(\mm\)-orthogonal) if \(\tB^{\T} \mm \tB\) is f-diagonal (i.e., all frontal slices are diagonal), and all frontal slices of \((\tB^{\T} \mm \tB) \tsM\) are diagonal matrices with entries that are either ones or zeros.

TSVDM¶

Let \(\tA \in \RR^{\mpn}\) be a real tensor, then is possible to write the full tubal singular value decomposition of \(\tA\) as

\[\tA = \tU \mm \tS \mm \tV^{\T}\]

where \(\tU, \tV\) are \((m \xx m \xx n)\) and \((p \xx p \xx n)\) , \(\mm\)-unitary tensors respectively, and \(\tS \in \RR^{\mpn}\) is an f-diagonal tensor, that is, a tensor whose frontal slices ( \(\tS_{:,:,i}\) ) are matrices with zeros outside their main diagonal.

We use the notation \(\hsigma_{j}^{(i)}\) do denote the \(j^{th}\) largest singular value on the \(i^{th}\) lateral face of \(\thS\):

\[\hsigma_{j}^{(i)} := \thS_{j,j,i}\]

[3]:

from mprod.decompositions import svdm
from mprod import tensor_mtranspose

U,S,V = svdm(A, funm_haar, invm_haar)

print("U:", "x".join(map(str, U.shape)))
print("S:", "x".join(map(str, S.shape)))
print("V:", "x".join(map(str, V.shape)),"\n")

# Note that for practical reasons, S is stored in a lean datastructure
# To obtain the "tensorial" representation of S, we do as follows
tens_S = np.zeros((p,p,n))
for i in range(n):
    tens_S[:S.shape[0],:S.shape[0],i] = np.diag(S[:,i])


# reconstruct the tensor
Vt = tensor_mtranspose(V,funm_haar, invm_haar)
US = m_prod(U, tens_S, funm_haar, invm_haar)
USVt = m_prod(US, Vt, funm_haar, invm_haar)

print("||A - USV'||^2 =",np.linalg.norm(A - USVt)**2) # practically 0

U: 10x5x8
S: 5x8
V: 5x5x8

||A - USV'||^2 = 5.098920121109679e-27

Tensor ranks and truncations¶

The t-rank of \(\tA\) is the number of nonezero tubes of \(\tS\):

\[r = | \left\{ i = 1, \dots, n ~;~ \FNormS{\tS_{i,i,:}} > 0 \right\} |\]

\(\tA^{(q)} = \tU_{:,1:q, :} \mm \tS_{1:q,1:q,:} \mm {\tV_{:,1:q,:}}^{\T}\) denotes the t-rank \(q\) truncation of \(\tA\) under \(\mm\)

The multi-rank of \(\tA\) under \(\mm\), denoted by the vector \(\rrho \in \mathbb{N}^{n}\) whose \(i^{th}\) entry is

\[\rrho_i = \rnk (\thA_{:,:,i})\]

The multi-rank \(\rrho\) truncation of \(\tA\) under \(\mm\) is given by the tensor \(\tA_{\rrho}\) for which

\[\widehat{\tA_{\rrho}}_{:,:,i} = \thU_{:,1:\rrho_i, i} \thS_{1:\rrho_i,1:\rrho_i,i} {\thV_{:,1:\rrho_i,i}}^{\T}\]

The implicit rank under \(\mm\) of a tensor \(\tA\) with multi-rank \(\rrho\) under \(\mm\) is

\[r = \sum_{i=1}^{n} \rrho_i\]

Note that for t-rank truncation the \(\tU\) and \(\tV\) factors are \(\mm\)-orthogonal, while for multi-rank truncation they are only pseudo \(\mm\)-orthogonal.

[4]:

# t-rank 4 trunctation
q = 4
tens_S_t_hat = funm_haar(tens_S.copy())
tens_S_t_hat[q:,q:,:] = 0
tens_S_t = invm_haar(tens_S_t_hat)
A4 = m_prod(m_prod(U, tens_S_t, funm_haar, invm_haar), Vt, funm_haar, invm_haar)


# multi-rank rho trunctation
rho = [1,3,2,2,3,1,4,3] # this is the multi-rank vector
tens_S_rho_hat = funm_haar(tens_S.copy())
for i in range(n):
    tens_S_rho_hat[rho[i]:,rho[i]:,i] = 0

tens_S_rho = invm_haar(tens_S_rho_hat)
A_rho = m_prod(m_prod(U, tens_S_rho, funm_haar, invm_haar), Vt, funm_haar, invm_haar)

Let \(\tA = \tU \mm \tS \mm \tV^{\T} \in \RR^{\mpn}\), we will use \(j_1,\dots, j_{np}\) and \(i_1,\dots, i_{np}\) to denote the indexes of the non-zeros of \(\thS\) ordered in decreasing order. That is

\[\hsigma_{\ell} := \hsigma_{j_{\ell}}^{(i_{\ell})}\]

where \(\hsigma_1 \geq \hsigma_2 \geq \dots \geq \hsigma_{np}\) .

For \(q = 1 , \dots , p n\) , the explicit rank- \(q\) truncation under \(\mm\) of a tensor \(\tA\), denoted by \(\tA_q = \tA_{\rrho}\) , where \(\tA_{\rrho}\) is the tensor of multi-rank \(\rrho\) under \(\mm\) such that

\[\rrho_i = \max \{ j = 1, \dots ,p ~|~ (j,i) \in \{(j_1, j_1), \dots, (j_q, i_q)\} \} .\]

In words, we keep the \(q\) top singular values of any frontal slice of \(\thS\), and zero out the rest.

Note

We have that \(\tA^{(q)}, \tA_{\rrho}\) and \(\tA_{q}\) are the best t-rank \(q\), multi-rank \(\rrho\) and explicit-rank \(q\) (under \(\mm\)) approximations of \(\tA\), respectively.

The effect of choosing different transforms¶

To demonstrate how might the choice of \(\matM\) influence the resulting decomposition, we use the real-world time-series dataset obtained from a study on Pediatric Ulcerative Colitis (PUC) by 2.

First, we obtain the data table from our analysis GitHub repo, construct a tensor from the data and apply TSVDM with respect to both randomly sampled \(\matM\) and the DCT.

Note that in generate_haar function call, we set the random_state parameter to an integer (123) just so that the results are reproducible.

[5]:

import pandas as pd
from mprod import table2tensor

file_path = "https://raw.githubusercontent.com/UriaMorP/" \
            "tcam_analysis_notebooks/main/Schirmer2018/Schirmer2018.tsv"

data_raw = pd.read_csv(file_path, index_col=[0,1], sep="\t"
                       , dtype={'Week':int})

data_tensor, map1, map3 =  table2tensor(data_raw)

m,p,n = data_tensor.shape

# Generate transforms according to the
# relevant dimensions
funm_haar, invm_haar = generate_haar(n,random_state=123)
funm_dct, invm_dct = generate_dct(n)


# Haar
Uhaar, Shaar, Vhaar = svdm(data_tensor, funm_haar, invm_haar)
print("shape of S, by randomly sampled transform:", Shaar.shape)
# DCT
Udct, Sdct, Vdct = svdm(data_tensor, funm_dct, invm_dct)
print("shape of S, by DCT:", Sdct.shape)

shape of S, by randomly sampled transform: (87, 4)
shape of S, by DCT: (4, 4)

In this case, we have that the t-rank of our data under the DCT domain transform is 4, and 87 under \(\mm\) where \(\matM\) is obtained from randomly sampling the Haar distribution.

Even though it is not generally true that choosing \(\matM\) as DCT (the t-product) results in better compression, the fact that it does so for time-series data makes perfect sense; Since we assume that time-series data are samples of continuous functions, which, are easy to approximate well using very few DCT basis elements.

1: Misha E. Kilmer, Lior Horesh, Haim Avron, and Elizabeth Newman. Tensor-tensor algebra for optimal representation and compression of multiway data. Proceedings of the National Academy of Sciences, 118(28):e2015851118, jul 2021. URL: https://www.pnas.org/content/118/28/e2015851118 https://www.pnas.org/content/118/28/e2015851118.abstract, doi:10.1073/PNAS.2015851118.
2: Melanie Schirmer, Lee Denson, Hera Vlamakis, Eric A. Franzosa, Sonia Thomas, Nathan M. Gotman, Paul Rufo, Susan S. Baker, Cary Sauer, James Markowitz, Marian Pfefferkorn, Maria Oliva-Hemker, Joel Rosh, Anthony Otley, Brendan Boyle, David Mack, Robert Baldassano, David Keljo, Neal LeLeiko, Melvin Heyman, Anne Griffiths, Ashish S. Patel, Joshua Noe, Subra Kugathasan, Thomas Walters, Curtis Huttenhower, Jeffrey Hyams, and Ramnik J. Xavier. Compositional and Temporal Changes in the Gut Microbiome of Pediatric Ulcerative Colitis Patients Are Linked to Disease Course. Cell Host & Microbe, 24(4):600–610.e4, oct 2018. URL: https://linkinghub.elsevier.com/retrieve/pii/S1931312818304931, doi:10.1016/j.chom.2018.09.009.