# [Eeglablist] value of PCA pre-processing before running ICA on EEG data?

arno arno at salk.edu
Thu Aug 3 12:41:45 PDT 2006

```Dear Anish,
> [1] How do I know how many dimensions to reduce the data to?  So far, I have
> been choosing to keep just enough principal components such that ~ 99% of
> the variance is retained (but I only picked that value arbitrarily), which
> usually halves the dimension of the data.
>
There is no accepted rule. If you 'n' sample points, you should use no
more than sqrt(n) "channels" (so if you have more than sqrt(n) channel
in your data, you use PCA to reduce the dimensionality). This is because
there is  number_channel^2 values in the weight matrix so you need at
least one value in the data (on time frame) per value in the matrix. In
our experience, it is good to have number_channel^2<sqrt(n/20).
> [2] Once I reduce the dimensionality of the data with PCA to 'p'
> uncorrelated components, how many independent components 'c' do I choose to
> extract?  Should c=p?
>
Yes, necessarily using the algorithm runica(). There is an option to
runica that modify the ICA algorithm to obtain less components than
channels ('ncomps') but it should not be used (it has returned strange
results). If someone is interested in investigating this behavior, they
are very welcome to. For now, just use the 'pca', option.
> [3] Is there any difference between: [a] running ICA and extracting (say) 60
> components from the original raw data, and [b] first running PCA to reduce
> the raw data to the largest 60 principal components, and then running ICA
> and extracting 60 independent components from the pre-processed data?  If
> there is a difference, which is the more appropriate method?
>
There is no difference if you use the 'pca' option of the runica() function.
> If anyone can offer any insight into these questions, it would be greatly
> appreciated.  So far, I have just been picking arbitrary values for 'c' and
> 'p' (ie. trial and error) and hoping for things to work out.  I am really
> stumped about question [3] though... I don't know which method is better, or
> even if it makes a difference which I choose.
>