[Eeglablist] AMICA15 segmentation fault

Norman Forschack forschack at cbs.mpg.de
Wed Dec 16 14:09:39 PST 2015


Got you, that's very helpful. I think you could assemble all your excellent communication from eeglablist archive and, with a little formatting, come up with a very nice paper.
I wish you well and merry xmas!
Norman

----- On Dec 16, 2015, at 10:44 PM, Jason Palmer japalmer29 at gmail.com wrote:

> For full rank data, the default is to use the inverse "symmetric square root" of
> the covariance matrix, which basically rotates back from the eigenspace into
> the data space, after scaling to unity along the eigendimensions. For full rank
> data, this will essentially give a starting point to ICA that is closer to the
> optimum than the eigenvector basis, which looks like checkerboard after the
> first few eigenvectors. When you are reducing rank, the "rotation back" to the
> data space isn't uniquely defined as in the full rank case (just the inverse of
> the eigenprojection). The least squares tries to match the channels with
> largest variance. This is usually much closer to the data space than just the
> eigenbasis itself. It will always reduce dimensions less than mineig, and more
> only if pcakeep is less than that. With "approximate sphering" it will use the
> symmetric sqrt for full rank data, and approximate the inverse rotation in a
> smaller space when dimensions are removed. It should really only affect the
> starting point and how long it takes to converge to the actual ICA optimized
> solution, but given that it takes a long time anyway, starting closer to the
> solution can save a lot of iterations/time (so do_approx_sphere is
> recommended). "Sphering" is how the symmetric sqrt cov matrix decorrelation is
> referred to, as opposed to simple projection onto the eigenvectors (to use the
> "principle components" as initial activations and eigenvectors as initial maps.
> 
> Hope that's somewhat clear.
> Best,
> Jason
> 
> -----Original Message-----
> From: Norman Forschack [mailto:forschack at cbs.mpg.de]
> Sent: Wednesday, December 16, 2015 1:06 PM
> To: japalmer at ucsd.edu
> Subject: Re: [Eeglablist] AMICA15 segmentation fault
> 
> Hi Jason,
> 
> great answer, thanks so much, many things became clear.
> So pca option is as save as analyzing full rank data, given the least squares
> for eigensubspace rotation are reasonably small.
> I wonder whether there are cases where the least squares are not sufficiently
> small. What is your experience with that?
> 
> cheers
> Norman
> 
> ----- On Dec 16, 2015, at 9:12 PM, Jason Palmer japalmer29 at gmail.com wrote:
> 
>> Hi Norman,
>> 
>> The negative eigenvalues are essentially due to machine precision
>> error in the positive semi-definiteness of the covariance matrix.
>> Amica removes all negative eigendimensions, and all dimensions with
>> eigenvalue < mineig (see Sphering ...). The number of dimensions kept
>> is printed as "Num eigs kept" in the output, which in your case is 57
>> out 61, so including the negatives and possibly some small dimensions
>> to eliminate 4. That eigensubspace is then rotated again to
>> approximately equal the data in least squares, preserving
>> decorrelation (when you have do_approx_sphere checked, otherwise just the
>> eigenvector space projection is used).
>> 
>> So the number of dimensions given to amica is the minimum of (num eigs
>> > mineig) and pcakeep. The minimum eigenvalues are displayed whether
>> they are kept or not.
>> 
>> The run time is basically proportional to the amount of data you have.
>> So with approximately the same number of channels and 3 times as much
>> data, it will take about 3 times as long. The slowness is why I try to
>> optimize the blocksize for iteration time, since this can save up to a
>> factor of 2 in run time, but with large data + # cores, you may have
>> to worry about stack overflow segfault since there is no prescribed
>> stack limit and the signal is not currently caught.
>> 
>> Best,
>> Jason
>> 
>> -----Original Message-----
>> From: Norman Forschack [mailto:forschack at cbs.mpg.de]
>> Sent: Wednesday, December 16, 2015 11:43 AM
>> To: japalmer at ucsd.edu
>> Cc: eeglablist
>> Subject: Re: [Eeglablist] AMICA15 segmentation fault
>> 
>> Hi Jason,
>> 
>> thanks a lot for your quick reply and yes you're right, reducing block
>> size and steps solves the issue as well as doing amica on a subset of
>> the data, at least for the dataset I tested it on.
>> Unfortunately, computation time now increased by a factor of three
>> (with eight cores). I'll test a little further to find the optimal settings.
>> 
>> When having pcakeep option enabled, I still get super small but
>> negative eigenvalues. Do you think this is a problem and would you
>> rather recommend not using pca at all but removing channels to give full rank
>> data to amica?
>> 
>> cheers
>> Norman
>> 
>> ----- On Dec 16, 2015, at 4:51 PM, Jason Palmer japalmer29 at gmail.com wrote:
>> 
>>> Hi Nathan,
>>> 
>>> Thanks for your early beta testing ... I was getting local
>>> feedback/debugging before announcing new version to wider list, but I
>>> guess it makes more sense to release a sort of beta version to the
>>> eeglablist. The new binaries (with sphering bug hopefully fixed) and
>>> test version of eeglab plugin is at:
>>> http://sccn.ucsd.edu/~jason/amica_web.html
>>> and there is a help link on that page for more information on
>>> arguments in the gui. So, anyone is welcome to try amica15, and the
>>> plugin, and help test it.
>>> 
>>> I think what you are actually getting is a stack overflow because it
>>> is attempting to use too large a matrix block size with a large
>>> number of samples.
>>> I think do_opt_block is enabled by default, and it tries 256:256:1024
>>> to find the fastest blocksize. You can change this with the Block sizes, etc.
>>> button.
>>> In the gui, or the blk_min, blk_step, blk_max keywords.
>>> 
>>> Probably if you set smaller values, like blk_min = 64, blk_step = 64,
>>> blk_max = 256, it will get through the tests without segmentation
>>> fault. I'm still working on catching this error in the code.
>>> 
>>> So I don't think your issue has to do with sphering--it is just
>>> printing the eigenvalue info twice (which I will remove) but I don't
>>> think anything is going wrong.
>>> 
>>> For anyone who would like to look at and/or compile the source code,
>>> you can get it at:
>>> https://github.com/japalmer29/amica
>>> with a readme on how to compile it using a demo version of the Intel
>>> Fortran compiler on various platforms.
>>> 
>>> Thanks, and happy winter!
>>> Jason
>>> 
>>> 
>>> -----Original Message-----
>>> From: eeglablist-bounces at sccn.ucsd.edu
>>> [mailto:eeglablist-bounces at sccn.ucsd.edu]
>>> On Behalf Of Norman Forschack
>>> Sent: Monday, December 14, 2015 4:40 PM
>>> To: eeglablist
>>> Subject: [Eeglablist] AMICA15 segmentation fault
>>> 
>>> Dear eeglabbers,
>>> 
>>> first of all, Jason, thanks a lot for providing the new version of
>>> amica for us, your input is always highly appreciated!
>>> 
>>> I tested the new version with the dataset from the homepage and it
>>> works neatly (after adjusting the call to the Ubuntu binary to 'amica15ub').
>>> 
>>> However, when running it on another EEG dataset with 62 channels, I
>>> have some trouble with negative but very small eigenvalues, which
>>> might be related to this post
>>> http://sccn.ucsd.edu/pipermail/eeglablist/2015/010104.html
>>> This causes amica to exit after the message 'segmentation fault'. The
>>> data is comprised of merged ten minutes blocks and is minimally preprocessed:
>>> 1. PREP pipeline robust average reference 2. resampling to 250Hz 3.
>>> 1Hz high pass 4. thresholding data points > 180 microV
>>> 
>>> The issue occurs when pcakeep option is enabled (see below for
>>> command line
>>> output)
>>> 
>>> I tried without pcakeep but with removed channels, mainly EOG and
>>> PREP interpolated channels, in order to input full rank data. This
>>> yields positive eigenvalues, but again, the algorithm exits with 'segmentation
>>> fault'.
>>> 
>>> I have no idea how to overcome this, but I'm wondering why
>>> eigenvalues are calculated twice? Are they supposed to differ at the
>>> second instance? In my case they don't. (see command line output
>>> below) Or could there probably be something wrong with my data
>>> preprocessing, am I missing something?
>>> 
>>> Any hint how to solve the problem, is highly anticipated.
>>> 
>>> Best
>>> Norman
>>> 
>>> 
>>> run with data dimensions reduced:
>>> 
>>> Writing data file: /.../tmpdata96489.fdt
>>> mkdir: cannot create directory '/.../amicaouttmp/': File exists
>>>           1 processor name = kambodscha
>>>           1 host_num =   2033974327
>>> This is MPI process           1 of           1 ; I am process           1 of
>>>           1 on node: kambodscha
>>>           1  : node root process           1 of           1
>>> Processing arguments ...
>>> num_files =            1
>>> FILES:
>>> /.../tmpdata96489.fdt
>>> num_dir_files =            1
>>> initial matrix block_size =          128
>>> do_opt_block =            1
>>> blk_min =          256
>>> blk_step =          256
>>> blk_max =         1024
>>> number of models =            1
>>> max_thrds =            8
>>> use_min_dll =            1
>>> min dll =   1.000000000000000E-009
>>> use_grad_norm =            1
>>> min grad norm =   1.000000000000000E-007
>>> number of density mixture components =            3
>>> pdf type =            0
>>> max_iter =         2000
>>> num_samples =            1
>>> data_dim =           61
>>> field_dim =      1115606
>>> do_history =            0
>>> histstep =           10
>>> share_comps =            0
>>> share_start =          100
>>> comp_thresh =   0.990000000000000
>>> share_int =          100
>>> initial lrate =   5.000000000000000E-002
>>> minimum lrate =   1.000000000000000E-008
>>> minimum data covariance eigenvalue =   1.000000000000000E-012
>>> lrate factor =   0.500000000000000
>>> initial rholrate =   5.000000000000000E-002
>>> rho0 =    1.50000000000000
>>> min rho =    1.00000000000000
>>> max rho =    2.00000000000000
>>> rho lrate factor =   0.500000000000000
>>> kurt_start =            3
>>> num kurt =            5
>>> kurt interval =            1
>>> do_newton =            1
>>> newt_start =           50
>>> newt_ramp =           10
>>> initial newton lrate =    1.00000000000000
>>> do_reject =            1
>>> num reject =            3
>>> reject sigma =    3.00000000000000
>>> reject start =            3
>>> reject interval =            3
>>> write step =           20
>>> write_nd =            0
>>> write_LLt =            1
>>> dec window =            1
>>> max_decs =            3
>>> fix_init =            0
>>> update_A =            1
>>> update_c =            1
>>> update_gm =            1
>>> update_alpha =            1
>>> update_mu =            1
>>> update_beta =            1
>>> invsigmax =    100.000000000000
>>> invsigmin =   0.000000000000000E+000
>>> do_rho =            1
>>> load_rej =            0
>>> load_c =            0
>>> load_gm =            0
>>> load_alpha =            0
>>> load_mu =            0
>>> load_beta =            0
>>> load_rho =            0
>>> load_comp_list =            0
>>> do_mean =            1
>>> do_sphere =            1
>>> pcakeep =           57
>>> pcadb =    30.0000000000000
>>> byte_size =            4
>>> doscaling =            1
>>> scalestep =            1
>>> mkdir: cannot create directory '/.../amicaouttmp/': File exists
>>> output directory =  /.../amicaouttmp/
>>>           1 : setting num_thrds to            8  ...
>>>           1 : using           8 threads.
>>>           1 : node_thrds =            8
>>> bytes in real =            1
>>>           1 : REAL nbyte =            1
>>> getting segment list ...
>>> blocks in sample =      1115606
>>> total blocks =      1115606
>>> node blocks =      1115606
>>> node            1  start: file            1  sample            1  index
>>>           1
>>> node            1  stop : file            1  sample            1  index
>>>     1115606
>>>           1 : data =   -4.96359205245972       -14.1346406936646
>>> getting the mean ...
>>>  mean =  -0.314507310018316      -0.304240499189297
>>> -5.144710467579993E-002
>>> subtracting the mean ...
>>> getting the covariance matrix ...
>>> cnt =      1115606
>>> doing eig nx =           61  lwork =        37210
>>> minimum eigenvalues =  -5.406684296262385E-013
>>> -1.285030795456605E-014
>>>  2.538997580317705E-013
>>> maximum eigenvalues =    4158.38269386037        1002.93581119057
>>>   747.275462874866
>>> num eigs kept =           57
>>> getting the sphering matrix ...
>>> minimum eigenvalues =  -5.406684296262385E-013
>>> -1.285030795456605E-014
>>>  2.538997580317705E-013
>>> maximum eigenvalues =    4158.38269386037        1002.93581119057
>>>   747.275462874866
>>> num eigs kept =           57
>>> sphering the data ...
>>> numeigs =           57
>>>           1 : Allocating variables ...
>>>           1 : Initializing variables ...
>>>           1 : Determining optimal block size ....
>>> /home/.../amica15ub /.../amicaouttmp/input.param: Segmentation fault
>>> No gm present, setting num_models to 1 No W present, exiting
>>> 
>>> 
>>> run with interpolated channels removed:
>>> 
>>> Writing data file: /.../tmpdata95717.fdt
>>> mkdir: cannot create directory '/.../amicaouttmp/': File exists
>>>           1 processor name = kambodscha
>>>           1 host_num =   2033974327
>>> This is MPI process           1 of           1 ; I am process           1 of
>>>           1 on node: kambodscha
>>>           1  : node root process           1 of           1
>>> Processing arguments ...
>>> num_files =            1
>>> FILES:
>>> /.../tmpdata95717.fdt
>>> num_dir_files =            1
>>> initial matrix block_size =          128
>>> do_opt_block =            1
>>> blk_min =          256
>>> blk_step =          256
>>> blk_max =         1024
>>> number of models =            1
>>> max_thrds =            8
>>> use_min_dll =            1
>>> min dll =   1.000000000000000E-009
>>> use_grad_norm =            1
>>> min grad norm =   1.000000000000000E-007
>>> number of density mixture components =            3
>>> pdf type =            0
>>> max_iter =         2000
>>> num_samples =            1
>>> data_dim =           57
>>> field_dim =      1116754
>>> do_history =            0
>>> histstep =           10
>>> share_comps =            0
>>> share_start =          100
>>> comp_thresh =   0.990000000000000
>>> share_int =          100
>>> initial lrate =   5.000000000000000E-002
>>> minimum lrate =   1.000000000000000E-008
>>> minimum data covariance eigenvalue =   1.000000000000000E-012
>>> lrate factor =   0.500000000000000
>>> initial rholrate =   5.000000000000000E-002
>>> rho0 =    1.50000000000000
>>> min rho =    1.00000000000000
>>> max rho =    2.00000000000000
>>> rho lrate factor =   0.500000000000000
>>> kurt_start =            3
>>> num kurt =            5
>>> kurt interval =            1
>>> do_newton =            1
>>> newt_start =           50
>>> newt_ramp =           10
>>> initial newton lrate =    1.00000000000000
>>> do_reject =            1
>>> num reject =            3
>>> reject sigma =    3.00000000000000
>>> reject start =            3
>>> reject interval =            3
>>> write step =           20
>>> write_nd =            0
>>> write_LLt =            1
>>> dec window =            1
>>> max_decs =            3
>>> fix_init =            0
>>> update_A =            1
>>> update_c =            1
>>> update_gm =            1
>>> update_alpha =            1
>>> update_mu =            1
>>> update_beta =            1
>>> invsigmax =    100.000000000000
>>> invsigmin =   0.000000000000000E+000
>>> do_rho =            1
>>> load_rej =            0
>>> load_c =            0
>>> load_gm =            0
>>> load_alpha =            0
>>> load_mu =            0
>>> load_beta =            0
>>> load_rho =            0
>>> load_comp_list =            0
>>> do_mean =            1
>>> do_sphere =            1
>>> pcakeep =           57
>>> pcadb =    30.0000000000000
>>> byte_size =            4
>>> doscaling =            1
>>> scalestep =            1
>>> mkdir: cannot create directory '/.../amicaouttmp/': File exists
>>> output directory = /.../amicaouttmp/
>>>           1 : setting num_thrds to            8  ...
>>>           1 : using           8 threads.
>>>           1 : node_thrds =            8
>>> bytes in real =            1
>>>           1 : REAL nbyte =            1
>>> getting segment list ...
>>> blocks in sample =      1116754
>>> total blocks =      1116754
>>> node blocks =      1116754
>>> node            1  start: file            1  sample            1  index
>>>           1
>>> node            1  stop : file            1  sample            1  index
>>>     1116754
>>>           1 : data =   -4.96453666687012       -14.1324272155762
>>> getting the mean ...
>>>  mean =  -0.286942247592682      -0.278907745979486
>>> -4.659303225201700E-002
>>> subtracting the mean ...
>>> getting the covariance matrix ...
>>> cnt =      1116754
>>> doing eig nx =           57  lwork =        32490
>>> minimum eigenvalues =   0.328194400980202       0.540830260003019
>>>  0.590640211589886
>>> maximum eigenvalues =    3591.26449346892        979.044648222552
>>>   723.438382000281
>>> num eigs kept =           57
>>> getting the sphering matrix ...
>>> minimum eigenvalues =   0.328194400980202       0.540830260003019
>>>  0.590640211589886
>>> maximum eigenvalues =    3591.26449346892        979.044648222552
>>>   723.438382000281
>>> num eigs kept =           57
>>> sphering the data ...
>>> numeigs =           57
>>>           1 : Allocating variables ...
>>>           1 : Initializing variables ...
>>>           1 : Determining optimal block size ....
>>> /home/.../amica15ub /.../amicaouttmp/input.param: Segmentation fault
>>> No gm present, setting num_models to 1 No W present, exiting
>>> 
>>> 
>>> Sometimes I also get a more verbose feedback, but I cannot reproduce
>>> this
>>> consistently:
>>> 
>>> forrtl: severe (174): SIGSEGV, segmentation fault occurred
>>> forrtl: severe (174): SIGSEGV, segmentation fault occurred
>>> /.../amica15ub /.../amicaouttmp/input.param: Signal 46
>>> 
>>> 
>>> 
>>> ___________________________________________________________
>>> Norman Forschack, Dipl.-Psych.
>>> Max-Planck Institute for Human Cognitive and Brain Sciences
>>> Stephanstraße 1a
>>> 04103 Leipzig
>>> 
>>> mail: forschack at cbs.mpg.de
>>> phone: +49341 9940171
>>> web: http://www.cbs.mpg.de/~forschack
>>> _______________________________________________
>>> Eeglablist page: http://sccn.ucsd.edu/eeglab/eeglabmail.html
>>> To unsubscribe, send an empty email to
>>> eeglablist-unsubscribe at sccn.ucsd.edu
>>> For digest mode, send an email with the subject "set digest mime" to
> > > eeglablist-request at sccn.ucsd.edu



More information about the eeglablist mailing list