[Eeglablist] AMICA15 segmentation fault
Norman Forschack
forschack at cbs.mpg.de
Wed Dec 16 14:09:39 PST 2015
Got you, that's very helpful. I think you could assemble all your excellent communication from eeglablist archive and, with a little formatting, come up with a very nice paper.
I wish you well and merry xmas!
Norman
----- On Dec 16, 2015, at 10:44 PM, Jason Palmer japalmer29 at gmail.com wrote:
> For full rank data, the default is to use the inverse "symmetric square root" of
> the covariance matrix, which basically rotates back from the eigenspace into
> the data space, after scaling to unity along the eigendimensions. For full rank
> data, this will essentially give a starting point to ICA that is closer to the
> optimum than the eigenvector basis, which looks like checkerboard after the
> first few eigenvectors. When you are reducing rank, the "rotation back" to the
> data space isn't uniquely defined as in the full rank case (just the inverse of
> the eigenprojection). The least squares tries to match the channels with
> largest variance. This is usually much closer to the data space than just the
> eigenbasis itself. It will always reduce dimensions less than mineig, and more
> only if pcakeep is less than that. With "approximate sphering" it will use the
> symmetric sqrt for full rank data, and approximate the inverse rotation in a
> smaller space when dimensions are removed. It should really only affect the
> starting point and how long it takes to converge to the actual ICA optimized
> solution, but given that it takes a long time anyway, starting closer to the
> solution can save a lot of iterations/time (so do_approx_sphere is
> recommended). "Sphering" is how the symmetric sqrt cov matrix decorrelation is
> referred to, as opposed to simple projection onto the eigenvectors (to use the
> "principle components" as initial activations and eigenvectors as initial maps.
>
> Hope that's somewhat clear.
> Best,
> Jason
>
> -----Original Message-----
> From: Norman Forschack [mailto:forschack at cbs.mpg.de]
> Sent: Wednesday, December 16, 2015 1:06 PM
> To: japalmer at ucsd.edu
> Subject: Re: [Eeglablist] AMICA15 segmentation fault
>
> Hi Jason,
>
> great answer, thanks so much, many things became clear.
> So pca option is as save as analyzing full rank data, given the least squares
> for eigensubspace rotation are reasonably small.
> I wonder whether there are cases where the least squares are not sufficiently
> small. What is your experience with that?
>
> cheers
> Norman
>
> ----- On Dec 16, 2015, at 9:12 PM, Jason Palmer japalmer29 at gmail.com wrote:
>
>> Hi Norman,
>>
>> The negative eigenvalues are essentially due to machine precision
>> error in the positive semi-definiteness of the covariance matrix.
>> Amica removes all negative eigendimensions, and all dimensions with
>> eigenvalue < mineig (see Sphering ...). The number of dimensions kept
>> is printed as "Num eigs kept" in the output, which in your case is 57
>> out 61, so including the negatives and possibly some small dimensions
>> to eliminate 4. That eigensubspace is then rotated again to
>> approximately equal the data in least squares, preserving
>> decorrelation (when you have do_approx_sphere checked, otherwise just the
>> eigenvector space projection is used).
>>
>> So the number of dimensions given to amica is the minimum of (num eigs
>> > mineig) and pcakeep. The minimum eigenvalues are displayed whether
>> they are kept or not.
>>
>> The run time is basically proportional to the amount of data you have.
>> So with approximately the same number of channels and 3 times as much
>> data, it will take about 3 times as long. The slowness is why I try to
>> optimize the blocksize for iteration time, since this can save up to a
>> factor of 2 in run time, but with large data + # cores, you may have
>> to worry about stack overflow segfault since there is no prescribed
>> stack limit and the signal is not currently caught.
>>
>> Best,
>> Jason
>>
>> -----Original Message-----
>> From: Norman Forschack [mailto:forschack at cbs.mpg.de]
>> Sent: Wednesday, December 16, 2015 11:43 AM
>> To: japalmer at ucsd.edu
>> Cc: eeglablist
>> Subject: Re: [Eeglablist] AMICA15 segmentation fault
>>
>> Hi Jason,
>>
>> thanks a lot for your quick reply and yes you're right, reducing block
>> size and steps solves the issue as well as doing amica on a subset of
>> the data, at least for the dataset I tested it on.
>> Unfortunately, computation time now increased by a factor of three
>> (with eight cores). I'll test a little further to find the optimal settings.
>>
>> When having pcakeep option enabled, I still get super small but
>> negative eigenvalues. Do you think this is a problem and would you
>> rather recommend not using pca at all but removing channels to give full rank
>> data to amica?
>>
>> cheers
>> Norman
>>
>> ----- On Dec 16, 2015, at 4:51 PM, Jason Palmer japalmer29 at gmail.com wrote:
>>
>>> Hi Nathan,
>>>
>>> Thanks for your early beta testing ... I was getting local
>>> feedback/debugging before announcing new version to wider list, but I
>>> guess it makes more sense to release a sort of beta version to the
>>> eeglablist. The new binaries (with sphering bug hopefully fixed) and
>>> test version of eeglab plugin is at:
>>> http://sccn.ucsd.edu/~jason/amica_web.html
>>> and there is a help link on that page for more information on
>>> arguments in the gui. So, anyone is welcome to try amica15, and the
>>> plugin, and help test it.
>>>
>>> I think what you are actually getting is a stack overflow because it
>>> is attempting to use too large a matrix block size with a large
>>> number of samples.
>>> I think do_opt_block is enabled by default, and it tries 256:256:1024
>>> to find the fastest blocksize. You can change this with the Block sizes, etc.
>>> button.
>>> In the gui, or the blk_min, blk_step, blk_max keywords.
>>>
>>> Probably if you set smaller values, like blk_min = 64, blk_step = 64,
>>> blk_max = 256, it will get through the tests without segmentation
>>> fault. I'm still working on catching this error in the code.
>>>
>>> So I don't think your issue has to do with sphering--it is just
>>> printing the eigenvalue info twice (which I will remove) but I don't
>>> think anything is going wrong.
>>>
>>> For anyone who would like to look at and/or compile the source code,
>>> you can get it at:
>>> https://github.com/japalmer29/amica
>>> with a readme on how to compile it using a demo version of the Intel
>>> Fortran compiler on various platforms.
>>>
>>> Thanks, and happy winter!
>>> Jason
>>>
>>>
>>> -----Original Message-----
>>> From: eeglablist-bounces at sccn.ucsd.edu
>>> [mailto:eeglablist-bounces at sccn.ucsd.edu]
>>> On Behalf Of Norman Forschack
>>> Sent: Monday, December 14, 2015 4:40 PM
>>> To: eeglablist
>>> Subject: [Eeglablist] AMICA15 segmentation fault
>>>
>>> Dear eeglabbers,
>>>
>>> first of all, Jason, thanks a lot for providing the new version of
>>> amica for us, your input is always highly appreciated!
>>>
>>> I tested the new version with the dataset from the homepage and it
>>> works neatly (after adjusting the call to the Ubuntu binary to 'amica15ub').
>>>
>>> However, when running it on another EEG dataset with 62 channels, I
>>> have some trouble with negative but very small eigenvalues, which
>>> might be related to this post
>>> http://sccn.ucsd.edu/pipermail/eeglablist/2015/010104.html
>>> This causes amica to exit after the message 'segmentation fault'. The
>>> data is comprised of merged ten minutes blocks and is minimally preprocessed:
>>> 1. PREP pipeline robust average reference 2. resampling to 250Hz 3.
>>> 1Hz high pass 4. thresholding data points > 180 microV
>>>
>>> The issue occurs when pcakeep option is enabled (see below for
>>> command line
>>> output)
>>>
>>> I tried without pcakeep but with removed channels, mainly EOG and
>>> PREP interpolated channels, in order to input full rank data. This
>>> yields positive eigenvalues, but again, the algorithm exits with 'segmentation
>>> fault'.
>>>
>>> I have no idea how to overcome this, but I'm wondering why
>>> eigenvalues are calculated twice? Are they supposed to differ at the
>>> second instance? In my case they don't. (see command line output
>>> below) Or could there probably be something wrong with my data
>>> preprocessing, am I missing something?
>>>
>>> Any hint how to solve the problem, is highly anticipated.
>>>
>>> Best
>>> Norman
>>>
>>>
>>> run with data dimensions reduced:
>>>
>>> Writing data file: /.../tmpdata96489.fdt
>>> mkdir: cannot create directory '/.../amicaouttmp/': File exists
>>> 1 processor name = kambodscha
>>> 1 host_num = 2033974327
>>> This is MPI process 1 of 1 ; I am process 1 of
>>> 1 on node: kambodscha
>>> 1 : node root process 1 of 1
>>> Processing arguments ...
>>> num_files = 1
>>> FILES:
>>> /.../tmpdata96489.fdt
>>> num_dir_files = 1
>>> initial matrix block_size = 128
>>> do_opt_block = 1
>>> blk_min = 256
>>> blk_step = 256
>>> blk_max = 1024
>>> number of models = 1
>>> max_thrds = 8
>>> use_min_dll = 1
>>> min dll = 1.000000000000000E-009
>>> use_grad_norm = 1
>>> min grad norm = 1.000000000000000E-007
>>> number of density mixture components = 3
>>> pdf type = 0
>>> max_iter = 2000
>>> num_samples = 1
>>> data_dim = 61
>>> field_dim = 1115606
>>> do_history = 0
>>> histstep = 10
>>> share_comps = 0
>>> share_start = 100
>>> comp_thresh = 0.990000000000000
>>> share_int = 100
>>> initial lrate = 5.000000000000000E-002
>>> minimum lrate = 1.000000000000000E-008
>>> minimum data covariance eigenvalue = 1.000000000000000E-012
>>> lrate factor = 0.500000000000000
>>> initial rholrate = 5.000000000000000E-002
>>> rho0 = 1.50000000000000
>>> min rho = 1.00000000000000
>>> max rho = 2.00000000000000
>>> rho lrate factor = 0.500000000000000
>>> kurt_start = 3
>>> num kurt = 5
>>> kurt interval = 1
>>> do_newton = 1
>>> newt_start = 50
>>> newt_ramp = 10
>>> initial newton lrate = 1.00000000000000
>>> do_reject = 1
>>> num reject = 3
>>> reject sigma = 3.00000000000000
>>> reject start = 3
>>> reject interval = 3
>>> write step = 20
>>> write_nd = 0
>>> write_LLt = 1
>>> dec window = 1
>>> max_decs = 3
>>> fix_init = 0
>>> update_A = 1
>>> update_c = 1
>>> update_gm = 1
>>> update_alpha = 1
>>> update_mu = 1
>>> update_beta = 1
>>> invsigmax = 100.000000000000
>>> invsigmin = 0.000000000000000E+000
>>> do_rho = 1
>>> load_rej = 0
>>> load_c = 0
>>> load_gm = 0
>>> load_alpha = 0
>>> load_mu = 0
>>> load_beta = 0
>>> load_rho = 0
>>> load_comp_list = 0
>>> do_mean = 1
>>> do_sphere = 1
>>> pcakeep = 57
>>> pcadb = 30.0000000000000
>>> byte_size = 4
>>> doscaling = 1
>>> scalestep = 1
>>> mkdir: cannot create directory '/.../amicaouttmp/': File exists
>>> output directory = /.../amicaouttmp/
>>> 1 : setting num_thrds to 8 ...
>>> 1 : using 8 threads.
>>> 1 : node_thrds = 8
>>> bytes in real = 1
>>> 1 : REAL nbyte = 1
>>> getting segment list ...
>>> blocks in sample = 1115606
>>> total blocks = 1115606
>>> node blocks = 1115606
>>> node 1 start: file 1 sample 1 index
>>> 1
>>> node 1 stop : file 1 sample 1 index
>>> 1115606
>>> 1 : data = -4.96359205245972 -14.1346406936646
>>> getting the mean ...
>>> mean = -0.314507310018316 -0.304240499189297
>>> -5.144710467579993E-002
>>> subtracting the mean ...
>>> getting the covariance matrix ...
>>> cnt = 1115606
>>> doing eig nx = 61 lwork = 37210
>>> minimum eigenvalues = -5.406684296262385E-013
>>> -1.285030795456605E-014
>>> 2.538997580317705E-013
>>> maximum eigenvalues = 4158.38269386037 1002.93581119057
>>> 747.275462874866
>>> num eigs kept = 57
>>> getting the sphering matrix ...
>>> minimum eigenvalues = -5.406684296262385E-013
>>> -1.285030795456605E-014
>>> 2.538997580317705E-013
>>> maximum eigenvalues = 4158.38269386037 1002.93581119057
>>> 747.275462874866
>>> num eigs kept = 57
>>> sphering the data ...
>>> numeigs = 57
>>> 1 : Allocating variables ...
>>> 1 : Initializing variables ...
>>> 1 : Determining optimal block size ....
>>> /home/.../amica15ub /.../amicaouttmp/input.param: Segmentation fault
>>> No gm present, setting num_models to 1 No W present, exiting
>>>
>>>
>>> run with interpolated channels removed:
>>>
>>> Writing data file: /.../tmpdata95717.fdt
>>> mkdir: cannot create directory '/.../amicaouttmp/': File exists
>>> 1 processor name = kambodscha
>>> 1 host_num = 2033974327
>>> This is MPI process 1 of 1 ; I am process 1 of
>>> 1 on node: kambodscha
>>> 1 : node root process 1 of 1
>>> Processing arguments ...
>>> num_files = 1
>>> FILES:
>>> /.../tmpdata95717.fdt
>>> num_dir_files = 1
>>> initial matrix block_size = 128
>>> do_opt_block = 1
>>> blk_min = 256
>>> blk_step = 256
>>> blk_max = 1024
>>> number of models = 1
>>> max_thrds = 8
>>> use_min_dll = 1
>>> min dll = 1.000000000000000E-009
>>> use_grad_norm = 1
>>> min grad norm = 1.000000000000000E-007
>>> number of density mixture components = 3
>>> pdf type = 0
>>> max_iter = 2000
>>> num_samples = 1
>>> data_dim = 57
>>> field_dim = 1116754
>>> do_history = 0
>>> histstep = 10
>>> share_comps = 0
>>> share_start = 100
>>> comp_thresh = 0.990000000000000
>>> share_int = 100
>>> initial lrate = 5.000000000000000E-002
>>> minimum lrate = 1.000000000000000E-008
>>> minimum data covariance eigenvalue = 1.000000000000000E-012
>>> lrate factor = 0.500000000000000
>>> initial rholrate = 5.000000000000000E-002
>>> rho0 = 1.50000000000000
>>> min rho = 1.00000000000000
>>> max rho = 2.00000000000000
>>> rho lrate factor = 0.500000000000000
>>> kurt_start = 3
>>> num kurt = 5
>>> kurt interval = 1
>>> do_newton = 1
>>> newt_start = 50
>>> newt_ramp = 10
>>> initial newton lrate = 1.00000000000000
>>> do_reject = 1
>>> num reject = 3
>>> reject sigma = 3.00000000000000
>>> reject start = 3
>>> reject interval = 3
>>> write step = 20
>>> write_nd = 0
>>> write_LLt = 1
>>> dec window = 1
>>> max_decs = 3
>>> fix_init = 0
>>> update_A = 1
>>> update_c = 1
>>> update_gm = 1
>>> update_alpha = 1
>>> update_mu = 1
>>> update_beta = 1
>>> invsigmax = 100.000000000000
>>> invsigmin = 0.000000000000000E+000
>>> do_rho = 1
>>> load_rej = 0
>>> load_c = 0
>>> load_gm = 0
>>> load_alpha = 0
>>> load_mu = 0
>>> load_beta = 0
>>> load_rho = 0
>>> load_comp_list = 0
>>> do_mean = 1
>>> do_sphere = 1
>>> pcakeep = 57
>>> pcadb = 30.0000000000000
>>> byte_size = 4
>>> doscaling = 1
>>> scalestep = 1
>>> mkdir: cannot create directory '/.../amicaouttmp/': File exists
>>> output directory = /.../amicaouttmp/
>>> 1 : setting num_thrds to 8 ...
>>> 1 : using 8 threads.
>>> 1 : node_thrds = 8
>>> bytes in real = 1
>>> 1 : REAL nbyte = 1
>>> getting segment list ...
>>> blocks in sample = 1116754
>>> total blocks = 1116754
>>> node blocks = 1116754
>>> node 1 start: file 1 sample 1 index
>>> 1
>>> node 1 stop : file 1 sample 1 index
>>> 1116754
>>> 1 : data = -4.96453666687012 -14.1324272155762
>>> getting the mean ...
>>> mean = -0.286942247592682 -0.278907745979486
>>> -4.659303225201700E-002
>>> subtracting the mean ...
>>> getting the covariance matrix ...
>>> cnt = 1116754
>>> doing eig nx = 57 lwork = 32490
>>> minimum eigenvalues = 0.328194400980202 0.540830260003019
>>> 0.590640211589886
>>> maximum eigenvalues = 3591.26449346892 979.044648222552
>>> 723.438382000281
>>> num eigs kept = 57
>>> getting the sphering matrix ...
>>> minimum eigenvalues = 0.328194400980202 0.540830260003019
>>> 0.590640211589886
>>> maximum eigenvalues = 3591.26449346892 979.044648222552
>>> 723.438382000281
>>> num eigs kept = 57
>>> sphering the data ...
>>> numeigs = 57
>>> 1 : Allocating variables ...
>>> 1 : Initializing variables ...
>>> 1 : Determining optimal block size ....
>>> /home/.../amica15ub /.../amicaouttmp/input.param: Segmentation fault
>>> No gm present, setting num_models to 1 No W present, exiting
>>>
>>>
>>> Sometimes I also get a more verbose feedback, but I cannot reproduce
>>> this
>>> consistently:
>>>
>>> forrtl: severe (174): SIGSEGV, segmentation fault occurred
>>> forrtl: severe (174): SIGSEGV, segmentation fault occurred
>>> /.../amica15ub /.../amicaouttmp/input.param: Signal 46
>>>
>>>
>>>
>>> ___________________________________________________________
>>> Norman Forschack, Dipl.-Psych.
>>> Max-Planck Institute for Human Cognitive and Brain Sciences
>>> Stephanstraße 1a
>>> 04103 Leipzig
>>>
>>> mail: forschack at cbs.mpg.de
>>> phone: +49341 9940171
>>> web: http://www.cbs.mpg.de/~forschack
>>> _______________________________________________
>>> Eeglablist page: http://sccn.ucsd.edu/eeglab/eeglabmail.html
>>> To unsubscribe, send an empty email to
>>> eeglablist-unsubscribe at sccn.ucsd.edu
>>> For digest mode, send an email with the subject "set digest mime" to
> > > eeglablist-request at sccn.ucsd.edu
More information about the eeglablist
mailing list