[Eeglablist] AMICA15 segmentation fault
Jason Palmer
japalmer29 at gmail.com
Wed Dec 16 13:44:52 PST 2015
For full rank data, the default is to use the inverse "symmetric square root" of the covariance matrix, which basically rotates back from the eigenspace into the data space, after scaling to unity along the eigendimensions. For full rank data, this will essentially give a starting point to ICA that is closer to the optimum than the eigenvector basis, which looks like checkerboard after the first few eigenvectors. When you are reducing rank, the "rotation back" to the data space isn't uniquely defined as in the full rank case (just the inverse of the eigenprojection). The least squares tries to match the channels with largest variance. This is usually much closer to the data space than just the eigenbasis itself. It will always reduce dimensions less than mineig, and more only if pcakeep is less than that. With "approximate sphering" it will use the symmetric sqrt for full rank data, and approximate the inverse rotation in a smaller space when dimensions are removed. It should really only affect the starting point and how long it takes to converge to the actual ICA optimized solution, but given that it takes a long time anyway, starting closer to the solution can save a lot of iterations/time (so do_approx_sphere is recommended). "Sphering" is how the symmetric sqrt cov matrix decorrelation is referred to, as opposed to simple projection onto the eigenvectors (to use the "principle components" as initial activations and eigenvectors as initial maps.
Hope that's somewhat clear.
Best,
Jason
-----Original Message-----
From: Norman Forschack [mailto:forschack at cbs.mpg.de]
Sent: Wednesday, December 16, 2015 1:06 PM
To: japalmer at ucsd.edu
Subject: Re: [Eeglablist] AMICA15 segmentation fault
Hi Jason,
great answer, thanks so much, many things became clear.
So pca option is as save as analyzing full rank data, given the least squares for eigensubspace rotation are reasonably small.
I wonder whether there are cases where the least squares are not sufficiently small. What is your experience with that?
cheers
Norman
----- On Dec 16, 2015, at 9:12 PM, Jason Palmer japalmer29 at gmail.com wrote:
> Hi Norman,
>
> The negative eigenvalues are essentially due to machine precision
> error in the positive semi-definiteness of the covariance matrix.
> Amica removes all negative eigendimensions, and all dimensions with
> eigenvalue < mineig (see Sphering ...). The number of dimensions kept
> is printed as "Num eigs kept" in the output, which in your case is 57
> out 61, so including the negatives and possibly some small dimensions
> to eliminate 4. That eigensubspace is then rotated again to
> approximately equal the data in least squares, preserving
> decorrelation (when you have do_approx_sphere checked, otherwise just the eigenvector space projection is used).
>
> So the number of dimensions given to amica is the minimum of (num eigs
> > mineig) and pcakeep. The minimum eigenvalues are displayed whether
> they are kept or not.
>
> The run time is basically proportional to the amount of data you have.
> So with approximately the same number of channels and 3 times as much
> data, it will take about 3 times as long. The slowness is why I try to
> optimize the blocksize for iteration time, since this can save up to a
> factor of 2 in run time, but with large data + # cores, you may have
> to worry about stack overflow segfault since there is no prescribed
> stack limit and the signal is not currently caught.
>
> Best,
> Jason
>
> -----Original Message-----
> From: Norman Forschack [mailto:forschack at cbs.mpg.de]
> Sent: Wednesday, December 16, 2015 11:43 AM
> To: japalmer at ucsd.edu
> Cc: eeglablist
> Subject: Re: [Eeglablist] AMICA15 segmentation fault
>
> Hi Jason,
>
> thanks a lot for your quick reply and yes you're right, reducing block
> size and steps solves the issue as well as doing amica on a subset of
> the data, at least for the dataset I tested it on.
> Unfortunately, computation time now increased by a factor of three
> (with eight cores). I'll test a little further to find the optimal settings.
>
> When having pcakeep option enabled, I still get super small but
> negative eigenvalues. Do you think this is a problem and would you
> rather recommend not using pca at all but removing channels to give full rank data to amica?
>
> cheers
> Norman
>
> ----- On Dec 16, 2015, at 4:51 PM, Jason Palmer japalmer29 at gmail.com wrote:
>
>> Hi Nathan,
>>
>> Thanks for your early beta testing ... I was getting local
>> feedback/debugging before announcing new version to wider list, but I
>> guess it makes more sense to release a sort of beta version to the
>> eeglablist. The new binaries (with sphering bug hopefully fixed) and
>> test version of eeglab plugin is at:
>> http://sccn.ucsd.edu/~jason/amica_web.html
>> and there is a help link on that page for more information on
>> arguments in the gui. So, anyone is welcome to try amica15, and the
>> plugin, and help test it.
>>
>> I think what you are actually getting is a stack overflow because it
>> is attempting to use too large a matrix block size with a large
>> number of samples.
>> I think do_opt_block is enabled by default, and it tries 256:256:1024
>> to find the fastest blocksize. You can change this with the Block sizes, etc.
>> button.
>> In the gui, or the blk_min, blk_step, blk_max keywords.
>>
>> Probably if you set smaller values, like blk_min = 64, blk_step = 64,
>> blk_max = 256, it will get through the tests without segmentation
>> fault. I'm still working on catching this error in the code.
>>
>> So I don't think your issue has to do with sphering--it is just
>> printing the eigenvalue info twice (which I will remove) but I don't
>> think anything is going wrong.
>>
>> For anyone who would like to look at and/or compile the source code,
>> you can get it at:
>> https://github.com/japalmer29/amica
>> with a readme on how to compile it using a demo version of the Intel
>> Fortran compiler on various platforms.
>>
>> Thanks, and happy winter!
>> Jason
>>
>>
>> -----Original Message-----
>> From: eeglablist-bounces at sccn.ucsd.edu
>> [mailto:eeglablist-bounces at sccn.ucsd.edu]
>> On Behalf Of Norman Forschack
>> Sent: Monday, December 14, 2015 4:40 PM
>> To: eeglablist
>> Subject: [Eeglablist] AMICA15 segmentation fault
>>
>> Dear eeglabbers,
>>
>> first of all, Jason, thanks a lot for providing the new version of
>> amica for us, your input is always highly appreciated!
>>
>> I tested the new version with the dataset from the homepage and it
>> works neatly (after adjusting the call to the Ubuntu binary to 'amica15ub').
>>
>> However, when running it on another EEG dataset with 62 channels, I
>> have some trouble with negative but very small eigenvalues, which
>> might be related to this post
>> http://sccn.ucsd.edu/pipermail/eeglablist/2015/010104.html
>> This causes amica to exit after the message 'segmentation fault'. The
>> data is comprised of merged ten minutes blocks and is minimally preprocessed:
>> 1. PREP pipeline robust average reference 2. resampling to 250Hz 3.
>> 1Hz high pass 4. thresholding data points > 180 microV
>>
>> The issue occurs when pcakeep option is enabled (see below for
>> command line
>> output)
>>
>> I tried without pcakeep but with removed channels, mainly EOG and
>> PREP interpolated channels, in order to input full rank data. This
>> yields positive eigenvalues, but again, the algorithm exits with 'segmentation fault'.
>>
>> I have no idea how to overcome this, but I'm wondering why
>> eigenvalues are calculated twice? Are they supposed to differ at the
>> second instance? In my case they don't. (see command line output
>> below) Or could there probably be something wrong with my data
>> preprocessing, am I missing something?
>>
>> Any hint how to solve the problem, is highly anticipated.
>>
>> Best
>> Norman
>>
>>
>> run with data dimensions reduced:
>>
>> Writing data file: /.../tmpdata96489.fdt
>> mkdir: cannot create directory '/.../amicaouttmp/': File exists
>> 1 processor name = kambodscha
>> 1 host_num = 2033974327
>> This is MPI process 1 of 1 ; I am process 1 of
>> 1 on node: kambodscha
>> 1 : node root process 1 of 1
>> Processing arguments ...
>> num_files = 1
>> FILES:
>> /.../tmpdata96489.fdt
>> num_dir_files = 1
>> initial matrix block_size = 128
>> do_opt_block = 1
>> blk_min = 256
>> blk_step = 256
>> blk_max = 1024
>> number of models = 1
>> max_thrds = 8
>> use_min_dll = 1
>> min dll = 1.000000000000000E-009
>> use_grad_norm = 1
>> min grad norm = 1.000000000000000E-007
>> number of density mixture components = 3
>> pdf type = 0
>> max_iter = 2000
>> num_samples = 1
>> data_dim = 61
>> field_dim = 1115606
>> do_history = 0
>> histstep = 10
>> share_comps = 0
>> share_start = 100
>> comp_thresh = 0.990000000000000
>> share_int = 100
>> initial lrate = 5.000000000000000E-002
>> minimum lrate = 1.000000000000000E-008
>> minimum data covariance eigenvalue = 1.000000000000000E-012
>> lrate factor = 0.500000000000000
>> initial rholrate = 5.000000000000000E-002
>> rho0 = 1.50000000000000
>> min rho = 1.00000000000000
>> max rho = 2.00000000000000
>> rho lrate factor = 0.500000000000000
>> kurt_start = 3
>> num kurt = 5
>> kurt interval = 1
>> do_newton = 1
>> newt_start = 50
>> newt_ramp = 10
>> initial newton lrate = 1.00000000000000
>> do_reject = 1
>> num reject = 3
>> reject sigma = 3.00000000000000
>> reject start = 3
>> reject interval = 3
>> write step = 20
>> write_nd = 0
>> write_LLt = 1
>> dec window = 1
>> max_decs = 3
>> fix_init = 0
>> update_A = 1
>> update_c = 1
>> update_gm = 1
>> update_alpha = 1
>> update_mu = 1
>> update_beta = 1
>> invsigmax = 100.000000000000
>> invsigmin = 0.000000000000000E+000
>> do_rho = 1
>> load_rej = 0
>> load_c = 0
>> load_gm = 0
>> load_alpha = 0
>> load_mu = 0
>> load_beta = 0
>> load_rho = 0
>> load_comp_list = 0
>> do_mean = 1
>> do_sphere = 1
>> pcakeep = 57
>> pcadb = 30.0000000000000
>> byte_size = 4
>> doscaling = 1
>> scalestep = 1
>> mkdir: cannot create directory '/.../amicaouttmp/': File exists
>> output directory = /.../amicaouttmp/
>> 1 : setting num_thrds to 8 ...
>> 1 : using 8 threads.
>> 1 : node_thrds = 8
>> bytes in real = 1
>> 1 : REAL nbyte = 1
>> getting segment list ...
>> blocks in sample = 1115606
>> total blocks = 1115606
>> node blocks = 1115606
>> node 1 start: file 1 sample 1 index
>> 1
>> node 1 stop : file 1 sample 1 index
>> 1115606
>> 1 : data = -4.96359205245972 -14.1346406936646
>> getting the mean ...
>> mean = -0.314507310018316 -0.304240499189297
>> -5.144710467579993E-002
>> subtracting the mean ...
>> getting the covariance matrix ...
>> cnt = 1115606
>> doing eig nx = 61 lwork = 37210
>> minimum eigenvalues = -5.406684296262385E-013
>> -1.285030795456605E-014
>> 2.538997580317705E-013
>> maximum eigenvalues = 4158.38269386037 1002.93581119057
>> 747.275462874866
>> num eigs kept = 57
>> getting the sphering matrix ...
>> minimum eigenvalues = -5.406684296262385E-013
>> -1.285030795456605E-014
>> 2.538997580317705E-013
>> maximum eigenvalues = 4158.38269386037 1002.93581119057
>> 747.275462874866
>> num eigs kept = 57
>> sphering the data ...
>> numeigs = 57
>> 1 : Allocating variables ...
>> 1 : Initializing variables ...
>> 1 : Determining optimal block size ....
>> /home/.../amica15ub /.../amicaouttmp/input.param: Segmentation fault
>> No gm present, setting num_models to 1 No W present, exiting
>>
>>
>> run with interpolated channels removed:
>>
>> Writing data file: /.../tmpdata95717.fdt
>> mkdir: cannot create directory '/.../amicaouttmp/': File exists
>> 1 processor name = kambodscha
>> 1 host_num = 2033974327
>> This is MPI process 1 of 1 ; I am process 1 of
>> 1 on node: kambodscha
>> 1 : node root process 1 of 1
>> Processing arguments ...
>> num_files = 1
>> FILES:
>> /.../tmpdata95717.fdt
>> num_dir_files = 1
>> initial matrix block_size = 128
>> do_opt_block = 1
>> blk_min = 256
>> blk_step = 256
>> blk_max = 1024
>> number of models = 1
>> max_thrds = 8
>> use_min_dll = 1
>> min dll = 1.000000000000000E-009
>> use_grad_norm = 1
>> min grad norm = 1.000000000000000E-007
>> number of density mixture components = 3
>> pdf type = 0
>> max_iter = 2000
>> num_samples = 1
>> data_dim = 57
>> field_dim = 1116754
>> do_history = 0
>> histstep = 10
>> share_comps = 0
>> share_start = 100
>> comp_thresh = 0.990000000000000
>> share_int = 100
>> initial lrate = 5.000000000000000E-002
>> minimum lrate = 1.000000000000000E-008
>> minimum data covariance eigenvalue = 1.000000000000000E-012
>> lrate factor = 0.500000000000000
>> initial rholrate = 5.000000000000000E-002
>> rho0 = 1.50000000000000
>> min rho = 1.00000000000000
>> max rho = 2.00000000000000
>> rho lrate factor = 0.500000000000000
>> kurt_start = 3
>> num kurt = 5
>> kurt interval = 1
>> do_newton = 1
>> newt_start = 50
>> newt_ramp = 10
>> initial newton lrate = 1.00000000000000
>> do_reject = 1
>> num reject = 3
>> reject sigma = 3.00000000000000
>> reject start = 3
>> reject interval = 3
>> write step = 20
>> write_nd = 0
>> write_LLt = 1
>> dec window = 1
>> max_decs = 3
>> fix_init = 0
>> update_A = 1
>> update_c = 1
>> update_gm = 1
>> update_alpha = 1
>> update_mu = 1
>> update_beta = 1
>> invsigmax = 100.000000000000
>> invsigmin = 0.000000000000000E+000
>> do_rho = 1
>> load_rej = 0
>> load_c = 0
>> load_gm = 0
>> load_alpha = 0
>> load_mu = 0
>> load_beta = 0
>> load_rho = 0
>> load_comp_list = 0
>> do_mean = 1
>> do_sphere = 1
>> pcakeep = 57
>> pcadb = 30.0000000000000
>> byte_size = 4
>> doscaling = 1
>> scalestep = 1
>> mkdir: cannot create directory '/.../amicaouttmp/': File exists
>> output directory = /.../amicaouttmp/
>> 1 : setting num_thrds to 8 ...
>> 1 : using 8 threads.
>> 1 : node_thrds = 8
>> bytes in real = 1
>> 1 : REAL nbyte = 1
>> getting segment list ...
>> blocks in sample = 1116754
>> total blocks = 1116754
>> node blocks = 1116754
>> node 1 start: file 1 sample 1 index
>> 1
>> node 1 stop : file 1 sample 1 index
>> 1116754
>> 1 : data = -4.96453666687012 -14.1324272155762
>> getting the mean ...
>> mean = -0.286942247592682 -0.278907745979486
>> -4.659303225201700E-002
>> subtracting the mean ...
>> getting the covariance matrix ...
>> cnt = 1116754
>> doing eig nx = 57 lwork = 32490
>> minimum eigenvalues = 0.328194400980202 0.540830260003019
>> 0.590640211589886
>> maximum eigenvalues = 3591.26449346892 979.044648222552
>> 723.438382000281
>> num eigs kept = 57
>> getting the sphering matrix ...
>> minimum eigenvalues = 0.328194400980202 0.540830260003019
>> 0.590640211589886
>> maximum eigenvalues = 3591.26449346892 979.044648222552
>> 723.438382000281
>> num eigs kept = 57
>> sphering the data ...
>> numeigs = 57
>> 1 : Allocating variables ...
>> 1 : Initializing variables ...
>> 1 : Determining optimal block size ....
>> /home/.../amica15ub /.../amicaouttmp/input.param: Segmentation fault
>> No gm present, setting num_models to 1 No W present, exiting
>>
>>
>> Sometimes I also get a more verbose feedback, but I cannot reproduce
>> this
>> consistently:
>>
>> forrtl: severe (174): SIGSEGV, segmentation fault occurred
>> forrtl: severe (174): SIGSEGV, segmentation fault occurred
>> /.../amica15ub /.../amicaouttmp/input.param: Signal 46
>>
>>
>>
>> ___________________________________________________________
>> Norman Forschack, Dipl.-Psych.
>> Max-Planck Institute for Human Cognitive and Brain Sciences
>> Stephanstraße 1a
>> 04103 Leipzig
>>
>> mail: forschack at cbs.mpg.de
>> phone: +49341 9940171
>> web: http://www.cbs.mpg.de/~forschack
>> _______________________________________________
>> Eeglablist page: http://sccn.ucsd.edu/eeglab/eeglabmail.html
>> To unsubscribe, send an empty email to
>> eeglablist-unsubscribe at sccn.ucsd.edu
>> For digest mode, send an email with the subject "set digest mime" to
> > eeglablist-request at sccn.ucsd.edu
More information about the eeglablist
mailing list