[Eeglablist] ICA running very slowly

Arnaud Delorme arno at ucsd.edu
Sat Feb 4 07:47:32 PST 2017


Yes, this is right which is why the data is automatically converted to double precision when filtering, resampling and ICA.

Arno

> On Feb 4, 2017, at 2:38 AM, Andreas Widmann <widmann at uni-leipzig.de> wrote:
> 
> Ok. So what do you think about Nima’s argument that "double precision computation is essential because round off in single precision quickly destroys any natural commutativity of the linear operations“ (http://journal.frontiersin.org/article/10.3389/fninf.2015.00016/full)?
> 
> Best,
> Andreas
> 
>> My opinion on this is that there is no reason to force the import of data as double precision because the number of relevant bits in EEG data is lower than 19 bits (I have searched for the reference but could not find it). BIOSEMI record 24-bits of data for example. The EDF standard format is limited to 16 bits so some data may be lost which is why BIOSEMI adapted the format to BDF. Other manufacturers use 32-bit formats. I do not know of any data recorded with more than 32-bit precision.
>> 
>> Single precision is 32 bits (32 zeros and ones). Therefore there is plenty of room for EEG data. Converting to double precision (64 bits) when the data is being imported would not be useful. It is useful sometimes to convert the data to double precision when filtering or when running ICA (because of numerical imprecisions, ICA with single precision data can be different from ICA with double precision data). The data is always converted to double precision before running ICA (unless there is not enough RAM and then the algorithm will run in single precision).
>> 
>> So, you should keep your data in single precision. EEGLAB will automatically handle the conversion to double precision when filtering, resampling, or when running ICA (then convert back the result to single precision). You cannot disable that feature (nor should you want to).
>> 
>> Arno
>> 
>>> On Feb 2, 2017, at 1:10 PM, Hiebel, Hannah (hannah.hiebel at uni-graz.at) <hannah.hiebel at uni-graz.at> wrote:
>>> 
>>> 
>>> Dear Andreas,
>>> 
>>> sorry for the misunderstanding. When running your function (eeglab 13.5.4b) the result is double (under the condition that option_single = 0).
>>> That is, once manually converted to double, the format remains double.
>>> 
>>> What I reported was the other case: if *not* converting manually to double after the import, the data remains single only if both option_single and option_savetwofiles are set to false. If option_savetwofiles is set to true, there actually is an automatic conversion to double when loading the set again.
>>> 
>>> Like this:
>>> 
>>> function testcase_precision_reverse
>>> 
>>> EEG = pop_loadbv('/yourpath', 'yourfile.vhdr');
>>> result1 = class( EEG.data )
>>> pop_saveset( EEG, 'filename', 'prec_test_rev.set', 'filepath', '/yourpath');
>>> EEG = pop_loadset( 'filename', 'prec_test_rev.set', 'filepath', '/yourpath');
>>> result2 = class( EEG.data )
>>> 
>>> end
>>> 
>>> Result:
>>> if option_savetwofiles = 0: result1 = single, result2 = single
>>> if option_savetwofiles = 1: result1 = single, result2 = double
>>> 
>>> I just thought it might be good to report that this also (and it explains why I initially ended up with different formats).
>>> 
>>> 
>>> However, I will convert manually to double and see if it has any influence on the ICA runtimes. So you recommend always converting to double after importing Brain Vision Analyzer files?
>>> How much impact does it in your opinion have on subsequent processing? Is it just preferrable or absolutely necessary?
>>> 
>>> 
>>> Best,
>>> Hannah
>>> 
>>> ________________________________________
>>> Von: Andreas Widmann <widmann at uni-leipzig.de>
>>> Gesendet: Donnerstag, 02. Februar 2017 17:29
>>> An: Hiebel, Hannah (hannah.hiebel at uni-graz.at)
>>> Cc: mmiyakoshi at ucsd.edu; eeglablist at sccn.ucsd.edu
>>> Betreff: Re: [Eeglablist] ICA running very slowly
>>> 
>>>> When now loading the dataset again with pop_loadset, the format remains single if only the set-file exists;
>>> What do you mean by „remains“. Did you manually convert to double before (i.e. after import) as suggested? As I wrote this conversion is *not* done automatically just by setting the option. Could you please save the following function (no attachments allowed here) in a file and run it after adjusting filename and path to something existing on your system (and possibly repair single quotes broken by my mail app) and report the result?
>>> 
>>> function testcase_precision
>>> 
>>> EEG = pop_loadbv('/yourpath', 'yourfile.vhdr');
>>> EEG.data = double( EEG.data );
>>> pop_saveset( EEG, 'filename', 'prec_test.set', 'filepath', '/yourpath');
>>> EEG = pop_loadset( 'filename', 'prec_test.set', 'filepath', '/yourpath');
>>> class( EEG.data )
>>> 
>>> end
>>> 
>>>> To get back to my data: what I can tell is that the data format is single precision after the import (I import Brain Vision Analyzer files with the eeglab extension: bva-io v1.5.12, pop_loadbv).
>>> Yes, bva-io imports with single precision. This is rather due to historic reasons. Not sure what the current EEGLAB policy is here. I would not object changing this to double in a future version. Arno, what do you think?
>>> 
>>> Best,
>>> Andreas
>>> 
>>>> Am 02.02.2017 um 16:44 schrieb Hiebel, Hannah (hannah.hiebel at uni-graz.at) <hannah.hiebel at uni-graz.at>:
>>>> 
>>>> Dear Andreas and Makoto,
>>>> 
>>>> of course, I am working on it! I am a bit reluctant to update the eeglab version in the middle of my analysis – wouldn’t this potentially result in additional inconsistencies? I used eeglab 13.4.4b on my main computer; I just ran additional tests with the other version.
>>>> 
>>>> I eventually managed to figure out what affects data precision: irrespective of the eeglab version (I tested with 13.4.4b, 13.5.4b, and 13.6.5b), changes result from saving/loading a dataset, depending on the settings in "memory and other options". You cannot see this when debugging your own script as it occurs in the process of saving/ loading itself. I think the function responsible ispop_loadset() which has a different effect on the data format (EEG.data) depending on how the dataset was saved before (option_savetwofiles in pop_editoptions , GUI: Memory and other options: "If set, save not one but two files for each dataset").
>>>> 
>>>> option_savetwofiles = 1  -->  two files are saved: .set and .fdt
>>>> option_savetwofiles = 0 -->  one file is saved: .set
>>>> 
>>>> When now loading the dataset again with pop_loadset, the format remains single if only the set-file exists; the data is converted to double if the data is stored in the fdt-file (2 files were saved before). I think it happens within the function eeg_checkset which is called by pop_loadset. I don’t know if this is intentional or not… but if so, it’s very difficult to track (at least for me).
>>>> 
>>>> To get back to my data: what I can tell is that the data format is single precision after the import (I import Brain Vision Analyzer files with the eeglab extension: bva-io v1.5.12, pop_loadbv). As far as I understand the function, this seems to be "standard" (unless an older Matlab version is used).
>>>> 
>>>> I am currently working on a simplified version I can more easily pass on to you (raw data files, code). Of course I rejected the identical bad segments (same mat-file with start/end points used for rejection).
>>>> Thanks  for the comment on linked mastoid reference - I am still not sure here but I hope it becomes clear in the script then.
>>>> 
>>>> Best,
>>>> Hannah
>>>> 
>>>> Von: Makoto Miyakoshi <mmiyakoshi at ucsd.edu>
>>>> Gesendet: Donnerstag, 02. Februar 2017 04:09
>>>> An: Hiebel, Hannah (hannah.hiebel at uni-graz.at)
>>>> Cc: Andreas Widmann; eeglablist at sccn.ucsd.edu
>>>> Betreff: Re: [Eeglablist] ICA running very slowly
>>>> 
>>>> Dear Hannah,
>>>> 
>>>>> or take on your offer, Makoto.
>>>> 
>>>> Actually I'm not a debugging guy. The official bug report place for EEGLAB is Bugzilla.
>>>> https://sccn.ucsd.edu/bugzilla/enter_bug.cgi
>>>> You can file your claim here. Thank you for your patience and cooperation.
>>>> 
>>>> That being said, if you don't see any error message, it's very hard for me to imagine what is wrong. You may also want to give us more info. For example, it is always very slow... if not, when it becomes slow etc. Also, all other basic info, such as sampling rate, data length, number of channels, etc etc...
>>>> 
>>>> Makoto
>>>> 
>>>> 
>>>> On Mon, Jan 30, 2017 at 2:51 AM, Hiebel, Hannah (hannah.hiebel at uni-graz.at)<hannah.hiebel at uni-graz.at> wrote:
>>>> Dear Andreas and Makoto,
>>>> 
>>>> thank you for your additional suggestions.
>>>> 
>>>> I am not really familiar with bugtracker, the easiest way for me would be sharing the data via Dropbox and send you the link separately, or take on your offer, Makoto.
>>>> I wonder if I could have a look at the code first – if you spot anything wrong here, you wouldn’t have to make the effort with the actual data. I could then easily provide the datasets sufficient to run runica().
>>>> Andreas, if I wanted to provide everything you’d need to replicate my whole routine, I’d have to make available several datasets, additional mat-files (e.g. with info about bad segments) and functions...
>>>> I'd suggest starting with the final datasets and then decide how to best proceed, if that's okay.
>>>> 
>>>> Regarding your comments:
>>>> I thought only re-referencing to average reference reduces the data rank (I re-referenced to linked mastoids instead). Maybe there are other steps potentially resulting in rank-deficiency I am not aware of?
>>>> When checking with rank(EEG.data(:,:)) it seems fine, I don’t know if that’s sufficient.
>>>> 
>>>> Makoto, I am working with ICA for the first time and thus have no experience with how clean the data should be. One subject with long runtimes indeed doesn’t have the best data quality (neck muscle tension) but I had the same runtime problems in a subject with very clean EEG data.
>>>> 
>>>> Thank you, Andreas, for your explanation regarding single vs. double precision. On additional remark: When I run the same script on different computers with different Matlab/eeglab version (same setting for option_single), the format (EEG.data) differs (single /double precision) – this is quite confusing for me, to be honest.
>>>> 
>>>> Best regards,
>>>> Hannah
>>>> 
>>>> Von: Makoto Miyakoshi <mmiyakoshi at ucsd.edu>
>>>> Gesendet: Donnerstag, 26. Jänner 2017 23:20
>>>> An: Andreas Widmann
>>>> Cc: Hiebel, Hannah (hannah.hiebel at uni-graz.at); eeglablist at sccn.ucsd.edu
>>>> 
>>>> Betreff: Re: [Eeglablist] ICA running very slowly
>>>> 
>>>> Dear Hannah and Andreas,
>>>> 
>>>>> Btw, did you check data rank?
>>>> 
>>>> Yeah this is another thing. runica() has a rank checker, but if it does not work well for whatever reason, the calculation will be difficult!
>>>> 
>>>> By the way, AMICA does not seem to change its computation speed regardless of data quality. runica() does it clearly.
>>>> 
>>>> Hannah, if you are willing to share the data, just give me data that is sufficient to run runica(). If you don't have any method to share data, I'll give you SCCN server account separately. Let me know.
>>>> 
>>>> Makoto
>>>> 
>>>> 
>>>> 
>>>> On Wed, Jan 25, 2017 at 10:04 AM, Andreas Widmann <widmann at uni-leipzig.de> wrote:
>>>> Dear Hannah,
>>>> 
>>>> please provide one affected raw dataset and the absolutely minimal script demonstrating the issue (mainly your various filters, artifact rejection, and possibly epoching or re-referencing etc). Presumably easiest is via the bugtracker. Also better for future reference.
>>>> 
>>>> Using double precision is indeed to be preferred (the firfilt plugin uses double precision for filtering internally anyway). Note however, that data are not automatically converted to double precision using the option (but NOT converted automatically to single). Depending on your raw data format and importer you possibly have to do that manually. Btw, did you check data rank?
>>>> 
>>>> Best,
>>>> Andreas
>>>> 
>>>>> Am 25.01.2017 um 15:18 schrieb Hiebel, Hannah (hannah.hiebel at uni-graz.at) <hannah.hiebel at uni-graz.at>:
>>>>> 
>>>>> Dear Andreas, dear Makoto,
>>>>> 
>>>>> I have re-run the pre-processing routine and ICA for one of the affected subjects with consistent Matlab version (R2015b) and still get the same results in terms of runtime (>40 h / 14.5 h / < 1h depending on the previously used high-pass filter). Thus, the problems don’t seem to have been caused by inconsistent versions.
>>>>> 
>>>>> Thank you Makoto for your suggestion, I haven’t used the trimOutlier() plugin so far - I will try to check for outliers that way. If bad data quality was the reason, shouldn’t the long runtimes (in a specific subject) occur irrespective of the used high-pass filter?
>>>>> 
>>>>> One aspect I noticed is that the data format (EEG.data) is single precision – could this be the problem? I’ve just read in the PREP pipeline paper that double precision computation is essential for filtering; it is mentioned that the eeg_checkset function converts EEG data to single precision per default and one should override this default by changing the eeglab settings (pop_editoptions, set option_single to false). I have changed the eeglab options following the instructions on the eeglab wiki page ('File' -> 'Memory and other options' -> 'If set, use single precision under...' uncheck it). In my understanding this should be the same, right?
>>>>> 
>>>>> I appreciate your offer to try replicate the problem. I am not sure though what I would have to make available to you - would the pre-processed dataset(s) of one affected subject be sufficient? Also, do you need to be able to actually run my scripts or would it be enough to see the relevant parts of the code? (because in the main scripts I also retrieve information from additional files and call custom-made functions to process co-registered eye-tracking data…).
>>>>> 
>>>>> Thanks a lot for your effort,
>>>>> Hannah
>>>>> 
>>>>> 
>>>>> Hannah Hiebel, Mag.rer.nat.
>>>>> Cognitive Psychology & Neuroscience
>>>>> Department of Psychology, University of Graz
>>>>> Universitätsplatz 2, 8010 Graz, Austria
>>>>> Von: Andreas Widmann <widmann at uni-leipzig.de>
>>>>> Gesendet: Donnerstag, 19. Jänner 2017 21:53
>>>>> An: Hiebel, Hannah (hannah.hiebel at uni-graz.at)
>>>>> Cc: eeglablist at sccn.ucsd.edu
>>>>> Betreff: Re: [Eeglablist] ICA running very slowly
>>>>> 
>>>>> Hi Hannah,
>>>>> 
>>>>> I would like to try to replicate this behavior. Could you please make available one of the affected datasets and the relevant parts of the code used for pre-processing and ICA, e.g. via the bugtracker or Dropbox? Are there possibly data discontinuities without boundary markers? Did you keep MATLAB version constant?
>>>>> 
>>>>> Best,
>>>>> Andreas
>>>>> 
>>>>>> Am 19.01.2017 um 09:41 schrieb Hiebel, Hannah (hannah.hiebel at uni-graz.at) <hannah.hiebel at uni-graz.at>:
>>>>>> 
>>>>>> Dear Alberto and Tarik,
>>>>>> 
>>>>>> thank you very much for your suggestions. I work on a computer with i7 3.60 GHz processor, 8 GB RAM or notebook with i7 2.5 GHz and 8GB Ram – this should be okay.
>>>>>> Gladly, the ICA eventually finds a solution and the IC maps look good. However, the question for me is still why does the ICA become >10 times slower after changing the pre-processing routine. I’ve continued testing and indeed the high-pass filter seems to be responsible for the differences.
>>>>>> 
>>>>>> In my recent routine I used the eeglab windowed sinc FIR filter with 1 Hz cut-off frequency, 1 Hz transition bandwidth, 0.001 passband ripple, Kaiser window. When I change the filter (settings) while keeping all other steps the same, I see huge differences in ICA runtime in some subjects. That is, when using a 0.1 Hz Butterworth filter instead, ICA is running fast again (< 1h for the subjects where it took > 30h before). With the eeglab basic FIR filter with 1 Hz passband edge and default settings defined by the internal heuristic (resulting in 0.5 Hz cut-off, 1 Hz trans. bandwidth) it’s also running much faster in most subjects but already takes >20h in the “problematic” cases.
>>>>>> 
>>>>>> This gives me the impression that the higher cut-off frequency causes the problems (or maybe stopband edge and attenuation are more decisive?).
>>>>>> That's very surprising as I would not have expected the filter to have such an impact and a higher cut-off is normally recommended.
>>>>>> 
>>>>>> I’d be very grateful if anyone could provide more insight!
>>>>>> 
>>>>>> Best,
>>>>>> Hannah
>>>>>> 
>>>>>> 
>>>>>> Hannah Hiebel, Mag.rer.nat.
>>>>>> Cognitive Psychology & Neuroscience
>>>>>> Department of Psychology, University of Graz
>>>>>> Universitätsplatz 2, 8010 Graz, Austria
>>>>>> 
>>>>>> Von: Alberto Sainz <albertosainzc at gmail.com>
>>>>>> Gesendet: Mittwoch, 18. Jänner 2017 04:29
>>>>>> An: Hiebel, Hannah (hannah.hiebel at uni-graz.at)
>>>>>> Cc: eeglablist at sccn.ucsd.edu
>>>>>> Betreff: Re: [Eeglablist] ICA running very slowly
>>>>>> 
>>>>>> I would suggest to try in a different computer. I have been applying ICA in a 14 electrode 30min continuous EEG recording (around 40mb) in two different computers. 2Ghz dual core computer took 1h. 2.2Ghz i7 takes around 5 minutes.
>>>>>> 
>>>>>> I know your data is larger but just to say that the processor (and probably the RAM if is too small) matters a lot.
>>>>>> 
>>>>>> Good luck
>>>>>> 
>>>>>> 2017-01-16 20:26 GMT+01:00 Hiebel, Hannah (hannah.hiebel at uni-graz.at)<hannah.hiebel at uni-graz.at>:
>>>>>> Dear all,
>>>>>> 
>>>>>> I am using ICA to clean my EEG data for eye-movement related artifacts. I’ve already done some testing in the past to see how certain pre-processing steps affect the quality of my decomposition (e.g. filter settings). In most cases, it took approximately 1-2 hours to run ICA for single subjects (62 channels: 59 EEG, 3 EOG channels).
>>>>>> 
>>>>>> Now that I run ICA on my final datasets it suddenly takes hours over hours to do only a few steps. It still works fine in some subjects but in others runica takes up to 50 hours. I observed that in some cases the weights blow up (learning rate is lowered many times); in others it starts right away without lowering the learning rate but every step takes ages.
>>>>>> I’ve done some troubleshooting to see if a specific pre-processing step causes this behavior but I cannot find a consistent pattern. It seems to me though that (at least in some cases) the high-pass filter played a role – can anyone explain how this is related? Could a high-pass filter potentially be too strict?
>>>>>> 
>>>>>> On the eeglablist I could only find discussions about rank deficiency (mostly due to using average reference) as a potential reason. I re-referenced to linked mastoids – does this also affect the rank? When I check with rank(EEG.data(:, :)) it returns 62 though, which is equal to the number of  channels. For some of the “bad” subjects I nonehteless tried without re-referencing – no improvement. Also, reducing dimensionality with pca ("pca, 61") didn’t help.
>>>>>> 
>>>>>> Any advice would be very much appreciated!
>>>>>> 
>>>>>> Many thanks in advance,
>>>>>> Hannah
>>>>>> 
>>>>>> 
>>>>>> Hannah Hiebel, Mag.rer.nat.
>>>>>> Cognitive Psychology & Neuroscience
>>>>>> Department of Psychology, University of Graz
>>>>>> Universitätsplatz 2, 8010 Graz, Austria
>>>>>> 
>>>>>> _______________________________________________
>>>>>> Eeglablist page: http://sccn.ucsd.edu/eeglab/eeglabmail.html
>>>>>> To unsubscribe, send an empty email to eeglablist-unsubscribe at sccn.ucsd.edu
>>>>>> For digest mode, send an email with the subject "set digest mime" to eeglablist-request at sccn.ucsd.edu
>>>>>> 
>>>>>> _______________________________________________
>>>>>> Eeglablist page: http://sccn.ucsd.edu/eeglab/eeglabmail.html
>>>>>> To unsubscribe, send an empty email to eeglablist-unsubscribe at sccn.ucsd.edu
>>>>>> For digest mode, send an email with the subject "set digest mime" to eeglablist-request at sccn.ucsd.edu
>>>>> 
>>>>> 
>>>> 
>>>> _______________________________________________
>>>> Eeglablist page: http://sccn.ucsd.edu/eeglab/eeglabmail.html
>>>> To unsubscribe, send an empty email to eeglablist-unsubscribe at sccn.ucsd.edu
>>>> For digest mode, send an email with the subject "set digest mime" to eeglablist-request at sccn.ucsd.edu
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Makoto Miyakoshi
>>>> Swartz Center for Computational Neuroscience
>>>> Institute for Neural Computation, University of California San Diego
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Makoto Miyakoshi
>>>> Swartz Center for Computational Neuroscience
>>>> Institute for Neural Computation, University of California San Diego
>>> 
>>> _______________________________________________
>>> Eeglablist page: http://sccn.ucsd.edu/eeglab/eeglabmail.html
>>> To unsubscribe, send an empty email to eeglablist-unsubscribe at sccn.ucsd.edu
>>> For digest mode, send an email with the subject "set digest mime" to eeglablist-request at sccn.ucsd.edu
>> 
> 




More information about the eeglablist mailing list