Example code of 'eigensym' and way faster 'eiggen2'

dbnomics: cannot get some ECB data?

eiggen2(), Re() and Im()

Ioannis A. Venetis

Friday, 10 January 2020 Fri, 10 Jan '20

4:37 a.m.

Running the following script I get a huge difference timerB = 0.010171000 timerA = 0.22064500 ratio = 21.693540 It does not matter if LOOP2 comes in place of LOOP1. Also I get eigenvalues in descending order in LOOP2 instead of ascending as in eigensym but that's not the issue (I guess). I hope the code has not any silly mistakes. Thanks for your time anyway. Yiannis <\hansl> clear set verbose off mynameisseed = randgen1(i,1000000,10000000) set seed mynameisseed set seed #8402212 reps = 1000 matrices AtA = array(reps) loop i = 1..reps -q A = mrandgen(i,-5,5,4,4) AtA[i] = A'A endloop # LOOP 2 set stopwatch loop i = 1..reps -q matrix V = {} matrix W = {} D = eiggen2(AtA[i],&V,&W) Tmp = mreverse(msortby(Re(D)~Re(V)',1)) D = Tmp[,1] V = Tmp[,2:]' # print D # print V endloop timerB = $stopwatch print timerB # LOOP 1 set stopwatch loop i = 1..reps -q matrix V = {} matrix W = {} D = eigensym(AtA[i],&V) # print D # print V endloop timerA = $stopwatch print timerA ratio = timerA/timerB print ratio </hansl>

Attachments:

attachment.htm (text/html — 4.2 KB)

Show replies by date

Sven Schreiber

Friday, 10 January Fri, 10 Jan

4:48 a.m.

Am 10.01.2020 um 10:37 schrieb Ioannis A. Venetis:

...

Running the following script I get a huge difference timerB = 0.010171000 timerA = 0.22064500 ratio = 21.693540 It does not matter if LOOP2 comes in place of LOOP1.

On 2019d on Windows (old Ivybridge) I get a ratio of 8.5, using 10000 reps instead of just 1000. Which version are you using? cheers sven

Ioannis Venetis

5:10 a.m.

Στις Παρ, 10 Ιαν 2020 στις 11:48 π.μ., ο/η Sven Schreiber <svetosch(a)gmx.net> έγραψε:

...

Am 10.01.2020 um 10:37 schrieb Ioannis A. Venetis: > Running the following script I get a huge difference > > timerB = 0.010171000 > timerA = 0.22064500 > ratio = 21.693540 > > It does not matter if LOOP2 comes in place of LOOP1. On 2019d on Windows (old Ivybridge) I get a ratio of 8.5, using 10000 reps instead of just 1000. Which version are you using? cheers sven _______________________________________________ Gretl-users mailing list -- gretl-users(a)gretlml.univpm.it To unsubscribe send an email to gretl-users-leave(a)gretlml.univpm.it Website: https://gretlml.univpm.it/postorius/lists/gretl-users.gretlml.univpm.it/

Wow! mmm Something is going on? I use gretl 2019d MS Windows (x86_64) build date 2019-12-22 Of course not your job but I also attach some info from my PC properties OS Name Microsoft Windows 10 Enterprise Version 10.0.17763 Build 17763 System Type x64-based PC System SKU 2YW27AV Processor Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz, 3192 Mhz, 6 Core(s), 12 Logical Processor(s) BIOS Version/Date HP Q50 Ver. 01.04.07, 1/31/2019 Installed Physical Memory (RAM) 16.0 GB Total Physical Memory 15.8 GB Available Physical Memory 10.2 GB Total Virtual Memory 18.2 GB Available Virtual Memory 12.3 GB Thanks again Yiannis

Sven Schreiber

5:19 a.m.

Am 10.01.2020 um 11:10 schrieb Ioannis Venetis:

...

Wow! mmm Something is going on? I use gretl 2019d MS Windows (x86_64) build date 2019-12-22

BTW just three days ago eigensym was switched to a different Lapack background function I think, so maybe we should try a brand new snapshot for this.

...

Of course not your job but I also attach some info from my PC properties OS Name Microsoft Windows 10 Enterprise Version 10.0.17763 Build 17763 System Type x64-based PC System SKU 2YW27AV Processor Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz, 3192 Mhz, 6 Core(s),

It could always be that openblas has different optimizations for different Lapack functions for your newer CPU. cheers sven

Ioannis Venetis

5:41 a.m.

BTW just three days ago eigensym was switched to a different Lapack

...

background function I think, so maybe we should try a brand new snapshot for this. It could always be that openblas has different optimizations for different Lapack functions for your newer CPU. cheers sven

Using 2020a-git (2020-01-08) returns the same times. Also I run the script from either the desktop location or from the hard disk. No other program is running etc. I attach my $sysinfo ? eval $sysinfo bundle anonymous: nproc = 12 blascore = "Haswell" hostname = "DESKTOP-OFGVHSS" os = "windows" mpi = 0 blas = "openblas" omp_num_threads = 12 ncores = 6 omp = 1 blas_parallel = "OpenMP" mpimax = 12 wordlen = 64 if it could be of any value Yiannis

Ioannis Venetis

6:37 a.m.

Στις Παρ, 10 Ιαν 2020 στις 12:41 μ.μ., ο/η Ioannis Venetis < ioannisvenetis(a)gmail.com> έγραψε:

...

BTW just three days ago eigensym was switched to a different Lapack > background function I think, so maybe we should try a brand new snapshot > for this. > > > It could always be that openblas has different optimizations for > different Lapack functions for your newer CPU. > > cheers > sven > > Using 2020a-git (2020-01-08) returns the same times. Also I run the script from either the desktop location or from the hard disk. No other program is running etc. I attach my $sysinfo ? eval $sysinfo bundle anonymous: nproc = 12 blascore = "Haswell" hostname = "DESKTOP-OFGVHSS" os = "windows" mpi = 0 blas = "openblas" omp_num_threads = 12 ncores = 6 omp = 1 blas_parallel = "OpenMP" mpimax = 12 wordlen = 64 if it could be of any value Yiannis

OK I went back to my old PC (I have 2 I admit!!) and I use 2019d in both cases. I get a ratio of 10 (less than half). But 'eigensym' does almost the same time while eiggen2 is much slower. So in my new (and I guess better PC) with eigensym I get the same timeA but a much faster timeB from 'eiggen2'. Riddle to be solved … :). Same thing happens with 2020a-git in both machines Thanks Yiannis

Allin Cottrell

7:33 p.m.

On Fri, 10 Jan 2020, Ioannis A. Venetis wrote:

...

Running the following script I get a huge difference timerB = 0.010171000 timerA = 0.22064500 ratio = 21.693540

where the fast "timerB" pertains to use of eiggen2() -- now renamed as eigen(), eigen-decomposition of a general matrix -- and the seemingly much slower "timerA" pertains to eigensym(), eigen-decomposition of a symmetric matrix. Both of these employed on matrices which are by construction symmetric. There are definitely some points to note here, though I've not been able to reproduce anything like the asymmetry you found. First, I think you may need more replications to generate meaningful results. Second, the matrices you pass to the respective eigen functions are tiny, 4 x 4. There's nothing wrong with that, but it means that any function that's at all "optimized" (let alone parallelized) is almost sure to take longer than the plainest of plain vanilla algorithms. We recently (git, snapshots) made eigensym() default to a cleverly optimized lapack variant, dsyevr(). I've just recently changed that so that we use use dsyevr only if the order of the input matrix is at least 10. That helps a little in the case of tiny input. Another point is that when openblas includes a parallelized version of a lapack function, by default all available threads get used in its execution. But today's "consumer" CPUs typically offer twice as many threads as real/physical cores (hyper-threading) and on dense computations such as lapack functions using all threads can slow things down quite a bit. This will probably hurt most for tiny input where multi-threading isn't really justified to start with. Anyway, I'm appending below a modified version of your script, with a switch to control tiny versus bigger input. I recommend running this with OMP_NUM_THREADS set in the environment to the number of physical cores on your system. On my (kinda elderly) home system (4 cores, 8 threads max) here's what I'm seeing: With OMP_NUM_THREADS=4 With tiny_mat = 1 (order 4, 20000 replications): eigen() time: 0.2217s eigensym() time: 0.2200s ratio of eigen() time to eigensym() time: 1.00768 With tiny_mat = 0 (order 40, 2000 replications): eigen() time: 1.1705s eigensym() time: 0.6310s ratio of eigen() time to eigensym() time: 1.85502 Script: <hansl> set verbose off set seed 8402212 # or whatever tiny_mat = 0 # or 0 for bigger input! reps = tiny_mat ? 20000 : 2000 dim = tiny_mat ? 4 : 40 matrices AA = array(reps) loop i=1..reps -q A = mrandgen(i, -5, 5, dim, dim) AA[i] = A'A endloop matrix V = {} matrix W = {} set stopwatch loop i=1..reps -q D = eigen(AA[i], &V, &W) Tmp = mreverse(msortby(Re(D)~Re(V)', 1)) D = Tmp[,1] V = Tmp[,2:]' endloop eigen_time = $stopwatch printf "eigen() time: %.4fs\n", eigen_time set stopwatch loop i=1..reps -q D = eigensym(AA[i], &V) endloop eigensym_time = $stopwatch printf "eigensym() time: %.4fs\n", eigensym_time printf "ratio of eigen() time to eigensym() time: %g\n", eigen_time / eigensym_time </hansl> Allin

Allin Cottrell

7:59 p.m.

On Fri, 10 Jan 2020, Allin Cottrell wrote: [...]

...

Script: <hansl> set verbose off set seed 8402212 # or whatever tiny_mat = 0 # or 0 for bigger input!

Oops. As you might have guessed, that line should have read: tiny_mat = 1 # or 0 for bigger input! "tiny_mat" (4 x 4) was supposed to be the default, as in Ioannis's original script. But the timings I posted were correct, for my system. Allin

Ioannis Venetis

Saturday, 11 January Sat, 11 Jan

2 a.m.

Στις Σάβ, 11 Ιαν 2020 στις 2:59 π.μ., ο/η Allin Cottrell <cottrell(a)wfu.edu> έγραψε:

...

On Fri, 10 Jan 2020, Allin Cottrell wrote: [...] > > Script: > > <hansl> > set verbose off > set seed 8402212 # or whatever > > tiny_mat = 0 # or 0 for bigger input! Oops. As you might have guessed, that line should have read: tiny_mat = 1 # or 0 for bigger input! "tiny_mat" (4 x 4) was supposed to be the default, as in Ioannis's original script. But the timings I posted were correct, for my system. Allin

Allin, thanks a lot.!! I will have a look tomorrow morning (i am away today) from my laptop and Monday morning in office. Yesterday when i run my script at home (laptop new) i got the same times with Sven. I will test it further using your code and I will back. I suspect that my office PC is responsible. Yiannis

Riccardo (Jack) Lucchetti

Sunday, 12 January Sun, 12 Jan

5:07 a.m.

On Fri, 10 Jan 2020, Allin Cottrell wrote:

...

Home laptop: tiny = 0: eigen() time: 2.6018s eigensym() time: 0.8643s ratio of eigen() time to eigensym() time: 3.01019 tiny = 1: eigen() time: 0.4468s eigensym() time: 0.2054s ratio of eigen() time to eigensym() time: 2.17544 Work PC: tiny = 0: eigen() time: 0.6631s eigensym() time: 0.5610s ratio of eigen() time to eigensym() time: 1.18205 tiny = 1: eigen() time: 0.1055s eigensym() time: 0.0857s ratio of eigen() time to eigensym() time: 1.23118 ------------------------------------------------------- Riccardo (Jack) Lucchetti Dipartimento di Scienze Economiche e Sociali (DiSES) Università Politecnica delle Marche (formerly known as Università di Ancona) r.lucchetti(a)univpm.it http://www2.econ.univpm.it/servizi/hpp/lucchetti -------------------------------------------------------

Sven Schreiber

Saturday, 11 January Sat, 11 Jan

5:41 a.m.

Am 11.01.2020 um 01:33 schrieb Allin Cottrell:

...

Another point is that when openblas includes a parallelized version of a lapack function, by default all available threads get used in its execution. But today's "consumer" CPUs typically offer twice as many threads as real/physical cores (hyper-threading) and on dense computations such as lapack functions using all threads can slow things down quite a bit. This will probably hurt most for tiny input where multi-threading isn't really justified to start with.

Hm, is openblas really so naive? Or to put it differently, is it the responsibility of the caller to pick the non-parallel version if needed?

...

Anyway, I'm appending below a modified version of your script, with a switch to control tiny versus bigger input. I recommend running this with OMP_NUM_THREADS set in the environment to the number of physical cores on your system. On my (kinda elderly) home system (4 cores, 8 threads max) here's what I'm seeing:

Hm, I'm using a perhaps even older 4-core system _without_ HT which has omp_num_threads = 4 as per $sysinfo, and with the brand new snapshot I get: tiny YES: ratio 0.11 tiny NO: ratio 0.37 So eigensym looks pretty bad "always". Since Ioannis had a newer PC, this doesn't look like a pre-Haswell CPU issue. Perhaps something Windows-specific (with OpenMP)? cheers sven

Allin Cottrell

3:26 p.m.

On Sat, 11 Jan 2020, Sven Schreiber wrote:

...

It's possible there was some specific issue with the openblas library we've been packaging for Windows, which was quite an old version. That's now replaced in today's snapshots with a new build of current OpenBLAS (version 0.3.7). I'd be interested to hear if that makes any difference. Oh, one other point. At present eigensym() tests for symmetry of the input. That's not going to affect the timing much, but maybe we should junk the test? In some other functions that expect symmetric input we don't bother to test -- it's up to the user. Allin

Sven Schreiber

3:37 p.m.

Am 11.01.2020 um 21:26 schrieb Allin Cottrell:

...

On Sat, 11 Jan 2020, Sven Schreiber wrote:

...

> > So eigensym looks pretty bad "always". Since Ioannis had a newer PC, > this doesn't look like a pre-Haswell CPU issue. Perhaps something > Windows-specific (with OpenMP)? It's possible there was some specific issue with the openblas library we've been packaging for Windows, which was quite an old version. That's now replaced in today's snapshots with a new build of current OpenBLAS (version 0.3.7). I'd be interested to hear if that makes any difference.

The snapshot refuses to start, asking for libgfortran-5.dll or something.

...

Oh, one other point. At present eigensym() tests for symmetry of the input. That's not going to affect the timing much, but maybe we should junk the test? In some other functions that expect symmetric input we don't bother to test -- it's up to the user.

Hm, don't know. I remember reading such a warning in the help for some function. This warning would have to be copied to the eigensym doc I guess. But for tiny matrices it may make a relative difference I guess. But what would be the quickest or easiest user-testable way for checking symmetry? (Can't test because my snapshot isn't starting ;-) thanks sven

Allin Cottrell

9:12 p.m.

On Sat, 11 Jan 2020, Sven Schreiber wrote:

...

Am 11.01.2020 um 21:26 schrieb Allin Cottrell: > On Sat, 11 Jan 2020, Sven Schreiber wrote: >> >> So eigensym looks pretty bad "always". Since Ioannis had a newer PC, >> this doesn't look like a pre-Haswell CPU issue. Perhaps something >> Windows-specific (with OpenMP)? > > It's possible there was some specific issue with the openblas library > we've been packaging for Windows, which was quite an old version. That's > now replaced in today's snapshots with a new build of current OpenBLAS > (version 0.3.7). I'd be interested to hear if that makes any difference. The snapshot refuses to start, asking for libgfortran-5.dll or something.

Ah, OK. That's now included in the latest snapshots.

...

> Oh, one other point. At present eigensym() tests for symmetry of the > input. That's not going to affect the timing much, but maybe we should > junk the test? In some other functions that expect symmetric input we > don't bother to test -- it's up to the user. Hm, don't know. I remember reading such a warning in the help for some function. This warning would have to be copied to the eigensym doc I guess. But for tiny matrices it may make a relative difference I guess. But what would be the quickest or easiest user-testable way for checking symmetry? (Can't test because my snapshot isn't starting ;-)

I think that shouldn't be a problem for the user: you should know if a matrix is symmetric or not without having to test. Is it the result of A'A, or a covariance matrix, or some such? In fact, what actually matters for the relevant lapack functions is not that the matrix is symmetric, but that it should be _taken as_ symmetric: lapack will read only the lower or upper triangle. (I'll have to check on this, but maybe we could standardize on, say, only the lower triangle is ever read.) Allin

Ioannis Venetis

Sunday, 12 January Sun, 12 Jan

3:53 a.m.

...

OK, I run Allin's script (I re-attach it at the end of this message) at home

( Windows 10 Home Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz 1.99 GHz Installed memory(RAM) 16.0GB) with 2020a-git build date 2020-01-11 and I get (tiny_mat = 1 ) eigen() time: 0.1368s eigensym() time: 1.9458s ratio of eigen() time to eigensym() time: 0.0702871 (tiny_mat = 0 ) eigen() time: 0.7492s eigensym() time: 2.6358s ratio of eigen() time to eigensym() time: 0.284221 Points: eigen() is faster than Allin's due to (I guess) newest machine I have, but eigensym() is considerably slower in any case. Bear in mind than I posted the $sysinfo earlier on but I have no clue about MPI or how to change OMP_NUM_THREADS (for fancy stuff I rely on Jack's ;) intervention ) Tomorrow I will post results from my office Yiannis

Sven Schreiber

5:52 a.m.

Am 12.01.2020 um 09:53 schrieb Ioannis Venetis:

...

( Windows 10 Home Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz 1.99 GHz Installed memory(RAM) 16.0GB) with 2020a-git build date 2020-01-11

There were two snapshots with that date I believe. I'm using the one from "right now" (half an hour ago or so). The problem does not go away, ratios stay the same.

...

ratio of eigen() time to eigensym() time: 0.0702871

...

ratio of eigen() time to eigensym() time: 0.284221

My numbers are marginally better for eigensym, perrhaps because this machine doesn't have HT. I couldn't find any obvious hints about slow performance of dsyevr or dsyev with openblas on Windows on the web so far, unfortunately. Maybe it's some compiler option, but of course one wouldn't expect that it doesn't slow down _everything_ then. thanks sven

Ioannis Venetis

3:18 p.m.

New subject: isnan() returns -1 instead of 1.

Hi all, In 2020a-git build date 2020-01-1 (i haven't test in other versions) Function isnan() returns -1 instead of 1 as the help says "... Given a matrix argument, returns a matrix of the same dimensions with 1s in positions where the corresponding element of the input is NaN and 0s elsewhere" Also shouldn't it be in the help "...of the input is NA and 0s elsewhere" since other 'text' inputs for not-a-number or not-available are not understood by gretl, e.g.set below, Z[3,3]=nan or NAN or NaN but still, if you set Z[3,3]=0/0 it evals Z[3,3]=nan which is then understood by gretl isnan() which returns -1. <hansl> Z = ones(5,5) Z[3,3]=NA eval isnan(Z) eval !ok(Z) eval sumc(sumr(isnan(Z))) eval sumc(sumr(!ok(Z))) Z[3,]=NA eval isnan(Z) </hansl> Thanks Yiannis

Allin Cottrell

4:32 p.m.

New subject: isnan() returns -1 instead of 1.

On Sun, 12 Jan 2020, Ioannis Venetis wrote:

...

Ah, that occurs only on Windows. I hadn't noticed, but the C99 standard just says "The isnan macro returns a nonzero value if and only if its argument has a NaN value." Microsoft has helpfully chosen to return -1 rather than 1. So there's now a workaround for MS behavior in git.

...

Also shouldn't it be in the help "...of the input is NA and 0s elsewhere" [...]

Actually, no: isnan() is literally a test for NaN. Allin

Allin Cottrell

4:44 p.m.

On Sun, 12 Jan 2020, Sven Schreiber wrote:

...

Am 12.01.2020 um 09:53 schrieb Ioannis Venetis: > ( Windows 10 Home > Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz 1.99 GHz > Installed memory(RAM) 16.0GB) > with 2020a-git build date 2020-01-11 There were two snapshots with that date I believe. I'm using the one from "right now" (half an hour ago or so). The problem does not go away, ratios stay the same.

I did some testing on Windows 10 today, and I saw the same: eigen() a lot faster than eigensym(), with the update to openblas 0.3.7 not making an appreciable difference. I also found: (a) disabling the symmetry test in eigensym made essentially no difference; (b) restricting the number of OMP threads to the number of real cores didn't change the ordering; and (c) running the script via gretlcli rather than the GUI didn't make a difference (it shouldn't, of course, but just checking). But... I did find that if you scale up the order of the input you eventually reach a breakeven point then eigensum overtakes. On the haswell laptop I was using the breakeven was around order 80, and by order 90 eigensym was substantially faster (on Windows, just to be clear). In further testing on Linux/haswell I saw that eigensym was much faster at order 40, but actually it was slower in the "tiny" case (order 4), though nothing like as much as on Windows. I also saw that both dsyev and dgeev appear to do multi-threading. Pending a proper understanding of what's going on. maybe we should make eigensym() divert to eigen() on Windows for order less than 90 or so. Allin

Sven Schreiber

Monday, 13 January Mon, 13 Jan

7:59 a.m.

Am 12.01.2020 um 22:44 schrieb Allin Cottrell:

...

On Sun, 12 Jan 2020, Sven Schreiber wrote:

...

> The problem does not go away, ratios stay the same. I did some testing on Windows 10 today, and I saw the same: eigen() a lot faster than eigensym(), with the update to openblas 0.3.7 not making an appreciable difference. I also found: (a) disabling the symmetry test in eigensym made essentially no difference; (b) restricting the number of OMP threads to the number of real cores didn't change the ordering;

...

though nothing like as much as on Windows. I also saw that both dsyev and dgeev appear to do multi-threading.

Thanks for the thorough testing. Perhaps it would be interesting to also restrict the number of OMP threads to 1 (instead of ncores)? (I don't know how to do that on Windows, except by wrapping the whole thing in a dummy mpi block with np=1 and omp-threads=1.)

...

Pending a proper understanding of what's going on. maybe we should make eigensym() divert to eigen() on Windows for order less than 90 or so.

Certainly an option. thanks sven

Allin Cottrell

12:19 p.m.

On Mon, 13 Jan 2020, Sven Schreiber wrote:

...

Thanks for the thorough testing. Perhaps it would be interesting to also restrict the number of OMP threads to 1 (instead of ncores)?

Yes, very interesting! See below.

...

(I don't know how to do that on Windows, except by wrapping the whole thing in a dummy mpi block with np=1 and omp-threads=1.)

google: windows set environment variable. (It's not that hard.) Anyway, here's what I found on forcing single-threaded behavior: * eigensym() is uniformly faster than eigen(), on both Windows and Linux. * It makes more difference to the eigensym() times, but even eigen() is a bit faster when single-threaded. * This applies for matrices up to order 200 (which is as big as I've tried). Just as one example, here's the comparison for input of order 90, on a dual-boot haswell laptop with 2 physical cores, max 4 threads: OMP_NUM_THREADS=1 Win10 Linux eigen: 6.2417s 6.2913s eigensym: 1.7279s 1.5508s OMP_NUM_THREADS=2 Win10 Linux eigen: 6.3512s 6.7276s eigensym: 5.6093s 2.1323s So rather than divert from eigensym to eigen on Windows, what we really want to do is run single-threaded eigensym on both platforms, if we can figure out how to do that. (We don't want everything OMP to be single-threaded.) Allin

Sven Schreiber

2:24 p.m.

Am 13.01.2020 um 18:19 schrieb Allin Cottrell:

...

On Mon, 13 Jan 2020, Sven Schreiber wrote:

...

> (I don't know how to do that on Windows, except by wrapping the whole > thing in a dummy mpi block with np=1 and omp-threads=1.) google: windows set environment variable. (It's not that hard.)

OK sorry, next time :-)

...

Anyway, here's what I found on forcing single-threaded behavior:

...

So rather than divert from eigensym to eigen on Windows, what we really want to do is run single-threaded eigensym on both platforms, if we can figure out how to do that. (We don't want everything OMP to be single-threaded.)

What about this: https://github.com/xianyi/OpenBLAS/blob/ce3651516f12079f3ca2418aa85b9ad57... And similar experiences: * https://github.com/scipy/scipy/pull/9056 * https://github.com/xianyi/OpenBLAS/pull/1971 , Shift transition to multithreading towards larger matrix sizes This seems to be a pretty pervasive problem of openblas. I wonder what other algebra functions in gretl might be affected. cheers sven

Allin Cottrell

2:52 p.m.

On Mon, 13 Jan 2020, Sven Schreiber wrote:

...

Am 13.01.2020 um 18:19 schrieb Allin Cottrell: > On Mon, 13 Jan 2020, Sven Schreiber wrote: >> (I don't know how to do that on Windows, except by wrapping the whole >> thing in a dummy mpi block with np=1 and omp-threads=1.) > > google: windows set environment variable. (It's not that hard.) OK sorry, next time :-) > Anyway, here's what I found on forcing single-threaded behavior: ... > So rather than divert from eigensym to eigen on Windows, what we > really want to do is run single-threaded eigensym on both platforms, > if we can figure out how to do that. (We don't want everything OMP to > be single-threaded.) What about this: https://github.com/xianyi/OpenBLAS/blob/ce3651516f12079f3ca2418aa85b9ad57...

We're on ths same page! I just pushed to git an openblas-specific patch for DSYEV* to store the prior number of threads, obtained via openblas_get_num_threads(); set the number of threads to 1 via openblas_sget_num_threads(); then restore the prior thread count. The bad news is that this really does seem to be a design flaw in openblas -- I've now tried matrices of order 400 and multi-threading is still slower for DSYEV*. The good news is that DGEEV runs a bit faster with multi-threading for really big matrices (at least on sandybridge). It remains to be seen how this pans out for other LAPACK functions. Allin

Sven Schreiber

Wednesday, 15 January Wed, 15 Jan

9:40 a.m.

Am 13.01.2020 um 20:52 schrieb Allin Cottrell:

...

I have some preliminary evidence that it also affects cholesky() and qrdecomp() with tiny matrices, at least on Windows. (This is not with the latest snapshot, but if I understand correctly the underlying routines DPOTRF/DPOTRS and DGEQRF/DORGQR were not called differently yet anyway.) The speedup of using just one thread for openmp seems to be about 15%, very roughly speaking. I don't have time to do a more thorough analysis right now, unfortunately. cheers sven

2012

days inactive

2017

days old

gretl-users@gretlml.univpm.it

Manage subscription

23 comments

5 participants

tags (0)

participants (5)

Allin Cottrell
Ioannis A. Venetis
Ioannis Venetis
Riccardo (Jack) Lucchetti
Sven Schreiber

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Example code of 'eigensym' and way faster 'eiggen2'