On Tue, 17 Sep 2019, Sven Schreiber wrote:
Am 17.09.2019 um 11:19 schrieb Riccardo (Jack) Lucchetti:
> On Tue, 17 Sep 2019, Sven Schreiber wrote:
>
>> Am 17.09.2019 um 09:47 schrieb Riccardo (Jack) Lucchetti:
>>>
>>> matrix sel = ranking(muniform(n,1))
>>> return X[sel[1:k],]
>>>
>> the doc for ranking says "...plus one half the number of elements...",
>> so it can be non-integer, no?
>
> In theory, yes. But the odds of having a tie among pseudo-random
> uniforms are so small I wouldn't worry about it.
Hm, I don't know. Small, of course, but if millions of future gretl
users start running huge simulations based on that, Murphy's law says
there's going to be an error. As you know, the probability isn't zero
for computer-represented numbers.
Yes. But doubles are 8 byte large. But the probability of extracting k
different numbers out of n choices without repetition is
$\frac{(n-1)!}{(n-k)! n^{k-1}}$ (if I recall correctly), so when n = 2^32
I guess we're pretty safe for any reasonable value of k.
But anyway, doesn't ranking involve expensive sorting? My gut
feeling is
that using mrandgen(i,...) to produce "more" than enough random indices,
then simply discarding the duplicates with uniq(), could be faster. (Yes
you need an additional check whether enough indices are left over, and
you might have to do extra work, but most of the time it wouldn't be
required.)
Sorting via qsort (which is what we use) is normally O(n log n), so I
guess it's about the same speed as uniq(). Maybe it could be interesting
to write a speed comparison script.
-------------------------------------------------------
Riccardo (Jack) Lucchetti
Dipartimento di Scienze Economiche e Sociali (DiSES)
Università Politecnica delle Marche
(formerly known as Università di Ancona)
r.lucchetti(a)univpm.it
http://www2.econ.univpm.it/servizi/hpp/lucchetti
-------------------------------------------------------