[Gretl-users] Re: Resample without replacement (Artur Tarassow)

Tuesday, 17 September 2019

On Tue, 17 Sep 2019, Sven Schreiber wrote:

...
 Am 17.09.2019 um 11:19 schrieb Riccardo (Jack) Lucchetti:
> On Tue, 17 Sep 2019, Sven Schreiber wrote:
> 
>> Am 17.09.2019 um 09:47 schrieb Riccardo (Jack) Lucchetti:
>>> 
>>>     matrix sel = ranking(muniform(n,1))
>>>     return X[sel[1:k],]
>>> 
>> the doc for ranking says "...plus one half the number of elements...",
>> so it can be non-integer, no?
> 
> In theory, yes. But the odds of having a tie among pseudo-random
> uniforms are so small I wouldn't worry about it.

 Hm, I don't know. Small, of course, but if millions of future gretl
 users start running huge simulations based on that, Murphy's law says
 there's going to be an error. As you know, the probability isn't zero
 for computer-represented numbers. 
Yes. But doubles are 8 byte large. But the probability of extracting k 
different numbers out of n choices without repetition is 
$\frac{(n-1)!}{(n-k)! n^{k-1}}$ (if I recall correctly), so when n = 2^32 
I guess we're pretty safe for any reasonable value of k.

...
 But anyway, doesn't ranking involve expensive sorting? My gut
feeling is
 that using mrandgen(i,...) to produce "more" than enough random indices,
 then simply discarding the duplicates with uniq(), could be faster. (Yes
 you need an additional check whether enough indices are left over, and
 you might have to do extra work, but most of the time it wouldn't be
 required.) 
Sorting via qsort (which is what we use) is normally O(n log n), so I 
guess it's about the same speed as uniq(). Maybe it could be interesting 
to write a speed comparison script.

-------------------------------------------------------
   Riccardo (Jack) Lucchetti
   Dipartimento di Scienze Economiche e Sociali (DiSES)

   Università Politecnica delle Marche
   (formerly known as Università di Ancona)

   r.lucchetti(a)univpm.it
   http://www2.econ.univpm.it/servizi/hpp/lucchetti
-------------------------------------------------------

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

[Gretl-users] Re: Resample without replacement (Artur Tarassow)