[Gretl-users] Re: Reading large csv and sorting data set -- a comparison with 2 python libs

Monday, 21 December 2020

Am 14.12.20 um 17:41 schrieb Allin Cottrell:
...
 On Mon, 14 Dec 2020, Artur Tarassow wrote:

 [...]

> Am 13.12.20 um 15:35 schrieb Allin Cottrell:
>
>> Sorting a dataset: This was not optimized for a huge number of 
>> observations. We were allocating more temporary storage than strictly 
>> necessary and moreover, at some points, calling malloc() per 
>> observation when it was possible to substitute a single call to get a 
>> big chunk of memory. Neither of these points were much of an issue 
>> with a normal-size dataset but they became a serious problem in the 
>> huge case. That's now addressed in git.
>
> As I already wrote you privately: This change is a boost as sorting 
> time got reduced from 14 to 7.5 seconds. Thanks for this!
>
> By the way, does this increased speed in sorting also affect the 
> aggregate() function?

 Right now it's specific to the case of sorting an entire dataset, but it 
 would be worth taking a look at the aggregate case too.

Hi all,

I've finalized the little documentation of this little project including 
Allin's response to the first version:

https://github.com/atecon/gretl_pandas_pypolars

Best,
Artur

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

[Gretl-users] Re: Reading large csv and sorting data set -- a comparison with 2 python libs