Am 30.10.21 um 17:59 schrieb Allin Cottrell:
On Sat, 30 Oct 2021, Allin Cottrell wrote:
> [D]o we really want to make the return value in the X,Y case a matrix
> rather than a vector?
>
> I'm not sure that's very helpful, but if we stay with a matrix result
> I think Artur is right: in a matrix operation where the left-hand
> operand is m x n and the right-hand one is p x n it seems unintuitive
> to give a p x m result; seems m x p would be better. (That point is of
> course invisible if m == p but it appears otherwise.)
Hmm, second thoughts... If we return a matrix it's then easier to
document the order in which the distances appear (whichever way that may
be), as in "element i,j holds the distance between row i of X and row j
of Y" (or the transpose). And provided that is clearly documented maybe
m x p versus p x m doesn't matter.
I fully support the view that a matrix is easier to documents and, I
would argue, to understand, too.
Of course, the order does not matter. However, that's why I asked
whether there exists a de facto standard. I simply tried to show what
sklearn (a de facto standard for data science) returns. By the way, the
"Distances" package for Julia behaves like sklearn. Why not follow those
packages which may make it easier for users to adapt to gretl?
<Julia>
using Pkg
Pkg.add("Distances")
using Distances
x = [[1, 2],[ 3, 4]]
y = [[1, 2],[0, 3]]
pairwise(Euclidean(), x, y)
2×2 Matrix{Float64}:
0.0 1.41421
2.82843 3.16228
pairwise(Cityblock(), x, y)
2×2 Matrix{Int64}:
0 2
4 4
</Julia>
Gretl returns
for "euclidean" metric:
0.0000 2.8284
1.4142 3.1623
for "manhattan" metric:
0 4
2 4
Best,
Artur