Skip to content

Distinguish data from distance type in vector space#484

Draft
GuilhemN wants to merge 1 commit into
nmslib:masterfrom
GuilhemN:INPUTTYPE
Draft

Distinguish data from distance type in vector space#484
GuilhemN wants to merge 1 commit into
nmslib:masterfrom
GuilhemN:INPUTTYPE

Conversation

@GuilhemN
Copy link
Copy Markdown

@GuilhemN GuilhemN commented Jun 9, 2021

Related to #482

@GuilhemN
Copy link
Copy Markdown
Author

GuilhemN commented Jun 9, 2021

I migrated SpaceSparseJaccard using this in another branch (using ann-benchmarks with my PR erikbern/ann-benchmarks#239).

The results are quite impressive, before:
plot

After:
movielens10m-jaccard

@searchivarius
Copy link
Copy Markdown
Member

Hi @GuilhemN I have reviewed the request. Yes, we can follow this path, but it is a rather substantial change of the C++ API so I need to discuss it with other people. And also if we are doing something like this, don't you think it's easier to have just a float distance instead? An extra template parameter, you know, it would be nice to get rid of dist_t then forever.

@GuilhemN
Copy link
Copy Markdown
Author

Yes, we can follow this path, but it is a rather substantial change of the C++ API so I need to discuss it with other people.

Sure, no problem.

And also if we are doing something like this, don't you think it's easier to have just a float distance instead? An extra template parameter, you know, it would be nice to get rid of dist_t then forever.

Wouldn't that be a bit weird for hamming distance? And also maybe slower than ints?
But I can do it if you want, that would leave us with just one template parameter (that I called dist_uint_t in this proposal, but should probably be renamed to something like data_t).
And it would actually also break compatibility with the former python interface so it would probably require a new major version, right?

@searchivarius
Copy link
Copy Markdown
Member

searchivarius commented Jun 14, 2021 via email

@searchivarius
Copy link
Copy Markdown
Member

Hi @GuilhemN after some consideration, I think I will likely go with your approach. I don't worry about float being inefficient, but using float instead of int will break compatibility. Please, give me some time, I will batch update the library (there are several important PRs) when I have a bit more time.

Thank you!

@GuilhemN
Copy link
Copy Markdown
Author

Hi! Ok thank you, let me know when I should rebase/rework on this PR :)

@searchivarius
Copy link
Copy Markdown
Member

Hi @GuilhemN I have to postpone things a bit, then I will update the library "en masse". I will let you know if something needs to be changed, but likely it's going to be ok.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants