Enable autocomplete for postalcodes #1514

missinglink · 2021-02-17T08:29:54Z

I tried to do a /v1/autocomplete for a partial postalcode with ?layers=postalcode today and discovered that it doesn't return anything until the final character has been typed.

This DRAFT PR is a simple attempt at maybe fixing that

Would require more unit tests and some reasonable level of perf/recall/precision testing to get merged.

missinglink · 2021-02-17T08:36:05Z

missinglink · 2021-02-17T08:37:38Z

missinglink · 2021-02-17T08:39:19Z

huh actually works pretty well @orangejulius @Joxit thoughts?

Joxit

LGTM

I did some acceptance-tests, no regression and this will improve UX on postalcode autocomplete !

This adds autocomplete tests specifically around postalcode behavior. Most of them were missed and I actually meant to commit them as part of #544. Additionally, I've added one more test for an incomplete postalcode, to measure progress against pelias/pelias#676. We may be able to improve results with pelias/api#1514

orangejulius · 2021-02-17T14:51:20Z

Oh, very nice. This has been really annoying for a long time. We've had the issue reported in the past going all the way back to 2017 pelias/pelias#676

I just added another acceptance test for this in pelias/acceptance-tests#545 and i'll follow up with a bit more testing.

orangejulius · 2021-02-17T17:39:03Z

I can report that autocomplete queries with a focus.point (and no other params) can work really well now:

Postalcode queries are still pretty tough, there's a lot of potential for exact housenumber matches, but this looks like a great improvement overall.

More testing is needed, it looks like there might be some address autocomplete queries that got a bit worse. But this is exciting :)

missinglink · 2021-02-17T21:18:29Z

it looks like there might be some address autocomplete queries that got a bit worse

Could you please post some examples.

orangejulius · 2021-02-17T22:30:20Z

Ok, still investigating and making sure the differences I'm seeing aren't from differences in test settings/etc, but it looks like the differences aren't that major.

Here's one that's a little confusing: autocomplete takes a couple more characters to get 30 w 26 correct

https://pelias.github.io/compare/#/v1/autocomplete?text=30+w+26

It looks like this query has replaced the second of two 30 w 26th addresses with a bunch of addresses of the format 26XX route 30 west. If its just this one case and there's something weird with the 30 w 26th, new york address record, that's not a problem. But I wonder if it's indicative of other problems.

Very preliminary results so far, stay tuned for more :)

missinglink · 2021-02-18T00:47:16Z

Hmm yeah that is weird, I'm on mobile ATM but I can't see the layers listed in the debug output.

The PR targets the clean.layers property which I assumed would always be there, and possibly is there under the hood.

missinglink · 2021-02-18T00:49:09Z

Adding layers=coarse,address,street,venue,locality,neighbourhood,county,localadmin,region,macrocounty,country,macroregion,borough,postalcode (all of them) to the query mirrors the old behavior

orangejulius · 2021-02-18T02:23:52Z

Ok, I figured out the difference in query.

It turns out all the records shown on the dev screenshot have the same score.

The baseline query has this in the must clause:

It will score records matching the phrase input in order the highest.

The query from this branch ends up looking like this:

All the documents shown in the screenshot above include 26 somewhere, and they have a perfect phrase match for 30 w, so that explains the scoring.

On the one hand, I don't like that this PR essentially adds another weird edge case to our tokenizing/parsing code, but on the other hand the underlying behavior is not the fault of this PR. It's just what our autocomplete queries already do.

I guess this could affect any address query ending in numbers, whether its a numeric street like in this case, or perhaps a "European style" street address, with the number written last?

missinglink · 2021-02-18T03:57:51Z

I'm back at the keyboard again now and added this commit which seems to have done the trick 63c168c

@orangejulius does that fix the issues you were seeing?

missinglink · 2021-02-18T04:02:15Z

I guess this could affect any address query ending in numbers, whether its a numeric street like in this case, or perhaps a "European style" street address, with the number written last?

It shouldn't affect any address queries because if the address layer is targeted then the existing behaviour remains unchanged.

The change of behaviour only applies when address is not present in the layers being queried.

missinglink · 2021-02-18T04:16:13Z

This query confuses me, since it's explicitly targeting the street layer I would expect it to list all street documents starting with 3 but it doesn't...

https://pelias.github.io/compare/#/v1/autocomplete?boundary.gid=whosonfirst%3Alocality%3A85977539&layers=street&text=west+3&debug=1

It's not actually bad, just not what I was expecting and might indicate a bug or that I've missed something.

missinglink · 2021-02-23T22:09:55Z

Some test cases to consider before merging.

missinglink · 2021-04-09T01:10:52Z

rebased origin/master

missinglink · 2021-04-09T01:25:17Z

After rebasing and manually reviewing the cases I mentioned above I don't see any regressions.

missinglink · 2021-04-26T22:57:01Z

We can merge this once we review the differences for the tests in pelias/acceptance-tests#549 and confirm they are positive and no major regressions.

missinglink · 2021-04-28T02:29:53Z

I rebased again and reviewed the test cases linked above, there were three changes:

1x improvement

https://pelias.github.io/compare/#/v1/autocomplete?layers=postalcode&boundary.country=USA&text=9021

2x regressions

https://pelias.github.io/compare/#/v1/autocomplete?boundary.country=FRA&text=03100

https://pelias.github.io/compare/#/v1/autocomplete?boundary.country=USA&text=04106

missinglink · 2021-04-28T02:39:06Z

For the vast majority of cases it's a marked improvement 🎉
The recall is much better with this PR but the precision for postcodes starting with a 0 have sorting issues.

The sorting is quite interesting:

looking at the first screenshot (the improvement), all 5 results are good but it's not clear why they are ordered the way they are 🤷
similarly the scoring for the second screenshot seems to be undefined, the postcodes are scored first and last with three other results in between
on the last screenshot it's including a bunch of "Country Road 4106" for the query "04106", I think this is undesirable. Once those are stripped away the sorting still seems to be undefined.

missinglink · 2021-04-28T02:49:25Z

hmm... I think maybe I'm getting confused since I've been away from work for a while.

The problem with the second two queries is they don't specify layers=postalcode like the first one does, IIRC this PR is only supposed to change behaviour when the address layer isn't being targeted.

When adding layers=postalcode the results look great, so either the dev env and the prod env are different or this code is not honouring that address restriction.

orangejulius · 2021-06-02T18:11:13Z

I just updated this PR with some new commit messages to provide a better "high level" context of the change, rather than the development-focused explanation that was originally present. The original commits are preserved in an archive branch.

Here's what I wrote for the explanation, hopefully it's clear and helpful as it will appear in the Pelias API release notes:

Enable support for autocompleting postalcode by relaxing constraints around when we allow partial matches on numeric inputs.

Historically, Pelias has not allowed the text parameter input to match a "partial" result when the entire input consists only of numbers. Considering there are often lots of results that could match a numeric input like 1, 10, or even 1014, and most of them are addresses that have little chance of being relevant, this approach isn't without merit.

However postalcodes are another possible match, and people want to search for them reasonably often. They also often consist purely of numbers

So this change allows those numeric partial matches when the address layer won't be queried. Thanks to some of our performance optimizations, this will always be the case for short inputs like text=1234 as long as the layers parameter is unset or doesn't explicitly contain the address layer.

In practice, what this means is that a query like text=9021 will now result in the 90210 US postalcode first, whereas previously it would return a bunch of irrelevant venue results scattered around the world.

Enable support for autocompleting postalcode by relaxing constraints around when we allow partial matches on numeric inputs. Historically, Pelias has not allowed the `text` parameter input to match a "partial" result when the entire input consists only of numbers. Considering there are often _lots_ of results that could match a numeric input like `1`, `10`, or even `1014`, and most of them are addresses that have little chance of being relevant, this approach isn't without merit. However postalcodes are another possible match, and people want to search for them reasonably often. They also often consist purely of numbers. So this change allows those numeric partial matches when the `address` layer _won't_ be queried. Thanks to some of our performance optimizations, this will always be the case for short inputs like `text=1234` as long as the `layers` parameter is unset or doesn't explicitly contain the address layer. In practice, what this means is that a query like `text=9021` will now result in the `90210` US postalcode first, whereas previously it would return a bunch of irrelevant venue results scattered around the world.

Joxit approved these changes Feb 17, 2021

View reviewed changes

orangejulius mentioned this pull request Feb 17, 2021

Add postalcode autocomplete tests pelias/acceptance-tests#545

Merged

missinglink force-pushed the autocomplete_postalcodes branch from 63c168c to 6b085e8 Compare April 9, 2021 01:10

missinglink marked this pull request as ready for review April 9, 2021 01:11

missinglink force-pushed the autocomplete_postalcodes branch from 6b085e8 to 0778fd2 Compare April 28, 2021 01:54

orangejulius changed the title ~~allow prefix matching numerals on non-address queries~~ Enable autocomplete for postalcodes Jun 2, 2021

orangejulius force-pushed the autocomplete_postalcodes branch from 0778fd2 to 2ab2149 Compare June 2, 2021 18:08

orangejulius force-pushed the autocomplete_postalcodes branch from 2ab2149 to 32b6c39 Compare June 2, 2021 18:12

orangejulius force-pushed the autocomplete_postalcodes branch from 32b6c39 to f060140 Compare June 3, 2021 04:09

missinglink merged commit 75aab51 into master Jun 3, 2021

missinglink deleted the autocomplete_postalcodes branch June 3, 2021 08:22

orangejulius mentioned this pull request Jul 19, 2021

added postcode match boost for autocomplete #1542

Closed

orangejulius mentioned this pull request Aug 25, 2021

WOF postal codes not found in autocomplete without full text pelias/pelias#676

Closed

missinglink mentioned this pull request Jan 28, 2022

config: enable/disable prefix matching numerals feature via config flag #1596

Merged

Uh oh!

Enable autocomplete for postalcodes #1514

Enable autocomplete for postalcodes #1514

Uh oh!

Conversation

missinglink commented Feb 17, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

missinglink commented Feb 17, 2021

Uh oh!

missinglink commented Feb 17, 2021

Uh oh!

missinglink commented Feb 17, 2021

Uh oh!

Joxit left a comment

Choose a reason for hiding this comment

Uh oh!

orangejulius commented Feb 17, 2021

Uh oh!

orangejulius commented Feb 17, 2021

Uh oh!

missinglink commented Feb 17, 2021

Uh oh!

orangejulius commented Feb 17, 2021

Uh oh!

missinglink commented Feb 18, 2021

Uh oh!

missinglink commented Feb 18, 2021

Uh oh!

orangejulius commented Feb 18, 2021

Uh oh!

missinglink commented Feb 18, 2021

Uh oh!

missinglink commented Feb 18, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

missinglink commented Feb 18, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

missinglink commented Feb 23, 2021

Uh oh!

missinglink commented Apr 9, 2021

Uh oh!

missinglink commented Apr 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

missinglink commented Apr 26, 2021

Uh oh!

missinglink commented Apr 28, 2021

1x improvement

2x regressions

Uh oh!

missinglink commented Apr 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

missinglink commented Apr 28, 2021

Uh oh!

orangejulius commented Jun 2, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

missinglink commented Feb 17, 2021 •

edited

Loading

missinglink commented Feb 18, 2021 •

edited

Loading

missinglink commented Feb 18, 2021 •

edited

Loading

missinglink commented Apr 9, 2021 •

edited

Loading

missinglink commented Apr 28, 2021 •

edited

Loading