Skip to content

Conversation

@agnelvishal
Copy link
Contributor

Sometimes author names are included inside span and a. So modified the code.
Can also include div if necessary

Sometimes author names are included inside span and a. So modified the code. 
Can also include div if necessary
Often author name is included in text of 'span' and 'a'. So author name is searched there too
@codelucas
Copy link
Owner

Great find @agnelvishal, looking!! 💯👍

Copy link
Owner

@codelucas codelucas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comments, this is a great change but have some suggestions

if match.tag == 'meta' or match.tag == 'span' or match.tag == 'a':
mm = match.xpath('@content')
if not mm:
mm=str(match.text_content()).split()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confident that this will increase the recall but will precision take a hit?

I am going to approve this but can you also add some urls where you found that these changes were helpful

if match.tag == 'meta' or match.tag == 'span' or match.tag == 'a':
mm = match.xpath('@content')
if not mm:
mm=str(match.text_content()).split()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: python style, please add spaces between the = on line 154

@agnelvishal
Copy link
Contributor Author

agnelvishal commented Nov 30, 2018

if len(content) > 0:
Could modify as if len(content) > 0 and len(content) < 30: to get better precision.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants