Skip to content

solos/sohutv

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

#sohutv

#关于

http://tv.sohu.com 爬虫

#队列 python deque or redis list 双向链表

#判重 murmur hash redis bitmap

#并发 gevent Pool

#抓取、数据提取 requests lxml 正则表达式

#数据存取 mysql

vid, nid, pid, cover, playlistId, o_playlistId, cid, subcid, osubcid, category, ateCode, pianhua, tag, tvid, pubdate, last, brief, title

#反防抓取策略 使用随机代理

proxy = PROXIES[random.randint()]
r = requests.get(url, stream=False, verify=False, timeout=timeout,                               headers=headers, proxies=proxy)

About

sohu tv crawler

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors