Collect all the jokes from the web site, and training all the data corpus combing the penny's score and machine learning algorithm
- crawler module. as JokeCrawler
- score and ranking for this joke. as JokeRanker
- search and display this joke. as JokePenny
-urllib2,urlparse may be some necessary low-level library to crawler on the internet.
-scrapy for consideration of efficiency or con-concurrence, we may need some excellent web crawler framworks.
-**navie bayes algorithm since it almost the efficent classification algorithm.
-lda latent Dirichlet allocation related clusering algorithm may be needed to new joke topic discovery.
Copyright © 2013 LengerFulluse, Inc. http://lengerfulluse.github.com/
The program is distributed under the terms of the GNU General Public License (or the Lesser GPL).