Skip to content

jmahmood/PyCrawler

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 

Repository files navigation

The original PyCrawler is great, but I want it to play nice with Django's ORM (and integrate Beautiful Soup 3.2 while I'm at it)

Instead of taking 4 arguments and being run from the command line, PyCrawler Django has one argument (a URL) and various configuration aids that are done directly in the model file itself.  

(They really should be in the settings file, but I didn't want to include a patch file.  If you think it's worth it, feel free to message me.

Thanks to theanti9 who put his pycrawler code up; it made my job a lot easier.  Take a look at his original version if you don't want any dependencies.

__BUGS__

If you are running this on Ubuntu and getting unicode errors, it's because Ubuntu is using Beautiful Soup 3.101 or some other 3.1x branch version.  Do yourself a favour and install 3.2;

    sudo apt-get remove beautifulsoup
    
    sudo easy_install beautifulsoup
    
If you run into anything like this, feel free to message me and I'll try to help you out.

About

A simple python web crawler - now for Django!

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 100.0%