jmahmood/PyCrawler
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|
Repository files navigation
The original PyCrawler is great, but I want it to play nice with Django's ORM (and integrate Beautiful Soup 3.2 while I'm at it)
Instead of taking 4 arguments and being run from the command line, PyCrawler Django has one argument (a URL) and various configuration aids that are done directly in the model file itself.
(They really should be in the settings file, but I didn't want to include a patch file. If you think it's worth it, feel free to message me.
Thanks to theanti9 who put his pycrawler code up; it made my job a lot easier. Take a look at his original version if you don't want any dependencies.
__BUGS__
If you are running this on Ubuntu and getting unicode errors, it's because Ubuntu is using Beautiful Soup 3.101 or some other 3.1x branch version. Do yourself a favour and install 3.2;
sudo apt-get remove beautifulsoup
sudo easy_install beautifulsoup
If you run into anything like this, feel free to message me and I'll try to help you out.