Jump to content

手册:Pywikibot/PAWS

From mediawiki.org
This page is a translated version of the page Manual:Pywikibot/PAWS and the translation is 76% complete.
参见Wikitech:PAWS以获取更多信息。

This document provides a quick interactive overview of Pywikibot using a notebook hosted on the Wikimedia Cloud Services environment using PAWS (PAWS: A Web Shell).

注意PAWS终端仅支持在基于Chromium的浏览器中复制粘贴。 (Chrome,Opera、Safari和新的Microsoft Edge都可以。) 如果您使用其他浏览器,则只能尝试使用菜单(右键单击),或者您需要手动键入本演练中提到的命令。 您还可以使用内容命令创建一个bash文件,并在终端中调用bash file.sh
警告 警告: PAWS上运行的一切notebook及终端都可能在无提前通知下被终止。 You are encouraged to run your task on other places (such as Toolforge) if your task lasts for more than hours.

创建维基媒体账户

To follow this walk-through, you only need a Wikipedia/Wikimedia account. Use Special:CreateAccount to create one.

Once you have created an account, please visit https://test.wikipedia.org/ and check that your username appears in the top right corner (this works around 工單T120327).

If you are a new user on Wikimedia log in with your account on Meta-Wiki, Wikipedia, Wikidata, and Commons. And in each of them read and delete all pending messages you have (on the top).

运行notebook

要启动托管notebook,请访问https://hub-paws.wmcloud.org/hub

单击“使用MediaWiki登录”,然后在要求批准“使用OAuth进行身份验证”时单击“允许”。 首次访问PAWS时,需要创建服务器。 单击绿色的“启动我的服务器”按钮。 新服务器等待几分钟才能启动是正常的。

完成后,您将被重定向到https://paws.wmflabs.org/paws/user/<username>/tree这样的链接

运行终端

要开始一个新的互动终端,

  1. 前往你的PAWS home
  2. 点击:文件 > 新建 > 终端

这将打开一个新窗口,并带有Linux '$'提示符。

此终端不是模拟器。它是一个真正的bash shell,作为Docker容器中的真正安装的Linux一部分,因此您可以使用任何bash命令,并使用已安装的Linux上可用的任何命令。

要查看一些可用的命令,请使用ls /bin/.。

$ ls /bin/
bash         cat            domainname  journalctl  mkdir          pwd         stty                            tar           zcmp
unzip2       chacl          echo        kill        mknod          rbash       su                              tempfile      zdiff
../..
$ ls /usr/bin/
2to3-3.4                 dvipdf                     lcf                         printf               systemd-path                         
X11                      dwp                        ld                          prlimit              systemd-run
../..

To see them all, press TAB twice.


登录wiki

这将在服务器上建立您的帐户,并允许您从命令行登录。 以下命令应确认您可以登录testwiki。它使用OAuth,因此无需输入密码。

$ pwb.py login
Logging in to wikipedia:test as <username>
Logged in on wikipedia:test as <username>.

您可以通过在$HOME 目录(/home/paws)中创建名为user-config.py的文件并添加mylangfamily变量:

mylang = 'test'
family = 'wikipedia'

You can type vim user-config.py in the terminal, then I to insert text, add the text, then Esc to exist insert mode, then :wq and Enter to finishing editing.


创建一个页面

要创建页面,请在终端中输入以下命令,将“<username>”替换为您的用户名,并在提示接受更改时按“Y”:

$ pwb.py add_text -up -talk -page:"User talk:<username>" -text:"Hello. ~~~~"
Loading User talk:<username>...

>>> User talk:<username> <<<
@@ -0,0 +1 @@
+ Hello. ~~~~

Do you want to accept these changes? ([Y]es, [N]o, [a]ll, open in [b]rowser): Y
Page [[User talk:<username>]] saved

您已完成编辑。在Web浏览器中打开https://test.wikipedia.org/wiki/User_talk:<username>查看更改。

您可以使用'-help'命令行选项阅读有关每个命令行脚本的更多信息。

$ pwb.py add_text -help
...

获取页面

使用“listpages”命令可以获取许多页面。

要获取您在上一章节中创建的页面的内容,请输入以下命令:

$ pwb.py listpages -page:"User talk:<username>" -save
   1 <username>
Saving User talk:<username> to /home/paws/User_talk_<username>
1 page(s) found

现在,如果您运行$ ls,则应该可以找到已保存的页面。

一个真实的脚本示例

When a website used on Wikipedia changes its URL, the links on Wikipedia become outdated, and possible also dead links if the website doesn't redirect from the old URLs to the new URLs. For example, Encyclopedia Britannica (EB) has changed their links, such as moving pages from http://www.britannica.com/EBchecked/media/ to http://www.britannica.com/topic/[topic name]/images-videos/*. 您可以在英语维基百科上找到旧URL的用法列表,网址为https://en.wikipedia.org/wiki/Special:LinkSearch/http://www.britannica.com/EBchecked/media 手动更新所有这些链接将非常耗时。 Thankfully EB has maintained redirects from their old URLs to the new URLs, so this does not need to be fixed immediately.

For a simpler example, English Wikipedia currently contains links to http://britannica.com/EBchecked/ instead of http://www.britannica.com/EBchecked/; i.e. a 'www.' subdomain is missing in the URL.

英语维基百科目前有14个案例:https://en.wikipedia.org/wiki/Special:LinkSearch/http://britannica.com/EBchecked/

Wikipedia in other language also have this problem. e.g. there is one case on German Wikipedia: w:de:Spezial:Weblinksuche/http://britannica.com/EBchecked/

In order to fix those links, we can use Pywikibot replace.py script. In this demo we will use the '-simulate' argument to avoid writing to the wiki, as there are strict rules about automated editing of English Wikipedia.

First, let's list all of the pages which link to http://britannica.com/EBchecked/.

$ pwb.py listpages -lang:en -weblink:"britannica.com/EBchecked/"
   1 Bhatner fort
   2 Mohammad Ishaq Khan
   3 Fringe theories/Noticeboard/Archive 7
   4 El Riego phase
   5 Catalonia/Archive 4
   6 Stephen I of Hungary
   7 Stephen I of Hungary/Archive 1
   8 Väinö Tanner
   9 Tokaji
  10 Transylvania/Archive5
  11 Hungarians in Romania
  12 Transylvania
  13 Uttarakhand
  14 Françoise Giroud
14 page(s) found

Now we check those pages actually have the literal URL in the page; i.e. they are not using a template.

$ pwb.py listpages -lang:en -weblink:"britannica.com/EBchecked/" -grep:"britannica.com\/EBchecked"
   1 Bhatner fort
   2 Mohammad Ishaq Khan
   3 Fringe theories/Noticeboard/Archive 7
   4 El Riego phase
   5 Catalonia/Archive 4
   6 Stephen I of Hungary
   7 Stephen I of Hungary/Archive 1
   8 Väinö Tanner
   9 Tokaji
  10 Transylvania/Archive5
  11 Hungarians in Romania
  12 Transylvania
  13 Uttarakhand
  14 Françoise Giroud
14 page(s) found

现在使用替换添加缺少的“www”。

$ pwb.py replace -lang:en -simulate -weblink:"britannica.com/EBchecked/" -grep:"britannica.com\/EBchecked" "http://britannica.com/EBchecked/" "http://www.britannica.com/EBchecked/"
The summary message for the command line replacements will be something like: Bot: Automated text replacement  (-http://britannica.com/EBchecked/ +http://www.britannica.com/EBchecked/)
Press Enter to use this automatic message, or enter a description of the
changes your bot will make: 
Logging in to wikipedia:en as <username>
Retrieving 14 pages from wikipedia:en.
Retrieving 14 pages from wikipedia:en.


>>> Stephen I of Hungary <<<
@@ -47 +47 @@
- Stephen's birth date is uncertain because it was not recorded in contemporaneous documents.{{sfn|Györffy|1994|p=64}} Hungarian and Polish chronicles written centuries later give three different years: 967, 969 and 975.{{sfn|Kristó|2001|p=15}} The unanimous testimony of his three late 11th-century or early 12th-century [[hagiographies]] and other Hungarian sources, which state that Stephen was "still an adolescent" in 997,<ref>''Hartvic, Life of King Stephen of Hungary'' (ch. 5), p. 381.</ref> substantiate the reliability of the later year (975).{{sfn|Györffy|1994|p=64}}{{sfn|Kristó|2001|p=15}} Stephen's ''[[Life of Saint Stephen, King of Hungary (Vita minor)|Lesser Legend]]'' adds that he was born in [[Esztergom]],{{sfn|Györffy|1994|p=64}}{{sfn|Kristó|2001|p=15}}<ref name=Britannica>{{cite encyclopedia|title=Stephen I|url=http://britannica.com/EBchecked/topic/565415/Stephen-I|encyclopedia=[[Encyclopædia Britannica]]|publisher=Encyclopædia Britannica, Inc.|year=2008|accessdate=2008-07-29}}</ref> which implies that he was born after 972 because his father, [[Géza, Grand Prince of the Hungarians]], chose Esztergom as royal residence around that year.{{sfn|Györffy|1994|p=64}} Géza promoted the spread of Christianity among his subjects by force, but never ceased worshipping pagan gods.{{sfn|Kontler|1999|p=51}}{{sfn|Berend|Laszlovszky|Szakács|2007|p=331}} Both his son's ''[[Life of Saint Stephen, King of Hungary (Vita maior)|Greater Legend]]'' and the nearly contemporaneous [[Thietmar of Merseburg]] described Géza as a cruel monarch, suggesting that he was a despot who mercilessly consolidated his authority over the rebellious Hungarian lords.{{sfn|Berend|Laszlovszky|Szakács|2007|p=331}}{{sfn|Bakay|1999|p=547}}
+ Stephen's birth date is uncertain because it was not recorded in contemporaneous documents.{{sfn|Györffy|1994|p=64}} Hungarian and Polish chronicles written centuries later give three different years: 967, 969 and 975.{{sfn|Kristó|2001|p=15}} The unanimous testimony of his three late 11th-century or early 12th-century [[hagiographies]] and other Hungarian sources, which state that Stephen was "still an adolescent" in 997,<ref>''Hartvic, Life of King Stephen of Hungary'' (ch. 5), p. 381.</ref> substantiate the reliability of the later year (975).{{sfn|Györffy|1994|p=64}}{{sfn|Kristó|2001|p=15}} Stephen's ''[[Life of Saint Stephen, King of Hungary (Vita minor)|Lesser Legend]]'' adds that he was born in [[Esztergom]],{{sfn|Györffy|1994|p=64}}{{sfn|Kristó|2001|p=15}}<ref name=Britannica>{{cite encyclopedia|title=Stephen I|url=http://www.britannica.com/EBchecked/topic/565415/Stephen-I|encyclopedia=[[Encyclopædia Britannica]]|publisher=Encyclopædia Britannica, Inc.|year=2008|accessdate=2008-07-29}}</ref> which implies that he was born after 972 because his father, [[Géza, Grand Prince of the Hungarians]], chose Esztergom as royal residence around that year.{{sfn|Györffy|1994|p=64}} Géza promoted the spread of Christianity among his subjects by force, but never ceased worshipping pagan gods.{{sfn|Kontler|1999|p=51}}{{sfn|Berend|Laszlovszky|Szakács|2007|p=331}} Both his son's ''[[Life of Saint Stephen, King of Hungary (Vita maior)|Greater Legend]]'' and the nearly contemporaneous [[Thietmar of Merseburg]] described Géza as a cruel monarch, suggesting that he was a despot who mercilessly consolidated his authority over the rebellious Hungarian lords.{{sfn|Berend|Laszlovszky|Szakács|2007|p=331}}{{sfn|Bakay|1999|p=547}}

Do you want to accept these changes? ([y]es, [N]o, [e]dit, open in [b]rowser, [a]ll, [q]uit): N

...

In PAWS, and any terminal that supports color, the diff of changes will show the added "www." in green text color, making it easier to find the proposed changes.

安装Pywikibot

警告 警告: 不要把密码写在服务器文件中,文件是公开的!

接下来我们将使用PAWS Python会话。

  1. Go to your PAWS home,
  2. click 'New' on the right hand side, and
  3. select 'Python 3'.

这将打开一个新窗口。

在文本框中,输入以下内容,然后在“单元格”菜单中选择“运行”(或按shift + enter运行)。

import pywikibot

下面会出现一个新的文本框。运行以下命令以创建连接到https://test.wikipedia.org/的APISite对象:

site = pywikibot.Site('test', 'wikipedia')

Describe "site" by entering it into the new text box and selecting "Run".

site

它应该会显示

 Out[3]: APISite("test", "wikipedia")

创建页面对象:

page = pywikibot.Page(site, 'test')

通过运行检查它是否存在:

page.exists()

它应该输出

 VERBOSE:pywiki:Found 1 wikipedia:test processes running, including this one.
 Out[5]: True

在页面上显示文字:

page.text

更改对象中的页面文本:

page.text = 'Hello world'

将页面保存到维基:

page.save()

响应应该是:

Page [[Test]] saved
INFO:pywiki:Page [[Test]] saved

交互式Python 3笔记本允许许多行一起运行。 以上内容可以放在一个文本框和Run中

import pywikibot

site = pywikibot.Site('test', 'wikipedia')
page = pywikibot.Page(site, 'test')

page.text = 'Hello world!'
page.save()

可以保存或下载交互式Python会话的日志以供将来参考。

访问PAWS在线文档

Pywikibot documentation may be found at wmdoc:pywikibot. It is primarily sourced from docstrings, which can be loaded in the interactive Python 3 notebook using the Python built-in function help().

例如,要查看上面save方法的参数,请运行以下任一方法:

help(page.save)

help(pywikibot.Page.save)

编辑Pywikibot脚本

Pywikibot库和脚本位于/srv/paws中,并且是只读的。无法在PAWS中修改已安装的Pywikibot库。

将脚本复制到PAWS主页后,可以进行修改脚本。

例如,要运行修改后的“checkimages.py”:

  1. 在终端中,输入cp /srv/paws/pwb/scripts/checkimages.py ~
  1. In a browser, go to your PAWS home and click on the file checkimages.py.
  1. 在浏览器中,您可以编辑该文件。 編輯程式碼 -- 例如,在第1775行的start = time.time()程式碼後,新增第1776行,將會輸出你的名字:print("MYNAME's version.")
  2. 在编辑界面中,使用“文件”菜单并单击“保存”以保存修改。
  3. 在终端中,输入pwb.py ~/checkimages.py -simulate (If no '-limit:x' defined, the program would run until all images checked, it may take long time.)

参见


If you need more help on setting up your Pywikibot visit the #pywikibot IRC channel 連線 or pywikibot@ mailing list.