Wikipedia talk:AutoWikiBrowser/Regular expression/Archive 1
|
|
This page has archives. Sections older than 180 days may be automatically archived by Lowercase sigmabot III when more than 1 section is present. |
Ref formula
[edit]According to the "Metacharacter" table at Wikipedia:AutoWikiBrowser/Regular expression#Regular expression definitions, the less than sign < and the greater than sign > must be "escaped", that is, preceded by a backslash \. However, according to Wikipedia:AutoWikiBrowser/Regular expression#Token matching, several formulas are given that don't escape < or >. When I tried the second of those formulas, <ref[^>]*>([^<]|<(?!/ref>))+</ref> it works the same with or without escaping. Neither version works very well for its intended purpose of finding references, because it doesn't account for ref names (example: <ref name=maleev1955a/> ), so the formula will recognize the first part of a ref name without recognizing its end, including normal prose until it eventually gets to the end of a more normal <ref> ... </ref> reference. At the moment I think the solution is to add a front slash to the formula so it will reject ref names, like this: <ref[^>/]*>([^<]|<(?!/ref>))+</ref> and also to remove > and < from the list of characters that must be escaped. While I'm at it, \n for new line doesn't work on Wikipedia. Is anybody still reading this talk page? Art LaPella (talk) 06:13, 22 March 2010 (UTC)
Using Groups
[edit]Can anyone please teach me how to use groups? I don't understand how to use $1, $2, $3, etc. For example if I have a string like:
prefix|word1|word2|suffix
then how can I transform it into
prefix|word2|word1|suffix
I allways have the same prefix and suffix, that would make the script easier, I think.
thanks Ark25 (talk) 20:56, 30 April 2010 (UTC)
- I found it:
- prefix\|(.*)\|(.*)\|suffix --> prefix|$2|$1|suffix
- I think this kind of replace is common, it should be documented — Ark25 (talk) 21:58, 30 April 2010 (UTC)
Help needed with something
[edit]I don't get this damn bot and the code at all, but I'd like to know how to force it to find a random part of text between two common parts of code and remove it. The part of code looks like;
|Other = text
}}
Now I tried somethings but i would totally rampage the entire article. Is there a regular expression that will force it to only pick the part of random text and remove it? I want that whole line gone and replaced with just }}. --Light Daxter (talk) 21:16, 28 June 2010 (UTC)
- Did you mean |Other = text }} or |Other = text
- }} ? The first is easy; the only hard part is escaping the metacharacters (see the Wikipedia:AutoWikiBrowser/Regular expression table entitled "Metacharacters (must be escaped)", to get \|Other = text \}\}. The second is harder. Use the Multiline parameter, but that doesn't always do what I expect. I'll experiment with it if that's what you wanted. Art LaPella (talk) 23:12, 28 June 2010 (UTC)
The latter is what I need, I hope you can help me out! --Light Daxter (talk) 12:25, 29 June 2010 (UTC)
- Yes. Before I explain how, first I'll ask again if that's what you really want. Why would you want to remove the last parameter of a template, only if the }} is encoded on the next line? Perhaps you meant to remove the last parameter whether or not the }} is encoded on the last line.
- PROBLEM 1: This changes |Other = text
- }} to just }}, ONLY if it is encoded on two lines.
- SOLUTION 1: In AWB, click "Options", "Find and Replace" Enabled, and "Normal Settings" to get to the Find & Replace menu. Under "Find", type (\|Other\s*=\s*text\s*\<br\>)$ or you may want instead to type it without the \<br\> to make it (\|Other\s*=\s*text\s*)$ Under "Replace", type $1LINE ENDS HERE Among the 6 check boxes to the right, check "Regex" and "MultiLine", and leave "Enabled" checked. Go back to the "Find" column underneath where you just entered (\|Other\s*=\s*text\s*\<br\>)$ and this time enter \|Other\s*=\s*text\s*\<br\>LINE ENDS HERE.:?\}\} once again omitting the br if desired. In the Replace column underneath $1LINE ENDS HERE type }} In the six check boxes, check "Case Sensitive", "Regex", and "SingleLine", and leave "Enabled" checked. Click "Done". I encoded the brs so I could test it on this talk page.
- P.S. A third Find & Replace line is needed in case the next line of the article isn't a }}. So find LINE ENDS HERE replace with nothing and check "Case Sensitive" (and leave "Enabled").
- PROBLEM 2: This changes |Other = text
- }} to just }}, WHETHER OR NOT it is encoded on two lines.
- SOLUTION 2: Go to the Find & Replace menu as in Solution 1. Under "Find" type (\|Other\s*=\s*text\s*\<br\>\s*)\}\} (or omit the br as before). Under "Replace" type }} Check "Regex" and leave "Enabled" checked. Art LaPella (talk) 00:08, 30 June 2010 (UTC)
I added the }} because the other field is the last one, I thought it'd be easier, but maybe not. I tried both solutions, but neither work. Maybe it is of importance to note I am trying this on wikia, but I doubt it. I don't suppose the amount of spaces between |other, = or the text is of importance either. I don't suppose I can just let you use the bot account and do it that way? --Light Daxter (talk) 14:29, 30 June 2010 (UTC)
- The number of spaces shouldn't matter because I encoded \s* between other, =, and text (and <br>). \s means a space, and * means any number of repetitions including zero. I haven't ever used wikia or bot accounts, but I've seen questions like this more often at Wikipedia talk:AutoWikiBrowser. Art LaPella (talk) 20:28, 30 June 2010 (UTC)
How do I add a line break
[edit]For example using the expression: (\[\[..:.*\]\])(\[\[..:.*\]\]) or for that matter any expression:
(Capture group1)(Capture group2)
and transform into:
(Capture group1) (Capture group2)
I.e. what should I place between $1 and $2 ? and I dont want to place
as I want the effect to be shown in te editor. I tried the documented \n but it doesnt seem to work it just place the text \n betwen the two parts. --79.183.192.85 (talk) 03:50, 20 May 2012 (UTC)
- I haven't gotten /n to work either, and nobody else seems to watch this page, so I recommend Wikipedia talk:AutoWikiBrowser. Art LaPella (talk) 04:27, 20 May 2012 (UTC)
- Thanks, I solved this problem, after noticing that I can simply place each capture group on seperate line, but I;ll try the Wikipedia talk:AutoWikiBrowser as I ahve a several more questions. --79.183.192.85 (talk) 06:43, 20 May 2012 (UTC)
Which engine of Regular Expression does AWB use
[edit]I'm looking for more documentation. Maybe one from the list in Comparison of regular expression engines? Regards, SunCreator (talk) 16:19, 7 August 2012 (UTC)
Error with Template replace
[edit]The page lists \{\{\s*?[Ff]lagicon\s*?\|.*?\}\} as a way to remove {{flagicon|whatever}} from pages, however I have been trying to use this for my own template removal and all it is doing is selecting every sentence with a template and trying to remove it. I am trying to remove {{border-radius|whatever}} from pages, what regex should I use that doesn't remove most of the page? 80.47.36.24 (talk) 15:53, 6 July 2013 (UTC)
- I have tried \{\{\s*?[Bb]order-radius\s*?\|.*?\}\} and \{\{\s*?[Bb]order\-radius\s*?\|.*?\}\} and \{\{\s*?[Bb]order(?:-)radius\s*?\|.*?\}\} all have the same result. 80.47.36.24 (talk) 16:06, 6 July 2013 (UTC)
Please help with replacement
[edit]On my wiki-project I want to massively change
|[[NBA in season 20х|20х]] || [[LA Lakers]]
at
| style="background-color:#FFD700" | [[NBA in season 20х|20х]]
| style="background-color:#FFD700" | [[La Lakers]]
x-01, 02, 03 ... 13
How to replace?--5.139.240.78 (talk) 21:22, 31 October 2013 (UTC)
- Find:
\|(\[\[NBA in season 20\d{2}\|20\d{2}\]\]) \|\| \[\[LA Lakers\]\]
- Replace:
| style="background-color:#FFD700" | $1 | style="background-color:#FFD700" | [[La Lakers]]
- Good luck! GoingBatty (talk) 23:47, 3 November 2013 (UTC)
[\s]*
[edit]I've seen in an error message that the regexp used for template parameters rule is:
(\|[\s]*)parameter([\s]*=)
Is there a difference in .NET regexp between \s*
and [\s]*
?
I usually use the first simple \s*
but if the latter is "better" in any way i'd like to know when I should use it.
Zebulon84 (talk) 15:13, 5 November 2013 (UTC)
Help searching for dates
[edit]I'm trying to use the database scanner to search for dates, in particular, in the range from 0000-01-01 to 1599-12-31, with the date in that format. I ran the database scanner and received article names that didn't seem to have anything that matched the first pattern I tried (since I didn't save that pattern, I won't waste your time on something that might have a typo). So I tried the AWB Regex Tester, and tested the following pattern:
[0-1][0-5]\d\d-[0-1]\d-[0-3]\d
I then put some text to search that seemed like it should cause some hits, but the tester said there were no matches:
2014-02-01 upchuck
1492-01-23 bozo
When in the course of human events
0012-12-25
So what am I missing? Jc3s5h (talk) 00:09, 2 February 2014 (UTC)
- Never mind, I see one error in the regular expression (first digit of month could also be 2 or 3), although that doesn't really explain why it didn't work. Jc3s5h (talk) 02:36, 2 February 2014 (UTC)
- Well... You could get much more specific given your date range. I assume that you are saying it is YYYY-MM-DD (i.e. with zero padding).
(1[0-5]\d\d|0[1-9]\d{2}|00[1-9]\d|000\d)-(0[1-9]|1[012])-(0[1-9]|[12]\d|3[01])
- That would limit it such that it does not find things like 1289-13-27, or 1289-01-45. Looking at it again, given that you are including the year 0000, the year construct is overly complicated. It could be:
(1[0-5]\d\d|0\d{3})-(0[1-9]|1[012])-(0[1-9]|[12]\d|3[01])
- Makyen (talk) 07:05, 2 February 2014 (UTC)
- Thanks. I'll try that next time. I ended up using this:
[^0-9][0-1][0-5]\d\d-[0-1]\d-[0-3]\d[^0-9]
- That prevented picking up various long serial-number-like things that happened to have a date-like sequence in the middle. Jc3s5h (talk) 14:32, 2 February 2014 (UTC)
replace years
[edit]I need to make Regular expression to link years, for example: 1966 → [[1966]] --Ibrahim.ID »» 19:42, 12 February 2014 (UTC)
- @Ibrahim.ID: Since not all numbers are intended as years, the solution depends on the range of years you want to search for. The following regex find/replace script should find and put square brackets around all numbers between 1900 and 2099:
- (19|20)([0-9][0-9]) → [[$1$2]]
- Though note that, in general, years shouldn't be linked according to the Manual of Style. SiBr4 (talk) 19:56, 16 February 2014 (UTC)
- I suggest that you qualify that regular expression a bit. As it is, it will match any four digits (e.g. a sequence such as 2731934878374 → 273[[1934]]878374). While your selector could get more complicated depending on what you were actually desiring to do, the following will limit matches to those where the four digits in the 1900–2099 range are surrounded by a word boundary:
- \b(19|20)([0-9][0-9])\b → [[$1$2]]
- or (an alternate construction of the same thing):
- \b((19|20)(\d\d))\b → [[$1]]
- These both result in:
- 2731934878374 2013-04-19 1966 2013/04/19 1966 → 2731934878374 [[2013]]-04-19 [[1966]] [[2013]]/04/19 [[1966]]
- @Ibrahim.ID:, I would second the issue @SiBr4: mentioned regarding the the Manual of Style being against the general linking of years which you say you desire. Your request for a regular expression to perform year linking implies a wholesale change to linking all years on a page, perhaps across multiple pages. Doing so is against the policy stated in WP:YEARLINK. On the other hand, it is possible that you are making changes, somewhere other than enwiki, where such linking is desirable. Makyen (talk) 21:39, 16 February 2014 (UTC)
- @Makyen: I noticed that but didn't know how to fix it (I'm pretty new to regex too), so thanks for that. SiBr4 (talk) 21:46, 16 February 2014 (UTC)
- You could further simplify it by removing one set of parentheses:
- \b((19|20)\d\d)\b → [[$1]]
- or even do this:
- \b((19|20)\d{2})\b → [[$1]]
- Hope this helps! GoingBatty (talk) 23:49, 16 February 2014 (UTC)
- Done
\b((19|20)\d\d)\b → [[$1]]
, it helps and working well, thank you and thanks for all Ibrahim.ID »» 01:31, 17 February 2014 (UTC)
- Done
- @Makyen: I noticed that but didn't know how to fix it (I'm pretty new to regex too), so thanks for that. SiBr4 (talk) 21:46, 16 February 2014 (UTC)
changing case
[edit]I thought I saw s.t. on changing case, but I can't find it. Is there an easy way to sub [A-Z] with [a-z] or vice versa? — kwami (talk) 03:47, 25 February 2012 (UTC)
- I haven't found it. My workaround is to have my software give me messages to fix something myself.
- Example, enforcing the "sentence case" clause at MOS:HEAD:
- Before: ==Rules and Regulations==
- My software changes it to: ==Rules and QRegulations==
- That makes the software highlight the line so I notice it. If it's a false positive (proper nouns are common) then I manually remove the Q. In this case, I manually remove the Q and then change "R" to "r". Art LaPella (talk) 05:42, 25 February 2012 (UTC)
- I thought there was s.t. w the magic word 'lc': you would put in s.t. like {{lc:$1}}, and the output would be in l.c. It was exceedingly simple. — kwami (talk) 06:41, 25 February 2012 (UTC)
- Not in my experience. You could try Wikipedia talk:AutoWikiBrowser. Art LaPella (talk) 20:23, 25 February 2012 (UTC)
- I thought there was s.t. w the magic word 'lc': you would put in s.t. like {{lc:$1}}, and the output would be in l.c. It was exceedingly simple. — kwami (talk) 06:41, 25 February 2012 (UTC)
- Okay. Thanks. — kwami (talk) 03:26, 27 February 2012 (UTC)
Although AWB does not support changing regex hits to upper case directly (unlike eg perl regexes it's not possible just to add eg \U in the replacement string), the same result can be achieved by using subst and MediaWiki's template syntax in the replacement:
- {{subst:uc:$1}}
Hope this helps. Jheald (talk) 22:42, 6 August 2014 (UTC)
newline in \W?
[edit]Do \W
and \D
include characters \n, \t (newline, tab; maybe more)? -DePiep (talk) 10:03, 2 November 2014 (UTC)
- A quick test in AWB's regex tester shows that these do match whitespace (at least newlines, spaces and tabs). SiBr4 (talk) 11:17, 2 November 2014 (UTC)
- Thanx. Adjusted the page. Testing by myself was too tricky. -DePiep (talk) 17:53, 2 November 2014 (UTC)
Can we match undefined text?
[edit]I'd like to consolidate lines in a list, where most form nearly identical pairs or triplets. For example, there may be:
- A(1)B
- A(2)B
- C(1)D
- C(2)D
- C(3)D,
and I'd like to replace those with:
- A(1,2)B
- C(1,2,3)D.
Since A,B,C,D are not predefined, I need AWB to see if the text before the parenthesis in one line matches the text in the next line. Is that possible? — kwami (talk) 03:17, 5 December 2014 (UTC)
- @Kwamikagami: Probably - could you please give an example? Thanks! GoingBatty (talk) 00:02, 7 December 2014 (UTC)
- Yes. At User:PotatoBot/Lists/Glottolog_log#Multiple ISO codes, but no glottocode, language articles are listed multiple times. Go down a screen or two; I consolidated the first few manually. E.g., lines 72–80, 'Central Banda language', should be one line, with the stuff in parentheses merged. See line 65 'Buyang language' for one I merged manually. — kwami (talk) 01:46, 7 December 2014 (UTC)
- @Kwamikagami: Try setting up an advanced find and replace rule as follows:
- Find:
#\[\[(.*?)\]\] \((.*?)\)\: (.*?)\n#\[\[\1\]\] \((.*?)\)\: \3\n
- Replace:
#[[$1]] ($2, $4): $3\n
- Find:
- Be sure you check the "Regular expression" box, and try setting Apply No. of times to "10". GoingBatty (talk) 02:29, 7 December 2014 (UTC)
- Perfect! I had no idea you could use \digit that way. I'll add it to the instructions. — kwami (talk) 03:48, 7 December 2014 (UTC)
- @Kwamikagami: Try setting up an advanced find and replace rule as follows:
- Yes. At User:PotatoBot/Lists/Glottolog_log#Multiple ISO codes, but no glottocode, language articles are listed multiple times. Go down a screen or two; I consolidated the first few manually. E.g., lines 72–80, 'Central Banda language', should be one line, with the stuff in parentheses merged. See line 65 'Buyang language' for one I merged manually. — kwami (talk) 01:46, 7 December 2014 (UTC)
Probably a simple regex change...
[edit]Hi, This is probably a very simple regex replacement but I can't seem to get it working. How would you go about replacing something like ''[[Bart Simpson 10]]'' to {{BS|10}}? Thanks in advance, Solar Dragon (talk) 17:35, 2 February 2015 (UTC)
- For that exact change, replace
\[\[Bart Simpson 10\]\]
with{{BS|10}}
. If with "something like" you mean you want to replace a range of strings with a single regex, you would need to clarify. SiBr4 (talk) 18:32, 2 February 2015 (UTC)- Yeah, I mean a range. For example, links from [[Bart Simpson 1]] to [[Bart Simpson 100]] need to be changed to {{BS|1}} to {{BS|100}}. Solar Dragon (talk) 19:02, 2 February 2015 (UTC)
\[\[Bart Simpson (\d+)\]\]
→{{BS|$1}}
replaces links containing any number (0–∞). To replace strictly only numbers between 1 and 100, a somewhat more complex regex is needed, such as\[\[Bart Simpson ([1-9]\d?|100)\]\]
→{{BS|$1}}
. SiBr4 (talk) 19:29, 2 February 2015 (UTC)
- Yeah, I mean a range. For example, links from [[Bart Simpson 1]] to [[Bart Simpson 100]] need to be changed to {{BS|1}} to {{BS|100}}. Solar Dragon (talk) 19:02, 2 February 2015 (UTC)
Month name to digits
[edit]Is there a quick way to detect a full month name and then convert it to the month number?
e.g say I was converting 26 September 1850 to use the {{start date}} template, the template requires the month to be a number, short of doing a regex check of if month x then number y, is there a quicker way? - X201 (talk) 16:06, 1 April 2015 (UTC)
Greed
[edit]I am searching for \<div class=.*references-small.*\>((.|\n)*)\<\/div\>
and replacing it with {{refbegin}}$1{{refend}}
. The problem is that when there are multiple instances of the text:
<div class="references-small"> content </div> <div class="references-small"> content </div> <div class="references-small"> content </div>
Then only the outer use is replaced:
{{refbegin}} content </div> <div class="references-small"> content </div> <div class="references-small"> content {{refend}}</nowiki>
Thoughts? -- Gadget850 talk 12:08, 1 May 2015 (UTC)
- Add
?
after(.|\n)*
, so you end up with\<div class=.*references-small.*\>((.|\n)*?)\<\/div\>
- X201 (talk) 13:10, 1 May 2015 (UTC)
- Perfect! Thanks! -- Gadget850 talk 13:24, 1 May 2015 (UTC)
Match whole page
[edit]How to match entire content of a page (via regex) to substitute it with a predefined static variable (template/phrase)? XXN (talk) 11:13, 21 April 2014 (UTC)
- Catch regex "." (without the quotes) -- Magioladitis (talk) 12:26, 21 April 2014 (UTC)
- "." can only catch one line, can't it? According to the regex guide, "." matches any character except newlines. So if you set Find&Replace to ".* → Foo", it replaces every line with "Foo". ". → Foo" even replaces every single character to "Foo". If I understand the original question correctly, XXN wants to replace an entire page with one instance of some text. SiBr4 (talk) 13:12, 21 April 2014 (UTC)
- @SiBr4 and XXN: What about "(.\n*)*" then? -- Magioladitis (talk) 15:31, 21 April 2014 (UTC)
- Yesterday i also created several expressions using dot, paratheses and asterisk, but no one of them worked correctly.
- @Magioladitis,
(.\n*)*
works when replacing with nothing. But, if i put in ”replace” field some text - it replace each line in part with this text, not entire page:( XXN (talk) 15:43, 21 April 2014 (UTC)- XXN Replace with nothing and then prepend text. Not perfect but it works. -- Magioladitis (talk) 16:06, 21 April 2014 (UTC)
- Good idea. Thank you! // XXN (talk) 16:40, 21 April 2014 (UTC)
^.*$
→Foo
- with the "SingleLine" checkbox checked works. The "SingleLine" checkbox explicitly makes "." match newlines. — Makyen (talk) 20:04, 21 April 2014 (UTC)
- Perfect! Thank you! --XXN, 09:22, 26 August 2015 (UTC)
- Good idea. Thank you! // XXN (talk) 16:40, 21 April 2014 (UTC)
- XXN Replace with nothing and then prepend text. Not perfect but it works. -- Magioladitis (talk) 16:06, 21 April 2014 (UTC)
- @SiBr4 and XXN: What about "(.\n*)*" then? -- Magioladitis (talk) 15:31, 21 April 2014 (UTC)
Regex for film article citations
[edit]There are a lot of citations of old newspaper film articles that have the form:
- <ref>Article title<newline>author's name. newspaper (1923-Current File) [newspaper location] dd mmm yyyy: pageno. </ref>
For example, from Red Mountain (film):
<ref>Trevor Howard Signed for Allen Film; Ladd Again Hero of Outdoors
Schallert, Edwin. Los Angeles Times (1923-Current File) [Los Angeles, Calif] 27 Sep 1950: 19. </ref>
I'd like to build a regex that captures the components so I can build a {{cite news}} from them. (Don't ask me what the (1923-Current File) is about - I have no idea and I've received no answer from the editor who added them - it seems useless and I plan to drop it). However, I can't work out how to write a regex that will capture expressions of this form without picking up a lot of other things, or spreading its capturing across multiple <ref>s. Any suggestions from you regex aces out there? Of course the pattern varies slightly with regard to number of embedded spaces, capitalisation, hyphen/endash, and so on, but if someone could show me the essential form, I can tweak it myself to cope with those variations. Thanks. Colonies Chris (talk) 18:00, 12 June 2015 (UTC)
- @Colonies Chris: Try this:
- Find:
<ref>(.*?)\n(\w+), (\w+)\. ([\w ]+)\(1923-Current File\) \[([\w, ]+)\] (\d{1,2} \w{3} \d{4}): (\d+)\.\s*<\/ref>
- Replace:
<ref>{{cite news |title=$1 |last=$2 |first=$3 |newspaper=$4 |location=$5 |date=$6 |page=$7}}</ref>
- Good luck! GoingBatty (talk) 03:35, 28 August 2015 (UTC)
Replacing dates with regex
[edit]Hi, can someone help me with something?
I'm trying to make a regex string to replace dates using AWB. The regex I have so far is:
([0-9]+)([st]|[nd]|[rd]|[th]) (January), ([1,2][0,9][0,1,9][0-9])
Which is replaced to this:
$3 $1, $4
Some of the years are linked, so I also have:
([0-9]+)([st]|[nd]|[rd]|[th]) (January), \[\[([1,2][0,9][0,1,9][0-9])\]\]
replaced to the same. However, this doesn't seem to be working for some reason, although I'm really not sure why. Examples of what I want to replace are 15th April, 1996
and 6th May, 1996
. I have regex code for each month rather than make it all in one line. The code is designed to replace any date between 1st January, 1990 up to 31st December, 2999. I want to get rid of the extensions "th" etc. This isn't for Wikipedia, it's for another wiki, but I've tried everything I can think of and I just can't get it working. Thank you, Solar Dragon (talk) 18:35, 9 October 2015 (UTC)
- Making only minimal changes to your regex, rather than suggest a totally different one,
([st]|[nd]|[rd]|[th])
→(st|nd|rd|th)
. --Unready (talk) 02:34, 10 October 2015 (UTC) - Also, for the year, the commas should be removed (within
[...]
they match literal commas). Your current year regex wouldn't match all years between 1990 and 2999, however; it matches any year in the intervals 1000–1019, 1090–1099, 1900–1919, 1990–2019, 2090–2099, 2900–2919 and 2990–2999. A better regex that matches exactly the years 1990–2999 could be(199\d|2\d{3})
. SiBr4 (talk) 10:05, 10 October 2015 (UTC)- Thanks for the help. I actually managed to figure it out myself in the end though. Solar Dragon (talk) 12:54, 11 October 2015 (UTC)
Converting wikitables help needed
[edit]I'm trying to transform some wikitables into templates on TV episode pages using AWB. In doing so, I only want the tables in certain sections changed (e.g., sections starting with "Episode" or "Series"). I'm trying to approach it by having the script search for that section header until the start of a wikitable, and replace the stuff above the wikitable. I'm rather new to RegEx so kindly bear with any ignorance on my part. What I currently have is a shitton of subrules addressing possible parameters of {{Episode list}} which is generally working, but getting the first part to only address wikitables in episode lists (and not, say, a wikitable of a song list within the same article like on My_Little_Pony:_Friendship_Is_Magic_(season_1)).
The code I was trying is searching for ((\=)*\s*(Episode.*|Season.*|Series.*|Pilot.*)(\n.*)*)(\n)(\<onlyinclude\>)?(?:\{\|)(?:\s*class\s*\=\"\s*wikit.*)(\n\s*\|\+.*)?(\n\s*\|\-.*)?
and replacing with $1$5{{Episode table
. AWB has crashed on me twice while I've fiddled with the coding so I figured I should come here and ask.
Is there maybe a way to do if/then statements in regex? Or have it only work in certain sections of the article? Any help would be appreciated. Please ping my in any replies. Thank you. EvergreenFir (talk) Please {{re}} 19:05, 7 April 2016 (UTC)
Character class for all uppercase letters in \w?
When parsing Vancouver style authors (and in other scenarios) I need to validate capitals, but [A-Z]
doesn't cover diacritics. \w
covers diacritics, but is case insensitive. I've searched a bit but I can't find anything suitable, short of putting every possible diacritical uppercase character in square brackets []
... Is there a better way? ~ Tom.Reding (talk ⋅dgaf) 15:01, 21 May 2016 (UTC)
- Has been answered here, for those interested. ~ Tom.Reding (talk ⋅dgaf) 23:05, 26 May 2016 (UTC)
$1 followed by 0
Hi,
On a book on the French Wikisource, I'd like to replace bad date like 161O (with the letter O instead of the number 0) so I have the regex 16([0-9])O that I want to replace with 16$10 but it doesn't work (logically the software seems to understand $10 instead of $1 follow by 0).
Any idea how to solve that ? (for the moment, I do without capturing groups 160O -> 1600, 161O -> 1610, etc. it works but it's not really efficient)
Cdlt, VIGNERON * discut. 20:46, 30 July 2018 (UTC)
- @GoingBatty: any idea? Cheers, VIGNERON * discut. 13:34, 9 September 2018 (UTC)
- @VIGNERON: Try this:
16([0-9])O → 16${1}0
- Hope this helps! GoingBatty (talk) 13:53, 9 September 2018 (UTC)
- @VIGNERON: Try this:
Regex for infobox values
Hello people,
I was trying to make some edits on my home wiki (bswiki), and tried to make some changes, but didn't succeed in particular task: How to change:
{{Infobox biography |day of birth = 6 |month of birth = July |year of birth = 1975 }}
to:
{{Infobox person |birth_date = {{birth date and age|1975|7|6}} }}
Other parameters I successfully changed, except this one. Thanks in advance. :) --Munja (talk) 17:50, 16 July 2018 (UTC)
- @Munja: Try this:
\{\{Infobox biography(\n\s*)\|day of birth\s*\=\s*(\d{1,2})\n\s*\|month of birth\s*\=\s*July\n\s*\|year of birth\s*\=\s*(\d{4})(\n\s*)\}\} → {{Infobox person$1|birth_date = {{birth date and age|$3|7|$2}}$4}}
- Hope this helps! GoingBatty (talk) 14:17, 9 September 2018 (UTC)
- @GoingBatty: Thanks. But how would this go for each month (not only July). I want to make it parametric also. --Munja (talk) 21:12, 21 September 2018 (UTC)
- @Munja: I would have 12 rules - one for each month. Maybe another editor could provide a parametric solution for you. GoingBatty (talk) 23:11, 23 September 2018 (UTC)
- @GoingBatty: Thanks. But how would this go for each month (not only July). I want to make it parametric also. --Munja (talk) 21:12, 21 September 2018 (UTC)
Match any string in the middle of the pattern, except a certain string
Hi,
I'm correcting the formatting of a grammar book of Biblical Hebrew in Wikisource. It has a lot of references to Bible verses. Most of them are formatted using a template ("GHGbible-ref"), but some are not formatted and appear as 12:34 (pattern: \d+:\d+
).
These that aren't formatted most often refer to the same book of the Bible as the last "GHGbible-ref" template that appeared before them. Some more text may appear between the end of the template and the \d+:\d+
.
Here's example wiki syntax (from a previous version of s:Page:Gesenius' Hebrew Grammar (1910 Kautzsch-Cowley edition).djvu/343; and please try to ignore the very complicated words):
- In ''conditional clauses'' both in the protasis and apodosis, or only in the latter, {{GHGbible-ref|book=Ps|chapter=23|verse=4}} {{GHGheb|text=גַּם כִּֽי־אֵלֵךְ... לֹֽא־אִירָא רָע|translate=yea, though I walk}} (or ''had to walk'')... ''I fear'' (or ''I would fear'') ''no evil''. Very frequently also in an apodosis, the protasis to which must be supplied from the context, e.g. {{GHGbible-ref|book=Jb|chapter=5|verse=8}} ''but as for me, I would seek unto God'' (were I in thy place); 3:13, 16, 14:14 f.
Here's the pattern that I tried to use: (\{\{GHGbible-ref\|book=([^|]+)\|.*\}\}.*)\b(\d+):(\d+)\b
. It's wrong, because it begins matching at "{{GHGbible-ref|book=Ps|chapter=23|verse=4}}". I need it to begin matching at "{{GHGbible-ref|book=Jb|chapter=5|verse=8}}".
I need a way to get the second instance of .*
in this pattern to match everything except "\{\{GHGbible-ref". I guess I need to use something with negative look-ahead or look-behind, but I am failing to find what exactly.
Any help will be appreciated! --Amir E. Aharoni (talk) 18:45, 6 October 2018 (UTC)
- @Amire80: In case you haven't found the solution yet:
- you need to be sure that the first
.*
stay in the template, so I suggest replacing it by[^{}]*
. If the template may contains other template, you can remove the{
if stopping at that template end is not a problem, or use a recursive regex to have balanced braces (ask if needed). - a negative look-ahead should do it for the second one, replace it by
(.(?!\{\{GHGbible-ref\|))*
. If you want to select 3:13 instead of 14:14, use a lazy quantifier*?
. You may also use non capturing group(?:)
(a bit less readable, but it don't mess with replacement string).
- you need to be sure that the first
- →
(\{\{GHGbible-ref\|book=([^|]+)\|[^{}]*\}\}(?:.(?!\{\{GHGbible-ref\|))*?)\b(\d+):(\d+)\b
- --Zebulon84 (talk) 07:54, 24 October 2018 (UTC)
Appeal for information
How do I get rid of this and put back importScript('User:Ohconfucius/script/EngvarB.js'); in workable form? Please. Keith-264 (talk) 20:24, 5 November 2018 (UTC)
Reordering of template parameters and spaces
Does someone have code that reorders infobox fields and spaces per the template list? I have a template that I need to remove unused parameters and rename deprecated parameters to their correct one, so while doing that, I'd also like to clean the infoboxes at the same time. Would appreciate any help here. --Gonnym (talk) 09:49, 3 April 2019 (UTC)
Removing a specific empty parameter
Loading from Category:Pages using duplicate arguments in template calls trying to replace in {{Infobox settlement}}:
\|[ ]{0,1}image_skyline[ ]{0,40}=[ ]{0,40}$
replace with 'nothing'
|image_skyline = |image_skyline = Aguiar Paraíba.jpg
result
| = | = Aguiar Paraíba.jpg
How to fix? TerraCyprus (talk) 15:42, 7 October 2020 (UTC)
- Can you use \n or \r instead of $ to represent the newline character? You are looking for empty image_skyline parameters. – Jonesey95 (talk) 15:55, 7 October 2020 (UTC)
- Thank you, even better, so the whole line is removed. Now it matches | and = and removes the whole line as wanted. But still it matches wrongly in the line that has a value, second line in this example:
| settlement_type = | = Aguilar de Montuenga - 009 (33705333192).jpg | image_size =
TerraCyprus (talk) 16:00, 7 October 2020 (UTC)
@TerraCyprus: Try this.\|( *?)image_skyline( *?)=( *?)\n
- X201 (talk) 08:20, 8 October 2020 (UTC)
Searching in wiki with a suffix
Consider the text "abcd/lmno". What regex can be used to retrieve a list of pages having the suffix "/lmno"? Adithyak1997 (talk) 11:30, 2 October 2020 (UTC)
- @Adithyak1997: Have you tried searching with
lmno insource:/\/lmno/
? GoingBatty (talk) 21:17, 25 December 2020 (UTC)- @GoingBatty: I am not sure whether its my fault, your code actually does not receive the results which contains prefix/suffix. Eg: If we use
layout insource:/\/layout/
, then it retrieves Box Layout, but it does not retrieve Wikipedia:Pages/Layout (example only). Adithyak1997 (talk) 08:36, 26 December 2020 (UTC)- That's because you need to use
intitle
, notinsource
. While your made-up example doesn't work, a search for intitle:/\/ab[^c]/ shows what would be expected. Primefac (talk) 10:17, 26 December 2020 (UTC)- @Adithyak1997: Sorry for misunderstanding your request. I thought you were looking for a list of pages that contained text with the suffix "/lmno". @Primefac: Thank you for answering the question regarding the names of the pages. GoingBatty (talk) 14:43, 26 December 2020 (UTC)
- Thanks a lot for the solution. Adithyak1997 (talk) 15:28, 26 December 2020 (UTC)
- That's because you need to use
- @GoingBatty: I am not sure whether its my fault, your code actually does not receive the results which contains prefix/suffix. Eg: If we use
Remove unsupported infobox parameter
Hi everyone! I'm trying to remove the unsupported |format=
parameter from {{Infobox song}} when making other changes. I tried using:
(\{\{Infobox song[^}]+)\|\s*format\s*\=\s*[^\|]+\|
→$1|
However this doesn't work if an earlier parameter contains a template (such as {{cite web}}), and only partially removes the parameter value if |format=
contains a piped link or a template (such as {{hlist}}). Has anyone already figured out a way to remove an infobox parameter? Thanks! GoingBatty (talk) 20:11, 25 December 2020 (UTC)
- If that's the only parameter you want to remove, create a Module in AWB, and in the stock/preloaded text, just replace the
ArticleText =...
line withArticleText = WikiFunctions.Tools.RemoveTemplateParameter(ArticleText, "infobox song", "format");
. You'll probably want to change the Summary to "" (i.e. blank) as well. Then just check the "enable this module" box, click the "Make module" button, and away you go. Primefac (talk) 20:17, 25 December 2020 (UTC)- @Primefac: Thanks for the quick reply! I receive an error when trying to load a custom module, so I opened a separate discussion here. Thanks! GoingBatty (talk) 21:14, 25 December 2020 (UTC)
- @Primefac: I see you have PrimeBOT working on this now - thank you! Is this how you have the bot configured? GoingBatty (talk) 05:00, 31 December 2020 (UTC)
- At the core of it, yes (I have a lot of other stuff included as well for formatting etc). One of these days I'll probably post my module code somewhere. Primefac (talk) 13:13, 31 December 2020 (UTC)
url update
I've found that the NRHP documents in Arkansas have moved. These are cited in hundreds of article references. If someone can help with the regex, I would like to update the urls so the ref links work again (the old url now goes to a "landing" page where it would be preferable to go to the document as it did before).
I need to change the url
from: http://www.arkansaspreservation.com/National-Register-Listings/PDF/VB0202.nr.pdf
to: https://www.arkansasheritage.com/docs/default-source/national-registry/vb0202-pdf
Thanks, MB 01:39, 19 February 2021 (UTC)
- @MB: If it's always only the four digit number to be converted, try:
http\:\/\/www\.arkansaspreservation\.com\/National-Register-Listings\/PDF\/VB(\d{4})\.nr\.pdf
→https://www.arkansasheritage.com/docs/default-source/national-registry/vb$1-pdf
- Happy editing! GoingBatty (talk) 04:06, 19 February 2021 (UTC)
- GoingBatty, that didn't quite work because it's not just the four-digit number, it needs to handle the six-character field including the two alpha string before the number. That seems to be a two-letter code for the county, so it can be anything. Can you give me another update to try? MB 04:33, 19 February 2021 (UTC)
- @MB: Hmmmm...I don't think the solution at Wikipedia talk:AutoWikiBrowser/Regular expression/Archive 1#changing case will work from within a reference. GoingBatty (talk) 05:09, 19 February 2021 (UTC)
- @MB: You could have 75 rules - one for each county in Arkansas. Maybe someone else can provide a more elegant solution. GoingBatty (talk) 05:13, 19 February 2021 (UTC)
- GoingBatty The case doesn't matter in a url. I just verified that with these by manually changing the case in the browser address window. We just need to have the same two letters in any case. MB 05:18, 19 February 2021 (UTC)
- @MB: Yay! Based on that, try:
- GoingBatty The case doesn't matter in a url. I just verified that with these by manually changing the case in the browser address window. We just need to have the same two letters in any case. MB 05:18, 19 February 2021 (UTC)
- GoingBatty, that didn't quite work because it's not just the four-digit number, it needs to handle the six-character field including the two alpha string before the number. That seems to be a two-letter code for the county, so it can be anything. Can you give me another update to try? MB 04:33, 19 February 2021 (UTC)
http\:\/\/www\.arkansaspreservation\.com\/National-Register-Listings\/PDF\/(\w{2}\d{4})\.nr\.pdf
→https://www.arkansasheritage.com/docs/default-source/national-registry/$1-pdf
- If you needed to convert to lowercase, Help:Substitution#Limitation has a suggestion for how to perform substitutions inside
<ref>...</ref>
. GoingBatty (talk) 05:21, 19 February 2021 (UTC)- GoingBatty, that seems to work. I've done about 30 so far and they all work now. Thanks for the help. MB 05:39, 19 February 2021 (UTC)
- If you needed to convert to lowercase, Help:Substitution#Limitation has a suggestion for how to perform substitutions inside
How to skip through parameters
I have some basic skills in regex, but I don't know how to skip through parameters. \n*
does not do the job.
I am trying to add the updated template {{BSWW World Ranking|AAA}}
after BSWW Rank =
to Template:Infobox national football team. For example what I am trying to do:
Before:
{{Infobox national football team
...
| FIFA Trigramme = BRA
...
| BSWW Rank = 2
After:
{{Infobox national football team
...
| FIFA Trigramme = BRA
...
| BSWW Rank = {{BSWW World Ranking|BRA}}
It works in most cases when using the following codes:
Find:
([\|]\s*)(FIFA\sTrigramme\s*\=\s*)(\p{Lu}{3})(\n*)(\|\s*BSWW\sRank\s*\=\s*)(\d{1,})
Replace:
$1$2$3$4${5}{{BSWW World Ranking|$3}}
However, if there is at least one parameter between "FIFA Trigramme" and "BSWW Rank" no changes will be made. Is there any solution how to quickly skip through multiple parameters? Maybe I should work with ^ and $ and set to MultiLine? Regards.--User:Tomcat7 (talk) 19:55, 16 August 2021 (UTC)
- @Tomcat7: You could try something like
(\| *FIFA\sTrigramme *\= *)(\p({Lu}{3})([^}]*)(\| *BSSW Rank *\= *)\d{1,}
→$1$2$3$4{{BSWW World Ranking|$2}}
. Happy editing! GoingBatty (talk) 01:14, 17 August 2021 (UTC)- It didn't work, I think there is a bracket missing. But thanks anyway.--User:Tomcat7 (talk) 17:19, 24 August 2021 (UTC)
- @Tomcat7: Oops - one parenthesis too many! Try
(\| *FIFA\sTrigramme *\= *)(\p{Lu}{3})([^}]*)(\| *BSWW Rank *\= *)\d{1,}
→$1$2$3$4{{BSWW World Ranking|$2}}
. Happy editing! GoingBatty (talk) 22:12, 24 August 2021 (UTC)
- @Tomcat7: Oops - one parenthesis too many! Try
- It didn't work, I think there is a bracket missing. But thanks anyway.--User:Tomcat7 (talk) 17:19, 24 August 2021 (UTC)
- I would make a module. If you always have a
|FIFA Trigramme=
param with a value, then useGetTemplateParameterValue
to get the value, which you can then later put into anUpdateTemplateParameterValue
call for the|BSWW Rank=
value. Primefac (talk) 18:53, 24 August 2021 (UTC)- @Primefac: Would you be willing to post a simple template replacement module example at WP:AWB/CM? GoingBatty (talk) 22:12, 24 August 2021 (UTC)
- For the record, there are about four pages that need updating still, so I'm not sure how much more use this will be, but here ya go. It's not elegant, but it works.
- @Primefac: Would you be willing to post a simple template replacement module example at WP:AWB/CM? GoingBatty (talk) 22:12, 24 August 2021 (UTC)
- I would make a module. If you always have a
Template parameter replacement
|
---|
public string ProcessArticle(string ArticleText, string ArticleTitle, int wikiNamespace, out string Summary, out bool Skip)
{
Skip = false;
Summary = "";
string templateName = "";
templateName = "infobox national football team";
// Avoid template redirects
ArticleText = commonName(ArticleText, templateName);
// Update parameter
ArticleText = Tools.NestedTemplateRegex(templateName).Replace(ArticleText, m => paramUpdate(m.Value));
return ArticleText;
}
public static string paramUpdate(string templateCall)
{
string paramVal = "";
paramVal = Tools.GetTemplateParameterValue(templateCall, "FIFA Trigramme");
string newParam = "{{BSWW World Ranking|" + paramVal + "}}";
templateCall = Tools.UpdateTemplateParameterValue(templateCall, "BSWW Rank", newParam);
return templateCall;
}
// Avoid template redirects
public static string commonName(string ArticleText, string templateName)
{
// Switch the template name to the default
List<string> otherNames = new List<string>();
string[] array1 = new string[]{"National football team", "European national under-19 football team", "Infobox official football team", "Infobox National football team", "European national under-21 football team"};
otherNames.AddRange(array1);
foreach(string callName in otherNames)
{
ArticleText = WikiFunctions.Tools.RenameTemplate(ArticleText, callName, templateName);
}
return ArticleText;
}
|
- I'm sure there are better ways to do it, but I build my modules with updating in mind so I tend to think more in subprograms. Primefac (talk) 22:45, 24 August 2021 (UTC)
Regex help
I would like to improve a reference in thousands of NRHP infoboxes. This requires finding the value of the refnum parameter (always 8 digits) which must be there (it's a required parameter for the infobox)
- | refnum = 11000523<ref name="nris">{{NRISref|version=2013a}}</ref>
and adding into the NRISref template that is usually is used to create a ref:
- | refnum = 11000523<ref name="nris">{{NRISref|version=2013a|refnum=11000523}}</ref>
Does this seem do-able this way? MB 01:39, 1 September 2021 (UTC)
- @MB: Hi there! You could start with
(\| *refnum *\= *)(\d{8})(\<ref name\="nris"\>\{\{NRISref\|version\=2013a)(\}\}\<\/ref\>)
→$1$2$3|refnum=$2$4
and keep expanding based on any edge cases this doesn't fix. Hope this helps! GoingBatty (talk) 02:28, 1 September 2021 (UTC)- Well, let me clarify further. The ref may or may not be named. And the tranclusion of
{{NRISref}}
may have other parameters and other values for version=. So to be most general, I want to find the 8-digit refnum and then, if NRISref is present, just add |refnum=nnnnnnnn. Since parameter order doesn't matter, it could be the first parameter so the rest of the line can be ignored: - {{NRISref |refnum=nnnnnnnn |any other existing parameters}}
- It looks like the above solution would handle the specific example I gave. MB 04:09, 1 September 2021 (UTC)
- @MB: OK - try
(\| *refnum *\= *)(\d{8}) *(\<ref(?: name\=\"?\w+\"?)?\>\{\{NRISref)
→$1$2$3 |refnum=$2
instead. This should work whether the reference is names or not. I'm presuming you'd skip the article if it already has the|refnum=
parameter, so it doesn't check if the parameter already exists. Happy editing! GoingBatty (talk) 04:42, 1 September 2021 (UTC)- That works. I found out that some newer numbers are 9 digits, so I added a 9-digit variation. I found some edge cases where the this parameter has a line-feed in the middle - but I just remove it and do a re-parse. Overall, it works fine. Thanks again for the help. MB 20:56, 1 September 2021 (UTC)
- @MB: You can change
(\d{8})
to(\d{8,9})
to catch both 8-digit and 9-digit values. GoingBatty (talk) 21:12, 1 September 2021 (UTC)
- @MB: You can change
- That works. I found out that some newer numbers are 9 digits, so I added a 9-digit variation. I found some edge cases where the this parameter has a line-feed in the middle - but I just remove it and do a re-parse. Overall, it works fine. Thanks again for the help. MB 20:56, 1 September 2021 (UTC)
- @MB: OK - try
- Well, let me clarify further. The ref may or may not be named. And the tranclusion of
External links section
Hello.
I sorted a bit the Wikipedia:AutoWikiBrowser/Regular expression#External links section, but I think there are too many links: that is not helping begginers to test their regexes or learn how to use it.
I am using AWB for years now, but I just grasped that it uses .NET regexes, with the help of someone on stackuser.com.
I find a bit confusing to have links to documentation about Perl and Python (but I don't know to which extend they can be helpful).
A cleanup by someone a bit more tech savy than me would be helful to non-tech savy users.
Concerning online regex testing tools, it may be nice to have several to suits everybody's tastes, but I actually think that regex101.com is the best and most used tool at the moment, so it may be the only one to be left; with a note saying that the regex library to be used is ECMAScript (JavaScript), because the default one is PCRE, and the tokens aren't the same.
Thanks, Şÿℵדαχ₮ɘɼɾ๏ʁ 10:49, 10 September 2021 (UTC)
- I've been advised not to use regex101 as it doesn't support .NET regexes, RegEx Storm seems to be a better option. Şÿℵדαχ₮ɘɼɾ๏ʁ 15:24, 10 September 2021 (UTC)
- @SyntaxTerror: Thanks for your work in this section. Since AWB has a built-in Regex tester, maybe you could add some comments to these links to briefly explain how AWB users might find these other Regex testers more useful. Thanks! GoingBatty (talk) 16:47, 10 September 2021 (UTC)
- @GoingBatty: Actually, I never used the built-in tester.
- I like to use regex101 because of its syntax highlighting, and also because it explains the meaning of the tokens and how the regex actually works point by point.
- Even if it doesn't support the .NET flavour, I think I'm going to continue tu use it using ECMAScript, as there are not many differences for the use I have of it.
- Again, I don't really want to edit technical pages on subjects I don't know too much (and also my technical English is far from perfect). Şÿℵדαχ₮ɘɼɾ๏ʁ 17:40, 10 September 2021 (UTC)
- @SyntaxTerror: Thanks for your work in this section. Since AWB has a built-in Regex tester, maybe you could add some comments to these links to briefly explain how AWB users might find these other Regex testers more useful. Thanks! GoingBatty (talk) 16:47, 10 September 2021 (UTC)
How to change to lowercase, etc.
Is there more in-depth documentation somewhere that would let me discover how to change matched strings to lowercase, and things like that? Or can someone tell me what variant of regex is used in AWB and JWB, so I can search for more docs outside? Or just tell me how to do it? Dicklyon (talk) 19:14, 30 January 2022 (UTC)
- If you want to change "Alphabet", "alphabet", and "AlphaBet" all to lowercase, just search for "alphabet" with
case sensitive
off, and replace with "alphabet". Primefac (talk) 20:46, 30 January 2022 (UTC)- What I want to do is change "(Any Words)" to "(any words)". The parens establish the context. I've seen docs on things like \L\1 for some kinds of regex, but doesn't work for ours. I made edits like this one with a bunch of patterns in JWB, but then there were more patterns that I had not anticipated. What I'm asking for probably won't quite solve it either, but it will be another tool to work with. Dicklyon (talk) 00:44, 31 January 2022 (UTC)
- @Dicklyon: You could try
\((\w)([^\)]+)\)
to({{subst:lc:$1}}$2)
. GoingBatty (talk) 01:20, 31 January 2022 (UTC)- Where can I read about this subst:lc: thing? Dicklyon (talk) 03:44, 31 January 2022 (UTC)
- @Dicklyon: At Help:Magic_words#Formatting, where you can also see that
{{lcfirst:string}}
might be easier, as in\(([^\)]+)\)
to({{subst:lcfirst:$1}})
if you like. GoingBatty (talk) 04:28, 31 January 2022 (UTC)- Ah, yes, I see, if I don't mess up and put only one curly brace, that invokes the action after I save. Thanks. Dicklyon (talk) 04:33, 31 January 2022 (UTC)
- @Dicklyon: At Help:Magic_words#Formatting, where you can also see that
- Where can I read about this subst:lc: thing? Dicklyon (talk) 03:44, 31 January 2022 (UTC)
- @Dicklyon: You could try
- What I want to do is change "(Any Words)" to "(any words)". The parens establish the context. I've seen docs on things like \L\1 for some kinds of regex, but doesn't work for ours. I made edits like this one with a bunch of patterns in JWB, but then there were more patterns that I had not anticipated. What I'm asking for probably won't quite solve it either, but it will be another tool to work with. Dicklyon (talk) 00:44, 31 January 2022 (UTC)
Query help
I'm looking to do an AWB run to change all-caps titles in CS1 references to title case. Would someone here be able to help me construct an appropriate RegEx find and replace? Specifically, I'm looking for a query that finds the content in the |title=
parameter within a reference, and if that content contains the string " AND " or " THE " (both with spaces), wraps it in {{subst:title case}}
. I'll be manually checking to make sure there aren't false positives or other issues. Thanks for the help! {{u|Sdkb}} talk 21:38, 20 August 2022 (UTC)
- @Sdkb: Hi there! You could try something like:
{{(\s*[Cc]it(?:e|ation))([^}]+)(\|\s*title\s*\=)([\w\s\–\&]+\s(?:AND|THE)\s[\w\s\–\&]+)([\|\}])
→{{$1$2$3{{subst:title case|$4}}$5
- but I don't think this will work because Wikipedia:Substitution reminds us that ref-tags refuse to run "subst:" unless temporarily renamed as "
<xref name=xx>
" or similar. - As an example, try manually using {{subst:title case}} on the reference #73 in the Phineas and Ferb article. GoingBatty (talk) 03:47, 21 August 2022 (UTC)
- Thanks for the help, @GoingBatty! Ack, looks like another casualty of phab:T4700. I'll return to it once that ticket is resolved, assuming I'm still alive in the century in which it is taken up lol. Cheers, {{u|Sdkb}} talk 23:30, 21 August 2022 (UTC)
Breaking up lines in the "Find" field?
Is there any way to break up the expression in the "Find" field into separate lines to allow easier visual parsing, without affecting the meaning of the expression? Something like:
th ( is | at )
which will match "this" or "that"? —swpbT • go beyond • bad idea 21:06, 14 December 2021 (UTC)
- You mean, putting that instead of
th(is|at)
? No, I don't think so. Primefac (talk) 10:50, 15 December 2021 (UTC)- Yes, that's what I mean. Obviously it doesn't add any value in my simple example, but some regex strings get quite long, with lots of nested groups. I put in a feature request: [1] —swpbT • go beyond • bad idea 14:41, 15 December 2021 (UTC)
- To editor Primefac: FYI, I've learned that yes, there is a simple way: the "IgnorePatternWhitespace" option can be used inline as so: (?x: pattern in which whitespace will be ignored). A checkbox would be nice, but this works. —swpbT • go beyond • bad idea 13:44, 16 September 2022 (UTC)
- Maybe a somewhat easier improvement would be to at least implement some kind of syntax highlighting/bracket matching for regexp in AWB. ~~~~
User:1234qwer1234qwer4 (talk) 20:29, 31 January 2022 (UTC)
- Maybe a somewhat easier improvement would be to at least implement some kind of syntax highlighting/bracket matching for regexp in AWB. ~~~~
"Token matching" vs "Regular matching" sections
I'm not sure I understand why these two sections are named as they are. The "token matching" section shows some examples of matching within defined wikitext elements. But to my understanding, "token" in regex refers to matched groups that are later referenced by name or number, as described in the "String matching" section of the "Regular expression definitions" table. Maybe the "Token matching" section should be called something like "Matching inside wikitext elements"? And I'm not sure what the "Regular matching" section adds. —swpbT • go beyond • bad idea 13:53, 16 September 2022 (UTC)
Speeding up mass-edit via AWB
Can I apply single edits to multiple pages with one click (my case) or do I have to click start and save for each page individually? And if so, how do AWB/JWB users who mass-edit 1000s of pages do it? Qwerty284651 (talk) 15:18, 31 January 2023 (UTC)
- You have to click save for each page. The only way around that, by design, is go get approval to run a bot. —swpbT • go beyond • bad idea 21:02, 31 January 2023 (UTC)
- To run a pre-made bot or create my own? By approval, I assume, you mean BRFA? Qwerty284651 (talk) 21:14, 31 January 2023 (UTC)
- @Qwerty284651 You have to go through the BRFA process to run a bot, whether it's made by you or someone else. AWB bots automate the clicking of the save button at a rate of one edit every 10 seconds. GoingBatty (talk) 21:59, 31 January 2023 (UTC)
- @GoingBatty, good to know for future reference. Thanks. Qwerty284651 (talk) 22:14, 31 January 2023 (UTC)
- For what it's worth, unless the edit run is going to be >500 pages, you might as well just do it manually. Sure, it's tedious and boring, but a one-time run with a relatively few number of edits will basically be over by the time the bot request trialling is done anyway. Primefac (talk) 08:56, 1 February 2023 (UTC)
- It was a 10-page edit run to clean up redirects after a page move. Luckily, AWB has Ctrl+S shortcut. If you spam it, it goes through every page. Holding won't do it. But, oh, well, at least it's something and it's faster than a bot's 6 edits/minute speed (according to GoingBatty).Qwerty284651 (talk) 11:33, 1 February 2023 (UTC)
- A 10-page edit run does not really require "spamming" the save button... it's 10 clicks. Primefac (talk) 13:17, 1 February 2023 (UTC)
- It seems we have a different definition of the word "spamming". Qwerty284651 (talk) 13:24, 1 February 2023 (UTC)
- A 10-page edit run does not really require "spamming" the save button... it's 10 clicks. Primefac (talk) 13:17, 1 February 2023 (UTC)
- It was a 10-page edit run to clean up redirects after a page move. Luckily, AWB has Ctrl+S shortcut. If you spam it, it goes through every page. Holding won't do it. But, oh, well, at least it's something and it's faster than a bot's 6 edits/minute speed (according to GoingBatty).Qwerty284651 (talk) 11:33, 1 February 2023 (UTC)
- For what it's worth, unless the edit run is going to be >500 pages, you might as well just do it manually. Sure, it's tedious and boring, but a one-time run with a relatively few number of edits will basically be over by the time the bot request trialling is done anyway. Primefac (talk) 08:56, 1 February 2023 (UTC)
- @GoingBatty, good to know for future reference. Thanks. Qwerty284651 (talk) 22:14, 31 January 2023 (UTC)
- @Qwerty284651 You have to go through the BRFA process to run a bot, whether it's made by you or someone else. AWB bots automate the clicking of the save button at a rate of one edit every 10 seconds. GoingBatty (talk) 21:59, 31 January 2023 (UTC)
- To run a pre-made bot or create my own? By approval, I assume, you mean BRFA? Qwerty284651 (talk) 21:14, 31 January 2023 (UTC)
Fixing disambiguation links
Primefac (talk) 14:30, 1 February 2023 (UTC)
How to config a bot for an BRFA
Primefac (talk) 15:39, 4 February 2023 (UTC)
AWB regex tester link to help page not working
Whilst checking regex on the AWB regex tester I noticed the link button next to the closing X button and clicked it for more info on regex since I wasn't getting the expected output and it redirected it to me a dead link: https://learn.microsoft.com/en-us/library/az24scfc.aspx (looked up archived versions on all the archive websites I could think of but to no avail; this was a permanent dead link). I am, therefore, requesting the replacement of the aforementioned embedded link with this one: https://archive.is/UfpRD. Qwerty284651 (talk) 19:12, 5 February 2023 (UTC)
- @Qwerty284651: Hi there! I was able to duplicate this issue in AWB. I suggest you create a bug report on Phabricator per Wikipedia talk:AutoWikiBrowser#Before you post. GoingBatty (talk) 03:26, 6 February 2023 (UTC)
- @GoingBatty, could you be so kind enough to do it for me? I don't want to share my credentials to Oath? Qwerty284651 (talk) 01:51, 7 February 2023 (UTC)
- @Qwerty284651: Done! GoingBatty (talk) 03:10, 7 February 2023 (UTC)
- @GoingBatty, thanks. Qwerty284651 (talk) 04:13, 7 February 2023 (UTC)
- @Qwerty284651: Done! GoingBatty (talk) 03:10, 7 February 2023 (UTC)
- @GoingBatty, could you be so kind enough to do it for me? I don't want to share my credentials to Oath? Qwerty284651 (talk) 01:51, 7 February 2023 (UTC)
Mass replacement br with plainlist
I tried replacing <br/> in this table with plainlist to avoid the annoying pause screen reader users come across. This is what I managed to \|\|([^[Ff]]\{\{\s*?[Ff]lagicon\s*?\|.*?\}\}|([^\|\|]+)
<br />([^[Ff]]\{\{\s*?[Ff]lagicon\s*?\|.*?\}\}|[^\|\|]+)
and replace it with || {{plainlist|* $1 * $2}}
, but it returns "unterminated group". How do you make 2 values between <br/>
move to a new line, for example,
{{flagicon|...}} [[...]] ... <br/> [[flagicon|...}} [[...]] ...
-->
{{plainlist| * {{Flagicon|...}} [[...]] ... * {{Flagicon|...}} [[...]] ...}}
Qwerty284651 (talk) 01:10, 13 December 2022 (UTC)
- I feel like there are easier ways to do this, but purely from a "please answer my question" standpoint, you haven't closed your opening ( after the \|\|. I'm not sure if you want it to be before the br or at the end, but that's where you're unbalanced. Also, as a thought, your regex isn't going to catch the first pair of names in the table, because it's only looking at the || separator between columns. Primefac (talk) 11:04, 13 December 2022 (UTC)
- I've been thinking about this and a lot of wikipedia pages and the heavy majority of them, I dare say, are polluted with the misage of
<br />
for it creates a annoying pauses in screen readers per WCAG and the main goal of wiki is to make editing of pages available for everyone, including the visually impaired. Using plainlist like for the above example solves this problem. {{Plainlist}} displays as if <br/> was used on desktop and probably Mac as well, and on mobile as if the lines were separated with a newline. There is a slight different in spacing. I want to request an RFC to solve this problem across wiki? What are your thoughts on this? Qwerty284651 (talk) 04:37, 7 February 2023 (UTC)- You're going to get a fair amount of resistance if you are intending on replacing all uses of br with {{plainlist}}. If you want my honest opinion on your initial query and proposal, turning a one-line table row entry into a multi-line monstrosity by using plainlist is a terrible idea from a coding perspective, because it's going to be a nightmare to delineate the table rows from the plainlist rows. Primefac (talk) 10:05, 7 February 2023 (UTC)
- Thanks for the feedback, Prime. Qwerty284651 (talk) 12:10, 7 February 2023 (UTC)
- What is a better alternative for
<br/>
?Qwerty284651 (talk) 19:49, 7 February 2023 (UTC)- I tend to defer to folks like Redrose64 to answer questions like those. Primefac (talk) 19:56, 7 February 2023 (UTC)
- @Redrose64, what is, in your opinion, a better alternative of
<br/>
, the dreaded pause maker in screen readers, so wiki pages, where applicable, become more WCAG-friendly? Qwerty284651 (talk) 05:03, 8 February 2023 (UTC)
- @Redrose64, what is, in your opinion, a better alternative of
- I tend to defer to folks like Redrose64 to answer questions like those. Primefac (talk) 19:56, 7 February 2023 (UTC)
- What is a better alternative for
- Thanks for the feedback, Prime. Qwerty284651 (talk) 12:10, 7 February 2023 (UTC)
- You're going to get a fair amount of resistance if you are intending on replacing all uses of br with {{plainlist}}. If you want my honest opinion on your initial query and proposal, turning a one-line table row entry into a multi-line monstrosity by using plainlist is a terrible idea from a coding perspective, because it's going to be a nightmare to delineate the table rows from the plainlist rows. Primefac (talk) 10:05, 7 February 2023 (UTC)
- I've been thinking about this and a lot of wikipedia pages and the heavy majority of them, I dare say, are polluted with the misage of
How to change Uppercase/lowercase?
What would be the regex to change "uppercase letter" into "lowercase same-letter" &tc? (P→p)? DePiep (talk) 10:40, 19 February 2023 (UTC)
- Some flavours of regexp support \L in the output string – (.)→\L$1 changes A→a – but I don't think that works in AWB. One solution is to change to {{subst:lc:$1}} and let MediaWiki do the work, per Help:Magic words#Formatting. Certes (talk) 12:13, 19 February 2023 (UTC)
- OK, indeed \L$1\E does not work here. But subst is smart :-) thx DePiep (talk) 12:17, 19 February 2023 (UTC)
I need a regular expression for this
15. 12. 1983 this is the original one, but i want like this 15.12.1983 in AWB advanced setings in find box iam putting this one (\d{1,2}.\s\d{1,2}.\s\d{4}) and in replace putting this one(\d{1,2}.\d{1,2}.\d{4}) but It's not working.--Tmamatha (talk) 07:35, 22 June 2023 (UTC)
- Your example is trying to replace one group with another, which will not work. In other words, you need more groups. To tweak your example, it would be
(\d{1,2}\.)\s(\d{1,2}\.)\s(\d{4})
replaced with$1$2$3
. Note also that I'm being a bit pedantic and using\.
instead of just.
so that it does match any character (since . is "any character" in regex). Primefac (talk) 08:54, 22 June 2023 (UTC)
Recursive subgroups?
A call for future editors to restore and expand the section below when more info is available:
Recursive subgroups
\[\[(Image:[^][|]+)\|([^][]*(\[\[[^][]+\]\][^][]*)*)\]\]
Qwerty284651 (talk) 21:51, 31 January 2023 (UTC)
- @Qwerty284651: What's the context? Mathematically, being a subgroup is transitive (A<B & B<C ⇒ A<C), so no recursion is necessary. Certes (talk) 13:00, 3 February 2023 (UTC)
- @Certes, this is the context query for input on AWB:Regex talk page. An editor and I couldn't come to an agreement, whether to keep this section or it. They claimed the example provided added no value to the article, which they removed; I reverted it, but they removed it again. So, I filed a query if anyone could elaborate recursive subgroups are. You mentioned recursive which reminded me of said matter and now we are here. Qwerty284651 (talk) 13:15, 3 February 2023 (UTC)
- The only reference I can find to "recursive subgroups" in a regular expression context is https://www.itworkman.com/php-pcre-regular-expression-annotation-and-recursive-mode/. (That page is aimed at PHP coders but should apply to other PCRE implementations.) In that sense, recursive subgroups is a relatively new and experimental feature which is neither used nor needed in the example you link. The example given in the PHP page matches an arbitrary depth of nested parentheses (like (this)). (In Lua we'd use %b for that, but recursive subgroups are more flexible and can do things Lua can't.) Of course, the term may have other meanings of which I'm unaware. Certes (talk) 13:36, 3 February 2023 (UTC)
- Now that's an in-depth explanation I was looking for. Qwerty284651 (talk) 13:46, 3 February 2023 (UTC)
- The only reference I can find to "recursive subgroups" in a regular expression context is https://www.itworkman.com/php-pcre-regular-expression-annotation-and-recursive-mode/. (That page is aimed at PHP coders but should apply to other PCRE implementations.) In that sense, recursive subgroups is a relatively new and experimental feature which is neither used nor needed in the example you link. The example given in the PHP page matches an arbitrary depth of nested parentheses (like (this)). (In Lua we'd use %b for that, but recursive subgroups are more flexible and can do things Lua can't.) Of course, the term may have other meanings of which I'm unaware. Certes (talk) 13:36, 3 February 2023 (UTC)
- @Certes, this is the context query for input on AWB:Regex talk page. An editor and I couldn't come to an agreement, whether to keep this section or it. They claimed the example provided added no value to the article, which they removed; I reverted it, but they removed it again. So, I filed a query if anyone could elaborate recursive subgroups are. You mentioned recursive which reminded me of said matter and now we are here. Qwerty284651 (talk) 13:15, 3 February 2023 (UTC)
- I don't know if this adds to or subtracts from the conversation, but the example cited by Qwerty284651 (I haven't tested it) seems to attempt to solve the problem of matching a construct like
[[Image:Nap.jpg|Picture of [[Napoleon]]]]
without stopping at the first ]] in a way that uses basic industry-wide regular expression syntax. The idea would also work for{{Cite xxx|ref={{harvid|x|1900}}}}
(a particular challenge of mine). As AWB uses the Microsoft RE engine, you'd be using the MS RE dialect. Somewhere in the AWB code (I forget where) it uses the Balancing Group Definitions (halfway down that page). I used to work for Microsoft and every time I try to understand that section I feel more stupid. But, again, I haven't tested the cited example which uses nested )* constructs. David Brooks (talk) 18:13, 3 February 2023 (UTC) - Adding: if you have the source, WikiFunctions\Parse\UnbalancedBrackets.cs has a bunch of them, and they are fairly easy to modify without fully understanding them. David Brooks (talk) 02:26, 4 February 2023 (UTC)
- @Certes@DavidBrooks, I moved the "recursive subgroups" part of the discussion here. Qwerty284651 (talk) 16:53, 4 February 2023 (UTC)
- Beware that {{...}} is particularly tricky to match because templates can have triple braces around parameters. {{T1|{{T2|{{T3}}}}}} and {{{P1|{{{P2}}}}}} both have six closing braces, but the first is three }} template ends and the second is two }}} parameter ends. Certes (talk) 17:44, 4 February 2023 (UTC)
- Aren't the 3 x2 parameters nested templates, whereas the 2nd example are a couple of positional parameters? Qwerty284651 (talk) 04:25, 7 February 2023 (UTC)
- Yes, and a regexp presented with six closing braces will have to work that out. Pairing single braces usually works, but they occasionally pop up unmatched in unexpected places. Certes (talk) 12:44, 7 February 2023 (UTC)
- Aren't the 3 x2 parameters nested templates, whereas the 2nd example are a couple of positional parameters? Qwerty284651 (talk) 04:25, 7 February 2023 (UTC)
- I don't know if this adds to or subtracts from the conversation, but the example cited by Qwerty284651 (I haven't tested it) seems to attempt to solve the problem of matching a construct like