Implement generic MW-API endpoints to replace the math endpoints of restbase
Open, Needs TriagePublic
Actions

Assigned To

None

Authored By

	Physikerwelt
	Jul 11 2024, 11:59 AM

Description

Currently,

https://wikimedia.org/api/rest_v1/media/math/ lists the following endpoints:

Figure out if it is possible to redirect requests made to https://wikimedia.org/api/rest_v1/media/math/ to new MW-Rest endpoints via a proxy so that the process is transparent to the user. @daniel

check/{type} Example curl -v -d 'q=E=mc^2' https://wikimedia.org/api/rest_v1/media/math/check/tex returns {"success":true,"checked":"E=mc^{2}","requiredPackages":[],"identifiers":["E","m","c"],"endsWithDot":false} with the header x-resource-location: 4c0004393a88f350a93bcef62106d556c7fc827b https://github.com/wikimedia/mediawiki-extensions-Math/blob/master/src/InputCheck/MathoidChecker.php is the implementation that gets respective information from mathoid backed by a WAN cache.
- Check if the cache key is exactly the same as it used to be and determine if it needs to be exactly the same.
- Implement tests for successful and failing examples
formula/{hash} example curl https://wikimedia.org/api/rest_v1/media/math/formula/4c0004393a88f350a93bcef62106d556c7fc827b

returns {"q":"E=mc^{2}","type":"tex"} so it can be extracted from the same WANCache

- figure out if this endpoint is used
render/{format}/{hash} example https://wikimedia.org/api/rest_v1/media/math/render/mml/4c0004393a88f350a93bcef62106d556c7fc827b returns

<math xmlns="http://www.w3.org/1998/Math/MathML" display="block" alttext="E=mc^{2}">
  <semantics>
    <mrow>
      <mi>E</mi>
      <mo>=</mo>
      <mi>m</mi>
      <msup>
        <mi>c</mi>
        <mrow class="MJX-TeXAtom-ORD">
          <mn>2</mn>
        </mrow>
      </msup>
    </mrow>
    <annotation encoding="application/x-tex">E=mc^{2}</annotation>
  </semantics>
</math>

Refactor https://github.com/wikimedia/mediawiki-extensions-Math/blob/master/src/SpecialMathShowImage.php to get content from hash independent of the special page
Develop the endpoint based on the to be created special page backend

Related Objects

Mentioned In: T271001: Transition to MathML rendering as default

Event Timeline

Physikerwelt created this task.Jul 11 2024, 11:59 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 11 2024, 11:59 AM

Physikerwelt updated the task description. (Show Details)Jul 11 2024, 12:00 PM

daniel added a project: RESTBase Sunsetting.Jul 11 2024, 4:09 PM

Physikerwelt merged a task: T255807: Implement formula endpoint using the MW:Rest-API.Jul 12 2024, 5:48 PM

Physikerwelt added subscribers: SalixAlba, NSoiffer, Debenben and 3 others.

Physikerwelt merged a task: T255808: Implement render endpoint using the MW:Rest-API.Jul 12 2024, 5:50 PM

Physikerwelt added subscribers: • eprodromou, Lucas_Werkmeister_WMDE.

Krinkle mentioned this in T271001: Transition to MathML rendering as default.Aug 27 2024, 4:03 PM

Yesterday we discussed the problems that arise when check can't store the rendered formula indefinitely. In that case, any requests that try to retrieve the rendered formula based on the hash will fail.

How about, instead of a hash, we use gzdeflate and base64_encode to encode the normalized formula in the "hash"? This will break URLs when the fomula gets too buig, but anything up to 800 or so bytes compressed and encoded should work. Larger fomulas would fail, but those should be very rare.

In T369809#10098732, @daniel wrote:

Yesterday we discussed the problems that arise when check can't store the rendered formula indefinitely. In that case, any requests that try to retrieve the rendered formula based on the hash will fail.

When we replaced the database backend with the cache, I investigated that one could use a special type of cache (DB) that stores the data infinitely. See https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Math/+/975432 . Thus, I don't fully understand the problem.

How about, instead of a hash, we use gzdeflate and base64_encode to encode the normalized formula in the "hash"? This will break URLs when the fomula gets too buig, but anything up to 800 or so bytes compressed and encoded should work. Larger fomulas would fail, but those should be very rare.

We could do that, but defining and implementing the edge cases would require.

On the server side, there needs to be 414 responses if the client sends longer requests than we want to process, maybe this is already handled by the MW REST API code but it would need to be checked
We would need to run the check again, on the second request
We would need to identify when we receive incomplete data
I am unsure if we can/should define a limit independent of the browser. According to StackOverflow 2000 might be a good value, but maybe it's best to use the same limit as defined in on the server side (1).
We would need to add a new error message. Most examples will be missing closing math tags. eg <math> formula1 some long wikitext <math>formula2</math>

In T369809#10108051, @Physikerwelt wrote:

When we replaced the database backend with the cache, I investigated that one could use a special type of cache (DB) that stores the data infinitely. See https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Math/+/975432 . Thus, I don't fully understand the problem.

Setting up a database for permanent persistance adds operational overhead. For a permanent solution, that would be justified. For temporary backwards-compatibility, I don't think it would be.

We could do that, but defining and implementing the edge cases would require.

On the server side, there needs to be 414 responses if the client sends longer requests than we want to process, maybe this is already handled by the MW REST API code but it would need to be checked

We would need to run the check again, on the second request

We would need to identify when we receive incomplete data

I am unsure if we can/should define a limit independent of the browser. According to StackOverflow 2000 might be a good value, but maybe it's best to use the same limit as defined in on the server side (1).

We would need to add a new error message. Most examples will be missing closing math tags. eg <math> formula1 some long wikitext <math>formula2</math>

That all seems pretty solvable. Encoding the formula (gzdeflate+base64) doesn't save much space (maybe 20%) but it would make it less likely that people just "send their own", and would make it obvious when the data was truncated. If we want to be extra sure, we can add a checksum or even an hmac (don't need to be full length, we don't need strong security).

As to running the check again - we only need to do that on a cache miss. The encoded version of the formula still serves as a cache key.

Since we have to account for the size of other parts of the URL, I'd go for a limit of 1000 bytes encoded, or even less.

Amire80 subscribed.Mon, Oct 14, 8:28 PM

Physikerwelt moved this task from Inbox to SVG-Only on the Math board.Mon, Nov 4, 9:09 PM

Implement generic MW-API endpoints to replace the math endpoints of restbaseOpen, Needs TriagePublicActions

Description

Related Objects

Event Timeline

Implement generic MW-API endpoints to replace the math endpoints of restbase
Open, Needs TriagePublic
Actions