Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PrefixMappingEntityIdParser #146

Merged
merged 2 commits into from
Nov 3, 2016
Merged

Conversation

manicki
Copy link
Member

@manicki manicki commented Sep 20, 2016

The parser can be used to prefix IDs of foreign entities with a repository name, and resolve foreign repository prefixes to local prefixes based on the prefix mapping.

This depends on wmde/WikibaseDataModel#679 and tests will be failing until that PR is merged.

Also I just realized that his repository might not be a right location for the new parser. As it relies on a feature to be added in DataModel 6.2 (or so), having the parser here would break DataModelServices compatibility with any earlier DataModel versions. I guess we want to keep this broad compatibility. Should I rather put this parser in the DataModel repo?

Ticket: T146274

@brightbyte
Copy link

Not sure we need to broad compat. If we do, we can check for EntityId::splitSerialization existing, and do nothing if it doesn't.

* @param string $prefix Prefix to be added
* @param EntityIdParser $idParser
*/
public function __construct( $prefix, EntityIdParser $idParser ) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps use Assert::propertyType( 'string', $prefix, '$prefix' ) to make sure $prefix is a string, and not a damp sock ;)

Copy link
Contributor

@JeroenDeDauw JeroenDeDauw Sep 20, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if ( !is_string( $prefix ) ) {
    throw new InvalidArgumentException();
}

Such ancient PHP tho!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

public function parse($idSerialization) {
// Do not add prefix if $idSerialization already starts with it
if ( strncmp( $idSerialization, $this->prefix, strlen( $this->prefix ) ) !== 0 ) {
$idSerialization = $this->prefix . $idSerialization;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This requires the prefix to contain a trailing ":". If we require this, then it should be checked by the constructor. It seems nicer to use EntityId::joinSerialization( [ $this->prefix, $idSerialization ] ).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was about to comment on this too. Reading the description again, it seems like the intent here is to be a generic prefixing parser that does not know about the meaning of different parts of IDs. If that where not the case, the class name should also change from something technical to using the correct domain term. As long as it's either way, which it currently is, it's fine by me.

In case we'd want a service specific to foreign repository "prefixes" [0], I'd disagree with your suggestion to use EntityId::joinSerialization. Making a static call to code in another component instead of a much simpler single line of code that just does two string concats is a very poor tradeoff IMO. I'd be different if the format was likely to change.

[0] In the PR to EntityId I can't find how these things are called. "Prefixes" is an accurate technical term, though perhaps not the best choice of naming for something with more specific meaning in our domain.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. I thought of this parser as a kind of generic parser that adds some prefix and not something foreign-id-specific. If we follow this path the code using the parser must ensure that the colon is present in a prefix passed to the parser.
This might be a more generic approach but on the other hand I believe the only use of such parser we now can think of would be adding a foreign repo prefix, wouldn't it? So it indeed seems reasonable to simply have ForeignEntitiyIdPrefixingParser (better name needed).
In the latter case I would also prefer not to hard code the colon in this parser. That said I agree with @JeroenDeDauw and thinking of having a static call just to avoid simple a string concat makes me pull out my hair.

In the PR to EntityId I can't find how these things are called. "Prefixes" is an accurate technical term, though perhaps not the best choice of naming for something with more specific meaning in our domain.

At some point we have been leaning towards calling these "foreign repository names", so something along this lines could probably be part of the name of the parser instead of "prefix".

* @param string $idSerialization
* @return EntityId
*/
public function parse($idSerialization) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spaces

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bah, forgot it is not using MW PHP CS config. Fixed

*/
public function testParse_invalidInput( $prefix, $input ) {
$parser = new PrefixingEntityIdParser( $prefix, new BasicEntityIdParser() );
$this->setExpectedException( EntityIdParsingException::class );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Travis is still running the tests with 5.3 and 5.4. Time to bump to 5.5+... either as a separate PR or in here, whichever is easier

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I have even prepared a branch doing this but then realized #145 is already doing this. I wanted to refer to that PR in the description of this one but had obviously forgotten.

public function testParse_invalidInput( $prefix, $input ) {
$parser = new PrefixingEntityIdParser( $prefix, new BasicEntityIdParser() );
$this->setExpectedException( EntityIdParsingException::class );
$parser->parse( $input );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this type of dataProvider usage pays for itself. How about having two methods:

  • testGivenInvalidId_exceptionIsThrown
  • testGivenEmptyId_exceptionIsThrown

And now I wrote this I realized that actually you are testing the decorated parser, BasicEntityIdParser, which is real production code. The one valid thing being tested here is that exceptions thrown by the decorated parser are not somehow caught. You only need one test case for that. And you can easily avoid binding to BasicEntityIdParser by using a test double.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to a single test, and switched to mock BasicEntityIdParser

@thiemowmde
Copy link
Contributor

This is a 1:1 copy of the SuffixEntityIdParser we already have in this repository. How did that happen?

@manicki
Copy link
Member Author

manicki commented Sep 21, 2016

@thiemowmde apart from the fact it does the opposite than SuffixEntityIdParser :)

* @return EntityId
*/
public function parse($idSerialization) {
// Do not add prefix if $idSerialization already starts with it
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, why does a parser add something that was not given in the input? This feels wrong. This means: All inputs with a prefix not supported by this parser will "successfully" be parsed. Or in other words: This parser assumes all inputs either start with the expected or none prefix. We would need some kind of PrefixValidator to make sure this is always the case. I don't think this is the best approach.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you describe is not exactly what this parser was intended to do. Particularly this:

This parser assumes all inputs either start with the expected or none prefix.

Could be that the class description is not clear enough. Or it should be called ...ParserWrapper or so.

Idea here is to add a defined prefix (if needed) and parse it using the parser passed in the constructor.

Copy link
Contributor

@JeroenDeDauw JeroenDeDauw Sep 21, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

confused-cat-meme-2

Perhaps it's me being all the sleepies, but I don't understand what you mean.

Edit: oh, @manicki already commented now

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention is this: we read JSON coming from repo foo. To avoid conflicts with local IDs, all IDs that are coming from foo get the "foo:" prefix. They are then passed on to the regular ID parser.

@thiemowmde
Copy link
Contributor

thiemowmde commented Sep 21, 2016

Oh my. Then I suggest to find a better name for this. What it does is:

  • Assume all inputs:
    • either start with the expected prefix
    • or are valid suffixes that are still valid when the prefix is added. Note that there is no code that makes sure the inputs are valid suffixes. I wonder where and how this will be guaranteed (see my other comment).
  • Then it checks if the prefix is already there
  • and adds the prefix if it was not there.
  • Then it calls a parser.

This is not a parser. Better remove the inner parse call and rename the class and the method to reflect better what it actually does.

return array(
array( 'wikidata:', 'Q1337', new ItemId( 'wikidata:Q1337' ) ),
array( 'foo:', 'P123', new PropertyId( 'foo:P123' ) ),
array( 'foo:', 'bar:Q1337', new ItemId( 'foo:bar:Q1337' ) ),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the test case that scares me the most. Let's say I do have two localized Wikibase repositories, "debase" and "enbase". How can code ever assume a prefixed EntityId like "enbase:Q1" is not prefixed, and prefix it with "debase:"? What does "debase:enbase:Q1" mean then? What if the inner EntityId parser accepts "enbase:Q1"? What will happen then? Some kind of remote validation? Possibly going in a loop? What's the difference between "debase:enbase:Q1" and "enbase:debase:Q1"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between "debase:enbase:Q1" and "enbase:debase:Q1"?

The meaning of "enbase" on "debase" and the other way round... I think... That should be something that is well understood before implementing such IDs, which is not happening in this PR, which is dealing with them after they already have been implemented.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume we are talking about the same.

This code allows prefixed EntityIDs with multiple prefixes. As far as I remember we never talked about this. I strongly suggest to not allow this anywhere in any code as long as we don't really need this, and have a specification for this that covers questions like the ones I raised above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thiemowmde I believe questions you ask are a bit beyond the scope of what this "parser" (which might not end up being call a parser any more) is supposed to do.
Your questions rather seem to me to be about the idea of "foreign entity ids" in general. It is definitely good that you raised them.
I don't know if I am the best person to try to answer them, and if you want to discuss them here, but to pick few of them:

How can code ever assume a prefixed EntityId like "enbase:Q1" is not prefixed, and prefix it with "debase:"?

Given the current (I believe unwritten) specification of foreign entity IDs, the ID could contain multiple parts separated by a colon. Left-most of them is the "foreign repository name", while all other is the ID coming from that foreign repo. In order to resolve the ID (ie. to fetch the entity the ID is referring to) there will be some lookup mechanism which will talk to the foreign repo.
So if I had a local Wikibase instance, and two foreign repos configured, using names "enbase" and "debase", when I see an entity ID like "enbase:Q1" or "enbase:debase:Q1" what I would know is that these entities are coming from the repo I call "enbase". I don't really need to know what "Q1" and "debase:Q1" stand for in this case. When resolving "enbase:debase:Q1" requires enbase repo to do lookup in another foreign repo (which enbase labelled as "debase") then fine, this will happen. My local repo will not really know about that.

So basically, as @JeroenDeDauw already said while I was typing, this "parser" does not know specifics of implementation of "foreign ids", this is done in other places.

Checking if the prefix is already in place would avoid having things like "enbase:enbase:Q1" but you're right when multiple prefixes are allowed, some "loops" on further "positions" might be possible, and something should/could be done about them. Still I don't think it is to be concern of this simple prefixing "parser".

Also, BTW, I don't know what you mean by "localized" in

Let's say I do have two localized Wikibase repositories, "debase" and "enbase".

How Wikibase instance could be localized?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added an explanation about "chained" foreign IDs in https://phabricator.wikimedia.org/T133381. Basically, they work like interwiki prefixes, which can look like de:wikibooks:fr:wikinews:commons.

@manicki manicki force-pushed the prefixing-entity-id-parser branch from e258458 to 7d1142a Compare September 21, 2016 09:28
@brightbyte
Copy link

For the record: I think this is a "parser" by virtue of implementing a parser interface. It's a parser that causes the resulting EntityId to have a specific prefix. We could add "Proxy" or "Wrapper" to the end of the name, but "EntityIdParser" should somewhere be in it, too.

public function parse( $idSerialization ) {
// Do not add prefix if $idSerialization already starts with it
if ( strncmp( $idSerialization, $this->prefix, strlen( $this->prefix ) ) !== 0 ) {
$idSerialization = $this->prefix . $idSerialization;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use EntityId::joinSerialization() to attach the prefix.

private $idParser;

/**
* @param string $prefix Prefix to be added

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should make clear whether the prefix given here should or should not include the ":" at the end. With the current code, it does, but I think it should not.

@thiemowmde
Copy link
Contributor

thiemowmde commented Sep 21, 2016

Your questions rather seem to me to be about the idea of "foreign entity ids" in general.

Not sure what you mean.

This class makes assumptions that are not backed by anything.

It should never ever happen that some code gets an arbitrary string that may or may not be prefixed. What's the use case of this? Either it is prefixed and we know that, or it is not. Edit: Or we have a list of known prefixes, fed into an actual parser that parses remaining entity IDs that do not start with a known prefix as if they start with the local prefix. Which I believe is what this class is meant to do, but does not do, because it does not know anything about other prefixes.

This code is fragile.

This code scares me, and I will most probably try to work around it with something that gives more guarantees.

The intention is this: we read JSON coming from repo foo. To avoid conflicts with local IDs, all IDs that are coming from foo get the "foo:" prefix.

Fine. Why not remove the if from this code then, and always add the prefix? Which most probably means we will not need this class, when literally all it does is a trivial string concatenation. Currently there is not a single validity check or guarantee in here that justifies the complexity of all this.

Oh, here is more:

  • Let's say an entity type allows colons in entity IDs. The if in this class will not add the prefix when an entity ID starts with a substring that happens to be identical to the prefix. This can easily be avoided by always adding the prefix.
  • Let's say we are on repo "a:", and we get an ID "a:Q1" from repo "b:", but repo "b:" does not target our repo "a:" with this prefix, but some other repo. The ID should be resolved as b:a:Q1, which means we will as repo "b:" for resolving "a:Q1", which may (or may not) be resolved by ourself (or an other repo that is known as "a:" to "b:". Currently all this is skipped and the code assumes prefixes are global.

*/
public function parse( $idSerialization ) {
// Do not add prefix if $idSerialization already starts with it
if ( strncmp( $idSerialization, $this->prefix, strlen( $this->prefix ) ) !== 0 ) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I completely missed this, and I think this is the cause for Thiemo's criticism: this is wrong. The prefix must ALWAYS by applied. If the repo we call "foo" sends us an Id with the prefix "foo" we can not assume that that's the same foo. We must add the prefix anyway, resulting in "foo:foo:Q123", meaning "Q123 in the repo that the repo we call foo calls foo". Confusing, but consistent :)

@manicki manicki force-pushed the prefixing-entity-id-parser branch from 7d1142a to bec0161 Compare September 21, 2016 12:46
@manicki manicki changed the title Add PrefixingEntityIdParser Add PrefixMappingEntityIdParser Sep 21, 2016
@manicki
Copy link
Member Author

manicki commented Sep 21, 2016

Thanks @thiemowmde @brightbyte @JeroenDeDauw for the review so far.

After talking to @brightbyte today (in person, the outcome of this I've put in https://phabricator.wikimedia.org/T146274) I've expanded what I am trying to add in this PR.
Apart from adding a prefix, this "parser" is also resolving prefixes.
So if for instance local repo knows that foo:bar: prefix is what this repo prefixes with awesome:, the parser will return the instance of EntityId prefixed with awesome:.

I've changed the description and title of the PR, and did some suggested changes:

  • prefix is always added. Indeed foo in one repo does not have to be the same in another repo. Sorry @thiemowmde that it took me few rounds of comments to get what you meant. You were obviously right
  • the parser now does not want any colons (especially trailing colons) in the prefix. It also adds a prefix using EntityId::joinSerialization.
  • open question is still the name of the new class. One thing is should this be called Parser? I agree with @brightbyte that it should s it implements the Parser interface. Should this then be prefixMappingProxyEntityIdParser? Names start to be dangerously long :) And as this is now not expecting any random prefix but the repo name, should this be also reflected in a name, as @JeroenDeDauw suggested? Any thoughts or name suggestions?

Per @thiemowmde's comment:

Let's say an entity type allows colons in entity IDs. The if in this class will not add the prefix when an entity ID starts with a substring that happens to be identical to the prefix. This can easily be avoided by always adding the prefix.

I believe we actually want to disallow using colons in entity IDs. This should be definitely made clear in EntityId docs!

@brightbyte
Copy link

For the record: Entity IDs cannot contain colons. We should add this this to the docs somewhere. IDs with colons would not only cause confusion in the context of repo prefixes for foreign IDs, it would also be confusing the the context of RDF prefixes and MediaWiki namespaces. So, please, no colons in IDs :)

Copy link

@brightbyte brightbyte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-1 only for the mangled name of the test class. Everything else are just suggestions.

* @return EntityId
*/
public function parse( $idSerialization ) {
$prefixedIdSerialization = EntityId::joinSerialization( array( $this->prefix, '', $idSerialization ) );

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest to do this in an else-branched below, only for unmapped prefixes.

public function parse( $idSerialization ) {
$prefixedIdSerialization = EntityId::joinSerialization( array( $this->prefix, '', $idSerialization ) );
$serializationParts = EntityId::splitSerialization( $idSerialization );
$serializationRepoName = $serializationParts[0];

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$foreignPrefix? Maybe worth a comment: "a prefix used by the foreign repo"

*
* @license GPL-2.0+
*/
class PrefMappixingEntityIdParserTest extends \PHPUnit_Framework_TestCase {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like "Mappixing", but the name of the test class should be consistent with the class under test :)

$prefixedIdSerialization = EntityId::joinSerialization( array( $this->prefix, '', $idSerialization ) );
$serializationParts = EntityId::splitSerialization( $idSerialization );
$serializationRepoName = $serializationParts[0];
if ( array_key_exists( $serializationRepoName, $this->prefixMapping ) ) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally prefer isset( $this->prefixMapping[$serializationRepoName] ) for readability. Note that these behave slightly differently: isset() will return falls if the key exists but the value is null, while array_key_exists() will return true in that case. That difference isn't relevant here, though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that isset reads better. While writing this I assumed that this prefix mapping will be coming from some reader class that would take care of validation/sanitization of mappings, so this code would not need to worry if mappings are sane (ie. if we expect values to be strings we will be sure we have string at this point). Thus here we could only check if the key exists and take its value if so. But wait, doesn't it mean that isset fits well here too :) Ahh, switched to isset then!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is not really a difference between isset and array_key_exists: http://maettig.com/1397246220. I do prefer array_key_exists when an array may contain null values. isset may fail then.

Copy link
Contributor

@JeroenDeDauw JeroenDeDauw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR has gotten an impressive number of comments, about one for each 4 lines by now :)

class PrefMappixingEntityIdParserTest extends \PHPUnit_Framework_TestCase {

private function getRegularParser( $input, EntityId $returnValue ) {
$regularParser = $this->getMock( BasicEntityIdParser::class );
Copy link
Contributor

@JeroenDeDauw JeroenDeDauw Sep 22, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use the interface?

Also applies to the tests below

$regularParser = $this->getMock( BasicEntityIdParser::class );
$regularParser->expects( $this->once() )
->method( 'parse' )
->with( 'wikidata:QQQ' )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the name of the test, I'm wondering why you are also testing that the call is made just once, and that a given parameter value is passed along. These are different concerns already tested elsewhere.

*/
public function testGivenInvalidPrefix_exceptionIsThrown( $prefix ) {
$regularParser = $this->getMock( BasicEntityIdParser::class );
$regularParser->expects( $this->never() )->method( 'parse' );
Copy link
Contributor

@JeroenDeDauw JeroenDeDauw Sep 22, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure there is value in testing that this thing is never called. If it has no side effects, which it likely does not, you can't even tell from the outside. So this seems like needless binding to implementation detail.

If there is some reason to verify "decorated parser is not called", then I'd change this line to

$regularParser->expects( $this->never() )->method( $this->anything() );

Also applies to test below

*/
public function __construct( $prefix, array $prefixMapping, EntityIdParser $idParser ) {
Assert::parameterType( 'string', $prefix, '$prefix' );
Assert::parameter( strpos( $prefix, ':' ) === false, '$prefix', 'must not contain a colon' );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add the relevant @throws docs to the constructor? We'd be doing this in case of inline exception throwing, and where it's definitely more valuable since it's a lot less easy to see what's happening otherwise.

@manicki manicki force-pushed the prefixing-entity-id-parser branch from bec0161 to 300716b Compare September 22, 2016 08:16
@manicki
Copy link
Member Author

manicki commented Sep 22, 2016

@brightbyte @JeroenDeDauw I am too lazy to write "Done" under each of your comments. I agree with each of your recent points and have done changes suggested. Thanks for taking a look at this code!

@JonasKress
Copy link
Contributor

Thank you @manicki for being bold!

@brightbyte
Copy link

Uh... PHP Fatal error: Call to undefined method Wikibase\DataModel\Entity\EntityId::splitSerialization() in /home/travis/build/wmde/WikibaseDataModelServices/src/EntityId/PrefixMappingEntityIdParser.php on line 61

Didn't I just merge the PR that would fix this?

@@ -23,7 +23,8 @@
"php": ">=5.3.0",
"wikibase/data-model": "~6.0|~5.0|~4.2",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will need to change to 6.2, or we need to add B/C code to handle earlier versions of the data model.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this should be changed to 6.2 when it is released.
This will also solved the PHP fatal you're quoting above. Although you've merged the PR adding foreign ids to data model it still needs to be tagged.
To test this PR locally you could temporarily hack composer json to use master (or rather the branch of wmde/WikibaseDataModel#679)

Copy link

@brightbyte brightbyte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only very minor issues left.

use Wikibase\DataModel\Entity\EntityIdParser;

/**
* EntityIdParser that adds a fixed prefix and parses resulting string as an EntityId.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does more now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expanded

Assert::parameterType( 'string', $prefix, '$prefix' );
Assert::parameter( strpos( $prefix, ':' ) === false, '$prefix', 'must not contain a colon' );
$this->prefix = $prefix;
$this->prefixMapping = $prefixMapping;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assert::parameterElementType( 'string', $prefixMapping, '$prefixMapping' )

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. Also added an assertion on keys (should be only strings too)
BTW: how there is no such assertion (array with string-only keys and values) in Assert yet - how often we want such constraints? does it make any sense to consider expanding Assert to include such thing?

*/
public function parse( $idSerialization ) {
$serializationParts = EntityId::splitSerialization( $idSerialization );
$foreignPrefix = $serializationParts[0]; // sa prefix used by foreign repository

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sa prefix?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must have fallen asleep on a keyboard

@manicki manicki force-pushed the prefixing-entity-id-parser branch from 300716b to 75f5419 Compare September 23, 2016 07:35
Copy link

@brightbyte brightbyte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is blocked on WikibaseDataModel 6.2 being released, so no "approve".

The code looks fine to merge. I added some comments regarding readability and documentation. I'd also recommend adding a few more test cases.

* to a local prefix according to the prefix mapping
* and parses resulting string as an EntityId.
* This can be used to prefix IDs of entities coming from a foreign repository
* with the repository name to avoid clashes with IDs of local entities.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Not a blocker for this PR, but a TODO): This should reference a proper spec document. That spec should contain essentially what is described in https://phabricator.wikimedia.org/T133381. The spec should live in docs/foreign-entity-ids.wiki (or .md?), but I'm not sure in which repo. It could live in Wikibase, so we have all the docs in one place. But that would break with the idea that documentation should live in the same repo as code, and should always be kept in sync with the code, commit by commit. So we'' want a docs directory in the WikibaseDataModelServices I suppose.


/**
* @param string $prefix Prefix to be added. It should not contain a colon, in particular at the end of the prefix
* @param string[] $prefixMapping

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the meaning and interaction of $prefix and $prefixMapping is unclear here. A reference ot a spec would help a lot.

Assert::parameterType( 'string', $prefix, '$prefix' );
Assert::parameter( strpos( $prefix, ':' ) === false, '$prefix', 'must not contain a colon' );
Assert::parameterElementType( 'string', $prefixMapping, '$prefixMapping' );
Assert::parameterElementType( 'string', array_keys( $prefixMapping ), '$prefixMapping' );

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid a misleading error message, this should be

Assert::parameterElementType( 'string', array_keys( $prefixMapping ), 'array_keys( $prefixMapping )' );

}

/**
* Adds the prefix to the given string, maps the resulting string according to prefix mapping definition

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implies the mapping is done after adding the prefix. It would be possible to do it that way, but the code below does not do that - it applies the mapping OR adds the prefix.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the code does it that way but I agree the description could be more clear. I've rephrased it slightly. And I agree the having some docs on this whole thing will be more than helpful!
Nah, looking at the new version it is basically the same as used to be before. Any suggestions?

* @return EntityId
*/
public function parse( $idSerialization ) {
$serializationParts = EntityId::splitSerialization( $idSerialization );

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More readable:
list( $repoName, $extraPrefixes , $relativeId )

class PrefixMappingEntityIdParserTest extends \PHPUnit_Framework_TestCase {

private function getRegularParser( $input, EntityId $returnValue ) {
$regularParser = $this->getMock( EntityIdParser::class );

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could simply return a new BasicEntityIdParser(), no need to mock.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've apparently gone too far with mocking things here. So mocks are now only use with tests that don't really "parse" anything, in cases like here BasicEntityIdParser is used

$this->assertEquals( $expected, $parser->parse( $input ) );
}

public function validIdSerializationProvider() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer provideValidIdSerialization(), since function names should be verbs. But we are sadly inconsistent in that regard, so whatever.

public function mappedSerializationIdProvider() {
return array(
array( 'foo', array( 'bar' => 'wd' ), 'bar:Q1337', 'wd:Q1337', new ItemId( 'wd:Q1337' ) ),
array( 'foo', array( 'bar' => 'wd' ), 'bar:x:Q1337', 'wd:x:Q1337', new ItemId( 'wd:x:Q1337' ) ),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps add a test case that makes sure that x:bar:Q1337 is not mapped.

}

public function validIdSerializationProvider() {
return array(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also test the case of adding an empty prefix.

@brightbyte
Copy link

@JeroenDeDauw have your concerns been addressed?

@manicki manicki force-pushed the prefixing-entity-id-parser branch 2 times, most recently from 404e279 to f00abfb Compare September 29, 2016 09:47
@manicki
Copy link
Member Author

manicki commented Sep 29, 2016

@brightbyte sorry, I missed the github ping that you've commented.
I've basically made all changes suggested except for adding docs/specs. I think it completely makes sense. I am just wondering how to deal with the docs being distributed/duplicated over all repos. Should the docs on foreign entity ids be common in all repos, or should the docs in this particular repo only focus on mapping/prefixing without other things being mentioned? For me the former seems more sane but it would then potentially be mentioning things that might be out of scope of this repo, wouldn't it?

I've also adjusted Travis CI config, as this change would require Data Model 6.2+.

@manicki manicki force-pushed the prefixing-entity-id-parser branch 2 times, most recently from c814d8f to 42bfe71 Compare September 29, 2016 10:00
@brightbyte
Copy link

@manicki thanks. Looks good now, though I can't find a way to get a diff of what you changed since my review. Github really like amend / force push. Reviews should probably be addressed by adding a commit.

Anyway: +1, blocked on the release of DataModel 6.2. Spec document is pending.

Shall we make a ticket for adding the spec document, or do you want to make a pull request right away? I think the spec for foreign IDs should be in the DataModel component - that component defines the EntityId object, so it should also define the semantics, even if it doesn't implement all of it.

Copy link

@brightbyte brightbyte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@manicki
Copy link
Member Author

manicki commented Sep 29, 2016

though I can't find a way to get a diff of what you changed since my review.

git diff 75f5419 42bfe71 shows those differences (plus some "noise" from changes done meanwhile on master) but github's compare view cannot show something similar, apparently.

Reviews should probably be addressed by adding a commit.

Yes. From now on I will be trying to add commits on top, and squash them into single commit when reviewers say it is ready.

Shall we make a ticket for adding the spec document, or do you want to make a pull request right away?

I am starting writing it down right away.

@brightbyte
Copy link

thanks @maniki!

@brightbyte
Copy link

brightbyte commented Oct 28, 2016

@maniki: What's the status here? Do we have a id mapping spec now somewhere?

Two comments regarding documentation are still pending.

@manicki
Copy link
Member Author

manicki commented Oct 31, 2016

@brightbyte so the docs on foreign ids, including a part on prefix mapping are now in https://github.com/wmde/WikibaseDataModel/blob/master/docs/foreign-entity-ids.wiki (added in wmde/WikibaseDataModel#682)
If needed this spec could be expanded/made more clear/improved in any other desired aspect, and this should indeed happen before this could be merged.

@brightbyte
Copy link

It seems to me like we can merge this. The only thing that I'd still like to see is an explicit reference to the prefix mapping spec/docs from the class level and method level documentation.

Assert::parameter( strpos( $prefix, ':' ) === false, '$prefix', 'must not contain a colon' );
Assert::parameterElementType( 'string', $prefixMapping, '$prefixMapping' );
Assert::parameterElementType( 'string', array_keys( $prefixMapping ), 'array_keys( $prefixMapping )' );
$this->prefix = $prefix;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, that could just be $prefixMapping['']....

The parser can be used to prefix IDs of foreign entities with
a repository name, and resolve foreign repository prefixes
to local prefixes based on the prefix mapping.
@manicki manicki force-pushed the prefixing-entity-id-parser branch from 6c76408 to 2c7b998 Compare November 1, 2016 10:13
Require the "default" prefix to be always provided to the constructor.
Also adds references to foreign id docs.
@manicki
Copy link
Member Author

manicki commented Nov 1, 2016

Rebased the PR after switching to short array syntax.
@brightbyte added references to docs, and removed the explicit prefix parameter of the constructor. Empty string key is now required in the mapping array.
@JeroenDeDauw could you please have another look at this and mark as approved if you were OK wth it? I am not sure if github allows to have it merged before that, as you "requested changes".

@brightbyte
Copy link

Giving @JeroenDeDauw a chance to have a look. Poke me to merge this tomorrow. I think I can override "requested changes".

@manicki
Copy link
Member Author

manicki commented Nov 3, 2016

So let's assume now is the "tomorrow". Ping @brightbyte !

@brightbyte brightbyte merged commit 5e3d423 into master Nov 3, 2016
@brightbyte brightbyte deleted the prefixing-entity-id-parser branch November 3, 2016 16:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants