Add PrefixMappingEntityIdParser #146

manicki · 2016-09-20T14:44:13Z

The parser can be used to prefix IDs of foreign entities with a repository name, and resolve foreign repository prefixes to local prefixes based on the prefix mapping.

~~This depends on wmde/WikibaseDataModel#679 and tests will be failing until that PR is merged.~~

Also I just realized that his repository might not be a right location for the new parser. As it relies on a feature to be added in DataModel 6.2 (or so), having the parser here would break DataModelServices compatibility with any earlier DataModel versions. I guess we want to keep this broad compatibility. Should I rather put this parser in the DataModel repo?

Ticket: T146274

brightbyte · 2016-09-20T18:38:22Z

Not sure we need to broad compat. If we do, we can check for EntityId::splitSerialization existing, and do nothing if it doesn't.

brightbyte · 2016-09-20T18:39:45Z

src/EntityId/PrefixingEntityIdParser.php

+	 * @param string $prefix Prefix to be added
+	 * @param EntityIdParser $idParser
+	 */
+	public function __construct( $prefix, EntityIdParser $idParser ) {


Perhaps use Assert::propertyType( 'string', $prefix, '$prefix' ) to make sure $prefix is a string, and not a damp sock ;)

if ( !is_string( $prefix ) ) { throw new InvalidArgumentException(); }

Such ancient PHP tho!

brightbyte · 2016-09-20T18:41:01Z

src/EntityId/PrefixingEntityIdParser.php

+	public function parse($idSerialization) {
+		// Do not add prefix if $idSerialization already starts with it
+		if ( strncmp( $idSerialization, $this->prefix, strlen( $this->prefix ) ) !== 0 ) {
+			$idSerialization = $this->prefix . $idSerialization;


This requires the prefix to contain a trailing ":". If we require this, then it should be checked by the constructor. It seems nicer to use EntityId::joinSerialization( [ $this->prefix, $idSerialization ] ).

I was about to comment on this too. Reading the description again, it seems like the intent here is to be a generic prefixing parser that does not know about the meaning of different parts of IDs. If that where not the case, the class name should also change from something technical to using the correct domain term. As long as it's either way, which it currently is, it's fine by me.

In case we'd want a service specific to foreign repository "prefixes" [0], I'd disagree with your suggestion to use EntityId::joinSerialization. Making a static call to code in another component instead of a much simpler single line of code that just does two string concats is a very poor tradeoff IMO. I'd be different if the format was likely to change.

[0] In the PR to EntityId I can't find how these things are called. "Prefixes" is an accurate technical term, though perhaps not the best choice of naming for something with more specific meaning in our domain.

Indeed. I thought of this parser as a kind of generic parser that adds some prefix and not something foreign-id-specific. If we follow this path the code using the parser must ensure that the colon is present in a prefix passed to the parser.
This might be a more generic approach but on the other hand I believe the only use of such parser we now can think of would be adding a foreign repo prefix, wouldn't it? So it indeed seems reasonable to simply have ForeignEntitiyIdPrefixingParser (better name needed).
In the latter case I would also prefer not to hard code the colon in this parser. That said I agree with @JeroenDeDauw and thinking of having a static call just to avoid simple a string concat makes me pull out my hair.

In the PR to EntityId I can't find how these things are called. "Prefixes" is an accurate technical term, though perhaps not the best choice of naming for something with more specific meaning in our domain.

At some point we have been leaning towards calling these "foreign repository names", so something along this lines could probably be part of the name of the parser instead of "prefix".

JeroenDeDauw · 2016-09-20T20:35:55Z

src/EntityId/PrefixingEntityIdParser.php

+	 * @param string $idSerialization
+	 * @return EntityId
+	 */
+	public function parse($idSerialization) {


bah, forgot it is not using MW PHP CS config. Fixed

JeroenDeDauw · 2016-09-20T20:47:21Z

tests/unit/EntityId/PrefixingEntityIdParserTest.php

+	 */
+	public function testParse_invalidInput( $prefix, $input ) {
+		$parser = new PrefixingEntityIdParser( $prefix, new BasicEntityIdParser() );
+		$this->setExpectedException( EntityIdParsingException::class );


Travis is still running the tests with 5.3 and 5.4. Time to bump to 5.5+... either as a separate PR or in here, whichever is easier

Right. I have even prepared a branch doing this but then realized #145 is already doing this. I wanted to refer to that PR in the description of this one but had obviously forgotten.

JeroenDeDauw · 2016-09-20T20:52:45Z

tests/unit/EntityId/PrefixingEntityIdParserTest.php

+	public function testParse_invalidInput( $prefix, $input ) {
+		$parser = new PrefixingEntityIdParser( $prefix, new BasicEntityIdParser() );
+		$this->setExpectedException( EntityIdParsingException::class );
+		$parser->parse( $input );


I don't think this type of dataProvider usage pays for itself. How about having two methods:

testGivenInvalidId_exceptionIsThrown

testGivenEmptyId_exceptionIsThrown

And now I wrote this I realized that actually you are testing the decorated parser, BasicEntityIdParser, which is real production code. The one valid thing being tested here is that exceptions thrown by the decorated parser are not somehow caught. You only need one test case for that. And you can easily avoid binding to BasicEntityIdParser by using a test double.

Changed to a single test, and switched to mock BasicEntityIdParser

thiemowmde · 2016-09-21T07:34:43Z

This is a 1:1 copy of the SuffixEntityIdParser we already have in this repository. How did that happen?

manicki · 2016-09-21T07:37:46Z

@thiemowmde apart from the fact it does the opposite than SuffixEntityIdParser :)

thiemowmde · 2016-09-21T07:41:12Z

src/EntityId/PrefixingEntityIdParser.php

+	 * @return EntityId
+	 */
+	public function parse($idSerialization) {
+		// Do not add prefix if $idSerialization already starts with it


Wait, why does a parser add something that was not given in the input? This feels wrong. This means: All inputs with a prefix not supported by this parser will "successfully" be parsed. Or in other words: This parser assumes all inputs either start with the expected or none prefix. We would need some kind of PrefixValidator to make sure this is always the case. I don't think this is the best approach.

What you describe is not exactly what this parser was intended to do. Particularly this:

This parser assumes all inputs either start with the expected or none prefix.

Could be that the class description is not clear enough. Or it should be called ...ParserWrapper or so.

Idea here is to add a defined prefix (if needed) and parse it using the parser passed in the constructor.

Perhaps it's me being all the sleepies, but I don't understand what you mean.

Edit: oh, @manicki already commented now

The intention is this: we read JSON coming from repo foo. To avoid conflicts with local IDs, all IDs that are coming from foo get the "foo:" prefix. They are then passed on to the regular ID parser.

thiemowmde · 2016-09-21T07:50:00Z

Oh my. Then I suggest to find a better name for this. What it does is:

Assume all inputs:
- either start with the expected prefix
- or are valid suffixes that are still valid when the prefix is added. Note that there is no code that makes sure the inputs are valid suffixes. I wonder where and how this will be guaranteed (see my other comment).
Then it checks if the prefix is already there
and adds the prefix if it was not there.
Then it calls a parser.

This is not a parser. Better remove the inner parse call and rename the class and the method to reflect better what it actually does.

thiemowmde · 2016-09-21T08:12:11Z

tests/unit/EntityId/PrefixingEntityIdParserTest.php

+		return array(
+			array( 'wikidata:', 'Q1337', new ItemId( 'wikidata:Q1337' ) ),
+			array( 'foo:', 'P123', new PropertyId( 'foo:P123' ) ),
+			array( 'foo:', 'bar:Q1337', new ItemId( 'foo:bar:Q1337' ) ),


This is the test case that scares me the most. Let's say I do have two localized Wikibase repositories, "debase" and "enbase". How can code ever assume a prefixed EntityId like "enbase:Q1" is not prefixed, and prefix it with "debase:"? What does "debase:enbase:Q1" mean then? What if the inner EntityId parser accepts "enbase:Q1"? What will happen then? Some kind of remote validation? Possibly going in a loop? What's the difference between "debase:enbase:Q1" and "enbase:debase:Q1"?

What's the difference between "debase:enbase:Q1" and "enbase:debase:Q1"?

The meaning of "enbase" on "debase" and the other way round... I think... That should be something that is well understood before implementing such IDs, which is not happening in this PR, which is dealing with them after they already have been implemented.

I assume we are talking about the same.

This code allows prefixed EntityIDs with multiple prefixes. As far as I remember we never talked about this. I strongly suggest to not allow this anywhere in any code as long as we don't really need this, and have a specification for this that covers questions like the ones I raised above.

@thiemowmde I believe questions you ask are a bit beyond the scope of what this "parser" (which might not end up being call a parser any more) is supposed to do.
Your questions rather seem to me to be about the idea of "foreign entity ids" in general. It is definitely good that you raised them.
I don't know if I am the best person to try to answer them, and if you want to discuss them here, but to pick few of them:

How can code ever assume a prefixed EntityId like "enbase:Q1" is not prefixed, and prefix it with "debase:"?

Given the current (I believe unwritten) specification of foreign entity IDs, the ID could contain multiple parts separated by a colon. Left-most of them is the "foreign repository name", while all other is the ID coming from that foreign repo. In order to resolve the ID (ie. to fetch the entity the ID is referring to) there will be some lookup mechanism which will talk to the foreign repo.
So if I had a local Wikibase instance, and two foreign repos configured, using names "enbase" and "debase", when I see an entity ID like "enbase:Q1" or "enbase:debase:Q1" what I would know is that these entities are coming from the repo I call "enbase". I don't really need to know what "Q1" and "debase:Q1" stand for in this case. When resolving "enbase:debase:Q1" requires enbase repo to do lookup in another foreign repo (which enbase labelled as "debase") then fine, this will happen. My local repo will not really know about that.

So basically, as @JeroenDeDauw already said while I was typing, this "parser" does not know specifics of implementation of "foreign ids", this is done in other places.

Checking if the prefix is already in place would avoid having things like "enbase:enbase:Q1" but you're right when multiple prefixes are allowed, some "loops" on further "positions" might be possible, and something should/could be done about them. Still I don't think it is to be concern of this simple prefixing "parser".

Also, BTW, I don't know what you mean by "localized" in

Let's say I do have two localized Wikibase repositories, "debase" and "enbase".

How Wikibase instance could be localized?

I added an explanation about "chained" foreign IDs in https://phabricator.wikimedia.org/T133381. Basically, they work like interwiki prefixes, which can look like de:wikibooks:fr:wikinews:commons.

brightbyte · 2016-09-21T09:54:30Z

For the record: I think this is a "parser" by virtue of implementing a parser interface. It's a parser that causes the resulting EntityId to have a specific prefix. We could add "Proxy" or "Wrapper" to the end of the name, but "EntityIdParser" should somewhere be in it, too.

brightbyte · 2016-09-21T09:56:34Z

src/EntityId/PrefixingEntityIdParser.php

+	public function parse( $idSerialization ) {
+		// Do not add prefix if $idSerialization already starts with it
+		if ( strncmp( $idSerialization, $this->prefix, strlen( $this->prefix ) ) !== 0 ) {
+			$idSerialization = $this->prefix . $idSerialization;


Please use EntityId::joinSerialization() to attach the prefix.

brightbyte · 2016-09-21T09:57:17Z

src/EntityId/PrefixingEntityIdParser.php

+	private $idParser;
+
+	/**
+	 * @param string $prefix Prefix to be added


This should make clear whether the prefix given here should or should not include the ":" at the end. With the current code, it does, but I think it should not.

thiemowmde · 2016-09-21T11:38:31Z

Your questions rather seem to me to be about the idea of "foreign entity ids" in general.

Not sure what you mean.

This class makes assumptions that are not backed by anything.

It should never ever happen that some code gets an arbitrary string that may or may not be prefixed. What's the use case of this? Either it is prefixed and we know that, or it is not. Edit: Or we have a list of known prefixes, fed into an actual parser that parses remaining entity IDs that do not start with a known prefix as if they start with the local prefix. Which I believe is what this class is meant to do, but does not do, because it does not know anything about other prefixes.

This code is fragile.

This code scares me, and I will most probably try to work around it with something that gives more guarantees.

The intention is this: we read JSON coming from repo foo. To avoid conflicts with local IDs, all IDs that are coming from foo get the "foo:" prefix.

Fine. Why not remove the if from this code then, and always add the prefix? Which most probably means we will not need this class, when literally all it does is a trivial string concatenation. Currently there is not a single validity check or guarantee in here that justifies the complexity of all this.

Oh, here is more:

Let's say an entity type allows colons in entity IDs. The if in this class will not add the prefix when an entity ID starts with a substring that happens to be identical to the prefix. This can easily be avoided by always adding the prefix.
Let's say we are on repo "a:", and we get an ID "a:Q1" from repo "b:", but repo "b:" does not target our repo "a:" with this prefix, but some other repo. The ID should be resolved as b:a:Q1, which means we will as repo "b:" for resolving "a:Q1", which may (or may not) be resolved by ourself (or an other repo that is known as "a:" to "b:". Currently all this is skipped and the code assumes prefixes are global.

brightbyte · 2016-09-21T11:58:00Z

src/EntityId/PrefixingEntityIdParser.php

+	 */
+	public function parse( $idSerialization ) {
+		// Do not add prefix if $idSerialization already starts with it
+		if ( strncmp( $idSerialization, $this->prefix, strlen( $this->prefix ) ) !== 0 ) {


Oh, I completely missed this, and I think this is the cause for Thiemo's criticism: this is wrong. The prefix must ALWAYS by applied. If the repo we call "foo" sends us an Id with the prefix "foo" we can not assume that that's the same foo. We must add the prefix anyway, resulting in "foo:foo:Q123", meaning "Q123 in the repo that the repo we call foo calls foo". Confusing, but consistent :)

manicki · 2016-09-21T13:40:34Z

Thanks @thiemowmde @brightbyte @JeroenDeDauw for the review so far.

After talking to @brightbyte today (in person, the outcome of this I've put in https://phabricator.wikimedia.org/T146274) I've expanded what I am trying to add in this PR.
Apart from adding a prefix, this "parser" is also resolving prefixes.
So if for instance local repo knows that foo:bar: prefix is what this repo prefixes with awesome:, the parser will return the instance of EntityId prefixed with awesome:.

I've changed the description and title of the PR, and did some suggested changes:

prefix is always added. Indeed foo in one repo does not have to be the same in another repo. Sorry @thiemowmde that it took me few rounds of comments to get what you meant. You were obviously right
the parser now does not want any colons (especially trailing colons) in the prefix. It also adds a prefix using EntityId::joinSerialization.
open question is still the name of the new class. One thing is should this be called Parser? I agree with @brightbyte that it should s it implements the Parser interface. Should this then be prefixMappingProxyEntityIdParser? Names start to be dangerously long :) And as this is now not expecting any random prefix but the repo name, should this be also reflected in a name, as @JeroenDeDauw suggested? Any thoughts or name suggestions?

Per @thiemowmde's comment:

Let's say an entity type allows colons in entity IDs. The if in this class will not add the prefix when an entity ID starts with a substring that happens to be identical to the prefix. This can easily be avoided by always adding the prefix.

I believe we actually want to disallow using colons in entity IDs. This should be definitely made clear in EntityId docs!

brightbyte · 2016-09-21T16:09:46Z

For the record: Entity IDs cannot contain colons. We should add this this to the docs somewhere. IDs with colons would not only cause confusion in the context of repo prefixes for foreign IDs, it would also be confusing the the context of RDF prefixes and MediaWiki namespaces. So, please, no colons in IDs :)

brightbyte

-1 only for the mangled name of the test class. Everything else are just suggestions.

brightbyte · 2016-09-21T16:13:34Z

src/EntityId/PrefixMappingEntityIdParser.php

+	 * @return EntityId
+	 */
+	public function parse( $idSerialization ) {
+		$prefixedIdSerialization = EntityId::joinSerialization( array( $this->prefix, '', $idSerialization ) );


I'd suggest to do this in an else-branched below, only for unmapped prefixes.

brightbyte · 2016-09-21T16:14:10Z

src/EntityId/PrefixMappingEntityIdParser.php

+	public function parse( $idSerialization ) {
+		$prefixedIdSerialization = EntityId::joinSerialization( array( $this->prefix, '', $idSerialization ) );
+		$serializationParts = EntityId::splitSerialization( $idSerialization );
+		$serializationRepoName = $serializationParts[0];


$foreignPrefix? Maybe worth a comment: "a prefix used by the foreign repo"

brightbyte · 2016-09-21T16:17:09Z

tests/unit/EntityId/PrefixMappingEntityIdParserTest.php

+ *
+ * @license GPL-2.0+
+ */
+class PrefMappixingEntityIdParserTest extends \PHPUnit_Framework_TestCase {


I like "Mappixing", but the name of the test class should be consistent with the class under test :)

brightbyte · 2016-09-21T16:19:06Z

src/EntityId/PrefixMappingEntityIdParser.php

+		$prefixedIdSerialization = EntityId::joinSerialization( array( $this->prefix, '', $idSerialization ) );
+		$serializationParts = EntityId::splitSerialization( $idSerialization );
+		$serializationRepoName = $serializationParts[0];
+		if ( array_key_exists( $serializationRepoName, $this->prefixMapping ) ) {


I generally prefer isset( $this->prefixMapping[$serializationRepoName] ) for readability. Note that these behave slightly differently: isset() will return falls if the key exists but the value is null, while array_key_exists() will return true in that case. That difference isn't relevant here, though.

I agree that isset reads better. While writing this I assumed that this prefix mapping will be coming from some reader class that would take care of validation/sanitization of mappings, so this code would not need to worry if mappings are sane (ie. if we expect values to be strings we will be sure we have string at this point). Thus here we could only check if the key exists and take its value if so. But wait, doesn't it mean that isset fits well here too :) Ahh, switched to isset then!

There is not really a difference between isset and array_key_exists: http://maettig.com/1397246220. I do prefer array_key_exists when an array may contain null values. isset may fail then.

JeroenDeDauw

This PR has gotten an impressive number of comments, about one for each 4 lines by now :)

JeroenDeDauw · 2016-09-22T01:11:39Z

tests/unit/EntityId/PrefixMappingEntityIdParserTest.php

+class PrefMappixingEntityIdParserTest extends \PHPUnit_Framework_TestCase {
+
+	private function getRegularParser( $input, EntityId $returnValue ) {
+		$regularParser = $this->getMock( BasicEntityIdParser::class );


Why not use the interface?

Also applies to the tests below

JeroenDeDauw · 2016-09-22T01:14:06Z

tests/unit/EntityId/PrefixMappingEntityIdParserTest.php

+		$regularParser = $this->getMock( BasicEntityIdParser::class );
+		$regularParser->expects( $this->once() )
+			->method( 'parse' )
+			->with( 'wikidata:QQQ' )


Looking at the name of the test, I'm wondering why you are also testing that the call is made just once, and that a given parameter value is passed along. These are different concerns already tested elsewhere.

JeroenDeDauw · 2016-09-22T01:17:37Z

tests/unit/EntityId/PrefixMappingEntityIdParserTest.php

+	 */
+	public function testGivenInvalidPrefix_exceptionIsThrown( $prefix ) {
+		$regularParser = $this->getMock( BasicEntityIdParser::class );
+		$regularParser->expects( $this->never() )->method( 'parse' );


Not sure there is value in testing that this thing is never called. If it has no side effects, which it likely does not, you can't even tell from the outside. So this seems like needless binding to implementation detail.

If there is some reason to verify "decorated parser is not called", then I'd change this line to

$regularParser->expects( $this->never() )->method( $this->anything() );

Also applies to test below

JeroenDeDauw · 2016-09-22T01:19:28Z

src/EntityId/PrefixMappingEntityIdParser.php

+	 */
+	public function __construct( $prefix, array $prefixMapping, EntityIdParser $idParser ) {
+		Assert::parameterType( 'string', $prefix, '$prefix' );
+		Assert::parameter( strpos( $prefix, ':' ) === false, '$prefix', 'must not contain a colon' );


Can you add the relevant @throws docs to the constructor? We'd be doing this in case of inline exception throwing, and where it's definitely more valuable since it's a lot less easy to see what's happening otherwise.

manicki · 2016-09-22T08:20:41Z

@brightbyte @JeroenDeDauw I am too lazy to write "Done" under each of your comments. I agree with each of your recent points and have done changes suggested. Thanks for taking a look at this code!

JonasKress · 2016-09-22T10:04:13Z

Thank you @manicki for being bold!

brightbyte · 2016-09-22T17:03:17Z

Uh... PHP Fatal error: Call to undefined method Wikibase\DataModel\Entity\EntityId::splitSerialization() in /home/travis/build/wmde/WikibaseDataModelServices/src/EntityId/PrefixMappingEntityIdParser.php on line 61

Didn't I just merge the PR that would fix this?

brightbyte · 2016-09-22T11:53:50Z

composer.json

@@ -23,7 +23,8 @@
 		"php": ">=5.3.0",
 		"wikibase/data-model": "~6.0|~5.0|~4.2",


This will need to change to 6.2, or we need to add B/C code to handle earlier versions of the data model.

yes, this should be changed to 6.2 when it is released.
This will also solved the PHP fatal you're quoting above. Although you've merged the PR adding foreign ids to data model it still needs to be tagged.
To test this PR locally you could temporarily hack composer json to use master (or rather the branch of wmde/WikibaseDataModel#679)

brightbyte

Only very minor issues left.

brightbyte · 2016-09-22T17:04:09Z

src/EntityId/PrefixMappingEntityIdParser.php

+use Wikibase\DataModel\Entity\EntityIdParser;
+
+/**
+ * EntityIdParser that adds a fixed prefix and parses resulting string as an EntityId.


This does more now.

brightbyte · 2016-09-22T17:05:29Z

src/EntityId/PrefixMappingEntityIdParser.php

+		Assert::parameterType( 'string', $prefix, '$prefix' );
+		Assert::parameter( strpos( $prefix, ':' ) === false, '$prefix', 'must not contain a colon' );
+		$this->prefix = $prefix;
+		$this->prefixMapping = $prefixMapping;


Assert::parameterElementType( 'string', $prefixMapping, '$prefixMapping' )

done. Also added an assertion on keys (should be only strings too)
BTW: how there is no such assertion (array with string-only keys and values) in Assert yet - how often we want such constraints? does it make any sense to consider expanding Assert to include such thing?

brightbyte · 2016-09-22T17:05:59Z

src/EntityId/PrefixMappingEntityIdParser.php

+	 */
+	public function parse( $idSerialization ) {
+		$serializationParts = EntityId::splitSerialization( $idSerialization );
+		$foreignPrefix = $serializationParts[0]; // sa prefix used by foreign repository


I must have fallen asleep on a keyboard

brightbyte

This is blocked on WikibaseDataModel 6.2 being released, so no "approve".

The code looks fine to merge. I added some comments regarding readability and documentation. I'd also recommend adding a few more test cases.

brightbyte · 2016-09-27T15:20:54Z

src/EntityId/PrefixMappingEntityIdParser.php

+ * to a local prefix according to the prefix mapping
+ * and parses resulting string as an EntityId.
+ * This can be used to prefix IDs of entities coming from a foreign repository
+ * with the repository name to avoid clashes with IDs of local entities.


(Not a blocker for this PR, but a TODO): This should reference a proper spec document. That spec should contain essentially what is described in https://phabricator.wikimedia.org/T133381. The spec should live in docs/foreign-entity-ids.wiki (or .md?), but I'm not sure in which repo. It could live in Wikibase, so we have all the docs in one place. But that would break with the idea that documentation should live in the same repo as code, and should always be kept in sync with the code, commit by commit. So we'' want a docs directory in the WikibaseDataModelServices I suppose.

brightbyte · 2016-09-27T15:21:44Z

src/EntityId/PrefixMappingEntityIdParser.php

+
+	/**
+	 * @param string $prefix Prefix to be added. It should not contain a colon, in particular at the end of the prefix
+	 * @param string[] $prefixMapping


the meaning and interaction of $prefix and $prefixMapping is unclear here. A reference ot a spec would help a lot.

brightbyte · 2016-09-27T15:22:32Z

src/EntityId/PrefixMappingEntityIdParser.php

+		Assert::parameterType( 'string', $prefix, '$prefix' );
+		Assert::parameter( strpos( $prefix, ':' ) === false, '$prefix', 'must not contain a colon' );
+		Assert::parameterElementType( 'string', $prefixMapping, '$prefixMapping' );
+		Assert::parameterElementType( 'string', array_keys( $prefixMapping ), '$prefixMapping' );


To avoid a misleading error message, this should be

Assert::parameterElementType( 'string', array_keys( $prefixMapping ), 'array_keys( $prefixMapping )' );

brightbyte · 2016-09-27T15:23:58Z

src/EntityId/PrefixMappingEntityIdParser.php

+	}
+
+	/**
+	 * Adds the prefix to the given string, maps the resulting string according to prefix mapping definition


This implies the mapping is done after adding the prefix. It would be possible to do it that way, but the code below does not do that - it applies the mapping OR adds the prefix.

Well, the code does it that way but I agree the description could be more clear. I've rephrased it slightly. And I agree the having some docs on this whole thing will be more than helpful!
Nah, looking at the new version it is basically the same as used to be before. Any suggestions?

brightbyte · 2016-09-27T15:25:06Z

src/EntityId/PrefixMappingEntityIdParser.php

+	 * @return EntityId
+	 */
+	public function parse( $idSerialization ) {
+		$serializationParts = EntityId::splitSerialization( $idSerialization );


More readable:
list( $repoName, $extraPrefixes , $relativeId )

brightbyte · 2016-09-27T15:27:26Z

tests/unit/EntityId/PrefixMappingEntityIdParserTest.php

+class PrefixMappingEntityIdParserTest extends \PHPUnit_Framework_TestCase {
+
+	private function getRegularParser( $input, EntityId $returnValue ) {
+		$regularParser = $this->getMock( EntityIdParser::class );


Could simply return a new BasicEntityIdParser(), no need to mock.

I've apparently gone too far with mocking things here. So mocks are now only use with tests that don't really "parse" anything, in cases like here BasicEntityIdParser is used

brightbyte · 2016-09-27T15:37:32Z

tests/unit/EntityId/PrefixMappingEntityIdParserTest.php

+		$this->assertEquals( $expected, $parser->parse( $input ) );
+	}
+
+	public function validIdSerializationProvider() {


I would prefer provideValidIdSerialization(), since function names should be verbs. But we are sadly inconsistent in that regard, so whatever.

brightbyte · 2016-09-27T15:38:24Z

tests/unit/EntityId/PrefixMappingEntityIdParserTest.php

+	public function mappedSerializationIdProvider() {
+		return array(
+			array( 'foo', array( 'bar' => 'wd' ), 'bar:Q1337', 'wd:Q1337', new ItemId( 'wd:Q1337' ) ),
+			array( 'foo', array( 'bar' => 'wd' ), 'bar:x:Q1337', 'wd:x:Q1337', new ItemId( 'wd:x:Q1337' ) ),


perhaps add a test case that makes sure that x:bar:Q1337 is not mapped.

brightbyte · 2016-09-27T15:40:49Z

tests/unit/EntityId/PrefixMappingEntityIdParserTest.php

+	}
+
+	public function validIdSerializationProvider() {
+		return array(


Also test the case of adding an empty prefix.

brightbyte · 2016-09-27T15:43:13Z

@JeroenDeDauw have your concerns been addressed?

manicki · 2016-09-29T09:55:03Z

@brightbyte sorry, I missed the github ping that you've commented.
I've basically made all changes suggested except for adding docs/specs. I think it completely makes sense. I am just wondering how to deal with the docs being distributed/duplicated over all repos. Should the docs on foreign entity ids be common in all repos, or should the docs in this particular repo only focus on mapping/prefixing without other things being mentioned? For me the former seems more sane but it would then potentially be mentioning things that might be out of scope of this repo, wouldn't it?

I've also adjusted Travis CI config, as this change would require Data Model 6.2+.

brightbyte · 2016-09-29T13:22:52Z

@manicki thanks. Looks good now, though I can't find a way to get a diff of what you changed since my review. Github really like amend / force push. Reviews should probably be addressed by adding a commit.

Anyway: +1, blocked on the release of DataModel 6.2. Spec document is pending.

Shall we make a ticket for adding the spec document, or do you want to make a pull request right away? I think the spec for foreign IDs should be in the DataModel component - that component defines the EntityId object, so it should also define the semantics, even if it doesn't implement all of it.

brightbyte

+1

manicki · 2016-09-29T13:37:57Z

though I can't find a way to get a diff of what you changed since my review.

git diff 75f5419 42bfe71 shows those differences (plus some "noise" from changes done meanwhile on master) but github's compare view cannot show something similar, apparently.

Reviews should probably be addressed by adding a commit.

Yes. From now on I will be trying to add commits on top, and squash them into single commit when reviewers say it is ready.

Shall we make a ticket for adding the spec document, or do you want to make a pull request right away?

I am starting writing it down right away.

brightbyte · 2016-09-29T13:42:32Z

thanks @maniki!

brightbyte · 2016-10-28T16:54:53Z

@maniki: What's the status here? Do we have a id mapping spec now somewhere?

Two comments regarding documentation are still pending.

manicki · 2016-10-31T09:30:05Z

@brightbyte so the docs on foreign ids, including a part on prefix mapping are now in https://github.com/wmde/WikibaseDataModel/blob/master/docs/foreign-entity-ids.wiki (added in wmde/WikibaseDataModel#682)
If needed this spec could be expanded/made more clear/improved in any other desired aspect, and this should indeed happen before this could be merged.

brightbyte · 2016-10-31T15:33:45Z

It seems to me like we can merge this. The only thing that I'd still like to see is an explicit reference to the prefix mapping spec/docs from the class level and method level documentation.

brightbyte · 2016-10-31T15:35:35Z

src/EntityId/PrefixMappingEntityIdParser.php

+		Assert::parameter( strpos( $prefix, ':' ) === false, '$prefix', 'must not contain a colon' );
+		Assert::parameterElementType( 'string', $prefixMapping, '$prefixMapping' );
+		Assert::parameterElementType( 'string', array_keys( $prefixMapping ), 'array_keys( $prefixMapping )' );
+		$this->prefix = $prefix;


Hm, that could just be $prefixMapping['']....

The parser can be used to prefix IDs of foreign entities with a repository name, and resolve foreign repository prefixes to local prefixes based on the prefix mapping.

Require the "default" prefix to be always provided to the constructor. Also adds references to foreign id docs.

manicki · 2016-11-01T10:59:45Z

Rebased the PR after switching to short array syntax.
@brightbyte added references to docs, and removed the explicit prefix parameter of the constructor. Empty string key is now required in the mapping array.
@JeroenDeDauw could you please have another look at this and mark as approved if you were OK wth it? I am not sure if github allows to have it merged before that, as you "requested changes".

brightbyte · 2016-11-01T18:08:20Z

Giving @JeroenDeDauw a chance to have a look. Poke me to merge this tomorrow. I think I can override "requested changes".

manicki · 2016-11-03T09:36:43Z

So let's assume now is the "tomorrow". Ping @brightbyte !

brightbyte suggested changes Sep 20, 2016

View reviewed changes

JeroenDeDauw reviewed Sep 20, 2016

View reviewed changes

JeroenDeDauw suggested changes Sep 20, 2016

View reviewed changes

thiemowmde reviewed Sep 21, 2016

View reviewed changes

manicki force-pushed the prefixing-entity-id-parser branch from e258458 to 7d1142a Compare September 21, 2016 09:28

brightbyte suggested changes Sep 21, 2016

View reviewed changes

manicki force-pushed the prefixing-entity-id-parser branch from 7d1142a to bec0161 Compare September 21, 2016 12:46

manicki changed the title ~~Add PrefixingEntityIdParser~~ Add PrefixMappingEntityIdParser Sep 21, 2016

brightbyte suggested changes Sep 21, 2016

View reviewed changes

JeroenDeDauw reviewed Sep 22, 2016

View reviewed changes

manicki force-pushed the prefixing-entity-id-parser branch from bec0161 to 300716b Compare September 22, 2016 08:16

brightbyte mentioned this pull request Sep 22, 2016

Support foreign EntityIds. wmde/WikibaseDataModel#678

Merged

brightbyte reviewed Sep 22, 2016

View reviewed changes

brightbyte suggested changes Sep 22, 2016

View reviewed changes

manicki force-pushed the prefixing-entity-id-parser branch from 300716b to 75f5419 Compare September 23, 2016 07:35

brightbyte reviewed Sep 27, 2016

View reviewed changes

manicki force-pushed the prefixing-entity-id-parser branch 2 times, most recently from 404e279 to f00abfb Compare September 29, 2016 09:47

manicki force-pushed the prefixing-entity-id-parser branch 2 times, most recently from c814d8f to 42bfe71 Compare September 29, 2016 10:00

brightbyte approved these changes Sep 29, 2016

View reviewed changes

manicki mentioned this pull request Sep 29, 2016

Add PrefixMappingEntityIdParserFactory #149

Merged

manicki mentioned this pull request Oct 31, 2016

Add functions to assert a value is an allowable repository name, or an array indexed by repository names #155

Closed

brightbyte reviewed Oct 31, 2016

View reviewed changes

Add PrefixMappingEntityIdParser

2c7b998

The parser can be used to prefix IDs of foreign entities with a repository name, and resolve foreign repository prefixes to local prefixes based on the prefix mapping.

manicki force-pushed the prefixing-entity-id-parser branch from 6c76408 to 2c7b998 Compare November 1, 2016 10:13

Remove redundant $prefix parameter.

e4c8274

Require the "default" prefix to be always provided to the constructor. Also adds references to foreign id docs.

manicki mentioned this pull request Nov 1, 2016

Add functions to assert a value is a valid repository name, or an array indexed by valid repository names wmde/WikibaseDataModel#687

Merged

brightbyte merged commit 5e3d423 into master Nov 3, 2016

brightbyte deleted the prefixing-entity-id-parser branch November 3, 2016 16:40

		@@ -23,7 +23,8 @@
		"php": ">=5.3.0",
		"wikibase/data-model": "~6.0\|~5.0\|~4.2",

Add PrefixMappingEntityIdParser #146

Add PrefixMappingEntityIdParser #146

Conversation

manicki commented Sep 20, 2016 • edited Loading

brightbyte commented Sep 20, 2016

Choose a reason for hiding this comment

JeroenDeDauw Sep 20, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thiemowmde commented Sep 21, 2016

manicki commented Sep 21, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JeroenDeDauw Sep 21, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thiemowmde commented Sep 21, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brightbyte commented Sep 21, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thiemowmde commented Sep 21, 2016 • edited Loading

Choose a reason for hiding this comment

manicki commented Sep 21, 2016

brightbyte commented Sep 21, 2016

brightbyte left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JeroenDeDauw left a comment

Choose a reason for hiding this comment

JeroenDeDauw Sep 22, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JeroenDeDauw Sep 22, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

manicki commented Sep 22, 2016 • edited Loading

JonasKress commented Sep 22, 2016

brightbyte commented Sep 22, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brightbyte left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brightbyte left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

manicki commented Sep 20, 2016 •

edited

Loading

JeroenDeDauw Sep 20, 2016 •

edited

Loading

JeroenDeDauw Sep 21, 2016 •

edited

Loading

thiemowmde commented Sep 21, 2016 •

edited

Loading

thiemowmde commented Sep 21, 2016 •

edited

Loading

JeroenDeDauw Sep 22, 2016 •

edited

Loading

JeroenDeDauw Sep 22, 2016 •

edited

Loading

manicki commented Sep 22, 2016 •

edited

Loading

manicki commented Sep 29, 2016 •

edited

Loading

brightbyte commented Oct 28, 2016 •

edited

Loading