-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add PrefixMappingEntityIdParser #146
Conversation
Not sure we need to broad compat. If we do, we can check for EntityId::splitSerialization existing, and do nothing if it doesn't. |
* @param string $prefix Prefix to be added | ||
* @param EntityIdParser $idParser | ||
*/ | ||
public function __construct( $prefix, EntityIdParser $idParser ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps use Assert::propertyType( 'string', $prefix, '$prefix' ) to make sure $prefix is a string, and not a damp sock ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if ( !is_string( $prefix ) ) {
throw new InvalidArgumentException();
}
Such ancient PHP tho!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
public function parse($idSerialization) { | ||
// Do not add prefix if $idSerialization already starts with it | ||
if ( strncmp( $idSerialization, $this->prefix, strlen( $this->prefix ) ) !== 0 ) { | ||
$idSerialization = $this->prefix . $idSerialization; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This requires the prefix to contain a trailing ":". If we require this, then it should be checked by the constructor. It seems nicer to use EntityId::joinSerialization( [ $this->prefix, $idSerialization ] ).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was about to comment on this too. Reading the description again, it seems like the intent here is to be a generic prefixing parser that does not know about the meaning of different parts of IDs. If that where not the case, the class name should also change from something technical to using the correct domain term. As long as it's either way, which it currently is, it's fine by me.
In case we'd want a service specific to foreign repository "prefixes" [0], I'd disagree with your suggestion to use EntityId::joinSerialization
. Making a static call to code in another component instead of a much simpler single line of code that just does two string concats is a very poor tradeoff IMO. I'd be different if the format was likely to change.
[0] In the PR to EntityId I can't find how these things are called. "Prefixes" is an accurate technical term, though perhaps not the best choice of naming for something with more specific meaning in our domain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. I thought of this parser as a kind of generic parser that adds some prefix and not something foreign-id-specific. If we follow this path the code using the parser must ensure that the colon is present in a prefix passed to the parser.
This might be a more generic approach but on the other hand I believe the only use of such parser we now can think of would be adding a foreign repo prefix, wouldn't it? So it indeed seems reasonable to simply have ForeignEntitiyIdPrefixingParser
(better name needed).
In the latter case I would also prefer not to hard code the colon in this parser. That said I agree with @JeroenDeDauw and thinking of having a static call just to avoid simple a string concat makes me pull out my hair.
In the PR to EntityId I can't find how these things are called. "Prefixes" is an accurate technical term, though perhaps not the best choice of naming for something with more specific meaning in our domain.
At some point we have been leaning towards calling these "foreign repository names", so something along this lines could probably be part of the name of the parser instead of "prefix".
* @param string $idSerialization | ||
* @return EntityId | ||
*/ | ||
public function parse($idSerialization) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spaces
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bah, forgot it is not using MW PHP CS config. Fixed
*/ | ||
public function testParse_invalidInput( $prefix, $input ) { | ||
$parser = new PrefixingEntityIdParser( $prefix, new BasicEntityIdParser() ); | ||
$this->setExpectedException( EntityIdParsingException::class ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Travis is still running the tests with 5.3 and 5.4. Time to bump to 5.5+... either as a separate PR or in here, whichever is easier
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. I have even prepared a branch doing this but then realized #145 is already doing this. I wanted to refer to that PR in the description of this one but had obviously forgotten.
public function testParse_invalidInput( $prefix, $input ) { | ||
$parser = new PrefixingEntityIdParser( $prefix, new BasicEntityIdParser() ); | ||
$this->setExpectedException( EntityIdParsingException::class ); | ||
$parser->parse( $input ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this type of dataProvider usage pays for itself. How about having two methods:
- testGivenInvalidId_exceptionIsThrown
- testGivenEmptyId_exceptionIsThrown
And now I wrote this I realized that actually you are testing the decorated parser, BasicEntityIdParser, which is real production code. The one valid thing being tested here is that exceptions thrown by the decorated parser are not somehow caught. You only need one test case for that. And you can easily avoid binding to BasicEntityIdParser by using a test double.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to a single test, and switched to mock BasicEntityIdParser
This is a 1:1 copy of the SuffixEntityIdParser we already have in this repository. How did that happen? |
@thiemowmde apart from the fact it does the opposite than |
* @return EntityId | ||
*/ | ||
public function parse($idSerialization) { | ||
// Do not add prefix if $idSerialization already starts with it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, why does a parser add something that was not given in the input? This feels wrong. This means: All inputs with a prefix not supported by this parser will "successfully" be parsed. Or in other words: This parser assumes all inputs either start with the expected or none prefix. We would need some kind of PrefixValidator to make sure this is always the case. I don't think this is the best approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What you describe is not exactly what this parser was intended to do. Particularly this:
This parser assumes all inputs either start with the expected or none prefix.
Could be that the class description is not clear enough. Or it should be called ...ParserWrapper
or so.
Idea here is to add a defined prefix (if needed) and parse it using the parser passed in the constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps it's me being all the sleepies, but I don't understand what you mean.
Edit: oh, @manicki already commented now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intention is this: we read JSON coming from repo foo. To avoid conflicts with local IDs, all IDs that are coming from foo get the "foo:" prefix. They are then passed on to the regular ID parser.
Oh my. Then I suggest to find a better name for this. What it does is:
This is not a parser. Better remove the inner |
return array( | ||
array( 'wikidata:', 'Q1337', new ItemId( 'wikidata:Q1337' ) ), | ||
array( 'foo:', 'P123', new PropertyId( 'foo:P123' ) ), | ||
array( 'foo:', 'bar:Q1337', new ItemId( 'foo:bar:Q1337' ) ), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the test case that scares me the most. Let's say I do have two localized Wikibase repositories, "debase" and "enbase". How can code ever assume a prefixed EntityId like "enbase:Q1" is not prefixed, and prefix it with "debase:"? What does "debase:enbase:Q1" mean then? What if the inner EntityId parser accepts "enbase:Q1"? What will happen then? Some kind of remote validation? Possibly going in a loop? What's the difference between "debase:enbase:Q1" and "enbase:debase:Q1"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the difference between "debase:enbase:Q1" and "enbase:debase:Q1"?
The meaning of "enbase" on "debase" and the other way round... I think... That should be something that is well understood before implementing such IDs, which is not happening in this PR, which is dealing with them after they already have been implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume we are talking about the same.
This code allows prefixed EntityIDs with multiple prefixes. As far as I remember we never talked about this. I strongly suggest to not allow this anywhere in any code as long as we don't really need this, and have a specification for this that covers questions like the ones I raised above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thiemowmde I believe questions you ask are a bit beyond the scope of what this "parser" (which might not end up being call a parser any more) is supposed to do.
Your questions rather seem to me to be about the idea of "foreign entity ids" in general. It is definitely good that you raised them.
I don't know if I am the best person to try to answer them, and if you want to discuss them here, but to pick few of them:
How can code ever assume a prefixed EntityId like "enbase:Q1" is not prefixed, and prefix it with "debase:"?
Given the current (I believe unwritten) specification of foreign entity IDs, the ID could contain multiple parts separated by a colon. Left-most of them is the "foreign repository name", while all other is the ID coming from that foreign repo. In order to resolve the ID (ie. to fetch the entity the ID is referring to) there will be some lookup mechanism which will talk to the foreign repo.
So if I had a local Wikibase instance, and two foreign repos configured, using names "enbase" and "debase", when I see an entity ID like "enbase:Q1" or "enbase:debase:Q1" what I would know is that these entities are coming from the repo I call "enbase". I don't really need to know what "Q1" and "debase:Q1" stand for in this case. When resolving "enbase:debase:Q1" requires enbase repo to do lookup in another foreign repo (which enbase labelled as "debase") then fine, this will happen. My local repo will not really know about that.
So basically, as @JeroenDeDauw already said while I was typing, this "parser" does not know specifics of implementation of "foreign ids", this is done in other places.
Checking if the prefix is already in place would avoid having things like "enbase:enbase:Q1" but you're right when multiple prefixes are allowed, some "loops" on further "positions" might be possible, and something should/could be done about them. Still I don't think it is to be concern of this simple prefixing "parser".
Also, BTW, I don't know what you mean by "localized" in
Let's say I do have two localized Wikibase repositories, "debase" and "enbase".
How Wikibase instance could be localized?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added an explanation about "chained" foreign IDs in https://phabricator.wikimedia.org/T133381. Basically, they work like interwiki prefixes, which can look like de:wikibooks:fr:wikinews:commons
.
e258458
to
7d1142a
Compare
For the record: I think this is a "parser" by virtue of implementing a parser interface. It's a parser that causes the resulting EntityId to have a specific prefix. We could add "Proxy" or "Wrapper" to the end of the name, but "EntityIdParser" should somewhere be in it, too. |
public function parse( $idSerialization ) { | ||
// Do not add prefix if $idSerialization already starts with it | ||
if ( strncmp( $idSerialization, $this->prefix, strlen( $this->prefix ) ) !== 0 ) { | ||
$idSerialization = $this->prefix . $idSerialization; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use EntityId::joinSerialization() to attach the prefix.
private $idParser; | ||
|
||
/** | ||
* @param string $prefix Prefix to be added |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should make clear whether the prefix given here should or should not include the ":" at the end. With the current code, it does, but I think it should not.
Not sure what you mean. This class makes assumptions that are not backed by anything. It should never ever happen that some code gets an arbitrary string that may or may not be prefixed. What's the use case of this? Either it is prefixed and we know that, or it is not. Edit: Or we have a list of known prefixes, fed into an actual parser that parses remaining entity IDs that do not start with a known prefix as if they start with the local prefix. Which I believe is what this class is meant to do, but does not do, because it does not know anything about other prefixes. This code is fragile. This code scares me, and I will most probably try to work around it with something that gives more guarantees.
Fine. Why not remove the Oh, here is more:
|
*/ | ||
public function parse( $idSerialization ) { | ||
// Do not add prefix if $idSerialization already starts with it | ||
if ( strncmp( $idSerialization, $this->prefix, strlen( $this->prefix ) ) !== 0 ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I completely missed this, and I think this is the cause for Thiemo's criticism: this is wrong. The prefix must ALWAYS by applied. If the repo we call "foo" sends us an Id with the prefix "foo" we can not assume that that's the same foo. We must add the prefix anyway, resulting in "foo:foo:Q123", meaning "Q123 in the repo that the repo we call foo calls foo". Confusing, but consistent :)
7d1142a
to
bec0161
Compare
Thanks @thiemowmde @brightbyte @JeroenDeDauw for the review so far. After talking to @brightbyte today (in person, the outcome of this I've put in https://phabricator.wikimedia.org/T146274) I've expanded what I am trying to add in this PR. I've changed the description and title of the PR, and did some suggested changes:
Per @thiemowmde's comment:
I believe we actually want to disallow using colons in entity IDs. This should be definitely made clear in EntityId docs! |
For the record: Entity IDs cannot contain colons. We should add this this to the docs somewhere. IDs with colons would not only cause confusion in the context of repo prefixes for foreign IDs, it would also be confusing the the context of RDF prefixes and MediaWiki namespaces. So, please, no colons in IDs :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-1 only for the mangled name of the test class. Everything else are just suggestions.
* @return EntityId | ||
*/ | ||
public function parse( $idSerialization ) { | ||
$prefixedIdSerialization = EntityId::joinSerialization( array( $this->prefix, '', $idSerialization ) ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest to do this in an else-branched below, only for unmapped prefixes.
public function parse( $idSerialization ) { | ||
$prefixedIdSerialization = EntityId::joinSerialization( array( $this->prefix, '', $idSerialization ) ); | ||
$serializationParts = EntityId::splitSerialization( $idSerialization ); | ||
$serializationRepoName = $serializationParts[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$foreignPrefix? Maybe worth a comment: "a prefix used by the foreign repo"
* | ||
* @license GPL-2.0+ | ||
*/ | ||
class PrefMappixingEntityIdParserTest extends \PHPUnit_Framework_TestCase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like "Mappixing", but the name of the test class should be consistent with the class under test :)
$prefixedIdSerialization = EntityId::joinSerialization( array( $this->prefix, '', $idSerialization ) ); | ||
$serializationParts = EntityId::splitSerialization( $idSerialization ); | ||
$serializationRepoName = $serializationParts[0]; | ||
if ( array_key_exists( $serializationRepoName, $this->prefixMapping ) ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I generally prefer isset( $this->prefixMapping[$serializationRepoName] ) for readability. Note that these behave slightly differently: isset() will return falls if the key exists but the value is null, while array_key_exists() will return true in that case. That difference isn't relevant here, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that isset reads better. While writing this I assumed that this prefix mapping will be coming from some reader class that would take care of validation/sanitization of mappings, so this code would not need to worry if mappings are sane (ie. if we expect values to be strings we will be sure we have string at this point). Thus here we could only check if the key exists and take its value if so. But wait, doesn't it mean that isset fits well here too :) Ahh, switched to isset then!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is not really a difference between isset and array_key_exists: http://maettig.com/1397246220. I do prefer array_key_exists when an array may contain null
values. isset may fail then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR has gotten an impressive number of comments, about one for each 4 lines by now :)
class PrefMappixingEntityIdParserTest extends \PHPUnit_Framework_TestCase { | ||
|
||
private function getRegularParser( $input, EntityId $returnValue ) { | ||
$regularParser = $this->getMock( BasicEntityIdParser::class ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use the interface?
Also applies to the tests below
$regularParser = $this->getMock( BasicEntityIdParser::class ); | ||
$regularParser->expects( $this->once() ) | ||
->method( 'parse' ) | ||
->with( 'wikidata:QQQ' ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the name of the test, I'm wondering why you are also testing that the call is made just once, and that a given parameter value is passed along. These are different concerns already tested elsewhere.
*/ | ||
public function testGivenInvalidPrefix_exceptionIsThrown( $prefix ) { | ||
$regularParser = $this->getMock( BasicEntityIdParser::class ); | ||
$regularParser->expects( $this->never() )->method( 'parse' ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure there is value in testing that this thing is never called. If it has no side effects, which it likely does not, you can't even tell from the outside. So this seems like needless binding to implementation detail.
If there is some reason to verify "decorated parser is not called", then I'd change this line to
$regularParser->expects( $this->never() )->method( $this->anything() );
Also applies to test below
*/ | ||
public function __construct( $prefix, array $prefixMapping, EntityIdParser $idParser ) { | ||
Assert::parameterType( 'string', $prefix, '$prefix' ); | ||
Assert::parameter( strpos( $prefix, ':' ) === false, '$prefix', 'must not contain a colon' ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add the relevant @throws docs to the constructor? We'd be doing this in case of inline exception throwing, and where it's definitely more valuable since it's a lot less easy to see what's happening otherwise.
bec0161
to
300716b
Compare
@brightbyte @JeroenDeDauw I am too lazy to write "Done" under each of your comments. I agree with each of your recent points and have done changes suggested. Thanks for taking a look at this code! |
Thank you @manicki for being bold! |
Uh... PHP Fatal error: Call to undefined method Wikibase\DataModel\Entity\EntityId::splitSerialization() in /home/travis/build/wmde/WikibaseDataModelServices/src/EntityId/PrefixMappingEntityIdParser.php on line 61 Didn't I just merge the PR that would fix this? |
@@ -23,7 +23,8 @@ | |||
"php": ">=5.3.0", | |||
"wikibase/data-model": "~6.0|~5.0|~4.2", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will need to change to 6.2, or we need to add B/C code to handle earlier versions of the data model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this should be changed to 6.2 when it is released.
This will also solved the PHP fatal you're quoting above. Although you've merged the PR adding foreign ids to data model it still needs to be tagged.
To test this PR locally you could temporarily hack composer json to use master (or rather the branch of wmde/WikibaseDataModel#679)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only very minor issues left.
use Wikibase\DataModel\Entity\EntityIdParser; | ||
|
||
/** | ||
* EntityIdParser that adds a fixed prefix and parses resulting string as an EntityId. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does more now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
expanded
Assert::parameterType( 'string', $prefix, '$prefix' ); | ||
Assert::parameter( strpos( $prefix, ':' ) === false, '$prefix', 'must not contain a colon' ); | ||
$this->prefix = $prefix; | ||
$this->prefixMapping = $prefixMapping; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assert::parameterElementType( 'string', $prefixMapping, '$prefixMapping' )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. Also added an assertion on keys (should be only strings too)
BTW: how there is no such assertion (array with string-only keys and values) in Assert yet - how often we want such constraints? does it make any sense to consider expanding Assert to include such thing?
*/ | ||
public function parse( $idSerialization ) { | ||
$serializationParts = EntityId::splitSerialization( $idSerialization ); | ||
$foreignPrefix = $serializationParts[0]; // sa prefix used by foreign repository |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sa prefix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I must have fallen asleep on a keyboard
300716b
to
75f5419
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is blocked on WikibaseDataModel 6.2 being released, so no "approve".
The code looks fine to merge. I added some comments regarding readability and documentation. I'd also recommend adding a few more test cases.
* to a local prefix according to the prefix mapping | ||
* and parses resulting string as an EntityId. | ||
* This can be used to prefix IDs of entities coming from a foreign repository | ||
* with the repository name to avoid clashes with IDs of local entities. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Not a blocker for this PR, but a TODO): This should reference a proper spec document. That spec should contain essentially what is described in https://phabricator.wikimedia.org/T133381. The spec should live in docs/foreign-entity-ids.wiki (or .md?), but I'm not sure in which repo. It could live in Wikibase, so we have all the docs in one place. But that would break with the idea that documentation should live in the same repo as code, and should always be kept in sync with the code, commit by commit. So we'' want a docs directory in the WikibaseDataModelServices I suppose.
|
||
/** | ||
* @param string $prefix Prefix to be added. It should not contain a colon, in particular at the end of the prefix | ||
* @param string[] $prefixMapping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the meaning and interaction of $prefix and $prefixMapping is unclear here. A reference ot a spec would help a lot.
Assert::parameterType( 'string', $prefix, '$prefix' ); | ||
Assert::parameter( strpos( $prefix, ':' ) === false, '$prefix', 'must not contain a colon' ); | ||
Assert::parameterElementType( 'string', $prefixMapping, '$prefixMapping' ); | ||
Assert::parameterElementType( 'string', array_keys( $prefixMapping ), '$prefixMapping' ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid a misleading error message, this should be
Assert::parameterElementType( 'string', array_keys( $prefixMapping ), 'array_keys( $prefixMapping )' );
} | ||
|
||
/** | ||
* Adds the prefix to the given string, maps the resulting string according to prefix mapping definition |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implies the mapping is done after adding the prefix. It would be possible to do it that way, but the code below does not do that - it applies the mapping OR adds the prefix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the code does it that way but I agree the description could be more clear. I've rephrased it slightly. And I agree the having some docs on this whole thing will be more than helpful!
Nah, looking at the new version it is basically the same as used to be before. Any suggestions?
* @return EntityId | ||
*/ | ||
public function parse( $idSerialization ) { | ||
$serializationParts = EntityId::splitSerialization( $idSerialization ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More readable:
list( $repoName, $extraPrefixes , $relativeId )
class PrefixMappingEntityIdParserTest extends \PHPUnit_Framework_TestCase { | ||
|
||
private function getRegularParser( $input, EntityId $returnValue ) { | ||
$regularParser = $this->getMock( EntityIdParser::class ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could simply return a new BasicEntityIdParser(), no need to mock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've apparently gone too far with mocking things here. So mocks are now only use with tests that don't really "parse" anything, in cases like here BasicEntityIdParser is used
$this->assertEquals( $expected, $parser->parse( $input ) ); | ||
} | ||
|
||
public function validIdSerializationProvider() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer provideValidIdSerialization(), since function names should be verbs. But we are sadly inconsistent in that regard, so whatever.
public function mappedSerializationIdProvider() { | ||
return array( | ||
array( 'foo', array( 'bar' => 'wd' ), 'bar:Q1337', 'wd:Q1337', new ItemId( 'wd:Q1337' ) ), | ||
array( 'foo', array( 'bar' => 'wd' ), 'bar:x:Q1337', 'wd:x:Q1337', new ItemId( 'wd:x:Q1337' ) ), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps add a test case that makes sure that x:bar:Q1337 is not mapped.
} | ||
|
||
public function validIdSerializationProvider() { | ||
return array( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also test the case of adding an empty prefix.
@JeroenDeDauw have your concerns been addressed? |
404e279
to
f00abfb
Compare
@brightbyte sorry, I missed the github ping that you've commented. I've also adjusted Travis CI config, as this change would require Data Model 6.2+. |
c814d8f
to
42bfe71
Compare
@manicki thanks. Looks good now, though I can't find a way to get a diff of what you changed since my review. Github really like amend / force push. Reviews should probably be addressed by adding a commit. Anyway: +1, blocked on the release of DataModel 6.2. Spec document is pending. Shall we make a ticket for adding the spec document, or do you want to make a pull request right away? I think the spec for foreign IDs should be in the DataModel component - that component defines the EntityId object, so it should also define the semantics, even if it doesn't implement all of it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Yes. From now on I will be trying to add commits on top, and squash them into single commit when reviewers say it is ready.
I am starting writing it down right away. |
thanks @maniki! |
@maniki: What's the status here? Do we have a id mapping spec now somewhere? Two comments regarding documentation are still pending. |
@brightbyte so the docs on foreign ids, including a part on prefix mapping are now in https://github.com/wmde/WikibaseDataModel/blob/master/docs/foreign-entity-ids.wiki (added in wmde/WikibaseDataModel#682) |
It seems to me like we can merge this. The only thing that I'd still like to see is an explicit reference to the prefix mapping spec/docs from the class level and method level documentation. |
Assert::parameter( strpos( $prefix, ':' ) === false, '$prefix', 'must not contain a colon' ); | ||
Assert::parameterElementType( 'string', $prefixMapping, '$prefixMapping' ); | ||
Assert::parameterElementType( 'string', array_keys( $prefixMapping ), 'array_keys( $prefixMapping )' ); | ||
$this->prefix = $prefix; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, that could just be $prefixMapping['']....
The parser can be used to prefix IDs of foreign entities with a repository name, and resolve foreign repository prefixes to local prefixes based on the prefix mapping.
6c76408
to
2c7b998
Compare
Require the "default" prefix to be always provided to the constructor. Also adds references to foreign id docs.
Rebased the PR after switching to short array syntax. |
Giving @JeroenDeDauw a chance to have a look. Poke me to merge this tomorrow. I think I can override "requested changes". |
So let's assume now is the "tomorrow". Ping @brightbyte ! |
The parser can be used to prefix IDs of foreign entities with a repository name, and resolve foreign repository prefixes to local prefixes based on the prefix mapping.
This depends on wmde/WikibaseDataModel#679 and tests will be failing until that PR is merged.Also I just realized that his repository might not be a right location for the new parser. As it relies on a feature to be added in DataModel 6.2 (or so), having the parser here would break DataModelServices compatibility with any earlier DataModel versions. I guess we want to keep this broad compatibility. Should I rather put this parser in the DataModel repo?Ticket: T146274