Wikidata:Property proposal/SIMBAD catalog properties (used more than 1 million times)
SIMBAD catalog properties (used more than 1 million times)
[edit]Gaia Data Release 2 ID
[edit]Return to Wikidata:Property proposal/Natural science
Description | identifier for an astronomical object in Gaia Data Release 2 |
---|---|
Data type | External identifier |
Domain | astronomical objects |
Allowed values | [0-9]{18} |
Example 1 | BS Cnc (Q2889194) → 661284024235415808 |
Example 2 | Gliese 450 (Q5880899) → 4031586157514097024 |
Example 3 | TYC 3645-2080-1 (Q75838267) → 1943381923013901440 |
Source | Gaia Data Release 2 (Q51905050) |
Planned use | migrate all P528 values qualified with P972 Q51905050 to this property |
Formatter URL | https://simbad.u-strasbg.fr/simbad/sim-id?Ident=Gaia%20DR2%20$1 |
2MASS ID
[edit]Return to Wikidata:Property proposal/Natural science
Description | identifier for an astronomical object in the Two Micron All Sky Survey |
---|---|
Data type | External identifier |
Domain | astronomical objects |
Allowed values | J[0-9]{8}[+-][0-9]{7} |
Example 1 | BS Cnc (Q2889194) → J08390909+1935327 |
Example 2 | Gliese 450 (Q5880899) → J11510737+3516188 |
Example 3 | TYC 3645-2080-1 (Q75838267) → J23350993+4851114 |
Source | 2MASS (Q1454942) |
Planned use | migrate all P528 values qualified with P972 Q1454942 to this property |
Formatter URL | https://simbad.u-strasbg.fr/simbad/sim-id?Ident=2MASS%20$1 |
Tycho-2 Catalogue ID
[edit]Return to Wikidata:Property proposal/Natural science
Description | identifier for an astronomical object in the Tycho-2 Catalogue |
---|---|
Data type | External identifier |
Domain | astronomical objects |
Allowed values | [0-9]{1,4}-[0-9]{1,4}-1 |
Example 1 | BS Cnc (Q2889194) → 1395-2445-1 |
Example 2 | Gliese 450 (Q5880899) → 2526-2357-1 |
Example 3 | TYC 3645-2080-1 (Q75838267) → 3645-2080-1 |
Source | The Tycho-2 catalogue of the 2.5 million brightest stars (Q2725928) |
Planned use | migrate all P528 values qualified with P972 Q2725928 to this property |
Formatter URL | https://simbad.u-strasbg.fr/simbad/sim-id?Ident=TYC%20$1 |
Gaia Data Release 1 ID
[edit]Return to Wikidata:Property proposal/Natural science
Description | identifier for an astronomical object in Gaia Data Release 1 |
---|---|
Data type | External identifier |
Domain | astronomical objects |
Allowed values | [0-9]{18} |
Example 1 | BS Cnc (Q2889194) → 661284019938140032 |
Example 2 | Gliese 450 (Q5880899) → 4031586157514097024 |
Example 3 | TYC 3645-2080-1 (Q75838267) → 1943381923012780160 |
Source | Gaia Data Release 1 (Q37859523) |
Planned use | migrate all P528 values qualified with P972 Q37859523 to this property |
Formatter URL | https://simbad.u-strasbg.fr/simbad/sim-id?Ident=Gaia%20DR1%20$1 |
SDSS object ID
[edit]Return to Wikidata:Property proposal/Natural science
Description | identifier for an astronomical object in the Sloan Digital Sky Survey |
---|---|
Data type | External identifier |
Domain | astronomical objects |
Allowed values | J[0-9]{6}\.[0-9]{2}[+-][0-9]{7}\.[0-9] |
Example 1 | BS Cnc (Q2889194) → J083909.03+193532.4 |
Example 2 | Gliese 450 (Q5880899) → J115106.57+351627.2 |
Example 3 | TYC 3645-2080-1 (Q75838267) → J233509.93+485111.4 |
Source | Sloan Digital Sky Survey (Q840332) |
Planned use | migrate all P528 values qualified with P972 Q840332 to this property |
Formatter URL | https://simbad.u-strasbg.fr/simbad/sim-id?Ident=SDSS%20$1 |
OGLE-III object ID
[edit]Return to Wikidata:Property proposal/Natural science
Description | identifier for an astronomical object in the Optical Gravitational Lensing Experiment |
---|---|
Data type | External identifier |
Domain | astronomical objects |
Example 1 | R99 (Q22087000) → BRIGHT-LMC-MISC-429 |
Example 2 | R85 (Q28406638) → BRIGHT-LMC-MISC-9 |
Example 3 | SV* HV 2827 (Q74703824) → LMC-CEP-4689 |
Source | The Optical Gravitational Lensing Experiment. The OGLE-III catalog of variable stars. I. Classical Cepheids in the Large Magellanic Cloud (Q67054966) |
Planned use | migrate all P528 values qualified with P972 Q67054966 to this property |
Formatter URL | https://simbad.u-strasbg.fr/simbad/sim-id?Ident=OGLE%20$1 |
Motivation
[edit]The specific combination of catalog code (P528) qualified by catalog (P972) is used in 24 million statements, the vast majority of which are for astronomical objects. About 14 million of these statements come from six catalogues, so migrating those statements to use these properties would remove the 14 million triples taken up by the P972 qualifiers. (Another 18 catalogues have more statements than the number of statements for inventory number (P217) with qualifier collection (P195) The Palace Museum (Q2047427)—127545 as of 6 August 2024.)
(This migration would similar to the migration that took place after the properties proposed at Wikidata:Property proposal/proper motion components were created. While this page intends to handle only the six largest catalogues, if you believe there are other large catalogues whose catalog codes would do well to be migrated to properties, please say so in a comment.) Mahir256 (talk) 21:56, 6 August 2024 (UTC)
Discussion
[edit]- @Mahir256 Is there any specific reason why we want to reduce number of P528 statements? Ghuron (talk) 00:03, 7 August 2024 (UTC)
- @Ghuron: We have dedicated external identifier properties rather than lumping them all in a single property and qualifying them, just as we have dedicated website account properties rather than always using website account on (P553) qualified with website username or ID (P554). This proposal is intended as a logical parallel of both of those decisions. Mahir256 (talk) 17:18, 12 August 2024 (UTC)
- @Mahir256: Let me rephrase how I understood your rationalization: if
p:P528/pq:P972 wd:Q51905050
occurs more than a million times, then it is both a necessary and sufficient condition for creating a new property, since it reduces the number of triplets and thus reduces the risk of Blazegraph crashing. Is that a correct summary? Ghuron (talk) 22:44, 12 August 2024 (UTC)- @Ghuron: I would not phrase it quite so absolutely, but I do want to see the number of triples reduced and believe this is a way to do it; an extremely high number of identically structured uses of a generic identification property like catalog code (P528) with the same qualifiers suggests that a more specialized identifier property is worth introducing to streamline things, just as has been done multiple times before. Mahir256 (talk) 16:50, 13 August 2024 (UTC)
- @Mahir256: Let me rephrase how I understood your rationalization: if
- @Ghuron: We have dedicated external identifier properties rather than lumping them all in a single property and qualifying them, just as we have dedicated website account properties rather than always using website account on (P553) qualified with website username or ID (P554). This proposal is intended as a logical parallel of both of those decisions. Mahir256 (talk) 17:18, 12 August 2024 (UTC)
- See also Wikidata:Property proposal/New General Catalogue ID, and various failed proposal for properties for astronomical catalogues such as Wikidata:Property_proposal/Archive/15#HD.--GZWDer (talk) 12:28, 7 August 2024 (UTC)
- NGC ID is actually an example of a misleading external id. This is a very old catalog, and historians are debating how their IDs correspond to objects in modern catalogs. The most authoritative source for that discussion is this site, which is difficult to assign as "formatter URL". SEDS which is used now, is ok only for ~80% of elements. Ghuron (talk) 18:58, 7 August 2024 (UTC)
- I also proposes (since we have mul aliases) add each of catalog IDs as mul aliases. This is controversial though.--GZWDer (talk) 12:32, 7 August 2024 (UTC)
- Support Having unique identifiers for astronomical objects and being able to correlate them is important; something hard to do with catalog code. ArthurPSmith (talk) 20:45, 7 August 2024 (UTC)
- Oppose I don't think this proposal will improve anything. If anything it may cause further confusion:
- As stated by Ghuron, is there any reason why we need to reduce the number of P528 statements? In the first place there are millions of Gaia IDs because of the import of the Simbad database (I am NOT against this import btw).
- Also, I wonder why only some catalogues would have their own properties. This will create a weird in-between for catalogues in P258 vs catalogues having their own properties. This makes no sense imo.
- Romuald 2 (talk) 15:31, 8 August 2024 (UTC)
- There is nothing wrong with having separate external id properties for most used identifiers with the correct "url formatter".
But I have 2 major objections:
- I don't see any reason to use https://simbad.u-strasbg.fr/simbad/sim-id?Ident= as a url. Those items that are on simbad, we already have Property:P3083 with the link to simbad. Those rare items that are not on simbad, this link will result in 404
- Having in mind (1) it would make sense to link to really useful external storages, that are only partially synchronized with simbad (like HyperLEDA or Gaia Archive). And that leads us to question about proposed set of properties:
- Why did we choose Gaia DR2, because this is only temporary IDs, permanent are Gaia DR3?
- Why did we choose Tycho-2, they pretty much 100% imported in Simbad?
- Ghuron (talk) 12:52, 9 August 2024 (UTC)
- @Romuald 2: Reducing the number of RDF triples that Wikidata consists of is generally a good thing, as there is a lot of discussion going on about the health of the Query Service and how reducing the number of triples that a single running Blazegraph instance holds is generally a good thing. Also I had noted that there were 18 other catalogs with more entries than the most frequent inventory number source; I only didn't add them to this page because it would have got too long. If these six go through, then I will promptly propose properties for those 18 (and as I stated in the motivation above, if you believe there are other large catalogues whose catalog codes would do well to be migrated to properties, please say so in a comment). Mahir256 (talk) 17:18, 12 August 2024 (UTC)
- @Ghuron: The reason I selected the SIMBAD formatter URL is that the external IDs I tried with that URL all seemed to resolve to the right objects; if there are in fact objects for which this resolution doesn't work, it would be great if you could name some. The caveat "(used more than 1 million times)" in the title of this property proposal page is important; because your imports did not yield more than 1 million Gaia DR3 identifiers, I did not think to propose a property for it here, though I'd gladly support one for Gaia DR3 if you think it would be useful. I don't know who "we" is as regards either Gaia DR2 or Tycho-2; you're the one who mass-imported the objects, so I'm working with the catalog codes I see on those objects. Mahir256 (talk) 17:18, 12 August 2024 (UTC)
- @Ghuron and @GZWDer, would you like to give your opinions? Regards, ZI Jony (Talk) 18:37, 16 September 2024 (UTC)
- I view external identifiers somewhat differently than @Mahir256. In my understanding, a new external identifier is needed when it provides a link to a new, previously unrelated external data source. In the proposed cases, we are getting connection to the same SIMBAD that we are already connected to via Property:P3083. Personally, the proposed identifiers will not bring me any valie (nor will they cause any harm).
I understand the idea that this will reduce the number of triplets, but I think that the measly few million that we are discussing here are a drop in the ocean. Our goal is to upload data to Wikidata, and not try to optimize it in a way that makes life easier for the foundation's engineers. Let them do their job and we will do ours. Ghuron (talk) 19:00, 16 September 2024 (UTC)
- I view external identifiers somewhat differently than @Mahir256. In my understanding, a new external identifier is needed when it provides a link to a new, previously unrelated external data source. In the proposed cases, we are getting connection to the same SIMBAD that we are already connected to via Property:P3083. Personally, the proposed identifiers will not bring me any valie (nor will they cause any harm).
- @Ghuron and @GZWDer, would you like to give your opinions? Regards, ZI Jony (Talk) 18:37, 16 September 2024 (UTC)