16 Responses

  1. Steve Pepper
    Steve Pepper March 23, 2009 at 18:37 |

    I don’t think this is the right way to go – but I’m willing to be convinced, especially if Wikipedia is willing to take certain actions. Let me just mention one problem:

    I need an identifier for the Italian composer Giacomo Puccini. Which of the following Wikipedia URLs should I use as the PSI:

    [+ 50 or so more]

  2. Vegard Sandvold
    Vegard Sandvold March 24, 2009 at 01:32 |

    Steve, it’s a pleasure to hear from you. I enjoyed your tutorial last Wednesday on Topic Maps Norway 2009. Getting some of the fundamentals of topic maps in place was really helpful.

    I guess you know better than me which actions Wikipedia could/should take to provide you with a proper PSI for Puccini. Here’s what I think, as I try to break down the problem.

    The first URL is a nonstandard form of the second URL, which can be considered the canonical form. Would it be better it Wikipedia disallowed nonstandard article URLs, instead of just redirecting to the canonical?

    The other URLs belong to non-English Wikipedia sites. This is really of the top of my head, but wouldn’t it be possible to make scopes out of language subdomains, like the language scopes of your Italian opera TM? That way, your Puccini PSI would be a language-independent canonical article URL, with subdomains representing scops, perhaps with English as the default scope.

    How would these changes improve the situation, you think?

  3. Steve Pepper
    Steve Pepper March 24, 2009 at 14:26 |

    For Wikipedia URIs to be able to function as PSIs Wikipedia must explicitly designate one URI as the canonical URI for each subject, otherwise users won’t know which one to use. This already happens (albeit implicitly) within a single language version of Wikipedia through redirection, as when …/Puccini redirects to …/Giacomo_Puccini. However, this doesn’t work across different languages.

    I for one would not be comfortable for English to become the default, because of the cultural bias this involves. I think there needs to be one URI which is language independent. My first thought was to suggest:


    but this would be unfair to the 54,000 speakers of Southern Pashayi in Afghanistan (ISO 639 language code “psi”) who might one day want their own Wikipedia (see http://www.ethnologue.org/show_language.asp?code=psi). An alternative might be:


    If Wikipedia were to implement this as the canonical identifier, explicitly stated on each page devoted to the subject in question, my first concern would go away. (One could envisage the possibility of such a PSI automatically resolving to the natural language version of the user’s choice, although this could also raise new concerns.)

    It wouldn’t alleviate my second concern, though:

    When we developed the Published Subjects paradigm, it was generally agreed that the PSD (published subject descriptor, aka published subject indicator) – the resource that the PSI resolves to – should contain the _bare minimum_ of assertions necessary to unambiguously identify the subject. The thinking was that the more assertions one makes about a subject, the greater the chance that people will disagree, and the more likely that some people will refuse to use the PSI (and that, of course, defeats the whole point of having a PSI). Wikipedia articles typically make many assertions about each subject and although they strive for objectivity, in practice this is never fully attainable.

    My third concern is that editorial policy at Wikipedia would prevent you from creating articles about some subjects. (I doubt, for example, that they would countenance an article about my mother’s dog, Barney, http://psi.ontopedia.net/Barney.) This means that Wikipedia could at best only provide a starter set of PSIs, which would have to be supplemented in other ways.

    At Ontopedia we have experimented with different approaches, but we have mostly followed the convention of using our own “namespace” (http://psi.ontopedia.net/) together with the “local part” (e.g. Giacomo_Puccini) that Wikipedia uses as its (implicit) canonical URI.

  4. Vegard Sandvold
    Vegard Sandvold March 24, 2009 at 20:11 |

    I can only speculate, but I believe that a richer PSD could work as an incentive for people to adopt the use the PSI. I understand the rationale behind minial and objective PSDs, but what if the lack of information and context makes me question the quality of the PSI, and it’s dependability as a lasting identifier for my topic?

    I mean, which PSD/PSI feels more authorative and dependable?

    I choose Wikipedia. It just feels more inviting and cared for. And care breeds confidence. Sure, people can disagree on the definition of a topic, and that happens all the time on Wikipedia. But most edit wars are eventually resolved, and the end product is close to the bare minimum of assertions people are willing to agree upon. Objective enough, you could say.

    Let’s flip the question around…

    Can I use Wikipedia article URLs as PSIs for my own topic maps? Would you advice against that for any reason?

  5. Dutchbob
    Dutchbob March 25, 2009 at 21:58 |

    Hei Vegard,

    interesting and necessary discussion. You might also want to have a look at this paper on “Cool URIs for the Semantic Web” as well:

    The idea of PSI’s is very interesting and has strong points, albeit I like the idea of having several Unique Identifiers at different places at the web for a specific object/topic/idea better. In such a manner selecting the “best” description really becomes a community-driven initiative. People might want to add similarity between pointers to the same object through constructs like owl:sameAs, which let anybody state similarity between URIs if they feel there is one. An interesting topic should be how we can trace where such statements came from, in such a way that we can (by integrating a FOAF/trusted peer concept) filter out those statements about such URIs that we really trust.

  6. Vegard Sandvold
    Vegard Sandvold March 26, 2009 at 22:08 |

    Hi Robert!

    I understand more of your comment now, after listening to your presentation today about Topic Maps, Semantic Web (and search). The concept of “compound identifiers” using the owl:sameAs is very interesting, together with the decentralized infrastructure for topic identification and description. I still don’t know enough about RDF, OWL, FOAF and the rest to really feel I can have a qualified opinion about what works or not, but what you’re saying resonates well with my general understanding.

    Thanks for sharing!

  7. Vegard Sandvold
    Vegard Sandvold March 28, 2009 at 13:01 |

    Are Gulbrandsen comments on this post on his own blog Everythings a Subject:

    My view is that I would currently use Wikipedia, because on some subjects it’s the best source I got. I agree with Steve Pepper, but imagine that it could be useful in some contexts to be a bit fuzzy on purpose. A widely defined and a bit fuzzy subject might be exactly want we want, to be able to “start a conversation”.

    He links to this blog post by Lars Marius Garshol: http://www.garshol.priv.no/blog/91.html

    (Note: Backlinks have for some reason disappeared from my blog. I need to check out why.)

  8. Martin Stricker
    Martin Stricker April 2, 2009 at 15:59 |

    re: PSI for Puccini

    How about using DBpedia[1] – it has already consolidated the different Wikipedia identifiers, with nice labels (“names”) comments in many languages plus media references and additional relations; and it is implemented with HTTP content negotiation. For instance:


    [1] http://dbpedia.org/About

  9. Steve Pepper
    Steve Pepper April 23, 2009 at 23:27 |

    Today I came across an interesting example of why I wouldn’t want to use Wikipedia as a source of PSIs – or any other source, including DBpedia – unless they guarantee to maintain a policy of stability.

    Browsing the article about “Paul Foot” [1] I clicked on the link to “International Socialists (UK)” [2] and was redirected to the page on the “Socialist Workers Party (Britain)” [3]. Although related, these are *not* the same subject. Even worse, the URL shown in the address bar of my browser for the SWP page was that of the IS page.

    [1] http://en.wikipedia.org/wiki/Paul_Foot
    [2] http://en.wikipedia.org/wiki/International_Socialists_(UK)
    [3] http://en.wikipedia.org/wiki/Socialist_Workers_Party_(Britain)

    It’s one thing to redirect from …/Puccini to …/Giacomo_Puccini (which one can understand are the same subject), but quite another to redirect from one subject to another.

  10. Vegard Sandvold
    Vegard Sandvold April 24, 2009 at 12:03 |

    Thank you for that example, Steve!

  11. Martin Stricker
    Martin Stricker April 24, 2009 at 12:14 |

    Re: Steve Pepper / International Socialists (UK)

    Interesting point, which touches the question of authority. As I see it (not an expert on the British left), Wikipedia thinks, International Socialists (UK) are indeed a (historical) part of another subject. Such decisions have to be made, I think, and they are often made for practical reasons.

    There are for me two distinct issues: First, do I respect Wikipedia’s authority (as best option for now/because of their peer reviewed, open approach) in principle? Second, what kind of options do I have if I don’t agree with this primary source of PSI’s or if I want to have a finer grained conversation? I’d think this would be the occasion for my own PSI’s.

    Anyway, Wikipedia has a distinct PSI for the International Socialists (UK), just in case:

  12. Martin Stricker
    Martin Stricker April 24, 2009 at 12:30 |

    Sorry, please ignore the last paragraph of the previous post – it doesn’t matter, as the page still points to the other one for identification.

  13. Joao Lima
    Joao Lima May 5, 2009 at 13:14 |

    I think the best source for person entities as subject (person PSI’s) is the VIAF project (still in beta). “The Deutsche Nationalbibliothek, the Library of Congress, the Bibliothèque nationale de France, and OCLC are jointly conducting a project to match and link the authority records for personal names in the retrospective personal name authority files of the Deutsche Nationalbibliothek (dnb), the Library of Congress (LC), and the Bibliothèque nationale de France (BnF). from: http://www.oclc.org/research/projects/viaf/“.

    The ISO 21127:2006, also known as CIDOC CRM ontology, considers that everything could be a *subject* of an “information object”. So, it’s necessary to create terminologies of speciliazed entities (like VIAF for person, TGN for location, etc.) that could be referenced with precision and without ambiguity. This global terminologies should be merge with national and local terminologies at one specific system implementation.

  14. Joao Lima
    Joao Lima May 5, 2009 at 17:48 |

    The Giacomo Puccini identifier at VIAF is http://viaf.org/137043 .

  15. John Cowan
    John Cowan November 24, 2009 at 03:31 |

    I proposed this some years back at http://en.wikipedia.org/w/index.php?title=Wikipedia:Village pump (policy)&oldid=17918242#Wikipedia_pages_as_Published_Subject_Indicators and got exactly nowhere, as you can see. Maybe someone with more clout in the Wikipedia community would have better luck.

  16. Steve Pepper
    Steve Pepper November 24, 2009 at 19:55 |

    @John: I think more is required than the insertion of published subjects boilerplate. At the very least you need a commitment to stability of URLs once published, including a commitment not to redirect from one previously independent page to another. I can’t see Wikipedia accepting that — and nor should they.

Comments are closed.