Semantic Web Interest Group IRC Scratchpad

Welcome to the Semantic Web Interest Group scratchpad generated automatically from discussions on IRC at irc.freenode.net port 6667 channel #swig by the chump bot, instructions in the chump user manual. Please use UTF-8 charset on IRC and pastebin for code or data more than 10 lines long.

Nearby: IRC logs (Latest) | semantic-web list | W3C Wiki (Recent changes) | delicious swigbot

last updated at 2003-12-11 22:00
DanC_jam: "Semantic Web Protocol Use Cases" nifty!
DanC_jam: people laugh at Moore's suggestion of semantic web servers in fridges talking to stoves... I suggest that's like people 10 or 20 years ago laughing at star trek communicators, i.e. cellphones.
DanC_jam: hmm... getStatements is a typical RPC call with a few scalar args; WSDL 2.0 should be able to map this to GET. the argument about the result type should go in the Accept: header
DanC_jam: hmm.. updateStatements... how about a where clause? how much of SQL can we do?
DanC_jam: hm... options()... seems to contradict the goal of interoperability, i.e. going away from "proprietary" protocols
DanC_jam: "RDF Data Model in XML Schema". hmm... interesting.
DanC_jam: soap binding doesn't look straightforward to me at all. cf whenToUseGet-7
 
evlist: This work is inspired by UDEF.
evlist: UDEF is basing their mapping on exact synonymy, which is to narrow to really work.
evlist: Three levels: data, information, knowledge.
evlist: XML is an enabler to work at the information layer.
evlist: Goals: exploit partial interoperability through declarative annotation.
evlist: Conceptual Indexing (work done at Sun labs) is another inspiration.
evlist: Concept can be linked by relations (is_a, instance_of, part_of, ...) instead of just synonymy.
evlist: UDEF relies on the US 2002 NAICS industry ontology.
evlist: This ontology lacks subsumption transitivity.
evlist: In Ambroziak's proposal, a set of rules is defined for each vocabulary. These sets of rules use a common ontology and processors are derived from these rules.
evlist: A rule is a triple "(concept, match, extract)".
evlist: Concept is a pointer into the ontology, match is an XPath expression and extract a XSLT snippet.
evlist: Ambroziak is not sure if XPath is enough for that work.
evlist: The generation of XSLT out of these rules isn't implemented yet, but a conceptual model of this mapping has been implemented.
Jhendler: interesting - the use of OWL for mapping XMLS's to each other has been proposed, and a couple of student level projects have shown the possibility
Jhendler: interesting to similar thought in XML community
 
mdubinko: Jack Rugh, Retrieval Systems Corporation and Julia Lennen, Tax Management, Inc./BNA
mdubinko: Challenge: portfoloios with over 500K citations
mdubinko: Citations include lots of details (case name, volume, reporter, page, point page, year)
mdubinko: more complications: same case can be recorded by multiple recorders, or more than one per printed page
mdubinko: high standard of accuracy: better to have a no-link than a link-to-wrong-case
mdubinko: SGML markup triggers (<cite.parallel ref="TCM\40\99">...) initiate processing
mdubinko: Normalization for punctuation, space, etc., then attempt exact match. In the case of multiple matches, uses fuzzy logic to decide
mdubinko: word frequency in case names tracked, log weighted
mdubinko: each matched word, by weight, contributes to an overall "match number", between 0 and 1
mdubinko: logic diagram spans over 6 slides!
mdubinko: reasons for failed match: not in db, multiple hits, or typographical error in db
mdubinko: solution: editorial interface, allowing human intervention, repair, flagging
mdubinko: results: 98% linking, no false positives (I think that's what she said)
 
DanC_jam: Coolheads Consulting
evlist: "Michel Biezunski is the grand father of Topic Maps" -- Kal Ahmed
evlist: Purpose: enhance the productivity of the tax law assistance call centers & server as a model to demonstrate the concept of a central entry point for technical information at IRS;
evlist: Currently at the 3rd step (all 95 taxpayer information publications, FAQ & tele-tax topics document types).
evlist: System supports both XML & SGML.
evlist: Topic Map created automatically from information gathered in the document.
evlist: Topics extracted from headers, keywords & tele-tax topics and separated into key topics (chosen by experts), form topics and other topics.
evlist: Key topics are available through alphabetic indexes, form topics are accessible though a specialized list & other topics are available through a search engine.
evlist: This allows to reduce the number of topics to browse (the total number of topics is ~ 10,000).
DanC_jam: "merge on the basis of names" hmm... if those names are URIs, that coincides with RDF semantics.
DanC_jam: is the topicmap he's talking about handy in HTTP space? ala the NCI ontology in owl?
DanC_jam: "Aggregation" slide is cool; sounds familiar, w.r.t. doing RDF stuff in W3C
DanC_jam: "Methodology: Incremental process"
DanC_jam: "Impact on existing practices: minimal changes in workflow"
DanC_jam: "Discovery of new issues: global consistency vs. local consistency"
evlist: Tax Map Maintenance Worshops organized to maintain the consitency of information.
DanC_jam: "Design decision: not to interfere with pre-existing workflow."
DanC_jam: this jives really well with what I'm talking about at the 'practical RDF' town hall tonight: The Semantic Web and its applications at W3C
DanC_jam: see also call for discussion on W3C Semantic Web Best Practices WG charter
evlist: Conclusion: "It is possible to benefit from cooperative work while leaving as much freedom as possible to the creators of the information sources."
 
Rich Salz, XML Security Standards and Best Practices
kendallclark: I missed the first bits, but Rich offers a pretty thorough of the XML security landscape.
kendallclark: "Unlike TPC DoS, one XML req enough to DoS, if it's properly mangled..." -- A bit scary.
kendallclark: Suggests a 4x perf slowdown to validate.
kendallclark: PKI is 'ridiculously hard'.
kendallclark: Best practices: 1. Secure the transport layer; 2. Mask internal resources; 3. Implement XML filtering; 4. Protect against XDoS; 5. Schema-validate all messages;
kendallclark: 6. Be able to transform all messages; 7. Sign all outgoing messages; 8. Timestamp all messages; 9. Encrypt sensitive message fields; 10. Securely log all sensitive events.
kendallclark: Best practices are processing intensive.
 
evlist: Roger Sperberg sets the context (AnswerBooks, Q&A format, tOC generated automatically, index & "end" table in XML, ...)
evlist: Nikita Ogievetsky describing the principle (global external Topic Map common to all the books).
evlist: The TOC are included in the Topic Map and books can be constructed through their TOC.
evlist: This systems lets you create new books through drag & drop (but of course that wouldn't be possible for other types of books).
 
evlist: The problems with Topic Maps is that there are many options, never one right way, where do I start?
evlist: But library science came to the rescue: lots of studies about structures apply to TM.
evlist: The 3 ways to describe TM models: PSI, PSI Metadata, TMCL. A 4th way is missing (prescriptive for human consumption).
evlist: This 4th way is Topic Maps design patterns
evlist: TP design patterns (TPDP) are smaller than an ontology. May be built on top of themselves.
kendallclark: TMDPs are prescriptive & human readable.
evlist: Diagramming is an important part of pattern description.
kendallclark: (Ooh, he loses me w/ UML diagrams. :<)
kendallclark: (A rather crowded session, btw.)
kendallclark: Hierarchical Naming Pattern (motive: solve problem of naming hierarchical ordered topics)
kendallclark: Requires 2 sorts of names: a long & short name, that is, a context and contextless name.
kendallclark: Need to avoid: topic naming constraint
kendallclark: Design: create 2 names for each topic, save for the root; scope the short name by the parent topic (i.e., parent is the context for the short name)
evlist: Kal Ahmed showing nice UML representations of Topic Maps
evlist: Now showing a "topic per concept thesaurus pattern".
evlist: And compares this to a "topic per term" pattern.
evlist: Now describing the faceted classification pattern.
evlist: Kal Ahmed hopes TM navigators will implement some of these design patterns at some point.
evlist: Promote your TM patterns on http://www.topicmapcentral.com/wiki/Wiki.jsp !
 
Curing the Web's Identity Crisis: Subject Indicators for RDF
kendallclark: (I think the "Identity Crisis" here came from an article of mine. Which is nice.)
kendallclark: Basically, a topic maps meets RDF talk (well, so far, I was a few minutes late.)
kendallclark: (Drops Hendler and Lassila from SciAm article, alas.)
kendallclark: I wish I grokked Topic Maps better; I've been saying that for 4 years, -sigh-
kendallclark: Shows the layer cake, with the comment: "There's no Topic Maps in here because the W3C didn't invent them. You can't expect them to..." -- Uh, we can't? Actually, I think that's something we should expect.
kendallclark: Ah, Pepper gives me mad props! "Title of an important article by..." me. Heh.
danbri: Unicode's in the layercake...
kendallclark: URIs identified for 2 distinct purposes: identify info resources; identify things that info resources describe
kendallclark: (This has implications for the social meaning cluster, as well.)
kendallclark: (And this isn't being offered as a new conclusion; as Pepper points out, the APW, in 2.2.5, talks about this distinction.)
kendallclark: (I've been working on a paper with Bijan P. about using David Lewis's notion of convention (as a 2-party cooperative coordination game) to address some of these issues. Wanted to submit to Phil of CS conference in Italy, but may hold it for next year's ISWC...)
kendallclark: Ah, we finally come to the point: "This fundamental ontological fact (gag!) -- that interesting things in teh world don't have addresses -- isn't addressed either by RDF or by the APW."
kendallclark: The come-to-jesus point: Published Subject Indicators. /me stifles-yawn
timbl: Of course, an RDF document may describe many resources, and resources may be described by many RDF documents. There isn't always a clear "subject" for a web page.
timbl: /me wonders what can't be names by an RDF document
larsbot: clear subject: true, which is why PSI documents usually have to be written specifically to be PSIs
kendallclark: FWIW, my 'yawn' above wasn't meant as criticism. As Lars points out, it's hard to do a lot of detail in this venue, in 45 minutes.
Jhendler: the topic maps people seem very confused about naming vs. referencing and all those things
Jhendler: Mondeca seems to have been getting it worked out a bit as they describe in their FAQ and are working on using RDF/OWL to provide the semantics for topic maps
MarkB: +1 to Jhendler; I mentioned to Sam Hunting that TopicMaps seem to be doing the equivalent of using java.lang.Object.equals() where "==" is required
 
eric: Russian doll because when you look at one of their documents, there are many different XML documents hidden here.
eric: This stuff is called Unified Delta.
eric: Robin La Fontaine giving a list of the benefits of expressing differences between XML documents in XML.
eric: Unified Delta provides "n XML files in one". The source data and the changes are integrated in a single document. All the versions of the document must have the same root element.
eric: A particular source document can be extracted with 60 lines of XSLT.
eric: Minor versions can also be easily deleted.
eric: All this is done with minimal duplication.
eric: Applications include content management, i18n, variant management, collaborative authoring, ...
eric: Robin La Fontaine giving markup examples.
eric: The first version of the archive is built out of the original by adding a "dxu:vset" attribute to the root element.
evlist: Modified text goes into "dxu:PCDATA" elements.
evlist: Added/removed elements identified through "dxu:vset" attributes.
evlist: Updated attributes are transfomed into "dxu:attribute" elements.
evlist: That scales pretty well to mixed content.
evlist: The whole stuff is quite eleguant and readable.
evlist: The time to extract any version is the same and increases linearly with the size of the archive.
evlist: Can give you an interlinear presentation of the same document in different languages.
evlist: Lets also you produce the list of changes to translated.
evlist: Nice concept of variant management applied to SVG documents.
evlist: See http://www.deltaxml.com/unified/
evlist: No specific provision for metadata, but any metadata in the document is "versionized".
 
danja: The HUGE list
danja: still going strong
 
XML Query and RDF Query and life, the universe, and...
DanC_jam: attendance list in paper
logger: See discussion
DanC_jam: BOF notice
danbri: See also some notes I took on PaulC's intro. Moving to IRC note-taking now.
DanC_jam: speaking of PaulC's exhortation to study xquery, my comments that I want to double-check: 12Apr on xquery constructors not being functional
DanC_jam: ... and 14 Mar 2003 on fn:escape-uri
DanC_jam: hmm... took a look at XQuery use cases. the TOC looks like a lot of greek letters, but if you read you get some stories.
DanC_jam: the yin/yang paper was nominated by Robie
DanC_jam: follow-up discussion is vaguely directed at www-rdf-rules
 
Created by the Daily Chump bot. Hosted by PlanetRDF.