Semantic Web Interest Group IRC Scratchpad

last updated at 2003-12-11 22:00

Semantic Web Servers - Engineering the Semantic Web

posted by DanC_jam at 2003-12-11 22:00 (+) tags:

DanC_jam: "Semantic Web Protocol Use Cases" nifty!
DanC_jam: people laugh at Moore's suggestion of semantic web servers in fridges talking to stoves... I suggest that's like people 10 or 20 years ago laughing at star trek communicators, i.e. cellphones.
DanC_jam: hmm... getStatements is a typical RPC call with a few scalar args; WSDL 2.0 should be able to map this to GET. the argument about the result type should go in the Accept: header
DanC_jam: hmm.. updateStatements... how about a where clause? how much of SQL can we do?
DanC_jam: hm... options()... seems to contradict the goal of interoperability, i.e. going away from "proprietary" protocols
DanC_jam: "RDF Data Model in XML Schema". hmm... interesting.
DanC_jam: soap binding doesn't look straightforward to me at all. cf whenToUseGet-7

Towards Semantic Interoperability of XML Vocabularies - Jacek R. Ambroziak

posted by evlist at 2003-12-11 21:46 (+) tags:

evlist: This work is inspired by UDEF.
evlist: UDEF is basing their mapping on exact synonymy, which is to narrow to really work.
evlist: Three levels: data, information, knowledge.
evlist: XML is an enabler to work at the information layer.
evlist: Goals: exploit partial interoperability through declarative annotation.
evlist: Conceptual Indexing (work done at Sun labs) is another inspiration.
evlist: Concept can be linked by relations (is_a, instance_of, part_of, ...) instead of just synonymy.
evlist: UDEF relies on the US 2002 NAICS industry ontology.
evlist: This ontology lacks subsumption transitivity.
evlist: In Ambroziak's proposal, a set of rules is defined for each vocabulary. These sets of rules use a common ontology and processors are derived from these rules.
evlist: A rule is a triple "(concept, match, extract)".
evlist: Concept is a pointer into the ontology, match is an XPath expression and extract a XSLT snippet.
evlist: Ambroziak is not sure if XPath is enough for that work.
evlist: The generation of XSLT out of these rules isn't implemented yet, but a conceptual model of this mapping has been implemented.
Jhendler: interesting - the use of OWL for mapping XMLS's to each other has been proposed, and a couple of student level projects have shown the possibility
Jhendler: interesting to similar thought in XML community

Using Fuzzy Logic to Create Links: Resolving References to Cited Court Cases

posted by mdubinko at 2003-12-11 20:54 (+) tags:

mdubinko: Jack Rugh, Retrieval Systems Corporation and Julia Lennen, Tax Management, Inc./BNA
mdubinko: Challenge: portfoloios with over 500K citations
mdubinko: Citations include lots of details (case name, volume, reporter, page, point page, year)
mdubinko: more complications: same case can be recorded by multiple recorders, or more than one per printed page
mdubinko: high standard of accuracy: better to have a no-link than a link-to-wrong-case
mdubinko: SGML markup triggers (<cite.parallel ref="TCM\40\99">...) initiate processing
mdubinko: Normalization for punctuation, space, etc., then attempt exact match. In the case of multiple matches, uses fuzzy logic to decide
mdubinko: word frequency in case names tracked, log weighted
mdubinko: each matched word, by weight, contributes to an overall "match number", between 0 and 1
mdubinko: logic diagram spans over 6 slides!
mdubinko: reasons for failed match: not in db, multiple hits, or typographical error in db
mdubinko: solution: editorial interface, allowing human intervention, repair, flagging
mdubinko: results: 98% linking, no false positives (I think that's what she said)

Semantic Integration at the IRS: The Tax Map - Michel Biezunski

posted by evlist at 2003-12-11 20:53 (+) tags:

DanC_jam: Coolheads Consulting
evlist: "Michel Biezunski is the grand father of Topic Maps" -- Kal Ahmed
evlist: Purpose: enhance the productivity of the tax law assistance call centers & server as a model to demonstrate the concept of a central entry point for technical information at IRS;
evlist: Currently at the 3rd step (all 95 taxpayer information publications, FAQ & tele-tax topics document types).
evlist: System supports both XML & SGML.
evlist: Topic Map created automatically from information gathered in the document.
evlist: Topics extracted from headers, keywords & tele-tax topics and separated into key topics (chosen by experts), form topics and other topics.
evlist: Key topics are available through alphabetic indexes, form topics are accessible though a specialized list & other topics are available through a search engine.
evlist: This allows to reduce the number of topics to browse (the total number of topics is ~ 10,000).
DanC_jam: "merge on the basis of names" hmm... if those names are URIs, that coincides with RDF semantics.
DanC_jam: is the topicmap he's talking about handy in HTTP space? ala the NCI ontology in owl?
DanC_jam: "Aggregation" slide is cool; sounds familiar, w.r.t. doing RDF stuff in W3C
DanC_jam: "Methodology: Incremental process"
DanC_jam: "Impact on existing practices: minimal changes in workflow"
DanC_jam: "Discovery of new issues: global consistency vs. local consistency"
evlist: Tax Map Maintenance Worshops organized to maintain the consitency of information.
DanC_jam: "Design decision: not to interfere with pre-existing workflow."
DanC_jam: this jives really well with what I'm talking about at the 'practical RDF' town hall tonight: The Semantic Web and its applications at W3C
DanC_jam: see also call for discussion on W3C Semantic Web Best Practices WG charter
evlist: Conclusion: "It is possible to benefit from cooperative work while leaving as much freedom as possible to the creators of the information sources."

Rich Salz, XML Security Standards and Best Practices

posted by kendallclark at 2003-12-11 19:55 (+) tags:

kendallclark: I missed the first bits, but Rich offers a pretty thorough of the XML security landscape.
kendallclark: "Unlike TPC DoS, one XML req enough to DoS, if it's properly mangled..." -- A bit scary.
kendallclark: Suggests a 4x perf slowdown to validate.
kendallclark: PKI is 'ridiculously hard'.
kendallclark: Best practices: 1. Secure the transport layer; 2. Mask internal resources; 3. Implement XML filtering; 4. Protect against XDoS; 5. Schema-validate all messages;
kendallclark: 6. Be able to transform all messages; 7. Sign all outgoing messages; 8. Timestamp all messages; 9. Encrypt sensitive message fields; 10. Securely log all sensitive events.
kendallclark: Best practices are processing intensive.

Book Builders: Content Repurposing with Topic Maps - Nikita Ogievetsky & Roger Sperberg

posted by evlist at 2003-12-11 19:41 (+) tags:

evlist: Roger Sperberg sets the context (AnswerBooks, Q&A format, tOC generated automatically, index & "end" table in XML, ...)
evlist: Nikita Ogievetsky describing the principle (global external Topic Map common to all the books).
evlist: The TOC are included in the Topic Map and books can be constructed through their TOC.
evlist: This systems lets you create new books through drag & drop (but of course that wouldn't be possible for other types of books).

Topic Map Design Patterns For Information Architecture - Kal Ahmed

posted by evlist at 2003-12-11 18:59 (+) tags:

evlist: The problems with Topic Maps is that there are many options, never one right way, where do I start?
evlist: But library science came to the rescue: lots of studies about structures apply to TM.
evlist: The 3 ways to describe TM models: PSI, PSI Metadata, TMCL. A 4th way is missing (prescriptive for human consumption).
evlist: This 4th way is Topic Maps design patterns
evlist: TP design patterns (TPDP) are smaller than an ontology. May be built on top of themselves.
kendallclark: TMDPs are prescriptive & human readable.
evlist: Diagramming is an important part of pattern description.
kendallclark: (Ooh, he loses me w/ UML diagrams. :<)
kendallclark: (A rather crowded session, btw.)
kendallclark: Hierarchical Naming Pattern (motive: solve problem of naming hierarchical ordered topics)
kendallclark: Requires 2 sorts of names: a long & short name, that is, a context and contextless name.
kendallclark: Need to avoid: topic naming constraint
kendallclark: Design: create 2 names for each topic, save for the root; scope the short name by the parent topic (i.e., parent is the context for the short name)
evlist: Kal Ahmed showing nice UML representations of Topic Maps
evlist: Now showing a "topic per concept thesaurus pattern".
evlist: And compares this to a "topic per term" pattern.
evlist: Now describing the faceted classification pattern.
evlist: Kal Ahmed hopes TM navigators will implement some of these design patterns at some point.
evlist: Promote your TM patterns on http://www.topicmapcentral.com/wiki/Wiki.jsp !

Curing the Web's Identity Crisis: Subject Indicators for RDF

posted by kendallclark at 2003-12-11 16:13 (+) tags:

kendallclark: (I think the "Identity Crisis" here came from an article of mine. Which is nice.)
kendallclark: Basically, a topic maps meets RDF talk (well, so far, I was a few minutes late.)
kendallclark: (Drops Hendler and Lassila from SciAm article, alas.)
kendallclark: I wish I grokked Topic Maps better; I've been saying that for 4 years, -sigh-
kendallclark: Shows the layer cake, with the comment: "There's no Topic Maps in here because the W3C didn't invent them. You can't expect them to..." -- Uh, we can't? Actually, I think that's something we should expect.
kendallclark: Ah, Pepper gives me mad props! "Title of an important article by..." me. Heh.
danbri: Unicode's in the layercake...
kendallclark: URIs identified for 2 distinct purposes: identify info resources; identify things that info resources describe
kendallclark: (This has implications for the social meaning cluster, as well.)
kendallclark: (And this isn't being offered as a new conclusion; as Pepper points out, the APW, in 2.2.5, talks about this distinction.)
kendallclark: (I've been working on a paper with Bijan P. about using David Lewis's notion of convention (as a 2-party cooperative coordination game) to address some of these issues. Wanted to submit to Phil of CS conference in Italy, but may hold it for next year's ISWC...)
kendallclark: Ah, we finally come to the point: "This fundamental ontological fact (gag!) -- that interesting things in teh world don't have addresses -- isn't addressed either by RDF or by the APW."
kendallclark: The come-to-jesus point: Published Subject Indicators. /me stifles-yawn
timbl: Of course, an RDF document may describe many resources, and resources may be described by many RDF documents. There isn't always a clear "subject" for a web page.
timbl: /me wonders what can't be names by an RDF document
larsbot: clear subject: true, which is why PSI documents usually have to be written specifically to be PSIs
kendallclark: FWIW, my 'yawn' above wasn't meant as criticism. As Lars points out, it's hard to do a lot of detail in this venue, in 45 minutes.
Jhendler: the topic maps people seem very confused about naming vs. referencing and all those things
Jhendler: Mondeca seems to have been getting it worked out a bit as they describe in their FAQ and are working on using RDF/OWL to provide the semantics for topic maps
MarkB: +1 to Jhendler; I mentioned to Sam Hunting that TopicMaps seem to be doing the equivalent of using java.lang.Object.equals() where "==" is required

Russian Dolls and XML: Designing Multi-Version XML Documents - Robin La Fontaine & Thomas Nichols

posted by eric at 2003-12-11 15:49 (+) tags:

eric: Russian doll because when you look at one of their documents, there are many different XML documents hidden here.
eric: This stuff is called Unified Delta.
eric: Robin La Fontaine giving a list of the benefits of expressing differences between XML documents in XML.
eric: Unified Delta provides "n XML files in one". The source data and the changes are integrated in a single document. All the versions of the document must have the same root element.
eric: A particular source document can be extracted with 60 lines of XSLT.
eric: Minor versions can also be easily deleted.
eric: All this is done with minimal duplication.
eric: Applications include content management, i18n, variant management, collaborative authoring, ...
eric: Robin La Fontaine giving markup examples.
eric: The first version of the archive is built out of the original by adding a "dxu:vset" attribute to the root element.
evlist: Modified text goes into "dxu:PCDATA" elements.
evlist: Added/removed elements identified through "dxu:vset" attributes.
evlist: Updated attributes are transfomed into "dxu:attribute" elements.
evlist: That scales pretty well to mixed content.
evlist: The whole stuff is quite eleguant and readable.
evlist: The time to extract any version is the same and increases linearly with the size of the archive.
evlist: Can give you an interlinear presentation of the same document in different languages.
evlist: Lets also you produce the list of changes to translated.
evlist: Nice concept of variant management applied to SVG documents.
evlist: See http://www.deltaxml.com/unified/
evlist: No specific provision for metadata, but any metadata in the document is "versionized".

Dave Beckett's Resource Description Framework (RDF) Resource Guide

posted by danja at 2003-12-11 13:47 (+) tags:

danja: The HUGE list
danja: still going strong

XML Query and RDF Query and life, the universe, and...

posted by DanC_jam at 2003-12-11 00:38 (+) tags:

DanC_jam: attendance list in paper
logger: See discussion
DanC_jam: BOF notice
danbri: See also some notes I took on PaulC's intro. Moving to IRC note-taking now.
DanC_jam: speaking of PaulC's exhortation to study xquery, my comments that I want to double-check: 12Apr on xquery constructors not being functional
DanC_jam: ... and 14 Mar 2003 on fn:escape-uri
DanC_jam: hmm... took a look at XQuery use cases. the TOC looks like a lot of greek letters, but if you read you get some stories.
DanC_jam: the yin/yang paper was nominated by Robie
DanC_jam: follow-up discussion is vaguely directed at www-rdf-rules

Semantic Web Interest Group IRC Scratchpad

Recent Pages

IRC logs

Mail archives

W3C Semantic Web

Semantic Web community

Archives

Syndicate