Hatrack River Forum: The Semantic Web

This is topic The Semantic Web in forum Books, Films, Food and Culture at Hatrack River Forum.

To visit this topic, use this URL:
http://www.hatrack.com/ubb/main/ultimatebb.php?ubb=get_topic;f=2;t=022587

Posted by fugu13 (Member # 2859) on :

Much of this was originally posted in Jon Boy's completion thread, I thought I'd start a topic on it.

The Semantic Web is an effort (by one of the original creators of the Web) to encode meaning (metadata, if you will) about resources (on the web and elsewhere) in a program-'understandable' format. Or rather, to get people to include such encoded meaning in their web pages, much as they include titles and such in it today.

The Semantic Web is largely based upon the Resource Description Framework (RDF) concept, which encodes meaning in 'triples' having a subject, a predicate, and an object. The subject and predicate are both resource specifiers, while the object is either a resource specifier or raw data (if one had one's name stored somewhere, one could use either then name itself or the resource specifier referring to where it is store when using it as an object, for instance).

You may or may not be familiar with another technology being used in the Semantic Web (its quite popular among librarians, for instance), the Dublin Core Metadata specification, which can be expressed in RDF.

The importance of the Semantic Web is that it makes it possible for programs to decipher meaning in resources such as web pages, documents, maps, et cetera. For instance, if the relevant documents are marked up using Semantic Web technologies, one could search for the restaurants mentioned in a particular book which are in a certain city and serve fettucine alfredo. Not only would such queries be possible, they would be easy.

Here's a 'simple' document using several RDF/Semantic Web technologies to describe a bit about me and my relation with hatrack/jatraqueros.

<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:foaf="http://xmlns.com/foaf/0.1/">

<foaf:Person>
<foaf:name>Russell Duhon</foaf:name>
<foaf:title>Mr</foaf:title>
<foaf:firstName>Russell</foaf:firstName>
<foaf:surname>Duhon</foaf:surname>
<foaf:nick>fugu13</foaf:nick>
<foaf:mbox_sha1sum>b5af28278eea1df56e008f649438701eb05318cb</foaf:mbox_sha1sum>
<foaf:homepage rdf:resource="http://fugu13.com"/>
<foaf:depiction rdf:resource="http://homepage.mac.com/fugu13/me.jpg"/>
<foaf:schoolHomepage rdf:resource="http://www.wustl.edu"/>
<foaf:aimChatID>fugu13</foaf:aimChatID>

<foaf:knows>
<foaf:Person>
<foaf:name>Jamie Taylor</foaf:name>
<foaf:mbox_sha1sum>04355f35cb7e897146fd0e005a992f4e3cc75610</foaf:mbox_sha1sum>
<foaf:aimChatID>mackillian</foaf:aimChatID>
</foaf:Person>
</foaf:knows>

<foaf:holdsAccount>
<foaf:OnlineAccount rdf:about="http://www.hatrack.com/ubb/cgi/ultimatebb.cgi?ubb=get_profile;u=00002859">
<foaf:accountServiceHomepage rdf:resource="http://www.hatrack.com/ubb/forum/ultimatebb.php?ubb=forum;f=2"/>
<foaf:accountName>fugu13</foaf:accountName>
</foaf:OnlineAccount>
</foaf:holdsAccount>

</foaf:Person>

<foaf:Group>
<foaf:name>Jatraqueros</foaf:name>
<foaf:homepage rdf:resource="http://www.hatrack.com/ubb/forum/ultimatebb.php?ubb=forum"/>

<foaf:member>
<foaf:Person>
<dc:identifier>fugu13</dc:identifier>
<foaf:name>Russell Duhon</foaf:name>
<foaf:mbox_sha1sum>b5af28278eea1df56e008f649438701eb05318cb</foaf:mbox_sha1sum>
</foaf:Person>
</foaf:member>

<foaf:member>
<foaf:Person>
<dc:identifier>mackillian</dc:identifier>
<foaf:name>Jamie Taylor</foaf:name>
<foaf:mbox_sha1sum>04355f35cb7e897146fd0e005a992f4e3cc75610</foaf:mbox_sha1sum>
</foaf:Person>
</foaf:member>

</foaf:Group>

</rdf:RDF>

A minor explanation -- the SHA sums are instead of emails. Its possible to know an SHA sum from an email, but not vice versa, and SHA sums are in a one to one ratio with the emails they are for. This means they're still useful as a globally unique identifier, can be compared to emails (make an SHA of the email and compare the SHA's), and don't reveal people's emails.
Remember, this entire thing breaks down into 'triples' of subject, predicate, and object. I'd post the triple breakdown, but the w3's rdf parser is currently down.

However, you can think of some of the triple being like this:

personresource has name Russell Duhon
(in this triple, personresource is the subject, name is the predicate, and Russell Duhon is the object)

personresource1 knows personresource2

personresource2 has name Jamie Taylor

See how these three triples of data make it possible to relate me to Jamie? What's more importantly, they're program understandable pieces of data. Using known-unique identifiers for people (email, AIM ID, that sort of thing), an RDF program can stitch together the relationships of people in huge maps of data.

The applications of RDF I show here are mainly people oriented, but RDF is a generalized technology -- it can hold semantic data about anything. As new RDF technologies are developed, more and more relationships can be formalized, and most importantly, translated and connected to each other.

For instance, there could be an RDF technology for recipes, and one for units. They would have triples like:
sixteen ounces is the same as one pound.
reciperesource1 is named Chocolate Chunk cookies
reciperesource1 has ingredient ingredient1
ingredient1 measures 8 ounces
ingredient1 type is Chocolate Chunks

Then imagine your grocery store uses the cooking ingredients and monetary RDF applications to describe their offerings. They might have triple like:

Nestles Chocolate Chunks type is Chocolate Chunks
Nestles Chocolate Chunks measures 1/4 pound.

Say you wanted to make chocolate chunk cookies, and you knew you didn't have any chocolate chunks, so you told your computer to order enough chocolate chunks to make chocolate chunk cookies. The computer would go Chocalate Chunk cookies ingredient with type chocolate chunks requires 8 ounces. Grocery store ingredient with type chocolate chunks measurs 1/4 pound. 1/4 pound is 4 ounces. 8 ounces is 2 * 1/4 pound.

And so on, using your credit card data from your private data store, and the price which would also be associated with Nestles Chocolate Chunks, it would order the Nestles Chocolate Chunks (2 packages) so as you would have enough for chocolate chunk cookies.

Here's an idea of the basics of the sorts of stuff that can already be done with Semantic Information:

http://beta.plink.org/profile.php?id=910a4f5ef7f0e062ae1fcab1ca981d14ef40563e

Its important to remember that this was not all in one file or anything; this information was compiled by tying together multiple people's files using common identifiers. You can browse all sorts of information on the people by using the links.

Furthermore, this information is site agnostic. Any site can import such RDF files and relate the data. So a business could have FOAF (friend of a friend, the particular RDF technology used on plink) files for all its employees, and dynamically create organizational trees and other organizational data using different criteria.

Posted by saxon75 (Member # 4589) on :

This is interesting stuff. Thanks fugu!

For some reason completely unfathomable to me, I'm reminded of the Geek Code. I am unfortunately not able to remember my Geek Code off the top of my head.

Edit: Ahh... here we go:

------BEGIN GEEK CODE BLOCK------
Ver: 3.1
GE d+(-) s+:+>+: a-- C++(+++) UL P L++ !E---
W++(+++)>$ N o? K- w+(++) !O M V- PS+() PE+
Y+ PGP- t++ 5+@ X- R(++) tv+> b+>+++ DI++ D+
G e++>++++ h--- r+++ z+++(--)
------END GEEK CODE BLOCK------

[ March 18, 2004, 11:54 PM: Message edited by: saxon75 ]

Posted by fugu13 (Member # 2859) on :

not so oddly, there's a FOAF element for a person's geek code [Smile]

Posted by saxon75 (Member # 4589) on :

A question: Is this substantively different from XML, or is it more of an implementation? If it is different, what are the major differences?

Posted by xnera (Member # 187) on :

I'm familiar with FOAF, because LiveJournal just implemented FOAF support. Yes, I am a LiveJournal geek. [Smile]

Posted by fugu13 (Member # 2859) on :

XML is a syntax specification. RDF is a logical language for making statements. XML shows the hierarchical organization of data. RDF expresses the meaning of data.

It is possible (and common) to express RDF using XML (as I did in the document above), creating a format called RDF/XML.

This makes particular sense because XML is both easily parsed and easily embedded in other xml documents, of which there are many. While there are other ways of expressing RDF which are as easy (or easier) to parse, they are not so easily embedded in XML, particularly on the same "level of parsing", so to speak.

[ March 19, 2004, 12:15 AM: Message edited by: fugu13 ]

Posted by saxon75 (Member # 4589) on :

I admit to being kind of a bonehead when it comes to XML. The main thing I have trouble wrapping my head around is: how is this useful? I find it terribly interesting, but I just can't figure it out.

For example, I used to use XML to store data for several of the pages at my website (this was before I got the domain name). For example, I stored the "Weekly" Wisdom in a database of sorts using XML, then used JavaScript to parse the XML and present the data in an attractive format. But now that I have other tools at my disposal, I find it much more convenient to use a MySQL database to store my website's content.

I get that XML presents a standard way of encoding information, but I just don't get what that enables.

Posted by fugu13 (Member # 2859) on :

First, XML.

XML is not a data store, though it can be useful as such for XML documents. XML is a markup language. Specifically, it is an (the) eXtensible Markup Language.

It is useful for marking up documents and data, extensibly (using technologies such as namespaces). For instance, if I might enclose a title in a pair of tags like so: <title>My title</title>, and later on in the document I might enclose some quote in tags like so: <quote>my quote</quote>.

This really doesn't work very well with the mysql data model, particularly if you consider all these tags are part of a larger document that's mostly text. (also, xml does hierarchical data structures, particularly ones containing arbitrary additional data, very well. MySQL sucks at those)

Now, these tag names are arbitrary, meaning only what I use them to mean (that is, if I treat any tag called title like I want a title to be treated, it "is" a title, just as much as a tag named boobies would "be" a title if I treated it as a title in my programming).

However, it is also possible to specify tags which have generally expected meanings. For instance, there's the Dublin Core Metadata specification. We might assign the specification to a namespace called 'dc'. So then we could refer to 'title as its meant in the sense the dublin core metadata specification means it' by saying <dc:title>My Title</dc:title>. Now anyone can open xml files and find the titles in the sense the dublin core metadata specification uses them by finding the dc:title tags (actually, the title tags in the dublin core namespace, but that's just being more exact).

Now we're starting to get into RDF. RDF is about using generally accepted tags following a certain relationship pattern to encode meaning. Read up above where I'm talking about the grocery store. Imagine trying to make the grocery store's system interoperate with your computer's recipe needs without knowing anything other than it uses MySQL and, say, PHP. All one has to know is that the grocery store uses RDF, and which RDF technologies it uses (in this case, Units and Ingredients), and one can interoperate. That's just not possible with things like MySQL.

RDF, XML, and such are about interoperability. XML is about encoding data in a commonly understood, yet marked up way (unlike plain text, which carries essentially no structural information). RDF is about encoding meaning using commonly understood, low level vocabularies which may be used to build much larger constructs.

Posted by fugu13 (Member # 2859) on :

For instance, consider you have a system which can combine user login data from other sources. It stores login information in XML, using 'login' to mean the login specifier namespace (which I just made up)

So you might have a login file that looks like so:
<logins xmlns="http://mysite/default/namespace" xmlns:login="http://example.org/login">
<login:entity>
<login:username>Jim</login:username>
<login:password>Samuel<login:password>
</login:entity>
.
.
.
</logins>
Now, whenever I want to (for instance), find the password of someone with a particular username, I can use the xpath expression '/logins/login:entity/login:password[parent::login:username="Jim"]'

It makes sense to store a system like this in MySQL, however, remember we combine login information from many different sites. How are they going to send us their login information? We could just specificy a format, but what if the format is problematic, or there are a lot of sites that collect login information, all in different formats?

This is where our XML becomes useful. A person might send us the following file to be merged with ours:

<users xmlns="http://his.site.default.namespace" xmlns:user="http://example.org/login" xmlns:dc="http://purl.org/dc/elements/1.1/">
<user:entity>
<user:username>Lucy</user:username>
<user:password>Gimpy</user:password>
<dc:title>President</dc:title>
</user:entity>
.
.
.
</users>

First, we can pick out all the entity's in the http://example.org/login namespace, even though he used a different identifier (perhaps he used the login identifier elsewhere). Now, we could pick out the username and password elements from there and just insert those, but why do the work? We can just insert the entity element we picked out as follows:

<logins xmlns="http://mysite/default/namespace" xmlns:login="http://example.org/login" xmlns:dc="http://purl.org/dc/elements/1.1/">
<login:entity>
<login:username>Jim</login:username>
<login:password>Samuel<login:password>
</login:entity>
<login:entity>
<login:username>Lucy</user:username>
<login:password>Gimpy</user:password>
<dc:title>President</dc:title>
</login:entity>
.
.
</logins>

Our XML library handles converting the namespace identifier to what we're using locally, and adds the dublin core namespace now that we have an element in it.

Even with this additional, foreign data (the dc: title element), we are still able to use the exact same xpath expression as before, we did not have to perform time consuming and test requiring modifications to a MySQL table schema, et cetera. In fact, I could write a program to perform this example's combination in around ten lines of code. All I would have to know of the documents being accepted is that they had entity elements in the http://example.org/login namespace. I woud collect all those document by selecting with the xpath expression '//login:entity' -- even though the documents might assign the name user, or francis, to the same namespace, because I would have already told my parser that by login I meant the http://example.org/login namespace! Then I would just insert them as children of my logins element. The other 6 or 7 lines of code would be bookkeeping, mainly.

Posted by rivka (Member # 4859) on :

It looks kinda like English . . .

*head asplodes!*

Posted by TomDavidson (Member # 124) on :

To translate: XML is really only important if you're interested in sharing data with people who might not be using the same system to store data that you are. If you're doing something for your own personal site, do whatever you want -- but keep in mind that XML zealots will regard you as an eccentric island of ultimately encoded information. [Smile]

Posted by fugu13 (Member # 2859) on :

[Razz]

Please note: the following has been mangled by adding a to the beginning of tags so ubb likes it.

There are other uses of xml, to. For instance, marking up documents for later data mining. Or marking up a document in a basic, content centered form to be transformed to different formats (html, pdf, wml, rss, et cetera) on demand.

Or just being able to take advantage of all the tools available for xml (such as xpath and xslt).

To give an example, here's how I create http://fugu13.com/consulting (warning, not yet done) using an xml publishing system called cocoon. I'll use the news page.

First, a fragment of the 'site map'

<amap:match pattern="cms/*.html">
<amap:aggregate element="cms">
<amap:part src="cms/header.xml"/>
<amap:part src="cms/menu.xml"/>
<amap:part src="cms/{1}.xml" label="cms-print"/>
<amap:part src="cms/sidebar.xml"/>
<amap:part src="cms/footer.xml"/>
<a/map:aggregate>
<amap:transform src="cms/fullpagetransform.xsl">
<amap:parameter name="rss" value="rss/{1}.rss"/>
<amap:parameter name="print" value="print/{1}.html"/>
<a/map:transform>
<amap:serialize/>
<a/map:match>

Whenever someone requests anything of the format cms/*.html (I'll leave off the cms from now on, its on everything, its just a common directory) cocoon first aggregates together 4 files. header.xml, menu.xml, footer.xml, and {1}.xml, where the {1} stands for whatever the * was.

From there, cocoon transforms the resulting document using an xslt stylesheet called fullpagetransform.xml. It passes two values to the stylesheet, rss/{1}.rss and print/{1}.html . As these are request data, the presentation layer shouldn't have to deal with them except as data -- which makes it a lot easier to move them around on the page and such. Anywhere I want one of those values to appear, I just put <axsl:value-of select="$rss"/> (or $print).

Here's the news.xml document that is used when someone requests the news.html page.

<acontent>
<apage-title>News<a/page-title>
<aarticle anchor="newsite">
<atitle>Sunday, March 14, 2004: Fugu Consulting Site Launched<a/title>
<apara>
The Fugu Consulting web site was launched today. It is generated using the Apache Cocoon XML
web development framework, and then deployed as static pages to the server. It uses several
Semantic Web applications, including RSS and FOAF.
<a/para>
<a/article>
<a/content>

Very simples, I have my root tag, content, then children called page-title and article, which has children called title and para. This is a very, very simple document, though my format for content documents allows them to get somewhat more complex.

Now here's a few fragments from fullpagetransform.xsl:

<ahead>
<atitle>
<axsl:value-of select="content/page-title"/><axsl:text> :: <a/xsl:text><axsl:value-of select="header/site-title"/>
<a/title>

First, tags that I just want to appear I just write, such as <aheada> and <atitlea>. What this does is it takes the value of the page-title child of the content section, and the value of the site-title child from the header section, and from them makes the title. To see the result, simply look at the title of http://fugu13.com/consulting/news.html

<atable>
<atr>
<atd valign="top">
<adiv id="menu">
<axsl:apply-templates select="menu"/>
<a/div>
<a/td>
<atd valign="top">
<adiv id="content">
<axsl:apply-templates select="content"/>
<a/div>
<a/td>
<atd valign="top">
<adiv id="sidebar">
<axsl:apply-templates select="sidebar"/>
<a/div>
<a/td>
<a/tr>
<a/table>

This is a section for putting all the other content in place. <axsl:apply-templates select="content"/> is like a function call. It applies the template that matches content to content. Which leads us to our next section:

<axsl:template match="content">
<aul id="paramenu">
<axsl:for-each select="article[@anchor]">
<ali>
<aa>
<axsl:attribute name="href">
<axsl:text>#<a/xsl:text>
<axsl:value-of select="@anchor"/>
<a/xsl:attribute>
<axsl:value-of select="title"/>
<a/a>
<a/li>
<a/xsl:for-each>
<axsl:text> <a/xsl:text>
<a/ul>
This first part of the content matching template does something very simple -- it constructs a list of links to the different articles on the page, but only the ones I have given an anchor attribute to. If I don't want an article to appear here, I delete or comment out its anchor tag. Each link is given the name of the article.

<axsl:for-each select="article">
<aa>
<axsl:attribute name="name"><axsl:value-of select="@anchor"/><a/xsl:attribute>
<a/a>

This comes basically right after, its the beginning of a section to display each article -- this is the part that constructs the anchor tag for the list of articles with anchor tags to connect to.

One of the big advantages of xslt is that it is functional. I don't need to manage the values of my variables or make sure I call functions in the right order or somesuch. Values are values, and all I have to do is remember the scope I'm in (very easy to do in xslt, the select statements make it very explicit) to remember how to reference them.

I also have a part of my site map that notices when someone requests a document in the form print/*.html . When someone's printing a document, all they need is the content. Here's the entire print stylesheet:

<a?xml version="1.0"?>
<axsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<axsl:param name="normal"/>
<axsl:template match="content">
<ahtml>
<ahead>
<atitle>
<axsl:value-of select="page-title"/>
<a/title>
<alink rel="Stylesheet" href="print-page.css" type="text/css" />
<a/head>
<abody>
<adiv id="content">
<aa>
<axsl:attribute name="href"><axsl:text>../<a/xsl:text><axsl:value-of select="$normal"/><a/xsl:attribute>
<axsl:text>Return to normal page.<a/xsl:text>
<a/a>
<axsl:for-each select="article">
<adiv>
<ah3><axsl:value-of select="title"/><a/h3>
<axsl:for-each select="para">
<ap><axsl:value-of select="."/><a/p>
<a/xsl:for-each>
<a/div>
<a/xsl:for-each>
<a/div>
<a/body>
<a/html>
<a/xsl:template>
<a/xsl:stylesheet>

There's a title, the link to the print css stylesheet, in the body a link back the normal version of the page (I stored the value in $normal in the site map), and then for each article I print the title, followed by the paragraphs, one after another.

To see the result on the news page, just visit http://fugu13.com/consulting/print/news.html .

Posted by fugu13 (Member # 2859) on :

The above probably looks really complicated. The thing is, it isn't. It is verbose, but the actual logic is very straightforward.

Quite simply, I find that for many purposes using XML as a common format to be retransformed into other format is very, very useful.

Would I still use an SQL backend for a large, content driven site? sure. But I'd use it through an opaque XML database that used the SQL database as its backend. The overhead is worth the access to the huge array of tools and technologies that are XML based.

For smaller, well defined sites I'm perfectly happy with using SQL directly (well, through a problem domain specific SQL interfacing object; using SQL all over the place is messy and error prone). The reason I'm not for my consulting site is I want to be able to keep near the forefront of Semantic Web development with it, and that is made so much easier by making XML a first class principle with the site. For instance, soon I'm going to add an automatically generated dublin core metadata file for each page.

Posted by saxon75 (Member # 4589) on :

You know what, rivka? I'm right there with you.

-----

Most of what I know about programming in general and web development in specific has been self-taught. The problem, though, with teaching yourself something is that you don't always have the best teacher.

Posted by TomDavidson (Member # 124) on :

The sad thing about this -- speaking as a big fan of the democratic web -- is that it will become increasingly harder for laypeople to create their own websites that will comply with the tools being used without using development tools designed for the process.

The increased verbosity is no problem for professionals, but I can see it turning off a whole generation of wannabe webbers.

Posted by fugu13 (Member # 2859) on :

A note on below: all parentheses have been changed to {} in order to get ubb to shut up.

Oh, and a bit more on storing documents. If you're just sticking documents of various types in a data store it doesn't make sense to come up with an SQL schema for each type. In fact, if each document is pretty much some text with embedded markup, its not really possible to come up with a schema for each type. If you just throw the files in as plain text, you've got a problem if you ever want to do any processing of them.

For instance, say you have a lot of fairy tales stored. Perhaps you want to know all the titles. If the documents are stored in XML its easy --

<xsl:for-each select="/stories/story">
<xsl:value-of select="title"/>
</xsl:for-each>

You could also use another method of working with XML, such as the DOM, or SAX, instead of XSLT.

Then say you wanted to know the title of all the stories for heros with names starting with "A" (perhaps you're doing some odd linguistics study). (I am of course assuming a particular choice of markup elements. There are many possible ones that make sense, and any that make sense would be similarly easy to query).

<xsl:for-each select="/stories/story/character[@type='hero' and starts-with{name, 'A'}">

<xsl:text>Hero: </xsl:text>
<xsl:value-of select="name"/>

<xsl:text>\nStory: </xsl:text>
<xsl:value-of select="parent::title"/>

<xsl:text>\n\n\n</xsl:text>
</xsl:for-each>

(I included some spacing characters just to make it a tad more realistic).

By using XML markup you just have to markup any aspects of the document you think you might want to reference later. You don't have to worry about exactly how you're going to parse it -- so long as the markup makes sense with the document, you'll be able to very easily get at any aspect of the document you want to look at.

What's more, you have a huge number of tools for doing that parsing, in any computer language. And so long as your XML files are easily accessible, you're not tied to anything and can make queries (or your successors can make queries) in whatever way they find most comfortable and easiest.

Posted by saxon75 (Member # 4589) on :

Of course, I'm sure that there are any number of elitist-type geeks out there who will view this as a good thing.

[Edit: That was to Tom.]

[ March 19, 2004, 11:27 AM: Message edited by: saxon75 ]

Posted by saxon75 (Member # 4589) on :

Oh, and another thing: Back in the so-called "Golden Age of Computing," most software types were essentially hobbyists. As software as a field became more mature and complexity increased, seat-of-the-pants hacking became less and less common in commercial and industrial software design, but even now there are thousands of hobbyists out there hacking out code. And given the popularity of the open source movement, many of these amateur programmers are involved with large and interesting projects. It doesn't seem like there's much reason to suspect that this will change in the future.

Posted by fugu13 (Member # 2859) on :

Tom -- that's a pretty big concern for many of the proponents of the Semantic Web as well, which is why the current strategies are to integrate RDF into common tools, make it an optional addition instead of required, and provide automatic creation tools such as foaf-a-matic:

http://www.ldodds.com/foaf/foaf-a-matic.html

As for integrating RDF, many blogging tools automatically generate RDF documents (embedded) for blogs and create RSS feeds (another Semantic Web tech) automatically. LJ recently integrated FOAF so it is automatically generated at specific URLs. Now all one has to do is make an lj bot and it will be able to quickly aggregate huge stores of RDF data.

For instance, my LJ FOAF page is here: http://www.livejournal.com/community/lj_nifty/96165.html

A 'FOAF-aware' tool could use that file to quickly find hundreds of LJ's of people at any degree of separation from me. Consider if you're researching 'link spreading'. Using LJ FOAF files and a bit of page parsing you can follow how popular links spread among groups of friends.

Posted by TomDavidson (Member # 124) on :

The problem is that more and more of these people are going to view web design as a "project." This will, unfortunately -- or fortunately, depending on your outlook -- go a long way towards standardizing web development.

If you're typing a story to put up on your web page, do you WANT to use XML markup? Does the idea of putting tags around the first appearance of every character appeal to you? If not, do you want to use a special XML-enabled program to write your story?

If not, you're out of this loop. (I speak here for a moment of myself; I like to do all my stuff in notepad, just as a matter of course. I can do this -- with some irritation -- for CSS and the like, but it would be RIDICULOUS to XMLify my site in notepad.)

Don't get me wrong: I think XML development is a wonderful, beautiful thing. I just think that UBIQUITOUS XML -- which is the goal, of course, of the Semantic Web -- will suppress individual creativity.

[ March 19, 2004, 11:42 AM: Message edited by: TomDavidson ]

Posted by fugu13 (Member # 2859) on :

I think you'll find that you're already behind the times on personal web sites, Tom. People are using, for the most part, blogs of various kinds to put their presence on the web. The blogs on blog sites are highly customizable, and the tendency has become to trade styles, modifying them to one's taste.

And most of the blog sites embed RDF markup of the page.

Also, you misunderstand my story example. I'm not suggesting everyone do it, and neither are most proponents of the semantic web (xml is just a detail for them, anyways [Wink]

). However, if you're intending to store documents yourself for later retrieval, XML makes a lot of sense. And more and more tools are doing parts of it automatically. For instance, you might be fascinated to learn that the next version of MS Office uses extensive metadata which is designed to from the ground up to be represented with xml.

Should a normal person mark up their stories like that? Not by hand, no. But if they use popular story writing software X, it might be done automatically for them. And there are people who would want to mark up their old documents by hand to make them better accessible later, such as academics, and corporations. The promise of XML is not just in interchange was the point I was making.

Posted by fugu13 (Member # 2859) on :

To clarify a bit on the blogs issue -- when there were no good ways to make a site your own, writing your own homepage in notepad allowed you to express your creativity. Now, CSS, templates and blog software makes it possible to do just about anything your typical amateur would be able to do anyways, with much less hassle.

People who are able to do more, do more. But for most of the people who are moving to the web today, CSS, templates, and blogs allow them to express their creativity much better than they were able to when writing pages by hand, when they were having way too much fun with the <blink> tag.

Posted by fugu13 (Member # 2859) on :

And there are many popular tools out there which could have RDF integrated into them -- address book software, for instance, could generate FOAF files which people could publish to places like plink. Google could return RDF with its search results, tagging descriptions and titles. Chat software could also turn out FOAF files, or even better, could use the common FOAF format generated by Address Book software (and an RSS based system of managing buddy information) to compile buddy lists. Calendars could use the RDF format of vCal to interchange data (using the RDF version makes it possible for any RDF enabled software to use the data is why that rather than another format of vCal). Online games could publish FOAF files. Recipe applications could use a Recipe RDF format, and recipe books could provide the files for their recipes in it.

Et cetera.

RDF is a breakthrough because it provides standard formats with easily parsed meaning. Most standard formats out there do not carry meaning well. RDF does.

Posted by Pod (Member # 941) on :

Saxon: i think the problem is that the "semantic" web concept is actually quite mundane. The issue at hand here is basically, how can you encode -everything- that a website is about in some sort of format that can be deterministically processed?

The issue that i have with doing this, is that alot of this information is -extremely- boring. Fugu might find it handy and interesting to express all of those people who he's connected through, out to two or three levels of indirection, but frankly, to me, (and no offense to fugu) i'm not terribly interested in who the friends of his friends are.

Ultimately, the message that comes out of this sort of endevor, is simply that meta-data can be useful for expressing relationships between marked up items. And that to me isn't terribly surprising.

The reason i don't find this interesting, is because in effect, all this is doing is taking properties we know about objects, and encoding them explicitly as a property of this object. This just means we're expanding our notion of what an object is. This has the added complication that these tasks take a -lot- of time and effort (i did this as a job for a while).

The truly interesting task is to take the objects as they are, and attempt to figure out the relationships between them. And, at least in linguistics, there are efforts to do this, with tools like Latent Semantic Analysis, which figures out what items are similar simply by where they occur, and where they -dont- occur (well and a bunch of non-linear math).

[ March 19, 2004, 12:14 PM: Message edited by: Pod ]

Posted by Pod (Member # 941) on :

Tom:

Get a text editor with macros [Razz]

Or, i'm still extremely efficient with copy & paste.

Posted by fugu13 (Member # 2859) on :

The interesting thing about this is, it does it in a commonly understandable way, and it enables so much more.

Right now, hardly any of the information available to humans is available to a given program in a way that it can "understand" the meanings (parse the meanings is a better way of putting it).

By providing RDF data for as much as possible, we are increasing the possible scope of programs many-fold. Truly ubiquitous computing is dependent on ubiquitous meaning. A computer can order stuff for you using easy descriptions -- for instance, order one pound of chocolate chunks and do not pay more than $5 -- if the grocery store uses RDF. It doesn't have to have an interface specific to the grocery store, all it needs to know is that the grocery store speaks in the Payment, Ingredient, and Measurement vocabularies of RDF -- which are low level, commonly understood vocabularies.

You may not want to know who my friends are, but say you're in a city and need to see a mechanic. Wouldn't it be interesting to know if any friends of yours or friends of friends of yours were or recommended mechanics in the same geographical area?

These sorts of things are already under development.

Oh, another way for applications to produce RDF I though of -- bookmarks lists in browsers. While being in the root bookmark folder doesn't mean much, pages in the same folder together bear some sort of relationship to each other -- and even vague associations like that can be encoded in RDF and leverage to, for instance, create a huge online directory for finding related web pages. Imagine if your browser had a "related pages" button which, when clicked, asked "bookmark-google" to return all pages in this vast store of related web page that were repeatedly related to the page you were on.

Posted by TomDavidson (Member # 124) on :

"For instance, you might be fascinated to learn that the next version of MS Office uses extensive metadata which is designed to from the ground up to be represented with xml."

Not really, no. [Smile]

We were with Office 2003 from the beta. *grin* For corporate purposes, XML is nifty-keen.

The thing is, on a personal level, I can't imagine trying to encode every element of my life, and I'm vaguely suspicious of people who have the time to try. *grin*

[ March 19, 2004, 12:39 PM: Message edited by: TomDavidson ]

Posted by fugu13 (Member # 2859) on :

Pod -- its not that met-data can be used for expressing meaning, its that if we use metadata to express meaning the potential power of computing skyrockets.

Posted by fugu13 (Member # 2859) on :

Actually, I'm talking about the one intended to run on Longhorn [Big Grin]

. You think you've seen metadata now . . .

Posted by fugu13 (Member # 2859) on :

Tom -- its not about encoding every element of your life, its about having the parts of it that are already encoded available in a format that conveys meaning -- for instance, posts on a weblog, friends on livejournal, email contacts in your address book, bookmarks in your browser. All these and more can be encoded by their programs -- without you having to worry about it -- and made available in RDF format.

Posted by TomDavidson (Member # 124) on :

I just shudder to think that people might for a moment think I'd CARE about their contacts, bookmarks, or daily blogs. *laugh*

I guess, when it really comes down to it, it's that I DON'T want the Web to be ubiquitous in people's lives -- at least, not to the extent that people are going to try to make the details of their lives ubiquitous to everyone else. [Smile]

There's a REASON I'm not exhibitionistic enough to have a blog. *grin* And I'm highly uncomfortable with the thought that people might WANT to see who's in my address book.

Posted by Pod (Member # 941) on :

Let me paraphrase one of my professors

Professor: "How long do you think it would take you to part of speech tag a sentence."
my Friend: "i don't know... about a minute"
Professor: "okay, lets say you worked 40 hours a week then. At that pace it'd take you the duration of your graduate school career to tag the entire Penn Tree-Bank [a corpus of part of speech tagged sentences]."

You see the problem here?

You very well can mark up all of the text you use. But the investment required to make all the interesting meta-data we might want explicit is just so vast that it's absolutely fiction to believe that it's useful.

Devices that process meta-data are indeed useful, however, devices that can see the relationships that the meta-data provide, without us having to provide them are what we need. (and really if you ahve such a device, you could use it to create meta-data)

Posted by fugu13 (Member # 2859) on :

Taking an example closer to home, consider forum posts. The amount of meaning available in forum posts is stupendous.

Forum posts are related to each, contain common-topic information that may be easily decoded (links), are linked to people through unique identifiers (at least forum name, and possibly email (through SHA if a person wants to keep it private), AIM ID, homepage, and others), are time specific (making all sorts of interesting things possible in terms of data searching), and I'm certain have more data that could be encoded with ease.

If forums were to provide that information in RDF format, aggregating sites could do all sorts of tricks with the data.

Posted by Pod (Member # 941) on :

Also, boo for Microsoft meta-data. Co-opting data formats for one's own propritary gain is scummy, and really, defeats the entire purpose of having uniform syntactic standards.

Posted by TomDavidson (Member # 124) on :

Evil, creepy, nosy things, yes. [Smile]

Posted by fugu13 (Member # 2859) on :

Pod -- the idea isn't to have every bit of information be marked up, the idea is to have data we already know what its meaning is be marked up as such automatically.

Time it takes to encode your weblog in RDF: 0.

Time it takes to encode the relationships in your bookmarks in RDF (assuming browsers start supporting it): 0.

Time it takes to encode forum posts in RDF (assuming forum software does it automatically): 0.

Sure, if you thought a particular piece of information was important to have encoded and there wasn't a tool to do it you might do it by hand, but for the most part encoding information would be transparent to the user.

In other words, the problem doesn't exist because the sorts of things proponents of the semantic web want marked up can be marked up automatically and transparently. Its a bogeyman of metadata, not a real problem.

[ March 19, 2004, 12:59 PM: Message edited by: fugu13 ]

Posted by fugu13 (Member # 2859) on :

Pod -- while I don't like the MS XML format, its a perfectly acceptable XML format. And the reason I don't like it is that its feature complete (for Word documents, at least), and so many of the features in Word are badly chosen for an XML document format.

A document format designed from the ground up to be XML based would be much less crufty, but that this one is crufty isn't the fault of the people doing the XML implementation, and it is standards compliant.

Posted by Pod (Member # 941) on :

Fugu:

I'm just not sold. If it can be marked up automatically, doesn't that jsut mean that it's already unambiguous and searchable?

Also, why so excited about MSXML? what ever happened to LaTeX? You're an OSS geek aren't you?

Posted by Pod (Member # 941) on :

Fugu:

I think you need a better example than forum data, cause i don't think Tom or i are terribly interested in searching them.

My opinion of the matter is that there are two types of data, ambiguous and unambiguous. Unambiguous data should already have some sort of heuristic property that can be used to index the data. Ambiguous data can't be uniquely indexed and so needs to have meta-data associated with it for two possible reasons, to standardize the format, or to disambiguate the data (or both). Performing either task on ambiguous data is hard, and requires either alot of ingenuity or a lot of hard work. In either regard, meta-data is merely a tool, and the real work is done by some other device.

Posted by Pod (Member # 941) on :

really, i'm just skeptical that this isn't just another case of people over-selling XML formats. Sure meta-data is wonderful, but deriving interesting meta-data is either hard or time consuming, not to mention that it requires a standard for markup that everyone has to agree on. And thats a entirely different bag of worms.

Posted by fugu13 (Member # 2859) on :

I'm not excited about MS XML, its just a good example of the increasing use of XML and RDF (I have not yet verified if it can output RDF, but the transform, at least for the metadata, should be easy).

Its not about Tom or you being interested in searching them, its about applications which can search them having the data available. For instance, did you read my mechanic example? An application for finding people who do and recommendations for various services in a given geographic data who are also within someone's circle of friends/acquaintances would be incredibly powerful.

As for the encoding nature, reread some of my posts. The point isn't just that its encoded in an accessible format, but that its encoded in an accessible and commonly understood format.

For instance, take bookmarks. Remember my bookmark-based related pages search function above (you can't tell me many people wouldn't find that insanely useful)?

Well, all the bookmark files certainly carry the necessary data already, which is part of the point -- but they're in their own formats! For the data to be commonly accessible, it needs to be in a commonly accessible format.

Okay, so now we have "bookmark-xml" that all browsers can export their bookmarks in, so we can easily aggregate this data.

But wait, what if we want to make a note of which bookmarks are people's homepages, and which are commercial sites, et cetera. That information's not encoded in the bookmark files -- but it is encoded in other types of files.

However, how do we relate bookmark-xml with those other formats? Well, it would be easy if all the formats were vocabularies of a common, root, meaning-carrying encoding. That common , root, meaning carrying encoding is called RDF.

Yes, the information is available already -- in formats that any non-specialized program can't make heads or tails of! How is a text parser supposed to know that on one site <h2> refers to the title of a book, and on another <h2> refers to the title of a movie? Well, the sites that have the data know -- such as amazon and IMDB. If they were to provide their data in RDF so that the meaning was clear to programs, rather than just the presentation, it would be possible to, for instance, quickly find books written by actors (hey, that could be very useful for a journalist, and its off the cuff) without requiring complicated data format parsing particular to each site's pages.

Posted by fugu13 (Member # 2859) on :

The format's already agreed upon, Pod, its called RDF, and its been accepted as a standard by most of the major players.

Posted by Pod (Member # 941) on :

Okay, but my point Fugu, is that you can already trip across that <h2> go "hmm, don't know what this is" toss it into the amazon or imdb search field, and pop out a result, which if you'll note, clearly specifies the format of the object in question.

If thats the case, why do we need a new meta-wrapper around all data? I agree, more intelligent searches are a good idea, but what we need are more powerful search tools, not just reformatting information we already know into explicit meta-data. that is my point. We shouldn't have to reformat all of the web to do this. I mean, in a way, that's the point of what google is.

Also, with the example of a good mechanic, i don't know...

what's the point? Word of mouth? General statistics i think would be nicer. Also with this foaf stuff, you have the potential for exponential explosion of search results, with no way of ordering them either. As a search heuristic, i'm not convinced that it helps any.

It just sounds too idealistic to me.

Posted by BannaOj (Member # 3206) on :

Ok while I'm an engineer I'm not terribly computer programming savvy.

To me, as a lay person, I'm trying to figure out how this matters to me.

I am currently attempting to build a website to display my show dogs (Cardigan Welsh Corgis). I'm using adobe Pagemaker.

How does all this metadata help me? Does it help me join webrings? How does it network me with other cardigan corgi people? Many of the Corgi people are less computer savvy than me, and have annoying cutsey music websites on free server space with popup adds. They rarely blog, they are too busy out showing dogs. I highly doubt they are going do be doing this sort of encoding

So I repeat, what does this do for me?

AJ

Posted by BannaOj (Member # 3206) on :

What would be cool is automatic pedigree generation of dogs, based on the meta data though.
All you'd have to do is put in the sire and dam and get the pedigree spit out complete with links to each generations dogs.

But that would once again require lots of coding of mostly amateurs that would look at you funny if you said the word "meta-data" to begin with.

AJ

Posted by Bokonon (Member # 480) on :

Bah, BeOS had (non-XML) metadata built into it's filesystem since 1995.

Longhorn is just a Johnny-Come-Lately.

-Bok

Posted by TomDavidson (Member # 124) on :

Bok, almost ALL operating systems have had metadata built into the file system for years. It's just that some OSes integrate more data than others. [Smile]

Posted by fugu13 (Member # 2859) on :

It's not about human understanding, its about program understanding!

You, as a human, can understand that bit about h2 based on context, because you are human. For a program to do that is exceptionally complex. Why should programs have to write complex parsers which are error prone to figure out semantic data when sites like amazon already have all that semantic data available, just not yet presented in a program understood generic format?

Its about making information available to programs, not hiding the information only for humans like we do now.

We need a new meta-wrapper because the current meta-wrappers aren a semantic wrappers. They are (usually) presentation wrappers. That's like asking why we need html when we have image files -- after all, any information that can be presented in an html can be presented as a picture.A semantic wrappers is a completely different format from any non-semantic wrapper, just as an html file is a completely different format from an image file. Notice that the main job of a web browser is to convert html into images -- why on earth should we have those images? After all, we already have the html! We have the images because the images are easier for humans to understand. Similarly, we should have RDF because it is easier for programs to understand (where by understand we mean parse for meaning).

The idea that something is already encoded once so it should never be reencoded is absurd. After all, sound files come in multiple formats. Why? because different formats have different advantages!

As for the bookmark thing, the point is I am saying how to make a better search engine.

lets take a few givens -- in a few years, browsers export and automatically upload (if its turned on) RDF files of their bookmarks. This data is kept in an online repository(not too hard).

Now say someone wanted to know closely related pages to a given page. They might query for pages in the same non-main folder as the page for at least 100 users. Obviously the optimal algorithm would be more complex, but you get the idea.

The thing is, the details of a person's query would be behind the scenes. What they'd actually do is press a button on their browser or click a link on a page, and be taken to the results.

This could be greatly expanded very easily -- they could uplad their FOAF file(s), for instance, and any page that was bookmarked by a friend or friend of a friend would be marked in some way. Et cetera.

Thats all that would be required. People uploading their bookmark files in RDF, people optionally uploading their FOAF files in RDF, a rather trivial amount of parsing, and ta-da! Relational search results generated by the very best web spiders -- humans.

Of course we need more powerful search tools, but we also need to realize that there are huge amounts of data out there which just need some slight reformatting to be made easily understandable by programs.

You're suggesting that we only use semantic data insofar as we parse it using complex, intensive, and error prone algorithms. Why not also make that data available in an easily parseable format in the first place? Why require all that work? Most of the metadata is already there, just not presented in a way machines can understand. Why continue the obfuscation?

Posted by fugu13 (Member # 2859) on :

Banna -- there are probably pedigree tracking programs out there already. Things like that would be possible using online repositories of the results from such programs in an RDF format. People wouldn't have to write the metadata by hand at all.

Posted by fugu13 (Member # 2859) on :

Pod, you've written computer programs, right?

Okay, we'll consider an example program using RDF.

Say we want to find all the authors of all the books that were made into films that came out in 2003.

Well, in an RDF program, in pseudocode, we might do this:

code:

imdb = connect(imdb-rdf-query-url)

titlesquery = triples(--result--:titlesof:books, books:madeinto:movies, movies:cameoutin:2003)

titles = imdb.query(titlesquery)

amazon = connect(amazon-rdf-query-url)

authorsquery = triples(--result--:authorsof:books, books:havetitles:*titles*)

authors = amazon.query(authorsquery)

print tripleresults(--result--:namesof:*authors*)

That's pretty much it. We're talking maybe 20 lines of code once you include boilerplate (closing connections, using the correct uri's -- several of the words in the triples above are actually URIs, but that wouldn't add any LOC, that sort of thing).

Conciseness and meaning like this are just not possible in the current way of doing things -- constructing, executing, and parsing a query to imdb, then using some results from that for a query to amazon is complicated. RDF uncomplicates it.

Furthermore, notice that there's nothing site specific about the RDF triples. Its all about the meaning of the data. Any site that can understand RDF and the particular low level vocabularies (books and movies) can understand those triples.

[ March 19, 2004, 03:56 PM: Message edited by: fugu13 ]

Posted by K.A.M.A. (Member # 6045) on :

*wipes everything out*

Posted by Pod (Member # 941) on :

Dude,

What i'm saying is that the amazon format IS machine readable. The format ALWAYS comes after the title information. It is in a predictable location, as designated by the tags.

Look at HTML, we have the syntax for HTML, which ideally is an XML format. Then the semantics of how the syntax is to be interpreted is located in a browser, or, if the designer wants to specify more closely what things mean, they use Cascading Style Sheets.

There, there's your semantics.

Why do we need a universal hammer, to be able to beat all data formats into a fixed semantic framework? So long as you have a syntax that gets at the properties that someone is concerned with, i don't understand why we need some sort of new semantic frame work. The current framework is as expressive as you want it to be. If you want your docs to be more expressive in terms of meta-data, go for it. There's nothing holding you back. In fact i'd gander theres more than enough meta-data out tehre to do what you want -already-. That's my point.

Next, we have the option of making complex parsers, or making our data extremely complex. I'd say make the tools complicated, and leave the data alone, that way data creation isn't hard, and anyone can do it. Complex tools only need to be made once.

And i never said anything about error prone searching or parsing algorithms. the point is to make tools that actually work [Wink]

My point is that all data that is easy to transform into RDF format, is already machine processable, so why do we need RDF? If we have coherent local semantics (i.e. how CSS works), a global semantics is redundant so long as the local semantics are handled in an intellegent way.

This is the problem with SGML. SGML asks you to specify too much crap to be comprehensible. The point of using XML is that you only need to specify those things that you're going to use. It's task oriented, without other cruft. I understand the problem of interoperability local formats (try playing with phonesets for acoustic data, that's a hairy problem), but the problem of interoperable formats doesn't go away when you attempt to integrate everything into any global format. Different formats are useful for different things (as you've pointed out), and may in fact be mutually exclusive. But all switching to a new format does, is bring all of these things to a head, as opposed to offering a solution for how they are to be resolved.

So what i'm saying is, easily convertable data formats are already interoperable without a new global data format, or require minimal mutilation, and those things that are not easily convertable, we are still left with no recourse, unless you want to kludge them together the best you can.

Again, i'm not convinced.

Furthermore, how is collecting a bunch of book marks, any different from what google does? And what's more, i trust google's method of associating links people put on webpages more than i do statistics on people's personal filing systems for book marks.

Posted by Bokonon (Member # 480) on :

Tom, not really true. Unix systems and windows have minimal metadata attributes (file extensions and ?? Or do you consider the default application stuff metadata? That's really system stuff, it's not an actual aspect of the file.). Apple had rudimentary metadata.

BeOS had true metadata. You had, say a contact file for a person? One piece of metadata for it could be, say, a picture of the person. Emails could be just the message, with all the To:, From:, etc., data as metadata attributed to the file (Which also much easier viewing of the message from any app). You could also do searches on the metadata.

Lately some OSes and filesystems have gotten to where BeOS was (and BeOS has been dead for 2-3 years now), and windows won't be close (or surpass it) until 2005 (Longhorn) at the earliest.

Of course, my message was just a way to point out that Longhorn's stuff isn't conceptually groundbreaking.

-Bok

Posted by Pod (Member # 941) on :

Alright fugu, so you're more concerned with a query format, rather than some sort of overarching semantic format

Posted by fugu13 (Member # 2859) on :

The formats are only easily convertible at the source. If y ou're not at the source, to aggregate that data from all those sources would take huge parsing libraries. Instead of one small RDF parsing library.

Posted by BannaOj (Member # 3206) on :

fugu my question is, how do I utilize this powerful pedigree type system when no one that has any relevance for my serach is using it? It seems pretty pointless.

Just for grins, I'll give you the name of my dog's father: Ch. Phi Vestavia Nautilus PT

and his mother: Ch. Kingsbury's Copyright

How do I use your semantic system to get to the rest of the pedigree?

Right now I can google it if I had to.

AJ

Posted by Bokonon (Member # 480) on :

Anna, that's sort of the issue issue that things like XML are, in part, trying to address. Right now, it might not mean much, but using an open flexible system like XML increases the likelihood that going forward, your data can be easily translated to whatever systems are created that can do this.

Whether it's a Vet app that uses lineage to track birth defects/genetic diseases, a canine family tree application, or maybe a dog show results application that can easily display who else in a dog's family tree was a winner, and when, and who was the breeder or trainer, etc...

You can find that now using multiple repositories that often are redundant, and not easily converted from one format to the next. having a single format allows the applications more flexibility in WHAT it wants to do with the data, rather than HOW it needs to prepare some data for use.

-Bok

Posted by BannaOj (Member # 3206) on :

But you can't do it right now?

And if you can't, how are you going to get joe blow small time kennel owner with a site like this or this to do stuff like that?

AJ

[ March 19, 2004, 04:42 PM: Message edited by: BannaOj ]

Posted by fugu13 (Member # 2859) on :

Pod -- no, its just that it can serve as a query format. However, it is also a semantic format.

Consider RSS. RSS is a part pf the semantic web, and is in all version related to RDF (in some it is RDF).

Why are so many websites providing RSS feeds? After all, all the information in the RSS feeds is already available on the websites.

They are providing RSS feeds because RSS feeds are a better encoding for certain purposes, such as being read by an RSS reader. For an RSS reader to have to be able to parse any site in existence and determine the semantic articles, their subjects, their descriptions, what their primary link on the web is, their author, their date of publication, et cetera is ridiculous

That's what you're saying semantic web applications should have to do when you're saying the information is already out there and applications should just have to understand it as it is.

Posted by TomDavidson (Member # 124) on :

fugu, would YOU use a browser that publicly posted all your bookmarks for the world to see?

Posted by fugu13 (Member # 2859) on :

I'd certainly use such a browser if I had options that allowed me to: turn off that behavior, submit anonymously if I wanted, choose where the information went, and customize what information was sent (folder-based exclusion, for instance).

All of which would be easy to implement.

Posted by BannaOj (Member # 3206) on :

I guess I'm just not getting it. I was trying to understand, really. But for it to be useful for me, seems virtually impossible, since the people who are actually making web pages that I check aren't doing the kind of programming necessary to make it work.

So I don't see the point.

AJ

Posted by fugu13 (Member # 2859) on :

Its in its infancy. There are only three major semantic web technologies right now, Dublin Core Metadata (DC), RSS and Friend Of A Friend (FOAF) (well, other than RDF, which all are closely related to).

Consider HTML. Everybody didn't start making web sites overnight. In fact, many people said the same thing about HTML -- that it was too arcane for any normal person to actually use.

You're talking about an application in a very specialized domain -- dog pedigrees. Its not useful to you right now because frankly, not much attention has been given to applying it to dog pedigrees. Its in its infancy. Its presence is not a magic bullet. A lot of work is still required to make it work. However, the way RDF works makes it so most of that work needs only be done by the vocabulary designers, rather than those using the vocabularies.

It would not be hard at all to imagine, for instance, when an RDF vocabulary has been created for dog pedigrees, that there was a simple tool for creating pedigrees -- you either enter a name or select an already entered name for a dog's parents, then enter a name for the dog. Rinse, repeat. You could import RDF files using the pedigree vocabulary from other people, and their dogs would appear on the list of possible dogs to choose as parents. Et cetera. There would of course be birthdates and other data involved; I'm not sure of exactly what sorts of things are kept track of in dog pedigrees, but you get the idea. Then at the end the person would export a file for their dogs called pedigrees.rdf and put the following in their html, in the head tag

code:

<link title="Dog Pedigrees" type="application/rdf+xml" rel="alternate" href="pedigrees.rdf" />

Or possibly link to it on the page itself.

That's definitely within the reach of the people who made those web pages.

[ March 19, 2004, 05:22 PM: Message edited by: fugu13 ]

Posted by BannaOj (Member # 3206) on :

I understand that dog pedigrees are a bit arcane.

But right now I can just google and at least find what I need with a bit of hunting.

I don't know, can't you make this semantic web AI some how so it can learn the information from existing web pages? It seems like that would be a lot less tedious then requiring everybody in the world to start programing in this, when most people can barely manage html.

AJ

Posted by fugu13 (Member # 2859) on :

The idea is not to have everyone in the world start programming RDF, but to have the tools they already use in programming do it for them, or at the worst have easy to use tools, such as this one:

http://www.ldodds.com/foaf/foaf-a-matic.html

make it really, really easy.

If you have a blog, you're probably already using RDF without knowing it. If you have a livejournal, you're already using RDF. Et cetera. Applications are enabled to create RDF, people don't type it out by hand.

Posted by BannaOj (Member # 3206) on :

ok that makes more sense... you just sort of sneak it into the code behind the scenes and the people that are using it never really know it exists even if they reap the benefits.

On the other hand that would make people like Tom more paranoid than they already are.

AJ

Posted by fugu13 (Member # 2859) on :

Google searches work well for returning single datapoints, but not for finding multiple datapoints.

Consider if your grocery store has an online database. Perhaps you want to know what kinds and the prices of chocolate chips they carry. Easy enough, you'd go to the site and find the chocolate chips.

But what if you want to compare prices from all the nearby grocery stores? Well, first you have to look up all the nearby grocery stores, then you have to visit all their sites, then you have to compare the prices manually.

If there was an online business location database that spoke RDF, which returned globally unique URIs for each grocery store chain (just the homepage of the store would work), and each grocery store's website spoke RDF, an RDF enabled application for browsing for different kinds of stores and services would be able to understand and execute a query like: find the types and prices of chocolate chips available at grocery stores within 10 miles, and group by type.

The really cool thing is, that application wouldn't have to know about the particulars of each grocery store's system at all! That's whats holding such applications up from being ubiquitous -- there are occasional applications that only need to draw from one or two websites, such as for movie information, but the difficulty is in parsing the content of the site. With RDF, parsing the content of the site becomes trivial and programmers can focus on ways to allow people to make complex and useful queries instead, and on presenting those queries in useful ways (on a map, et cetera).

Posted by fugu13 (Member # 2859) on :

Here's one way of putting it. RDF and the Semantic Web are here to stay:

http://www.ilrt.bris.ac.uk/discovery/rdf/resources/