Hatrack River Forum: The Semantic Web

my profile login | search | faq | forum home

»	Hatrack River Forum » Active Forums » Books, Films, Food and Culture » The Semantic Web (Page 1)

This topic comprises 2 pages: 1 2

Author

Topic: The Semantic Web

fugu13
Member
Member # 2859

posted

Much of this was originally posted in Jon Boy's completion thread, I thought I'd start a topic on it.

The Semantic Web is an effort (by one of the original creators of the Web) to encode meaning (metadata, if you will) about resources (on the web and elsewhere) in a program-'understandable' format. Or rather, to get people to include such encoded meaning in their web pages, much as they include titles and such in it today.

The Semantic Web is largely based upon the Resource Description Framework (RDF) concept, which encodes meaning in 'triples' having a subject, a predicate, and an object. The subject and predicate are both resource specifiers, while the object is either a resource specifier or raw data (if one had one's name stored somewhere, one could use either then name itself or the resource specifier referring to where it is store when using it as an object, for instance).

You may or may not be familiar with another technology being used in the Semantic Web (its quite popular among librarians, for instance), the Dublin Core Metadata specification, which can be expressed in RDF.

The importance of the Semantic Web is that it makes it possible for programs to decipher meaning in resources such as web pages, documents, maps, et cetera. For instance, if the relevant documents are marked up using Semantic Web technologies, one could search for the restaurants mentioned in a particular book which are in a certain city and serve fettucine alfredo. Not only would such queries be possible, they would be easy.

Here's a 'simple' document using several RDF/Semantic Web technologies to describe a bit about me and my relation with hatrack/jatraqueros.

<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:foaf="http://xmlns.com/foaf/0.1/">

<foaf:Person>
<foaf:name>Russell Duhon</foaf:name>
<foaf:title>Mr</foaf:title>
<foaf:firstName>Russell</foaf:firstName>
<foaf:surname>Duhon</foaf:surname>
<foaf:nick>fugu13</foaf:nick>
<foaf:mbox_sha1sum>b5af28278eea1df56e008f649438701eb05318cb</foaf:mbox_sha1sum>
<foaf:homepage rdf:resource="http://fugu13.com"/>
<foaf:depiction rdf:resource="http://homepage.mac.com/fugu13/me.jpg"/>
<foaf:schoolHomepage rdf:resource="http://www.wustl.edu"/>
<foaf:aimChatID>fugu13</foaf:aimChatID>

<foaf:knows>
<foaf:Person>
<foaf:name>Jamie Taylor</foaf:name>
<foaf:mbox_sha1sum>04355f35cb7e897146fd0e005a992f4e3cc75610</foaf:mbox_sha1sum>
<foaf:aimChatID>mackillian</foaf:aimChatID>
</foaf:Person>
</foaf:knows>

<foaf:holdsAccount>
<foaf:OnlineAccount rdf:about="http://www.hatrack.com/ubb/cgi/ultimatebb.cgi?ubb=get_profile;u=00002859">
<foaf:accountServiceHomepage rdf:resource="http://www.hatrack.com/ubb/forum/ultimatebb.php?ubb=forum;f=2"/>
<foaf:accountName>fugu13</foaf:accountName>
</foaf:OnlineAccount>
</foaf:holdsAccount>

</foaf:Person>

<foaf:Group>
<foaf:name>Jatraqueros</foaf:name>
<foaf:homepage rdf:resource="http://www.hatrack.com/ubb/forum/ultimatebb.php?ubb=forum"/>

<foaf:member>
<foaf:Person>
<dc:identifier>fugu13</dc:identifier>
<foaf:name>Russell Duhon</foaf:name>
<foaf:mbox_sha1sum>b5af28278eea1df56e008f649438701eb05318cb</foaf:mbox_sha1sum>
</foaf:Person>
</foaf:member>

<foaf:member>
<foaf:Person>
<dc:identifier>mackillian</dc:identifier>
<foaf:name>Jamie Taylor</foaf:name>
<foaf:mbox_sha1sum>04355f35cb7e897146fd0e005a992f4e3cc75610</foaf:mbox_sha1sum>
</foaf:Person>
</foaf:member>

</foaf:Group>

</rdf:RDF>

A minor explanation -- the SHA sums are instead of emails. Its possible to know an SHA sum from an email, but not vice versa, and SHA sums are in a one to one ratio with the emails they are for. This means they're still useful as a globally unique identifier, can be compared to emails (make an SHA of the email and compare the SHA's), and don't reveal people's emails.
Remember, this entire thing breaks down into 'triples' of subject, predicate, and object. I'd post the triple breakdown, but the w3's rdf parser is currently down.

However, you can think of some of the triple being like this:

personresource has name Russell Duhon
(in this triple, personresource is the subject, name is the predicate, and Russell Duhon is the object)

personresource1 knows personresource2

personresource2 has name Jamie Taylor

See how these three triples of data make it possible to relate me to Jamie? What's more importantly, they're program understandable pieces of data. Using known-unique identifiers for people (email, AIM ID, that sort of thing), an RDF program can stitch together the relationships of people in huge maps of data.

The applications of RDF I show here are mainly people oriented, but RDF is a generalized technology -- it can hold semantic data about anything. As new RDF technologies are developed, more and more relationships can be formalized, and most importantly, translated and connected to each other.

For instance, there could be an RDF technology for recipes, and one for units. They would have triples like:
sixteen ounces is the same as one pound.
reciperesource1 is named Chocolate Chunk cookies
reciperesource1 has ingredient ingredient1
ingredient1 measures 8 ounces
ingredient1 type is Chocolate Chunks

Then imagine your grocery store uses the cooking ingredients and monetary RDF applications to describe their offerings. They might have triple like:

Nestles Chocolate Chunks type is Chocolate Chunks
Nestles Chocolate Chunks measures 1/4 pound.

Say you wanted to make chocolate chunk cookies, and you knew you didn't have any chocolate chunks, so you told your computer to order enough chocolate chunks to make chocolate chunk cookies. The computer would go Chocalate Chunk cookies ingredient with type chocolate chunks requires 8 ounces. Grocery store ingredient with type chocolate chunks measurs 1/4 pound. 1/4 pound is 4 ounces. 8 ounces is 2 * 1/4 pound.

And so on, using your credit card data from your private data store, and the price which would also be associated with Nestles Chocolate Chunks, it would order the Nestles Chocolate Chunks (2 packages) so as you would have enough for chocolate chunk cookies.

Here's an idea of the basics of the sorts of stuff that can already be done with Semantic Information:

http://beta.plink.org/profile.php?id=910a4f5ef7f0e062ae1fcab1ca981d14ef40563e

Its important to remember that this was not all in one file or anything; this information was compiled by tying together multiple people's files using common identifiers. You can browse all sorts of information on the people by using the links.

Furthermore, this information is site agnostic. Any site can import such RDF files and relate the data. So a business could have FOAF (friend of a friend, the particular RDF technology used on plink) files for all its employees, and dynamically create organizational trees and other organizational data using different criteria.

Posts: 15770 | Registered: Dec 2001 | IP: Logged |

saxon75
Member
Member # 4589

posted

This is interesting stuff. Thanks fugu!

For some reason completely unfathomable to me, I'm reminded of the Geek Code. I am unfortunately not able to remember my Geek Code off the top of my head.

Edit: Ahh... here we go:

------BEGIN GEEK CODE BLOCK------
Ver: 3.1
GE d+(-) s+:+>+: a-- C++(+++) UL P L++ !E---
W++(+++)>$ N o? K- w+(++) !O M V- PS+() PE+
Y+ PGP- t++ 5+@ X- R(++) tv+> b+>+++ DI++ D+
G e++>++++ h--- r+++ z+++(--)
------END GEEK CODE BLOCK------

[ March 18, 2004, 11:54 PM: Message edited by: saxon75 ]

Posts: 4534 | Registered: Jan 2003 | IP: Logged |

fugu13
Member
Member # 2859

posted

not so oddly, there's a FOAF element for a person's geek code [Smile]

Posts: 15770 | Registered: Dec 2001 | IP: Logged |

saxon75
Member
Member # 4589

posted

A question: Is this substantively different from XML, or is it more of an implementation? If it is different, what are the major differences?

Posts: 4534 | Registered: Jan 2003 | IP: Logged |

xnera
Member
Member # 187

posted

I'm familiar with FOAF, because LiveJournal just implemented FOAF support. Yes, I am a LiveJournal geek. [Smile]

Posts: 1805 | Registered: Jun 1999 | IP: Logged |

fugu13
Member
Member # 2859

posted

XML is a syntax specification. RDF is a logical language for making statements. XML shows the hierarchical organization of data. RDF expresses the meaning of data.

It is possible (and common) to express RDF using XML (as I did in the document above), creating a format called RDF/XML.

This makes particular sense because XML is both easily parsed and easily embedded in other xml documents, of which there are many. While there are other ways of expressing RDF which are as easy (or easier) to parse, they are not so easily embedded in XML, particularly on the same "level of parsing", so to speak.

[ March 19, 2004, 12:15 AM: Message edited by: fugu13 ]

Posts: 15770 | Registered: Dec 2001 | IP: Logged |

saxon75
Member
Member # 4589

posted

I admit to being kind of a bonehead when it comes to XML. The main thing I have trouble wrapping my head around is: how is this useful? I find it terribly interesting, but I just can't figure it out.

For example, I used to use XML to store data for several of the pages at my website (this was before I got the domain name). For example, I stored the "Weekly" Wisdom in a database of sorts using XML, then used JavaScript to parse the XML and present the data in an attractive format. But now that I have other tools at my disposal, I find it much more convenient to use a MySQL database to store my website's content.

I get that XML presents a standard way of encoding information, but I just don't get what that enables.

Posts: 4534 | Registered: Jan 2003 | IP: Logged |

fugu13
Member
Member # 2859

posted

First, XML.

XML is not a data store, though it can be useful as such for XML documents. XML is a markup language. Specifically, it is an (the) eXtensible Markup Language.

It is useful for marking up documents and data, extensibly (using technologies such as namespaces). For instance, if I might enclose a title in a pair of tags like so: <title>My title</title>, and later on in the document I might enclose some quote in tags like so: <quote>my quote</quote>.

This really doesn't work very well with the mysql data model, particularly if you consider all these tags are part of a larger document that's mostly text. (also, xml does hierarchical data structures, particularly ones containing arbitrary additional data, very well. MySQL sucks at those)

Now, these tag names are arbitrary, meaning only what I use them to mean (that is, if I treat any tag called title like I want a title to be treated, it "is" a title, just as much as a tag named boobies would "be" a title if I treated it as a title in my programming).

However, it is also possible to specify tags which have generally expected meanings. For instance, there's the Dublin Core Metadata specification. We might assign the specification to a namespace called 'dc'. So then we could refer to 'title as its meant in the sense the dublin core metadata specification means it' by saying <dc:title>My Title</dc:title>. Now anyone can open xml files and find the titles in the sense the dublin core metadata specification uses them by finding the dc:title tags (actually, the title tags in the dublin core namespace, but that's just being more exact).

Now we're starting to get into RDF. RDF is about using generally accepted tags following a certain relationship pattern to encode meaning. Read up above where I'm talking about the grocery store. Imagine trying to make the grocery store's system interoperate with your computer's recipe needs without knowing anything other than it uses MySQL and, say, PHP. All one has to know is that the grocery store uses RDF, and which RDF technologies it uses (in this case, Units and Ingredients), and one can interoperate. That's just not possible with things like MySQL.

RDF, XML, and such are about interoperability. XML is about encoding data in a commonly understood, yet marked up way (unlike plain text, which carries essentially no structural information). RDF is about encoding meaning using commonly understood, low level vocabularies which may be used to build much larger constructs.

Posts: 15770 | Registered: Dec 2001 | IP: Logged |

fugu13
Member
Member # 2859

posted

For instance, consider you have a system which can combine user login data from other sources. It stores login information in XML, using 'login' to mean the login specifier namespace (which I just made up)

So you might have a login file that looks like so:
<logins xmlns="http://mysite/default/namespace" xmlns:login="http://example.org/login">
<login:entity>
<login:username>Jim</login:username>
<login:password>Samuel<login:password>
</login:entity>
.
.
.
</logins>
Now, whenever I want to (for instance), find the password of someone with a particular username, I can use the xpath expression '/logins/login:entity/login:password[parent::login:username="Jim"]'

It makes sense to store a system like this in MySQL, however, remember we combine login information from many different sites. How are they going to send us their login information? We could just specificy a format, but what if the format is problematic, or there are a lot of sites that collect login information, all in different formats?

This is where our XML becomes useful. A person might send us the following file to be merged with ours:

<users xmlns="http://his.site.default.namespace" xmlns:user="http://example.org/login" xmlns:dc="http://purl.org/dc/elements/1.1/">
<user:entity>
<user:username>Lucy</user:username>
<user:password>Gimpy</user:password>
<dc:title>President</dc:title>
</user:entity>
.
.
.
</users>

First, we can pick out all the entity's in the http://example.org/login namespace, even though he used a different identifier (perhaps he used the login identifier elsewhere). Now, we could pick out the username and password elements from there and just insert those, but why do the work? We can just insert the entity element we picked out as follows:

<logins xmlns="http://mysite/default/namespace" xmlns:login="http://example.org/login" xmlns:dc="http://purl.org/dc/elements/1.1/">
<login:entity>
<login:username>Jim</login:username>
<login:password>Samuel<login:password>
</login:entity>
<login:entity>
<login:username>Lucy</user:username>
<login:password>Gimpy</user:password>
<dc:title>President</dc:title>
</login:entity>
.
.
</logins>

Our XML library handles converting the namespace identifier to what we're using locally, and adds the dublin core namespace now that we have an element in it.

Even with this additional, foreign data (the dc: title element), we are still able to use the exact same xpath expression as before, we did not have to perform time consuming and test requiring modifications to a MySQL table schema, et cetera. In fact, I could write a program to perform this example's combination in around ten lines of code. All I would have to know of the documents being accepted is that they had entity elements in the http://example.org/login namespace. I woud collect all those document by selecting with the xpath expression '//login:entity' -- even though the documents might assign the name user, or francis, to the same namespace, because I would have already told my parser that by login I meant the http://example.org/login namespace! Then I would just insert them as children of my logins element. The other 6 or 7 lines of code would be bookkeeping, mainly.

Posts: 15770 | Registered: Dec 2001 | IP: Logged |

rivka
Member
Member # 4859

posted

It looks kinda like English . . .

*head asplodes!*

Posts: 32919 | Registered: Mar 2003 | IP: Logged |

TomDavidson
Member
Member # 124

posted

To translate: XML is really only important if you're interested in sharing data with people who might not be using the same system to store data that you are. If you're doing something for your own personal site, do whatever you want -- but keep in mind that XML zealots will regard you as an eccentric island of ultimately encoded information. [Smile]

Posts: 37449 | Registered: May 1999 | IP: Logged |

fugu13
Member
Member # 2859

posted

Please note: the following has been mangled by adding a to the beginning of tags so ubb likes it.

There are other uses of xml, to. For instance, marking up documents for later data mining. Or marking up a document in a basic, content centered form to be transformed to different formats (html, pdf, wml, rss, et cetera) on demand.

Or just being able to take advantage of all the tools available for xml (such as xpath and xslt).

To give an example, here's how I create http://fugu13.com/consulting (warning, not yet done) using an xml publishing system called cocoon. I'll use the news page.

First, a fragment of the 'site map'

<amap:match pattern="cms/*.html">
<amap:aggregate element="cms">
<amap:part src="cms/header.xml"/>
<amap:part src="cms/menu.xml"/>
<amap:part src="cms/{1}.xml" label="cms-print"/>
<amap:part src="cms/sidebar.xml"/>
<amap:part src="cms/footer.xml"/>
<a/map:aggregate>
<amap:transform src="cms/fullpagetransform.xsl">
<amap:parameter name="rss" value="rss/{1}.rss"/>
<amap:parameter name="print" value="print/{1}.html"/>
<a/map:transform>
<amap:serialize/>
<a/map:match>

Whenever someone requests anything of the format cms/*.html (I'll leave off the cms from now on, its on everything, its just a common directory) cocoon first aggregates together 4 files. header.xml, menu.xml, footer.xml, and {1}.xml, where the {1} stands for whatever the * was.

From there, cocoon transforms the resulting document using an xslt stylesheet called fullpagetransform.xml. It passes two values to the stylesheet, rss/{1}.rss and print/{1}.html . As these are request data, the presentation layer shouldn't have to deal with them except as data -- which makes it a lot easier to move them around on the page and such. Anywhere I want one of those values to appear, I just put <axsl:value-of select="$rss"/> (or $print).

Here's the news.xml document that is used when someone requests the news.html page.

<acontent>
<apage-title>News<a/page-title>
<aarticle anchor="newsite">
<atitle>Sunday, March 14, 2004: Fugu Consulting Site Launched<a/title>
<apara>
The Fugu Consulting web site was launched today. It is generated using the Apache Cocoon XML
web development framework, and then deployed as static pages to the server. It uses several
Semantic Web applications, including RSS and FOAF.
<a/para>
<a/article>
<a/content>

Very simples, I have my root tag, content, then children called page-title and article, which has children called title and para. This is a very, very simple document, though my format for content documents allows them to get somewhat more complex.

Now here's a few fragments from fullpagetransform.xsl:

<ahead>
<atitle>
<axsl:value-of select="content/page-title"/><axsl:text> :: <a/xsl:text><axsl:value-of select="header/site-title"/>
<a/title>

First, tags that I just want to appear I just write, such as <aheada> and <atitlea>. What this does is it takes the value of the page-title child of the content section, and the value of the site-title child from the header section, and from them makes the title. To see the result, simply look at the title of http://fugu13.com/consulting/news.html

<atable>
<atr>
<atd valign="top">
<adiv id="menu">
<axsl:apply-templates select="menu"/>
<a/div>
<a/td>
<atd valign="top">
<adiv id="content">
<axsl:apply-templates select="content"/>
<a/div>
<a/td>
<atd valign="top">
<adiv id="sidebar">
<axsl:apply-templates select="sidebar"/>
<a/div>
<a/td>
<a/tr>
<a/table>

This is a section for putting all the other content in place. <axsl:apply-templates select="content"/> is like a function call. It applies the template that matches content to content. Which leads us to our next section:

<axsl:template match="content">
<aul id="paramenu">
<axsl:for-each select="article[@anchor]">
<ali>
<aa>
<axsl:attribute name="href">
<axsl:text>#<a/xsl:text>
<axsl:value-of select="@anchor"/>
<a/xsl:attribute>
<axsl:value-of select="title"/>
<a/a>
<a/li>
<a/xsl:for-each>
<axsl:text> <a/xsl:text>
<a/ul>
This first part of the content matching template does something very simple -- it constructs a list of links to the different articles on the page, but only the ones I have given an anchor attribute to. If I don't want an article to appear here, I delete or comment out its anchor tag. Each link is given the name of the article.

<axsl:for-each select="article">
<aa>
<axsl:attribute name="name"><axsl:value-of select="@anchor"/><a/xsl:attribute>
<a/a>

This comes basically right after, its the beginning of a section to display each article -- this is the part that constructs the anchor tag for the list of articles with anchor tags to connect to.

One of the big advantages of xslt is that it is functional. I don't need to manage the values of my variables or make sure I call functions in the right order or somesuch. Values are values, and all I have to do is remember the scope I'm in (very easy to do in xslt, the select statements make it very explicit) to remember how to reference them.

I also have a part of my site map that notices when someone requests a document in the form print/*.html . When someone's printing a document, all they need is the content. Here's the entire print stylesheet:

<a?xml version="1.0"?>
<axsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<axsl:param name="normal"/>
<axsl:template match="content">
<ahtml>
<ahead>
<atitle>
<axsl:value-of select="page-title"/>
<a/title>
<alink rel="Stylesheet" href="print-page.css" type="text/css" />
<a/head>
<abody>
<adiv id="content">
<aa>
<axsl:attribute name="href"><axsl:text>../<a/xsl:text><axsl:value-of select="$normal"/><a/xsl:attribute>
<axsl:text>Return to normal page.<a/xsl:text>
<a/a>
<axsl:for-each select="article">
<adiv>
<ah3><axsl:value-of select="title"/><a/h3>
<axsl:for-each select="para">
<ap><axsl:value-of select="."/><a/p>
<a/xsl:for-each>
<a/div>
<a/xsl:for-each>
<a/div>
<a/body>
<a/html>
<a/xsl:template>
<a/xsl:stylesheet>

There's a title, the link to the print css stylesheet, in the body a link back the normal version of the page (I stored the value in $normal in the site map), and then for each article I print the title, followed by the paragraphs, one after another.

To see the result on the news page, just visit http://fugu13.com/consulting/print/news.html .

Posts: 15770 | Registered: Dec 2001 | IP: Logged |

fugu13
Member
Member # 2859

posted

The above probably looks really complicated. The thing is, it isn't. It is verbose, but the actual logic is very straightforward.

Quite simply, I find that for many purposes using XML as a common format to be retransformed into other format is very, very useful.

Would I still use an SQL backend for a large, content driven site? sure. But I'd use it through an opaque XML database that used the SQL database as its backend. The overhead is worth the access to the huge array of tools and technologies that are XML based.

For smaller, well defined sites I'm perfectly happy with using SQL directly (well, through a problem domain specific SQL interfacing object; using SQL all over the place is messy and error prone). The reason I'm not for my consulting site is I want to be able to keep near the forefront of Semantic Web development with it, and that is made so much easier by making XML a first class principle with the site. For instance, soon I'm going to add an automatically generated dublin core metadata file for each page.

Posts: 15770 | Registered: Dec 2001 | IP: Logged |

saxon75
Member
Member # 4589

posted

You know what, rivka? I'm right there with you.

-----

Most of what I know about programming in general and web development in specific has been self-taught. The problem, though, with teaching yourself something is that you don't always have the best teacher.

Posts: 4534 | Registered: Jan 2003 | IP: Logged |

TomDavidson
Member
Member # 124

posted

The sad thing about this -- speaking as a big fan of the democratic web -- is that it will become increasingly harder for laypeople to create their own websites that will comply with the tools being used without using development tools designed for the process.

The increased verbosity is no problem for professionals, but I can see it turning off a whole generation of wannabe webbers.

Posts: 37449 | Registered: May 1999 | IP: Logged |

fugu13
Member
Member # 2859

posted

A note on below: all parentheses have been changed to {} in order to get ubb to shut up.

Oh, and a bit more on storing documents. If you're just sticking documents of various types in a data store it doesn't make sense to come up with an SQL schema for each type. In fact, if each document is pretty much some text with embedded markup, its not really possible to come up with a schema for each type. If you just throw the files in as plain text, you've got a problem if you ever want to do any processing of them.

For instance, say you have a lot of fairy tales stored. Perhaps you want to know all the titles. If the documents are stored in XML its easy --

<xsl:for-each select="/stories/story">
<xsl:value-of select="title"/>
</xsl:for-each>

You could also use another method of working with XML, such as the DOM, or SAX, instead of XSLT.

Then say you wanted to know the title of all the stories for heros with names starting with "A" (perhaps you're doing some odd linguistics study). (I am of course assuming a particular choice of markup elements. There are many possible ones that make sense, and any that make sense would be similarly easy to query).

<xsl:for-each select="/stories/story/character[@type='hero' and starts-with{name, 'A'}">

<xsl:text>Hero: </xsl:text>
<xsl:value-of select="name"/>

<xsl:text>\nStory: </xsl:text>
<xsl:value-of select="parent::title"/>

<xsl:text>\n\n\n</xsl:text>
</xsl:for-each>

(I included some spacing characters just to make it a tad more realistic).

By using XML markup you just have to markup any aspects of the document you think you might want to reference later. You don't have to worry about exactly how you're going to parse it -- so long as the markup makes sense with the document, you'll be able to very easily get at any aspect of the document you want to look at.

What's more, you have a huge number of tools for doing that parsing, in any computer language. And so long as your XML files are easily accessible, you're not tied to anything and can make queries (or your successors can make queries) in whatever way they find most comfortable and easiest.

Posts: 15770 | Registered: Dec 2001 | IP: Logged |

saxon75
Member
Member # 4589

posted

Of course, I'm sure that there are any number of elitist-type geeks out there who will view this as a good thing.

[Edit: That was to Tom.]

[ March 19, 2004, 11:27 AM: Message edited by: saxon75 ]

Posts: 4534 | Registered: Jan 2003 | IP: Logged |

saxon75
Member
Member # 4589

posted

Oh, and another thing: Back in the so-called "Golden Age of Computing," most software types were essentially hobbyists. As software as a field became more mature and complexity increased, seat-of-the-pants hacking became less and less common in commercial and industrial software design, but even now there are thousands of hobbyists out there hacking out code. And given the popularity of the open source movement, many of these amateur programmers are involved with large and interesting projects. It doesn't seem like there's much reason to suspect that this will change in the future.

Posts: 4534 | Registered: Jan 2003 | IP: Logged |

fugu13
Member
Member # 2859

posted

Tom -- that's a pretty big concern for many of the proponents of the Semantic Web as well, which is why the current strategies are to integrate RDF into common tools, make it an optional addition instead of required, and provide automatic creation tools such as foaf-a-matic:

http://www.ldodds.com/foaf/foaf-a-matic.html

As for integrating RDF, many blogging tools automatically generate RDF documents (embedded) for blogs and create RSS feeds (another Semantic Web tech) automatically. LJ recently integrated FOAF so it is automatically generated at specific URLs. Now all one has to do is make an lj bot and it will be able to quickly aggregate huge stores of RDF data.

For instance, my LJ FOAF page is here: http://www.livejournal.com/community/lj_nifty/96165.html

A 'FOAF-aware' tool could use that file to quickly find hundreds of LJ's of people at any degree of separation from me. Consider if you're researching 'link spreading'. Using LJ FOAF files and a bit of page parsing you can follow how popular links spread among groups of friends.

Posts: 15770 | Registered: Dec 2001 | IP: Logged |

TomDavidson
Member
Member # 124

posted

The problem is that more and more of these people are going to view web design as a "project." This will, unfortunately -- or fortunately, depending on your outlook -- go a long way towards standardizing web development.

If you're typing a story to put up on your web page, do you WANT to use XML markup? Does the idea of putting tags around the first appearance of every character appeal to you? If not, do you want to use a special XML-enabled program to write your story?

If not, you're out of this loop. (I speak here for a moment of myself; I like to do all my stuff in notepad, just as a matter of course. I can do this -- with some irritation -- for CSS and the like, but it would be RIDICULOUS to XMLify my site in notepad.)

Don't get me wrong: I think XML development is a wonderful, beautiful thing. I just think that UBIQUITOUS XML -- which is the goal, of course, of the Semantic Web -- will suppress individual creativity.

[ March 19, 2004, 11:42 AM: Message edited by: TomDavidson ]

Posts: 37449 | Registered: May 1999 | IP: Logged |

fugu13
Member
Member # 2859

posted

I think you'll find that you're already behind the times on personal web sites, Tom. People are using, for the most part, blogs of various kinds to put their presence on the web. The blogs on blog sites are highly customizable, and the tendency has become to trade styles, modifying them to one's taste.

And most of the blog sites embed RDF markup of the page.

Also, you misunderstand my story example. I'm not suggesting everyone do it, and neither are most proponents of the semantic web (xml is just a detail for them, anyways [Wink]

). However, if you're intending to store documents yourself for later retrieval, XML makes a lot of sense. And more and more tools are doing parts of it automatically. For instance, you might be fascinated to learn that the next version of MS Office uses extensive metadata which is designed to from the ground up to be represented with xml.

Should a normal person mark up their stories like that? Not by hand, no. But if they use popular story writing software X, it might be done automatically for them. And there are people who would want to mark up their old documents by hand to make them better accessible later, such as academics, and corporations. The promise of XML is not just in interchange was the point I was making.

Posts: 15770 | Registered: Dec 2001 | IP: Logged |

fugu13
Member
Member # 2859

posted

To clarify a bit on the blogs issue -- when there were no good ways to make a site your own, writing your own homepage in notepad allowed you to express your creativity. Now, CSS, templates and blog software makes it possible to do just about anything your typical amateur would be able to do anyways, with much less hassle.

People who are able to do more, do more. But for most of the people who are moving to the web today, CSS, templates, and blogs allow them to express their creativity much better than they were able to when writing pages by hand, when they were having way too much fun with the <blink> tag.

Posts: 15770 | Registered: Dec 2001 | IP: Logged |

fugu13
Member
Member # 2859

posted

And there are many popular tools out there which could have RDF integrated into them -- address book software, for instance, could generate FOAF files which people could publish to places like plink. Google could return RDF with its search results, tagging descriptions and titles. Chat software could also turn out FOAF files, or even better, could use the common FOAF format generated by Address Book software (and an RSS based system of managing buddy information) to compile buddy lists. Calendars could use the RDF format of vCal to interchange data (using the RDF version makes it possible for any RDF enabled software to use the data is why that rather than another format of vCal). Online games could publish FOAF files. Recipe applications could use a Recipe RDF format, and recipe books could provide the files for their recipes in it.

Et cetera.

RDF is a breakthrough because it provides standard formats with easily parsed meaning. Most standard formats out there do not carry meaning well. RDF does.

Posts: 15770 | Registered: Dec 2001 | IP: Logged |

Pod
Member
Member # 941

posted

Saxon: i think the problem is that the "semantic" web concept is actually quite mundane. The issue at hand here is basically, how can you encode -everything- that a website is about in some sort of format that can be deterministically processed?

The issue that i have with doing this, is that alot of this information is -extremely- boring. Fugu might find it handy and interesting to express all of those people who he's connected through, out to two or three levels of indirection, but frankly, to me, (and no offense to fugu) i'm not terribly interested in who the friends of his friends are.

Ultimately, the message that comes out of this sort of endevor, is simply that meta-data can be useful for expressing relationships between marked up items. And that to me isn't terribly surprising.

The reason i don't find this interesting, is because in effect, all this is doing is taking properties we know about objects, and encoding them explicitly as a property of this object. This just means we're expanding our notion of what an object is. This has the added complication that these tasks take a -lot- of time and effort (i did this as a job for a while).

The truly interesting task is to take the objects as they are, and attempt to figure out the relationships between them. And, at least in linguistics, there are efforts to do this, with tools like Latent Semantic Analysis, which figures out what items are similar simply by where they occur, and where they -dont- occur (well and a bunch of non-linear math).

[ March 19, 2004, 12:14 PM: Message edited by: Pod ]

Posts: 4482 | Registered: May 2000 | IP: Logged |

Pod
Member
Member # 941

posted

Tom:

Get a text editor with macros [Razz]

Or, i'm still extremely efficient with copy & paste.

Posts: 4482 | Registered: May 2000 | IP: Logged |

fugu13
Member
Member # 2859

posted

The interesting thing about this is, it does it in a commonly understandable way, and it enables so much more.

Right now, hardly any of the information available to humans is available to a given program in a way that it can "understand" the meanings (parse the meanings is a better way of putting it).

By providing RDF data for as much as possible, we are increasing the possible scope of programs many-fold. Truly ubiquitous computing is dependent on ubiquitous meaning. A computer can order stuff for you using easy descriptions -- for instance, order one pound of chocolate chunks and do not pay more than $5 -- if the grocery store uses RDF. It doesn't have to have an interface specific to the grocery store, all it needs to know is that the grocery store speaks in the Payment, Ingredient, and Measurement vocabularies of RDF -- which are low level, commonly understood vocabularies.

You may not want to know who my friends are, but say you're in a city and need to see a mechanic. Wouldn't it be interesting to know if any friends of yours or friends of friends of yours were or recommended mechanics in the same geographical area?

These sorts of things are already under development.

Oh, another way for applications to produce RDF I though of -- bookmarks lists in browsers. While being in the root bookmark folder doesn't mean much, pages in the same folder together bear some sort of relationship to each other -- and even vague associations like that can be encoded in RDF and leverage to, for instance, create a huge online directory for finding related web pages. Imagine if your browser had a "related pages" button which, when clicked, asked "bookmark-google" to return all pages in this vast store of related web page that were repeatedly related to the page you were on.

Posts: 15770 | Registered: Dec 2001 | IP: Logged |

TomDavidson
Member
Member # 124

posted

"For instance, you might be fascinated to learn that the next version of MS Office uses extensive metadata which is designed to from the ground up to be represented with xml."

Not really, no. [Smile]

We were with Office 2003 from the beta. *grin* For corporate purposes, XML is nifty-keen.

The thing is, on a personal level, I can't imagine trying to encode every element of my life, and I'm vaguely suspicious of people who have the time to try. *grin*

[ March 19, 2004, 12:39 PM: Message edited by: TomDavidson ]

Posts: 37449 | Registered: May 1999 | IP: Logged |

fugu13
Member
Member # 2859

posted

Pod -- its not that met-data can be used for expressing meaning, its that if we use metadata to express meaning the potential power of computing skyrockets.

Posts: 15770 | Registered: Dec 2001 | IP: Logged |

fugu13
Member
Member # 2859

posted

Actually, I'm talking about the one intended to run on Longhorn [Big Grin]

. You think you've seen metadata now . . .

Posts: 15770 | Registered: Dec 2001 | IP: Logged |

fugu13
Member
Member # 2859

posted

Tom -- its not about encoding every element of your life, its about having the parts of it that are already encoded available in a format that conveys meaning -- for instance, posts on a weblog, friends on livejournal, email contacts in your address book, bookmarks in your browser. All these and more can be encoded by their programs -- without you having to worry about it -- and made available in RDF format.

Posts: 15770 | Registered: Dec 2001 | IP: Logged |

TomDavidson
Member
Member # 124

posted

I just shudder to think that people might for a moment think I'd CARE about their contacts, bookmarks, or daily blogs. *laugh*

I guess, when it really comes down to it, it's that I DON'T want the Web to be ubiquitous in people's lives -- at least, not to the extent that people are going to try to make the details of their lives ubiquitous to everyone else. [Smile]

There's a REASON I'm not exhibitionistic enough to have a blog. *grin* And I'm highly uncomfortable with the thought that people might WANT to see who's in my address book.

Posts: 37449 | Registered: May 1999 | IP: Logged |

Pod
Member
Member # 941

posted

Let me paraphrase one of my professors

Professor: "How long do you think it would take you to part of speech tag a sentence."
my Friend: "i don't know... about a minute"
Professor: "okay, lets say you worked 40 hours a week then. At that pace it'd take you the duration of your graduate school career to tag the entire Penn Tree-Bank [a corpus of part of speech tagged sentences]."

You see the problem here?

You very well can mark up all of the text you use. But the investment required to make all the interesting meta-data we might want explicit is just so vast that it's absolutely fiction to believe that it's useful.

Devices that process meta-data are indeed useful, however, devices that can see the relationships that the meta-data provide, without us having to provide them are what we need. (and really if you ahve such a device, you could use it to create meta-data)

Posts: 4482 | Registered: May 2000 | IP: Logged |

fugu13
Member
Member # 2859

posted

Taking an example closer to home, consider forum posts. The amount of meaning available in forum posts is stupendous.

Forum posts are related to each, contain common-topic information that may be easily decoded (links), are linked to people through unique identifiers (at least forum name, and possibly email (through SHA if a person wants to keep it private), AIM ID, homepage, and others), are time specific (making all sorts of interesting things possible in terms of data searching), and I'm certain have more data that could be encoded with ease.

If forums were to provide that information in RDF format, aggregating sites could do all sorts of tricks with the data.

Posts: 15770 | Registered: Dec 2001 | IP: Logged |

Pod
Member
Member # 941

posted

Also, boo for Microsoft meta-data. Co-opting data formats for one's own propritary gain is scummy, and really, defeats the entire purpose of having uniform syntactic standards.

Posts: 4482 | Registered: May 2000 | IP: Logged |

TomDavidson
Member
Member # 124

posted

Evil, creepy, nosy things, yes. [Smile]

Posts: 37449 | Registered: May 1999 | IP: Logged |

fugu13
Member
Member # 2859

posted

Pod -- the idea isn't to have every bit of information be marked up, the idea is to have data we already know what its meaning is be marked up as such automatically.

Time it takes to encode your weblog in RDF: 0.

Time it takes to encode the relationships in your bookmarks in RDF (assuming browsers start supporting it): 0.

Time it takes to encode forum posts in RDF (assuming forum software does it automatically): 0.

Sure, if you thought a particular piece of information was important to have encoded and there wasn't a tool to do it you might do it by hand, but for the most part encoding information would be transparent to the user.

In other words, the problem doesn't exist because the sorts of things proponents of the semantic web want marked up can be marked up automatically and transparently. Its a bogeyman of metadata, not a real problem.

[ March 19, 2004, 12:59 PM: Message edited by: fugu13 ]

Posts: 15770 | Registered: Dec 2001 | IP: Logged |

fugu13
Member
Member # 2859

posted

Pod -- while I don't like the MS XML format, its a perfectly acceptable XML format. And the reason I don't like it is that its feature complete (for Word documents, at least), and so many of the features in Word are badly chosen for an XML document format.

A document format designed from the ground up to be XML based would be much less crufty, but that this one is crufty isn't the fault of the people doing the XML implementation, and it is standards compliant.

Posts: 15770 | Registered: Dec 2001 | IP: Logged |

Pod
Member
Member # 941

posted

Fugu:

I'm just not sold. If it can be marked up automatically, doesn't that jsut mean that it's already unambiguous and searchable?

Also, why so excited about MSXML? what ever happened to LaTeX? You're an OSS geek aren't you?

Posts: 4482 | Registered: May 2000 | IP: Logged |

Pod
Member
Member # 941

posted

Fugu:

I think you need a better example than forum data, cause i don't think Tom or i are terribly interested in searching them.

My opinion of the matter is that there are two types of data, ambiguous and unambiguous. Unambiguous data should already have some sort of heuristic property that can be used to index the data. Ambiguous data can't be uniquely indexed and so needs to have meta-data associated with it for two possible reasons, to standardize the format, or to disambiguate the data (or both). Performing either task on ambiguous data is hard, and requires either alot of ingenuity or a lot of hard work. In either regard, meta-data is merely a tool, and the real work is done by some other device.

Posts: 4482 | Registered: May 2000 | IP: Logged |

Pod
Member
Member # 941

posted

really, i'm just skeptical that this isn't just another case of people over-selling XML formats. Sure meta-data is wonderful, but deriving interesting meta-data is either hard or time consuming, not to mention that it requires a standard for markup that everyone has to agree on. And thats a entirely different bag of worms.

Posts: 4482 | Registered: May 2000 | IP: Logged |

fugu13
Member
Member # 2859

posted

I'm not excited about MS XML, its just a good example of the increasing use of XML and RDF (I have not yet verified if it can output RDF, but the transform, at least for the metadata, should be easy).

Its not about Tom or you being interested in searching them, its about applications which can search them having the data available. For instance, did you read my mechanic example? An application for finding people who do and recommendations for various services in a given geographic data who are also within someone's circle of friends/acquaintances would be incredibly powerful.

As for the encoding nature, reread some of my posts. The point isn't just that its encoded in an accessible format, but that its encoded in an accessible and commonly understood format.

For instance, take bookmarks. Remember my bookmark-based related pages search function above (you can't tell me many people wouldn't find that insanely useful)?

Well, all the bookmark files certainly carry the necessary data already, which is part of the point -- but they're in their own formats! For the data to be commonly accessible, it needs to be in a commonly accessible format.

Okay, so now we have "bookmark-xml" that all browsers can export their bookmarks in, so we can easily aggregate this data.

But wait, what if we want to make a note of which bookmarks are people's homepages, and which are commercial sites, et cetera. That information's not encoded in the bookmark files -- but it is encoded in other types of files.

However, how do we relate bookmark-xml with those other formats? Well, it would be easy if all the formats were vocabularies of a common, root, meaning-carrying encoding. That common , root, meaning carrying encoding is called RDF.

Yes, the information is available already -- in formats that any non-specialized program can't make heads or tails of! How is a text parser supposed to know that on one site <h2> refers to the title of a book, and on another <h2> refers to the title of a movie? Well, the sites that have the data know -- such as amazon and IMDB. If they were to provide their data in RDF so that the meaning was clear to programs, rather than just the presentation, it would be possible to, for instance, quickly find books written by actors (hey, that could be very useful for a journalist, and its off the cuff) without requiring complicated data format parsing particular to each site's pages.

Posts: 15770 | Registered: Dec 2001 | IP: Logged |

fugu13
Member
Member # 2859

posted

The format's already agreed upon, Pod, its called RDF, and its been accepted as a standard by most of the major players.

Posts: 15770 | Registered: Dec 2001 | IP: Logged |

Pod
Member
Member # 941

posted

Okay, but my point Fugu, is that you can already trip across that <h2> go "hmm, don't know what this is" toss it into the amazon or imdb search field, and pop out a result, which if you'll note, clearly specifies the format of the object in question.

If thats the case, why do we need a new meta-wrapper around all data? I agree, more intelligent searches are a good idea, but what we need are more powerful search tools, not just reformatting information we already know into explicit meta-data. that is my point. We shouldn't have to reformat all of the web to do this. I mean, in a way, that's the point of what google is.

Also, with the example of a good mechanic, i don't know...

what's the point? Word of mouth? General statistics i think would be nicer. Also with this foaf stuff, you have the potential for exponential explosion of search results, with no way of ordering them either. As a search heuristic, i'm not convinced that it helps any.

It just sounds too idealistic to me.

Posts: 4482 | Registered: May 2000 | IP: Logged |

BannaOj
Member
Member # 3206

posted

Ok while I'm an engineer I'm not terribly computer programming savvy.

To me, as a lay person, I'm trying to figure out how this matters to me.

I am currently attempting to build a website to display my show dogs (Cardigan Welsh Corgis). I'm using adobe Pagemaker.

How does all this metadata help me? Does it help me join webrings? How does it network me with other cardigan corgi people? Many of the Corgi people are less computer savvy than me, and have annoying cutsey music websites on free server space with popup adds. They rarely blog, they are too busy out showing dogs. I highly doubt they are going do be doing this sort of encoding

So I repeat, what does this do for me?

AJ

Posts: 11265 | Registered: Mar 2002 | IP: Logged |

BannaOj
Member
Member # 3206

posted

What would be cool is automatic pedigree generation of dogs, based on the meta data though.
All you'd have to do is put in the sire and dam and get the pedigree spit out complete with links to each generations dogs.

But that would once again require lots of coding of mostly amateurs that would look at you funny if you said the word "meta-data" to begin with.

AJ

Posts: 11265 | Registered: Mar 2002 | IP: Logged |

Bokonon
Member
Member # 480

posted

Bah, BeOS had (non-XML) metadata built into it's filesystem since 1995.

Longhorn is just a Johnny-Come-Lately.

-Bok

Posts: 7021 | Registered: Nov 1999 | IP: Logged |

TomDavidson
Member
Member # 124

posted

Bok, almost ALL operating systems have had metadata built into the file system for years. It's just that some OSes integrate more data than others. [Smile]

Posts: 37449 | Registered: May 1999 | IP: Logged |

fugu13
Member
Member # 2859

posted

It's not about human understanding, its about program understanding!

You, as a human, can understand that bit about h2 based on context, because you are human. For a program to do that is exceptionally complex. Why should programs have to write complex parsers which are error prone to figure out semantic data when sites like amazon already have all that semantic data available, just not yet presented in a program understood generic format?

Its about making information available to programs, not hiding the information only for humans like we do now.

We need a new meta-wrapper because the current meta-wrappers aren a semantic wrappers. They are (usually) presentation wrappers. That's like asking why we need html when we have image files -- after all, any information that can be presented in an html can be presented as a picture.A semantic wrappers is a completely different format from any non-semantic wrapper, just as an html file is a completely different format from an image file. Notice that the main job of a web browser is to convert html into images -- why on earth should we have those images? After all, we already have the html! We have the images because the images are easier for humans to understand. Similarly, we should have RDF because it is easier for programs to understand (where by understand we mean parse for meaning).

The idea that something is already encoded once so it should never be reencoded is absurd. After all, sound files come in multiple formats. Why? because different formats have different advantages!

As for the bookmark thing, the point is I am saying how to make a better search engine.

lets take a few givens -- in a few years, browsers export and automatically upload (if its turned on) RDF files of their bookmarks. This data is kept in an online repository(not too hard).

Now say someone wanted to know closely related pages to a given page. They might query for pages in the same non-main folder as the page for at least 100 users. Obviously the optimal algorithm would be more complex, but you get the idea.

The thing is, the details of a person's query would be behind the scenes. What they'd actually do is press a button on their browser or click a link on a page, and be taken to the results.

This could be greatly expanded very easily -- they could uplad their FOAF file(s), for instance, and any page that was bookmarked by a friend or friend of a friend would be marked in some way. Et cetera.

Thats all that would be required. People uploading their bookmark files in RDF, people optionally uploading their FOAF files in RDF, a rather trivial amount of parsing, and ta-da! Relational search results generated by the very best web spiders -- humans.

Of course we need more powerful search tools, but we also need to realize that there are huge amounts of data out there which just need some slight reformatting to be made easily understandable by programs.

You're suggesting that we only use semantic data insofar as we parse it using complex, intensive, and error prone algorithms. Why not also make that data available in an easily parseable format in the first place? Why require all that work? Most of the metadata is already there, just not presented in a way machines can understand. Why continue the obfuscation?

Posts: 15770 | Registered: Dec 2001 | IP: Logged |

fugu13
Member
Member # 2859

posted

Banna -- there are probably pedigree tracking programs out there already. Things like that would be possible using online repositories of the results from such programs in an RDF format. People wouldn't have to write the metadata by hand at all.

Posts: 15770 | Registered: Dec 2001 | IP: Logged |

fugu13
Member
Member # 2859

posted

Pod, you've written computer programs, right?

Okay, we'll consider an example program using RDF.

Say we want to find all the authors of all the books that were made into films that came out in 2003.

Well, in an RDF program, in pseudocode, we might do this:

code:

imdb = connect(imdb-rdf-query-url)

titlesquery = triples(--result--:titlesof:books, books:madeinto:movies, movies:cameoutin:2003)

titles = imdb.query(titlesquery)

amazon = connect(amazon-rdf-query-url)

authorsquery = triples(--result--:authorsof:books, books:havetitles:*titles*)

authors = amazon.query(authorsquery)

print tripleresults(--result--:namesof:*authors*)

That's pretty much it. We're talking maybe 20 lines of code once you include boilerplate (closing connections, using the correct uri's -- several of the words in the triples above are actually URIs, but that wouldn't add any LOC, that sort of thing).

Conciseness and meaning like this are just not possible in the current way of doing things -- constructing, executing, and parsing a query to imdb, then using some results from that for a query to amazon is complicated. RDF uncomplicates it.

Furthermore, notice that there's nothing site specific about the RDF triples. Its all about the meaning of the data. Any site that can understand RDF and the particular low level vocabularies (books and movies) can understand those triples.

[ March 19, 2004, 03:56 PM: Message edited by: fugu13 ]

Posts: 15770 | Registered: Dec 2001 | IP: Logged |

This topic comprises 2 pages: 1 2

Printer-friendly view of this topic

Contact Us | Hatrack River Home Page

Copyright © 2008 Hatrack River Enterprises Inc. All rights reserved.
Reproduction in whole or in part without permission is prohibited.

Powered by Infopop Corporation
UBB.classic™ 6.7.2