Thursday, March 13, 2008

This blog has moved

Blogging on this topic has moved to The Cyc Foundation blog.

Tuesday, August 01, 2006

The Usability Barrier

At the AAAI conference in Boston, Peter Norvig of Google raised a concern about the Semantic Web to Tim Berners-Lee. The concern has primarily to do with ease of use.

We're not going to have a smarter Web until and unless the new tools are easy enough to use. Until then, we'll have Web 2.0, as described here. As Zombini explains, these Web 2.0 technologies already have semantic qualities, but they are held back by "using older, more limited technologies because of our lack of understanding and desire to learn complex new languages."

Cyc Foundation Update

A lot has been happening at the Cyc Foundation, even though nothing has been happening with this blog. Some folks have expressed confusion over where to go for Cyc Foundation information. People who want to get their hands dirty and participate can just email me (johnd at cycfoundation.org). Tell me a little about yourself and why your interested, and we'll add you to the mailing list and get you access to the wiki.

Better yet, we now have a server that is operational. So, the logical next step will be to put up a website. We're working on it, along with several other projects.

  • We're making an application that will let people (and some background programs) determine links between Wikipedia article titles and Cyc concept terms. This will grow the Cyc ontology and provide a way to access Wikipedia content semantically.
  • We're getting ready to publish the Cyc ontology as a Semantic Web resource, with one URI per concept.
  • We're creating REST Web services for easy access to create, read, update, delete and query Cyc concepts, facts and rules.

Further descriptions of all of these are available on the members-only Wiki and will also be available on the website.

Friday, May 26, 2006

First Skypecast

We had 10 programmers from around the country on a Skypecast (conference call) for 90 minutes last night discussing creating web services for Cyc. We've got some issues to work out with microphones and with getting on the call in the first place. Still, it was a worthwhile call.

We'll be talking again next Wednesday (May 31, 2006) at 8:30pm Central Standard Time. Central time is one hour earlier than the East Coast and two hours later than the West Coast. It's a public call, so anyone can join, but we usually mute people to improve sound quality. There's chat that happens alongside the audio, and we may use VNC to do screen sharing next time. To join the call, you first have to join Skype (for free). Go to www.skype.com. Then, on the Skype home page, there's a link for skypecasts. Then look for "Find a Skypecast" "starting soon". Or you can just visit the link: https://skypecasts.skype.com/skypecasts/upcoming

We don't need to get your skype address ahead of time. You just click your way into the call. Once in, we may add you to a group chat, and if you want to participate in Cyclify, you can put your contact info into chat.

Hope to see you there!

P.S. Please test how your microphone sounds before joining the call. Do this by placing a test call to the skype number: echo123. Most people have their mikes way too close at first, and once you're on the call it's hard to tell.

Saturday, May 20, 2006

Natural Language Understanding is Hard

Sorry for the gap in posts. I will try to get more consistent.

If you ever wonder why it is so difficult for computers to understand natural language, check out this article by George Miller, creator of WordNet. In one example from his article, he points out that the following couplet from a Robert Frost poem . . .
"But I have promises to keep, and miles to go before I sleep."
. . . has over 3 trillion possible meanings.

Sunday, April 30, 2006

Thanks to early supporters

We've had 1.5 handfuls of folks contact us from around the U.S. interested in helping with the Cyc Foundation. We're grateful for whatever help we can get. Broader support will probably start showing up once the first game is out there on the Web and once Wikipedia linking is underway. Of course, it would help to have help to get those things moving. So, we're dealing with a chicken and the egg kind of a thing. And not even Cyc knows which came first.

Friday, April 21, 2006

But that's not how people think...

I've often heard that complaint when I describe that Cyc uses deduction to do reasoning. That's not how people think. First of all, I don't want to claim that Cyc "thinks" at all, but sometimes its expedient to use expressions like that.

As for how people think, who knows? There's a lot of research in that area, but we really don't know yet how people think. I suspect it's quite different than the processing that goes on in the Cyc inference engine. But who cares? It was never a goal to emulate human beings; it was merely a goal to in some sense understand the world of humans.

Consider the airplane. It is the man-made version of a bird, only faster, more powerful and controllable. Where are its feathers? Why don't its wings flap?

Many of man's tools serve the purpose of amplifying some existing capability. The telephone amplifies the ear and mouth. The bicycle amplifies power of the legs. A technology such as Cyc can amplify the power of the mind even if it doesn't work at all like one.

Thursday, April 20, 2006

"We've Got to Think About the Game"

I said I'd go into more detail about each of the activities in the Cyc Foundation's Cyclify initiative. The purpose that cuts across all activities is to grow the Cyc Knowledge Base. Initially, the focus is on what we call breadth or coverage. We want everything one could think of to have a concept term or a way of forming one from functional terms. A concept term is something like #$CreditCard. A term built functionally is something like (#$BorderBetweenFn #$France #$Germany). The '#$' is something used internally by Cyc to distinguish Cyc terms from other kinds of symbols. If you're not doing programming, you don't have to use it. But sometimes it helps as a way of knowing that you're talking about #$France the Cyc term as opposed to "France" the string or France the country.

Anyway, we want to fill up Cyc with concepts and with useful facts about the concepts. As a result of the way development proceeded with fits and starts based on where funding was coming from, the coverage in Cyc is very broad but spotty. There may be terms for #$Parsley, #$Sage and #$Rosemary, but #$Thyme might be missing. (Sounds like a song.) As for facts, Cyc may know that yelling is louder than talking, but not know that talking is typically louder than whispering.

The games will be one way of helping to fill in the content. The first game presents the player with statements for which she must choose True, False or Skip. Since many of the questions are common sense things that we all know, challenge is introduced in the form of time pressure. The first game will also have other question types; for example, one type allows the player to choose several answers.

The game is fed by statements that you might call prospective facts. There are several ways to come up with these. One is to have Cyc use natural language generation to create partial search strings -- "The adult elephant can weigh as much as *" -- and parse what you get back. Another is to use abduction -- find things that might possibly be true and propose that they are true. The statements that feed the game are in what we call "the pipeline".

There are issues in creating the pipeline. We want to create useful prospective facts; and we also want to be able to categorize them, so that you as a player won't keep getting shown statements that you know nothing about. That's a challenge we need to address presently.

I'll continue with more about the game next time.

P.S. Bonus points for whoever can tell what musical the title of this post came from.

Monday, April 17, 2006

What will we be doing?

Cyclify, the grass-roots initiative of the Cyc Foundation, is composed of the following projects:
  • Games for growing the knowledge base
  • Wikipedia knowledge collection
  • Pairing subject-matter experts (SMEs, pronounced "smees") with ontologists
  • Alignment with other sources (for example, WordNet)
I'll make a separate post for each of these, in order to go into more detail. Something that applies to all of them is the fact that they depend on the help of people who are not experts in artificial intelligence. We still need the experts, but they'll be working side-by-side with people who are life-smart. Experts in other fields, experts at being human (or at least much better at it than computers), and people who can help with coordinating, supporting and motivating everyone's play.

Outside of Cyclify, the Cyc Foundation will be working on issues related to standards, ResearchCyc coordination and other things.

We'll be using the Basecamp program, which was developed with Ruby on Rails. We'll be setting up accounts for people who join the Cyc Foundation. You can request membership by sending an email with your background and interests to johndcyc@gmail.com.

Sunday, April 16, 2006

New Article in New Scientist about Cyc

New Scientist just published a new article about Cyc in which they say, "Cycorp has also just launched a trivia game for the public that will help fill in gaps in Cyc's knowledge."

The Cyc Foundation is being entrusted with the responsibility for further development of the game. We're currently designing a new front-end for the game that will provide an arcade-like feel. We hope it will make game play more compelling. And we're planning to extend the kinds of knowledge that the game will gather.

This will be a new genre in gaming: Games that Matter, where every question you answer brings the world a little closer to a truly intelligent computer.

[By the way, a Cyc Foundation website is in the works and is expected next month. In the meantime, we may set up at least a home page in the next few days where we'll be able to collect a little information from those of you who would like to help with this effort.]

21st Century Glasnost and Perestroika

While I was in college in the mid-Eighties, there was a buzz in the air about Gorbachev's glasnost and perestroika. Those were times filled with anticipation of changing relationships, although few truly knew how much change was coming.

Changes at Cycorp over the last few years remind me of those times. The iron curtain on Cyc is coming down with the release of the full Cyc ontology into open source (glasnost), and there is an opportunity for the Cyc ontology, developed over 20 years, to contribute to the restructuring of the Web (perestroika) that is taking place.

This new glasnost extends beyond the release of an ontology that can be used and extended by all. The entire Cyc system is now available for free for research purposes; and, although inference engine source code is not included, there are over 18,000 functions and macros available for the research community to work with, including those that support natural language parsing and generation. As with Gorbachev, the glasnost is the result of a changing attitude toward working with the outside world.

Cycorp's "Reykjavik Summit" occurred in 2001, when DARPA invited many of the leading minds of ontology and artificial intelligence to Austin for a summit. It was essentially Cyc's coming out party -- a time for Cycorp to say, "We know we've been keeping mostly to ourselves. We're ready to share, and we'd like to know what you think about what we've been working on." Present were Marvin Minsky, Ed Feigenbaum, Ron Brachman (meeting organizer), George Miller, Bill Woods, Deborah McGuinness, Hector LeVesque, Scott Fahlman, Danny Bobrow, John Sowa, Fritz Lehmann and more. John McCarthy added his two cents in a separate visit. McCarthy published the paper Programs with Common Sense back in 1959, in which he asserted, "In order for a program to be capable of learning something it must first be capable of being told it."

By the end, even those expected to be the biggest critics agreed that they would like to get their hands on as much of the Cyc technology as possible -- especially the knowledge base content. That was the beginning of the ResearchCyc project, and it was when the decision was made to have OpenCyc get all of the concept terms that were released in ResearchCyc.

So, that was glasnost. Perestroika is coming in the form of The Cyc Foundation, which has a goal of working cooperatively with as much of the "Web 2.0" community as bandwidth permits. For example, we're working on a web services interface for the Cyc API. We'll be linking Cyc concepts with Wikipedia concepts and, as a result, providing a new way to navigate Wikipedia. We hope to use Ruby on Rails on the interface that administers workflow (we call it "playflow", since it supports games) for the games that add knowledge to the Cyc knowledge base. And we'd like to work with the social tagging community to give them a way to use a shared tagset without giving up the ease of use and social networking that they are accustomed to. In all cases, we'll have to explore with the rest of the Web community how to capitalize on the advantages of our respective technologies.

The wall has been knocked down. I anticipate a time of even greater change, and I'm looking forward to the next several years!

Saturday, April 15, 2006

The Semantics of Semantics

Bill Jarrold of SRI sent me some comments about my presentation on the 13th (see below), and I'd like to respond. He said the presentation was pretty good, but noted:

Some people will differ with your characterization that OWL contains no semantics. People are working on adding rules to OWL and OWL-Full is quite descriptive. OWL-DL is much weaker, but is computationally pretty good (description logics run in polynomial time). But, in spirit, you are right. From what little I know, Tim Berners Lee seems to urge everyone to keep moving, that through common use we will eventually arise at some sort of folksonomy like effect.
Okay, I want to clarify what I meant when I was talking about Semantic Web standards.

With regard to representing the meaning in documents, I made the claim that, despite a number of new W3C standards, we're in the same situation as we were with Electronic Data Interchange (EDI) 20 years ago. That's not true. The W3C standards are actually a huge advance over EDI (by which I really mean X12). Currently, however, they don't aim to deal with the issue of a common vocabulary that (within the business domain) EDI focused on for 25 years. There is nothing inherent in the W3C standards that keeps us from taking that extra step, so I'm excited that the Cyc Foundation will be able to offer a part of the solution to that issue.

People often talk about the Semantic Web by comparing "syntax" to "semantics." I divide the knowledge representation problem into syntax, vocabulary, ontology and semantics (as defined in my previous blog post). It's not completely accurate, but accuracy can be the enemy of clarity sometimes. :-)

OWL has support for semantics. OWL-Full has a quite a bit more support for semantics than OWL-DL. Neither contains a lot of meaning about things in the world, because the intention is to rely on ontologies expressed in OWL. It is up to users of OWL to add the meaningful terms that depend on the semantics that OWL provides.

As a result, we have a proliferation of ontologies from which, it is hoped, a common set of meaningful terms will emerge. At this point, there is some meaning in each of the ontologies, but there is not a shared meaning across ontologies.

I'm going to save discussion of folksonomies and emergent semantics for another post. For now, suffice it to say: I don't oppose ground-up development of ontologies, and there is no inherent contradiction between doing that and having a unifying hub ontology. I look forward to working with the OWL community in creating a sustainable, semantically rich Web.

Syntax, Semantics and In Between

I'm about to post a response to a comment I received about Thursday night's presentation, but first I want to define some terms. (The fact that I have to try to establish this common ground vocabulary in order to discuss these issues ironically argues my point for me, as I hope you'll see in the next post.) I invite corrections to my naive definitions, as long as the corrections can be stated in something close to English.

Syntax is expressed by the simple grammar rules you get in any computer language. (If you want to be more technical, I'm talking about the kind of simple prescriptive grammars that can be defined with BNF.) OWL and CycL also have a syntax that can be defined with simple grammar rules.

Vocabulary is an agreed upon set of terms. There may be an implied connection to the things in the world the terms refer to ("clock" is a device we use to tell time), but it is not required that the terms be interrelated in any way. OWL intentionally has a quite small vocabulary. Vocabularies are handled by a proliferation of OWL ontologies. Electronic Data Interchange (EDI) has a large vocabulary of business terms. Cyc currently has a much smaller business vocabulary that EDI. It is also smaller than the sum of the vocabularies in OWL-based ontologies. Cyc has a very large vocabulary of terms that refer to things in the everyday world.

Ontology is a formal set of statements, built from a vocabulary, and about the things that the vocabulary terms refer to in the world. If we have a vocabulary that includes "dog" and "mammal", an ontology made from that vocabulary would have a statement that the set of all dogs (referred to by the vocabulary term "dog") is a subset of the set of all mammals (referred to by the vocabulary term "animal"). More simply, a dog is a kind of animal.

Finally, semantics (for the purposes of comparing knowledge representation choices) refers to the meanings of statements ("Your mother's brother is your uncle.") expressed in a form suitable for logical manipulation: (implies (and (mother ?X ?M) (brother ?M ?B)) (uncle ?X ?B)). I argue that an important distinction can be made between support for semantics and semantics. OWL has support for semantics (and a small amount of actual semantics involving the language primitives), and every related ontology adds to the actual semantics expressed using OWL. Cyc contains the support for semantics as well as the ontology content that gives it actual semantics. Cyc is compatible with OWL and can extend its semantics the semantics of OWL in the same way that any other ontology can.

Friday, April 14, 2006

Cyc Foundation announced at Cyclify meeting

Here's an announcement we posted today on the OpenCyc site.

Austin, TX. April 14, 2006 --

A new independent non-profit organization was announced Thursday night at the monthly meeting of Cyclify-Austin, the Cyc User's group. The Cyc Foundation is now forming to manage the OpenCyc ontology and to grow the ontology and knowledge base exponentially with the help of volunteers from all walks of life.

Foundation president, John De Oliveira, compared the Foundation's "Cyclify" effort to the Wikipedia project. He said, "The Wikimedia Foundation asks us to 'Imagine a world in which every single person is given free access to the sum of all human knowledge.' In the Cyclify project, led by The Cyc Foundation, we ask you to imagine a world in which every single person is given free access to programs that reason with the sum of all human knowledge."

PPT Slides of the Announcement at the Cyclify Meeting
(download first & follow along)


Podcast of the Announcement at the Cyclify Meeting
(duration 1 hr. 18 mins.)

Thursday, April 13, 2006

Portait of a Past Future

Portraits of me and my 6 brothers and sisters were done by an oil painter while we were on a trip to Florida when I was about 7 years old. He purposefully painted us to look a few years older than we actually were at the time (I forget why). We never quite turned out looking that way. As a result, I've got an example of what someone in the past thought (a part of) the future would be like.

Sunday, December 22, 2002

Test Post

This is a test. This is my first blog entry, so I'm just testing how this works. The site for my primary project is here.