Sunday, April 30, 2006

Thanks to early supporters

We've had 1.5 handfuls of folks contact us from around the U.S. interested in helping with the Cyc Foundation. We're grateful for whatever help we can get. Broader support will probably start showing up once the first game is out there on the Web and once Wikipedia linking is underway. Of course, it would help to have help to get those things moving. So, we're dealing with a chicken and the egg kind of a thing. And not even Cyc knows which came first.

Friday, April 21, 2006

But that's not how people think...

I've often heard that complaint when I describe that Cyc uses deduction to do reasoning. That's not how people think. First of all, I don't want to claim that Cyc "thinks" at all, but sometimes its expedient to use expressions like that.

As for how people think, who knows? There's a lot of research in that area, but we really don't know yet how people think. I suspect it's quite different than the processing that goes on in the Cyc inference engine. But who cares? It was never a goal to emulate human beings; it was merely a goal to in some sense understand the world of humans.

Consider the airplane. It is the man-made version of a bird, only faster, more powerful and controllable. Where are its feathers? Why don't its wings flap?

Many of man's tools serve the purpose of amplifying some existing capability. The telephone amplifies the ear and mouth. The bicycle amplifies power of the legs. A technology such as Cyc can amplify the power of the mind even if it doesn't work at all like one.

Thursday, April 20, 2006

"We've Got to Think About the Game"

I said I'd go into more detail about each of the activities in the Cyc Foundation's Cyclify initiative. The purpose that cuts across all activities is to grow the Cyc Knowledge Base. Initially, the focus is on what we call breadth or coverage. We want everything one could think of to have a concept term or a way of forming one from functional terms. A concept term is something like #$CreditCard. A term built functionally is something like (#$BorderBetweenFn #$France #$Germany). The '#$' is something used internally by Cyc to distinguish Cyc terms from other kinds of symbols. If you're not doing programming, you don't have to use it. But sometimes it helps as a way of knowing that you're talking about #$France the Cyc term as opposed to "France" the string or France the country.

Anyway, we want to fill up Cyc with concepts and with useful facts about the concepts. As a result of the way development proceeded with fits and starts based on where funding was coming from, the coverage in Cyc is very broad but spotty. There may be terms for #$Parsley, #$Sage and #$Rosemary, but #$Thyme might be missing. (Sounds like a song.) As for facts, Cyc may know that yelling is louder than talking, but not know that talking is typically louder than whispering.

The games will be one way of helping to fill in the content. The first game presents the player with statements for which she must choose True, False or Skip. Since many of the questions are common sense things that we all know, challenge is introduced in the form of time pressure. The first game will also have other question types; for example, one type allows the player to choose several answers.

The game is fed by statements that you might call prospective facts. There are several ways to come up with these. One is to have Cyc use natural language generation to create partial search strings -- "The adult elephant can weigh as much as *" -- and parse what you get back. Another is to use abduction -- find things that might possibly be true and propose that they are true. The statements that feed the game are in what we call "the pipeline".

There are issues in creating the pipeline. We want to create useful prospective facts; and we also want to be able to categorize them, so that you as a player won't keep getting shown statements that you know nothing about. That's a challenge we need to address presently.

I'll continue with more about the game next time.

P.S. Bonus points for whoever can tell what musical the title of this post came from.

Monday, April 17, 2006

What will we be doing?

Cyclify, the grass-roots initiative of the Cyc Foundation, is composed of the following projects:
  • Games for growing the knowledge base
  • Wikipedia knowledge collection
  • Pairing subject-matter experts (SMEs, pronounced "smees") with ontologists
  • Alignment with other sources (for example, WordNet)
I'll make a separate post for each of these, in order to go into more detail. Something that applies to all of them is the fact that they depend on the help of people who are not experts in artificial intelligence. We still need the experts, but they'll be working side-by-side with people who are life-smart. Experts in other fields, experts at being human (or at least much better at it than computers), and people who can help with coordinating, supporting and motivating everyone's play.

Outside of Cyclify, the Cyc Foundation will be working on issues related to standards, ResearchCyc coordination and other things.

We'll be using the Basecamp program, which was developed with Ruby on Rails. We'll be setting up accounts for people who join the Cyc Foundation. You can request membership by sending an email with your background and interests to johndcyc@gmail.com.

Sunday, April 16, 2006

New Article in New Scientist about Cyc

New Scientist just published a new article about Cyc in which they say, "Cycorp has also just launched a trivia game for the public that will help fill in gaps in Cyc's knowledge."

The Cyc Foundation is being entrusted with the responsibility for further development of the game. We're currently designing a new front-end for the game that will provide an arcade-like feel. We hope it will make game play more compelling. And we're planning to extend the kinds of knowledge that the game will gather.

This will be a new genre in gaming: Games that Matter, where every question you answer brings the world a little closer to a truly intelligent computer.

[By the way, a Cyc Foundation website is in the works and is expected next month. In the meantime, we may set up at least a home page in the next few days where we'll be able to collect a little information from those of you who would like to help with this effort.]

21st Century Glasnost and Perestroika

While I was in college in the mid-Eighties, there was a buzz in the air about Gorbachev's glasnost and perestroika. Those were times filled with anticipation of changing relationships, although few truly knew how much change was coming.

Changes at Cycorp over the last few years remind me of those times. The iron curtain on Cyc is coming down with the release of the full Cyc ontology into open source (glasnost), and there is an opportunity for the Cyc ontology, developed over 20 years, to contribute to the restructuring of the Web (perestroika) that is taking place.

This new glasnost extends beyond the release of an ontology that can be used and extended by all. The entire Cyc system is now available for free for research purposes; and, although inference engine source code is not included, there are over 18,000 functions and macros available for the research community to work with, including those that support natural language parsing and generation. As with Gorbachev, the glasnost is the result of a changing attitude toward working with the outside world.

Cycorp's "Reykjavik Summit" occurred in 2001, when DARPA invited many of the leading minds of ontology and artificial intelligence to Austin for a summit. It was essentially Cyc's coming out party -- a time for Cycorp to say, "We know we've been keeping mostly to ourselves. We're ready to share, and we'd like to know what you think about what we've been working on." Present were Marvin Minsky, Ed Feigenbaum, Ron Brachman (meeting organizer), George Miller, Bill Woods, Deborah McGuinness, Hector LeVesque, Scott Fahlman, Danny Bobrow, John Sowa, Fritz Lehmann and more. John McCarthy added his two cents in a separate visit. McCarthy published the paper Programs with Common Sense back in 1959, in which he asserted, "In order for a program to be capable of learning something it must first be capable of being told it."

By the end, even those expected to be the biggest critics agreed that they would like to get their hands on as much of the Cyc technology as possible -- especially the knowledge base content. That was the beginning of the ResearchCyc project, and it was when the decision was made to have OpenCyc get all of the concept terms that were released in ResearchCyc.

So, that was glasnost. Perestroika is coming in the form of The Cyc Foundation, which has a goal of working cooperatively with as much of the "Web 2.0" community as bandwidth permits. For example, we're working on a web services interface for the Cyc API. We'll be linking Cyc concepts with Wikipedia concepts and, as a result, providing a new way to navigate Wikipedia. We hope to use Ruby on Rails on the interface that administers workflow (we call it "playflow", since it supports games) for the games that add knowledge to the Cyc knowledge base. And we'd like to work with the social tagging community to give them a way to use a shared tagset without giving up the ease of use and social networking that they are accustomed to. In all cases, we'll have to explore with the rest of the Web community how to capitalize on the advantages of our respective technologies.

The wall has been knocked down. I anticipate a time of even greater change, and I'm looking forward to the next several years!

Saturday, April 15, 2006

The Semantics of Semantics

Bill Jarrold of SRI sent me some comments about my presentation on the 13th (see below), and I'd like to respond. He said the presentation was pretty good, but noted:

Some people will differ with your characterization that OWL contains no semantics. People are working on adding rules to OWL and OWL-Full is quite descriptive. OWL-DL is much weaker, but is computationally pretty good (description logics run in polynomial time). But, in spirit, you are right. From what little I know, Tim Berners Lee seems to urge everyone to keep moving, that through common use we will eventually arise at some sort of folksonomy like effect.
Okay, I want to clarify what I meant when I was talking about Semantic Web standards.

With regard to representing the meaning in documents, I made the claim that, despite a number of new W3C standards, we're in the same situation as we were with Electronic Data Interchange (EDI) 20 years ago. That's not true. The W3C standards are actually a huge advance over EDI (by which I really mean X12). Currently, however, they don't aim to deal with the issue of a common vocabulary that (within the business domain) EDI focused on for 25 years. There is nothing inherent in the W3C standards that keeps us from taking that extra step, so I'm excited that the Cyc Foundation will be able to offer a part of the solution to that issue.

People often talk about the Semantic Web by comparing "syntax" to "semantics." I divide the knowledge representation problem into syntax, vocabulary, ontology and semantics (as defined in my previous blog post). It's not completely accurate, but accuracy can be the enemy of clarity sometimes. :-)

OWL has support for semantics. OWL-Full has a quite a bit more support for semantics than OWL-DL. Neither contains a lot of meaning about things in the world, because the intention is to rely on ontologies expressed in OWL. It is up to users of OWL to add the meaningful terms that depend on the semantics that OWL provides.

As a result, we have a proliferation of ontologies from which, it is hoped, a common set of meaningful terms will emerge. At this point, there is some meaning in each of the ontologies, but there is not a shared meaning across ontologies.

I'm going to save discussion of folksonomies and emergent semantics for another post. For now, suffice it to say: I don't oppose ground-up development of ontologies, and there is no inherent contradiction between doing that and having a unifying hub ontology. I look forward to working with the OWL community in creating a sustainable, semantically rich Web.

Syntax, Semantics and In Between

I'm about to post a response to a comment I received about Thursday night's presentation, but first I want to define some terms. (The fact that I have to try to establish this common ground vocabulary in order to discuss these issues ironically argues my point for me, as I hope you'll see in the next post.) I invite corrections to my naive definitions, as long as the corrections can be stated in something close to English.

Syntax is expressed by the simple grammar rules you get in any computer language. (If you want to be more technical, I'm talking about the kind of simple prescriptive grammars that can be defined with BNF.) OWL and CycL also have a syntax that can be defined with simple grammar rules.

Vocabulary is an agreed upon set of terms. There may be an implied connection to the things in the world the terms refer to ("clock" is a device we use to tell time), but it is not required that the terms be interrelated in any way. OWL intentionally has a quite small vocabulary. Vocabularies are handled by a proliferation of OWL ontologies. Electronic Data Interchange (EDI) has a large vocabulary of business terms. Cyc currently has a much smaller business vocabulary that EDI. It is also smaller than the sum of the vocabularies in OWL-based ontologies. Cyc has a very large vocabulary of terms that refer to things in the everyday world.

Ontology is a formal set of statements, built from a vocabulary, and about the things that the vocabulary terms refer to in the world. If we have a vocabulary that includes "dog" and "mammal", an ontology made from that vocabulary would have a statement that the set of all dogs (referred to by the vocabulary term "dog") is a subset of the set of all mammals (referred to by the vocabulary term "animal"). More simply, a dog is a kind of animal.

Finally, semantics (for the purposes of comparing knowledge representation choices) refers to the meanings of statements ("Your mother's brother is your uncle.") expressed in a form suitable for logical manipulation: (implies (and (mother ?X ?M) (brother ?M ?B)) (uncle ?X ?B)). I argue that an important distinction can be made between support for semantics and semantics. OWL has support for semantics (and a small amount of actual semantics involving the language primitives), and every related ontology adds to the actual semantics expressed using OWL. Cyc contains the support for semantics as well as the ontology content that gives it actual semantics. Cyc is compatible with OWL and can extend its semantics the semantics of OWL in the same way that any other ontology can.

Friday, April 14, 2006

Cyc Foundation announced at Cyclify meeting

Here's an announcement we posted today on the OpenCyc site.

Austin, TX. April 14, 2006 --

A new independent non-profit organization was announced Thursday night at the monthly meeting of Cyclify-Austin, the Cyc User's group. The Cyc Foundation is now forming to manage the OpenCyc ontology and to grow the ontology and knowledge base exponentially with the help of volunteers from all walks of life.

Foundation president, John De Oliveira, compared the Foundation's "Cyclify" effort to the Wikipedia project. He said, "The Wikimedia Foundation asks us to 'Imagine a world in which every single person is given free access to the sum of all human knowledge.' In the Cyclify project, led by The Cyc Foundation, we ask you to imagine a world in which every single person is given free access to programs that reason with the sum of all human knowledge."

PPT Slides of the Announcement at the Cyclify Meeting
(download first & follow along)


Podcast of the Announcement at the Cyclify Meeting
(duration 1 hr. 18 mins.)

Thursday, April 13, 2006

Portait of a Past Future

Portraits of me and my 6 brothers and sisters were done by an oil painter while we were on a trip to Florida when I was about 7 years old. He purposefully painted us to look a few years older than we actually were at the time (I forget why). We never quite turned out looking that way. As a result, I've got an example of what someone in the past thought (a part of) the future would be like.