Data Strategy

July 27, 2007

From information extraction to semantic representation

Filed under: Information Retrieval, Search — chucklam @ 6:14 pm

I suppose this is a huge coincidence that on the day I posted videos of Powerset’s first public demo, I get this workshop announcement in my inbox:

MASTERING THE GAP
From Information Extraction to Semantic Representation

http://tev.itc.it/mtg2007.html
—————————————————————-
The Workshop focuses on the interface between the information extracted from content objects and the semantic layer in which this information is explicitly represented in the form of ontologies and their instances. The Workshop will provide an opportunity for discussing adequate methods, processes (pipelines) and representation formats for the annotation process.

Automating the process of semantic annotation of content objects is a crucial step for bootstrapping the Semantic Web. This process requires a complex flow of activities which combines competences from different areas. The Workshop will focus precisely on the interface between the information extracted from content objects (e.g., using methods from NLP, image processing, text mining, etc.) and the semantic layer in which this information is explicitly represented in the form of ontologies and their instances. The workshop will provide an opportunity for: discussing adequate methods, processes (pipelines) and representation formats for the annotation process; reaching a shared
understanding with respect to the terminology in the area; discussing the lessons learned from projects in the area, and putting together a list of the most critical issues to be tackled by the research community to make further progress in the area.

Some of the possible challenges to be discussed at the workshop are:

  • How can ontological/domain knowledge be fed back into the extraction process?
  • How can the semantic layer be extended by the results of information extraction (e.g. ontology learning)?
  • What are the steps of an annotation process, which steps can be standardized for higher flexibility? Which parts are intrinsically application-specific?
  • What are the requirements towards formats for representing semantic annotations?
  • Where is the borderline of automation and how can it be further pushed?
  • How can the linking with the semantic layer be supported on the concept/schema level as well as on the instance level?
  • How can knowledge extracted from different sources with different tools and perhaps different reference ontologies (interoperability) be merged (semi-)automatically?
  • How can extraction technologies for different media (e.g. text and images) be combined and how can the merged extraction results be represented in order to create synergies?

I don’t have much expertise in information extraction or semantic web, but it seems like the challenges stated are quite appropriate for Powerset to look at.

Advertisements

1 Comment »

  1. You should take a look at http://www.linguisticagents.com. It’s a start-up company that has developed a natural language understanding technology that will be used in many applications in addition to search. This technology uses a deep parsing algorithm that is based on nano-syntax technology.

    Comment by Andy — July 29, 2007 @ 12:11 am


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Create a free website or blog at WordPress.com.

%d bloggers like this: