I suppose this is a huge coincidence that on the day I posted videos of Powerset’s first public demo, I get this workshop announcement in my inbox:
MASTERING THE GAP
From Information Extraction to Semantic Representation
The Workshop focuses on the interface between the information extracted from content objects and the semantic layer in which this information is explicitly represented in the form of ontologies and their instances. The Workshop will provide an opportunity for discussing adequate methods, processes (pipelines) and representation formats for the annotation process.
Automating the process of semantic annotation of content objects is a crucial step for bootstrapping the Semantic Web. This process requires a complex flow of activities which combines competences from different areas. The Workshop will focus precisely on the interface between the information extracted from content objects (e.g., using methods from NLP, image processing, text mining, etc.) and the semantic layer in which this information is explicitly represented in the form of ontologies and their instances. The workshop will provide an opportunity for: discussing adequate methods, processes (pipelines) and representation formats for the annotation process; reaching a shared
understanding with respect to the terminology in the area; discussing the lessons learned from projects in the area, and putting together a list of the most critical issues to be tackled by the research community to make further progress in the area.
Some of the possible challenges to be discussed at the workshop are:
- How can ontological/domain knowledge be fed back into the extraction process?
- How can the semantic layer be extended by the results of information extraction (e.g. ontology learning)?
- What are the steps of an annotation process, which steps can be standardized for higher flexibility? Which parts are intrinsically application-specific?
- What are the requirements towards formats for representing semantic annotations?
- Where is the borderline of automation and how can it be further pushed?
- How can the linking with the semantic layer be supported on the concept/schema level as well as on the instance level?
- How can knowledge extracted from different sources with different tools and perhaps different reference ontologies (interoperability) be merged (semi-)automatically?
- How can extraction technologies for different media (e.g. text and images) be combined and how can the merged extraction results be represented in order to create synergies?
I don’t have much expertise in information extraction or semantic web, but it seems like the challenges stated are quite appropriate for Powerset to look at.