PREFIX projectfunding: PREFIX site: PREFIX ov: PREFIX sioc: PREFIX xsd: PREFIX rdfs: PREFIX people: PREFIX content: PREFIX dc: PREFIX dcterms: PREFIX void: PREFIX aksw: PREFIX v: PREFIX schema: PREFIX dbpedia: PREFIX rdf: PREFIX dbr: PREFIX sh: PREFIX sysont: PREFIX skos: PREFIX aiiso: PREFIX doap: PREFIX geo: PREFIX dct: PREFIX foaf: PREFIX owl: PREFIX partner: PREFIX projects: PREFIX groups: PREFIX lod2: PREFIX vann: PREFIX sioct: PREFIX : projects:BOA a aksw:AlumniProject, aksw:IncubatorProject; aksw:hookline "BOotstrapping linked datA"; aksw:publicationTag "boa"; aksw:relatedProject projects:DeFacto, projects:LIMES; site:content "# General Overview\n\nMost knowledge sources on the Data Web were extracted from structured or semi-structured data. Thus, they encompass solely a small fraction of the information available on the document-oriented Web. In this paper, we present BOA, an iterative bootstrapping strategy for extracting RDF from unstructured data. The idea behind BOA is to use the Data Web as background knowledge for the extraction of natural language patterns that represent predicates found on the Data Web. These patterns are used to extract instance knowledge from natural language text. This knowledge is finally fed back into the Data Web, therewith closing the loop. We evaluate our approach on two data sets using DBpedia as background knowledge. Our results show that we can extract several thousand new facts in one iteration with very high accuracy. Moreover, we provide the first repository of natural language representations of predicates found on the Data Web.\n\n# Presentation\n\nThe following presentation was held at EKAW 2012 in Galway:\n\n

Extracting Multilingual Natural-Language Patterns for RDF Predicates from Daniel Gerber

\n\nThe following presentation was held at We~KEx at ISWC 2011 in Bonn:\n\n

BOA - Bootstrapping Linked Data from Daniel Gerber

\n\n# Generated Knowledge\n\nThe generated knowledge can be accessed at the [BOA dydra repository][boadydra].\n\n# Library of Natural-Language Representations of Formal Relations\n\nThe results of the BOA approach can be downloaded in form of an Lucene Index. The pattern in this index were derived from applying DBpedia background knowledge on the English Wikipedia. The index was created as follows:\n\n Document doc = new Document();\n doc.add(new Field(\"uri\", mapping.getProperty().getUri(), Field.Store.YES, Field.Index.NOT_ANALYZED));\n doc.add(new Field(\"nlr\", pattern.getNaturalLanguageRepresentationWithoutVariables().trim(), Field.Store.YES, Field.Index.ANALYZED));\n doc.add(new NumericField(\"confidence\", Field.Store.YES, true).setDoubleValue(pattern.getConfidence()));\n writer.addDocument(doc);%%\n\nYou can query the index like this:\n\n Query query1 = new TermQuery(new Term(\"nlr\", searchPhrase));\n Query query2 = NumericRangeQuery.newDoubleRange(\"confidence\", confidenceThreshold, 1D, true, true);\n \n BooleanQuery booleanQuery = new BooleanQuery();\n booleanQuery.add(query1, BooleanClause.Occur.MUST);\n booleanQuery.add(query2, BooleanClause.Occur.MUST);\n \n ScoreDoc[] hits = indexSearcher.search(booleanQuery, 100).scoreDocs;\n \n for (int i = 0; i < hits.length && i < 5; i++) {\n \n System.out.println(indexSearcher.doc(hits[i].doc).get(\"uri\"));\n System.out.println(indexSearcher.doc(hits[i].doc).get(\"nlr\"));\n System.out.println(indexSearcher.doc(hits[i].doc).get(\"confidence\"));\n }\n\nYou can download this index [here][patlibrary]. Keep in mind that you need Lucene in at least Version 3.0. We applied very strict rules during pattern filtering, so very few patterns were actually generated. Also there are no score constrains applied to the patterns contained, leading to very weak patterns inside the index.\n\n[boadydra]: http://dydra.com/daniel-gerber/boa \"BOA dydra repository\"\n[patlibrary]: http://aksw.org/Projects/BOA/files?get=pattern_library_en_dbpedia_wikipedia.tar"^^sysont:Markdown; site:templateOption "minimal"; ov:screenshot ; dcterms:abstract "Most knowledge sources on the Data Web were extracted from structured or semi-structured data. Thus, they encompass solely a small fraction of the information available on the document-oriented Web. BOA is an iterative bootstrapping strategy for extracting RDF from unstructured data."; sioc:feed ; doap:browse ; doap:bug-database ; doap:download-page ; doap:maintainer people:DanielGerber; doap:programming-language "Java"; doap:wiki ; rdfs:label "BOA"; foaf:homepage .