

{"id":68,"date":"2021-10-29T11:53:11","date_gmt":"2021-10-29T09:53:11","guid":{"rendered":"https:\/\/project.inria.fr\/mikrolog\/?page_id=68"},"modified":"2023-09-27T10:55:06","modified_gmt":"2023-09-27T08:55:06","slug":"presentation","status":"publish","type":"page","link":"https:\/\/project.inria.fr\/mikrolog\/","title":{"rendered":"MiKroloG: The Microdata Knowledge Graph"},"content":{"rendered":"\n<h2 class=\"has-text-align-left wp-block-heading\">Context<\/h2>\n\n\n\n<p class=\"has-text-align-left\">Searching the web has changed our daily lives; documents on the web containing a list of keywords can be found in a snap. Then, users wanted to find <a href=\"https:\/\/blog.google\/products\/search\/introducing-knowledge-graph-things-not\/\" data-type=\"URL\" data-id=\"https:\/\/blog.google\/products\/search\/introducing-knowledge-graph-things-not\/\">Things, not strings<\/a>. Thanks to knowledge graphs (KG), Searching the web has changed our daily lives; documents on the web containing a list of keywords can be found in a snap. However, keyword searches often return many irrelevant documents, pushing users to refine their keyword list following a trial-and-error process.&nbsp;Then, users wanted to find Things, not strings. \u201cParis\u201d as a string refers to the capital of France, but also a city in Canada, in Arkansas, a movie, a band, a ship, a person, or a&nbsp; manuscript. Considering languages, \u201cParis\u201d means a \u201cbet\u201d in French. Searching for Things means a keyword query returns a Thing or a collection of Things instead of documents. Searching for Things requires a knowledge graph storing entities representing Things, i.e., an instance of a concept such as a person, a place, an event, etc. For example, there is one entity for <a href=\"https:\/\/dbpedia.org\/page\/Paris\" data-type=\"URL\" data-id=\"https:\/\/dbpedia.org\/page\/Paris\">Paris<\/a> as a city and another for Paris as a manuscript of 1844. Thanks to these knowledge graphs, users who request <a href=\"https:\/\/www.google.com\/search?safe=off&amp;rlz=1C5CHFA_enFR877FR877&amp;sxsrf=ALeKk019zAurVWieE8vpql7Am_KwtNUeYg%3A1613227636308&amp;ei=dOYnYIf5EemtgwfTyK_gCQ&amp;q=movies+of+james+cameron&amp;oq=movies+of&amp;gs_lcp=Cgdnd3Mtd2l6EAMYADIGCCMQJxATMgUIABDLATIFCAAQywEyBQgAEMsBMgIIADIFCAAQywEyBQgAEMsBMgIIADICCAAyAggAOgcIABBHELADOgQIIxAnOggIABCxAxCDAToFCAAQsQM6CwgAELEDEMcBEKMCOgQIABBDOggILhCxAxCDAToOCAAQsQMQgwEQxwEQowI6CggAELEDEIMBEEM6BQguELEDOgcIABCxAxBDUICKA1jDmANg_58DaAJwAngAgAGSAYgBiAeSAQQxMC4xmAEAoAEBqgEHZ3dzLXdpesgBCMABAQ&amp;sclient=gws-wiz\">movies of James Cameron<\/a> receive a list of movies where James Cameron and his movies are Things, i.e., entities defined in the KG.&nbsp;<\/p>\n\n\n\n<p>However, searching the web and searching for Things are entirely different. Searching the web offers diversity at the price of noise. Searching for Things delivers exact answers, but we lose diversity. Is there a way to have diversity without noise?<\/p>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<h2 class=\"has-text-align-left wp-block-heading\">Objectives<\/h2>\n\n\n\n<h2 class=\"wp-block-heading\"><a href=\"https:\/\/docs.google.com\/presentation\/d\/1fAhDgP26qWp_Dm3F7nlIIi9RFWVKrQpU\/edit?usp=sharing&amp;ouid=102294686548638158772&amp;rtpof=true&amp;sd=true\">S<strong>lides of cominlabs days 25-27 September 2023<\/strong><\/a><\/h2>\n\n\n\n<p>In MiKroloG, we aim to search the web with Things. As KG returns a collection of entities, such entities may be used to retrieve web pages that refer to them. In this way, it is possible to have both diversity and accuracy. For instance, we may search the websites selling \u201cJames Cameron movies\u201d ordered by price and rating or for experimental data related to the COVID pandemic published by UK public universities ordered by date. These queries first explore existing knowledge graphs to retrieve collections of things like &#8220;UK public universities&#8221; and &#8220;James Cameron movies&#8221;. Then, we explore which &#8220;commercial websites&#8221; or &#8220;university web pages&#8221; refer to these things, i.e., we searched the web with Things.&nbsp;<\/p>\n\n\n\n<p><strong>Searching the web with Things<\/strong> requires a close connection between the web of documents and Knowledge Graphs. Currently, this connection is partially powered by the embedding of microdata in web pages. Half of the web pages integrate microdata describing people, places, organizations, events, products, and drugs following the&nbsp; <a href=\"https:\/\/schema.org\/\" data-type=\"URL\" data-id=\"https:\/\/schema.org\/\">Schema.org<\/a> ontology. This represents billions of facts spread over millions of constantly evolving websites. <a href=\"https:\/\/datasetsearch.research.google.com\/\">Google Dataset Search<\/a> relies on microdata to search for datasets on the web, <a href=\"https:\/\/shopping.google.com\/\" data-type=\"URL\" data-id=\"https:\/\/shopping.google.com\/\">Google Shopping<\/a> relies on microdata to feed its marketplace and search for products.<\/p>\n\n\n\n<p><strong>To search&nbsp; the web with Things<\/strong>, we face three main scientific challenges:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><a href=\"https:\/\/project.inria.fr\/mikrolog\/files\/2023\/09\/Capture-de\u0301cran-2023-09-04-a\u0300-12.03.02.png\"><img loading=\"lazy\" decoding=\"async\" width=\"684\" height=\"432\" src=\"https:\/\/project.inria.fr\/mikrolog\/files\/2023\/09\/Capture-de\u0301cran-2023-09-04-a\u0300-12.03.02.png\" alt=\"\" class=\"wp-image-156\" srcset=\"https:\/\/project.inria.fr\/mikrolog\/files\/2023\/09\/Capture-de\u0301cran-2023-09-04-a\u0300-12.03.02.png 684w, https:\/\/project.inria.fr\/mikrolog\/files\/2023\/09\/Capture-de\u0301cran-2023-09-04-a\u0300-12.03.02-300x189.png 300w, https:\/\/project.inria.fr\/mikrolog\/files\/2023\/09\/Capture-de\u0301cran-2023-09-04-a\u0300-12.03.02-150x95.png 150w\" sizes=\"auto, (max-width: 684px) 100vw, 684px\" \/><\/a><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Users are used to searching using keywords. Transforming a keyword query into a knowledge graph query is difficult, especially for complex queries.<\/li><li>As with traditional web searches, users expect to get ranked results in a snap. It is very challenging to provide top-ranked results for complex queries on large knowledge graphs.<\/li><li>Microdata provides some links between web pages and knowledge graph entities, but these links must be computed by solving the problem of matching microdata to knowledge graph entities. Performing entity matching at a large scale between microdata in web pages and knowledge graph entities is challenging.<\/li><\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Proposal<\/h2>\n\n\n\n<figure class=\"wp-block-image size-full\"><a href=\"https:\/\/project.inria.fr\/mikrolog\/files\/2023\/09\/Capture-de\u0301cran-2023-09-17-a\u0300-23.03.57.png\"><img loading=\"lazy\" decoding=\"async\" width=\"882\" height=\"667\" src=\"https:\/\/project.inria.fr\/mikrolog\/files\/2023\/09\/Capture-de\u0301cran-2023-09-17-a\u0300-23.03.57.png\" alt=\"\" class=\"wp-image-182\" srcset=\"https:\/\/project.inria.fr\/mikrolog\/files\/2023\/09\/Capture-de\u0301cran-2023-09-17-a\u0300-23.03.57.png 882w, https:\/\/project.inria.fr\/mikrolog\/files\/2023\/09\/Capture-de\u0301cran-2023-09-17-a\u0300-23.03.57-300x227.png 300w, https:\/\/project.inria.fr\/mikrolog\/files\/2023\/09\/Capture-de\u0301cran-2023-09-17-a\u0300-23.03.57-768x581.png 768w, https:\/\/project.inria.fr\/mikrolog\/files\/2023\/09\/Capture-de\u0301cran-2023-09-17-a\u0300-23.03.57-150x113.png 150w\" sizes=\"auto, (max-width: 882px) 100vw, 882px\" \/><\/a><\/figure>\n\n\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Context Searching the web has changed our daily lives; documents on the web containing a list of keywords can be found in a snap. Then,\u2026<\/p>\n<p> <a class=\"continue-reading-link\" href=\"https:\/\/project.inria.fr\/mikrolog\/\"><span>Continue reading<\/span><i class=\"crycon-right-dir\"><\/i><\/a> <\/p>\n","protected":false},"author":1754,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-68","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/project.inria.fr\/mikrolog\/wp-json\/wp\/v2\/pages\/68","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/project.inria.fr\/mikrolog\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/project.inria.fr\/mikrolog\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/project.inria.fr\/mikrolog\/wp-json\/wp\/v2\/users\/1754"}],"replies":[{"embeddable":true,"href":"https:\/\/project.inria.fr\/mikrolog\/wp-json\/wp\/v2\/comments?post=68"}],"version-history":[{"count":15,"href":"https:\/\/project.inria.fr\/mikrolog\/wp-json\/wp\/v2\/pages\/68\/revisions"}],"predecessor-version":[{"id":188,"href":"https:\/\/project.inria.fr\/mikrolog\/wp-json\/wp\/v2\/pages\/68\/revisions\/188"}],"wp:attachment":[{"href":"https:\/\/project.inria.fr\/mikrolog\/wp-json\/wp\/v2\/media?parent=68"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}