Swoogle

4 min readDec 29, 2020

I am not sure about you might heard about swoogle or not. On recent years swoogle gains more popularity among its community.

What is swoogle?

Swoogle is often called semantic web search engine. Swoogle is a crawler-based indexing and retrieval system for the Semantic Web documents, like RDF or OWL, which then has either .rdf or .owl as its extension. It extracts metadata for each discovered document, and computes relations between documents. The documents are also indexed by using an information retrieval system which can use either character N-Gram or URIs as terms to find documents matching a user’s query or to compute the similarity among a set of documents. In swoogle, one of the properties we can compute is ontology rank, a measure of the importance of a Semantic Web document.

What the Wikipedia says?

“ Swoogle was a search engine for Semantic Web ontologies, documents, terms and data published on the Web. Swoogle employed a system of crawlers to discover RDF documents and HTML documents with embedded RDF content. Swoogle reasoned about these documents and their constituent parts (e.g., terms and triples) and recorded and indexed meaningful metadata about them in its database” .

“Swoogle provided services to human users through a browser interface and to software agents via RESTful web services. Several techniques were used to rank query results inspired by the PageRank algorithm developed at Google but adapted to the semantics and use patterns found in semantic web documents.”

Introduction

Currently, the Semantic Web, in the form of RDF and OWL documents, is essentially a web universe parallel to the web of HTML documents. There is as yet no standard way for HTML (even XHTML) documents to embed RDF and OWL markup or to reference them in a standard way that carries meaning. Semantic Web documents reference one another as well as HTML documents in meaningful ways. This situation makes it appropriate to design and build specialized Internet search engines customized for Semantic Web Documents (SWDs).

“what is the best way to index, digest and cache such systems?”, and “is it possible to create a meaningful rank measure that uses link semantics?”.

The system is intended to support human users as well as software agents and services. At this stage, human users are expected to be semantic web researchers and developers who are interested in accessing, a significant fraction of the RDF and OWL documents found on the web. Software APIs will support programs that need to find SWDs matching certain descriptions, e.g., those containing certain terms, similar to other SWDs, using certain classes or properties, etc.

The system consists of a database that stores metadata about the SWDs, two distinct web crawlers that locate new and modified SWDs, components that compute useful document metadata, components to compute semantic relationships among the SWDs, an N-Gram based indexing and retrieval engine, a simple user interface for querying the system, and agent/web service APIs to provide useful services.

We describe an algorithm, Ontology Rank, inspired by the Page Rank algorithm , that is used to rank hits returned by the retrieval engine. This algorithm takes advantage of the fact that the graph formed by SWDs has a richer set of relations. In other word, the edges in this graph have explicit semantics. Some are defined or derivable from the RDF and OWL languages (e.g., imports, usesTerm, version, extends, etc.) and others by common ontologies (e.g., FOAF’s knows 1 ). We will also present some preliminary results summarizing the characteristics of the portion of the semantic web that our system has crawled and analyzed.

Following are broad uses of Swoogle:

Finding appropriate ontologies.
Finding instance data
Studying the structure of the semantic web

Swoogle Architecture

Swoogle’s architecture has four major components

SWD discovery -The potential SWDs throughout the Web and keep up-to date information about SWDs.
Metadata creation - The metadata creation component caches a snapshot of a SWD and generates objective metadata about SWDs in both syntax level and semantic level.
Data analysis -The data analysis component uses the cached SWDs and the created metadata to derive analytical reports, such as classification of SWO and SWDB, rank of SWDs, and the IR index of SWDs.
Interface -The interface component focuses on providing data service to the Semantic Web community.

This architecture is data centric and extensible: different components work on different tasks independently.

Reference: https://books.google.lk/books?id=-CcmoI2b2Z8C&pg=PA147&lpg=PA147&dq=swoogle&source=bl&ots=Y2AFOd4IPc&sig=ACfU3U3hU_nA9BMDBTDfdymBvjkL1ndGBg&hl=en&sa=X&ved=2ahUKEwjqg8nz_vHtAhX-yzgGHe_kCEkQ6AEwBnoECAcQAg#v=onepage&q=swoogle&f=false