Zaman

We are editing a book on Artificial Intelligence for Cybersecurity.

The new book will be entitled (tentatively)
Artificial Intelligence in Cybersecurity: The State of the art
Publisher: IOS-Press, Series: Frontiers in Artificial Intelligence and Applications

and we are requesting book chapter proposals.

The scope is any research that includes both artificial neural networks (and deep learning) and Cybersecurity; see e.g. https://www.computer.org/publications/tech-news/trends/the-use-of-artificial-intelligence-in-cybersecurity

A book chapter shall be an overview of a line of work by the chapter authors based on two or more related publications in quality conferences or journals. The intention is that an extensive collection of such chapters will provide an overview of the whole field.

To contribute to the book, please provide a brief book chapter proposal to ksarker@bowiestate.edu consisting of the following:

Title of the chapter
List of chapter authors
A brief abstract (one paragraph)
The approximate number of pages
The list of already published conference or journal papers the chapter will be based on

We will notify contributors within three weeks, whether their chapter will be included.

Further, please take note of the following:

We will do a light cross-review for feedback (since the material is based on already peer-reviewed publications)
Each contributing author will have to be available to do light review at most one other chapter within four weeks.

We expect the publication of the book by the end of 2023.

We are looking forward to your contribution!

Editors
Dr. Kaushik Roy
Professor and Interim Chair
Department of Computer Science
Director, Center for Cyber Defense
Director, Cyber Defense and AI Lab
North Carolina A&T State University

Co-editor:
Dr. Kishor Datta Gupta
Asst prof, Clark Atlanta University

Co-editor:
Dr. Md Kamruzzaman Sarker
Asst prof, Bowie State University

Knowledge graph or Ontology is making it easier to grasp information. When we search google for anything, we see the well-structured information on the right side of the page. For example, if we search google for Albert Einstein, then it shows:

Which has the facts or axiom like

Alberts Einstein ---- Born ---- March 14, 1879, Ulm, Germany
Alberts Einstein ---- Education ---- University of Zurich (1905), ETH Zürich (1896–1900)

These facts are the outcome of using a Knowledge Graph or more technically an Ontology. Google has developed its knowledge graph, by scrapping information from online and many from Wikipedia.

Wikipedia is the largest hub of open information. Wikipedia’s articles are assigned to various categories (https://en.wikipedia.org/wiki/Category:Main_topic_classifications) according to their relatedness. For example, the article of Albert Einstein is categorized as German Inventors (there are other categories also). One of the parent categories of German Inventors is Inventors by Nationality.

I was looking for an open-source knowledge graph close to the Wikipedia category, which I can use off-the-shelf. I found DBpedia has a Wikipedia category hierarchy using skos:broader terms, but not the exact one I was looking for. So I had to make the Wikipedia hierarchy knowledge graph from scratch.

I found 2 approaches to solve this. Scrap the Wikipedia hierarchy or Use the data dump of Wikipedia

I did the scrapping of Wikipedia, starting from the main category and then looking through its categories, and pages until there are no pages left. But this was a time-consuming process, as the program, had to visit each category page to find its children and subsequently their children.

As the scrapping process was time-consuming, and I needed to make it reproducible, I opted for the data dump: http://dumps.wikimedia.org/enwiki/latest/. It has all the information we need in SQL format. Among the dumps, 2 tables are the point of interest to me, page information and category information. These are stored on the page table and categorylinks table.

Page/Articles:
Download: http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-page.sql.gz
Information on page table: https://www.mediawiki.org/wiki/Manual:Page_table.
This table has around 49 million entries, as of January 20, 2020.

CategoryLinks:
Download: http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-categorylinks.sql.gz
Information on the category table: https://www.mediawiki.org/wiki/Manual:Categorylinks_table
This table has around 140 million entries, as of January 20, 2020.

Page table gives us information, about page_id and title. It also provides the page type, by column page_namespace. For category page, page_namespace is 14 and for article page, page_namespace is 0.

Categorylinks table provides the actual hierarchical information. It’s columns cl_from and cl_to hold the relation, where cl_from is the article page/subcategory and cl_to is the category/parent_category respectively.

Let’s see some examples:
If we want to get the page id of the article, Albert Einstein, we can get it by executing the SQL command.

select page_id, page_title, page_namespace from page where page_title='Albert_Einstein' and page_namespace=0;

Then, if we want to get the categories of this page, we can get those by executing

Select cl_from, cl_to from categorylinks where cl_from=736;

This returns around 148 rows. A snippet of it:

We can see, that it has the German_inventors also as the category.

To get the parent category of German_Inventors, we can search for the page_id of the German_Inventors page.

select page_id, page_title, page_namespace from page where page_title='German_inventors' and page_namespace=14;

After getting the page_id, we can look for the parent category of this.

Select cl_from, cl_to from categorylinks where cl_from=1033282;

We need to continue this back-and-forth computation until we get all the categories hierarchy.

Making concrete Knowledge graph

Using OWLAPI or Apache Jena library or Owlready2 we can easily make a concrete knowledge graph. Wikipedia hierarchy has cycles. For example, it has information such as,

1949_establishments_in_Asia childCategoryOf 1949_establishments_in_India
and
1949_establishments_in_India childCategoryOf 1949_establishments_in_Asia

which, creates a cyclic relation. Owlready2 library treats a concept as a python class, and python class supports inheritance and no cycle. so, Owlready2 can not handle this, as of January 20, 2020. OWLAPI and Jena library can support this. Here is the code to create a single fact/axiom using OWLAPI.

void createRelation(String childName, String parentName) {
    IRI cIRI = IRI.create(onto_prefix + beautifyName(childName));
    IRI pIRI = IRI.create(onto_prefix + beautifyName(parentName));
    OWLClass cClass = owlDataFactory.getOWLClass(cIRI);
    OWLClass pClass = owlDataFactory.getOWLClass(pIRI);
    OWLAxiom owlAxiom = owlDataFactory.getOWLSubClassOfAxiom(cClass, pClass);
    owlOntologyManager.addAxiom(owlOntology, owlAxiom);
}

After making the Knowledge Graph, it has,

Total axioms: 7864012
Total classes: 1901708
Total subclassOf axioms: 5962304

Here is a screenshot of the knowledge graph

Knowledge Graph from Wikipedia Hierarchy

Complete Knowledge graph, can be downloaded from here
Making a knowledge graph is fun!!!

Thanks:

Author: Zaman

Call for book chapter proposals of Artificial Intelligence in Cybersecurity: The State of the art

Knowledge Graph from Wikipedia Category

US2TS-2019 trip report