Open PhD Theses

If you are looking for a PhD thesis proposal, the following is a list of open PhD thesis topics that are available under the advice of Prof. Andrea G. B. Tettamanzi within the WIMMICS research team of INRIA Sophia Antipolis - Méditerranée and I3S (CNRS and Université Nice Sophia Antipolis).

Active Learning for Axiom Discovery

Context

The rise of deep learning has been made possible by the availability of large computing powers at affordable prices. Its performance has made it possible to solve problems that previously seemed out of reach, especially in the field of perception, such as computer vision and the analysis of natural language texts. However, tasks that require reasoning still require knowledge that is symbolically represented. Substantial breakthroughs can be achieved by combining deep learning with symbolic reasoning. What hinders these developments is mainly the cost of building knowledge bases rich not only in factual information, which is relatively abundant and easy to capture, but also in rules, constraints, and relationships (in summary, axioms) that make it possible to infer implicit knowledge by reasoning.

The definition of the standards that collectively go under the name of “semantic Web” has provided a technological framework to produce open data as well as to define vocabularies and ontologies to make those data interoperable. Nowadays, a huge mass of machine-readable knowledge is available on the semantic Web, which opens up enormous opportunities for research. An obvious thing to do is to analyze it and learn new knowledge from it. Potential applications range from bio-informatics to computational finance.

Objectives

The main goal of this thesis is to combine symbolic reasoning and active learning to make the automatic discovery of axioms possible, thus helping to overcome the knowledge acquisition bottleneck, while radically changing the way we look at the semantic Web: instead of postulating an a priori conceptualization of reality (i.e., an ontology) and requiring that our knowledge about facts complies with it, we propose to start from collected observations about facts and learn an ontology which is able to account for them.

Discovering new axioms from a knowledge base containing both axioms (background knowledge) and assertions (facts) may be regarded as a sort of generate and test procedure, whereas candidate axioms are generated following some heuristics and then tested to determine whether they are compatible with the facts recorded in the knowledge base and consistent with background knowledge.

The main problem is that testing a candidate axiom requires reasoning with the knowledge base plus the axiom, which can be computationally very expensive. Therefore, testing every candidate axiom would be prohibitive. A way to overcome this problem is to learn a model capable of predicting whether a candidate axiom will fit the knowledge base or not, as a surrogate for reasoning. Reasoning, however, remains an option which can be used, every once in a while, as an “oracle” to classify those candidate axioms for which the trained model has a hard time to make a reliable prediction. This looks like a perfect scenario for applying active learning. Indeed, the intuition behind active learning is that a machine learning algorithm with few labeled data can improve its result if it is allowed to choose which data to use during the learning process. The general context can be described as follows: (i) the learner has few labeled data which it uses to construct an initial model, (ii) a large set of unlabeled data is also available, (iii) an “oracle” can be asked to associate labels to some unlabeled data. The main problem is then to determine when and to which data the learner will ask a label, the aim being to revise the model in order to improve it. In this thesis, the reasoner will take the place of the “oracle” and the new axioms to be tested will take the place of the set of unlabeled data.

Organization

The thesis will be carried out in the SPARKS research group of the I3S Laboratory, located in the technology park of Sophia Antipolis, on the French Riviera/Côte d’Azur, famous worldwide as a landmark of science, invention, innovation, and research.

The thesis will be jointly supervised by Andrea G. B. Tettamanzi and Célia da Costa Pereira.

Bibliographical References

  1. Thu Huong Nguyen, Andrea G. B. Tettamanzi: Learning Class Disjointness Axioms Using Grammatical Evolution. EuroGP 2019: 278-294
  2. Duc Minh Tran, Claudia d'Amato, Binh Thanh Nguyen, Andrea G. B. Tettamanzi: Comparing Rule Evaluation Metrics for the Evolutionary Discovery of Multi-relational Association Rules in the Semantic Web. EuroGP 2018: 289-305
  3. Dario Malchiodi, Andrea G. B. Tettamanzi: Predicting the possibilistic score of OWL axioms through modified support vector clustering. SAC 2018: 1984-1991
  4. Dario Malchiodi, Célia da Costa Pereira, Andrea G. B. Tettamanzi: Predicting the Possibilistic Score of OWL Axioms Through Support Vector Regression. SUM 2018: 380-386
  5. Andrea G. B. Tettamanzi, Catherine Faron-Zucker, Fabien Gandon: Possibilistic testing of OWL axioms against RDF data. Int. J. Approx. Reasoning 91: 114-130 (2017)
  6. Duc Minh Tran, Claudia d'Amato, Binh Thanh Nguyen, Andrea G. B. Tettamanzi: An evolutionary algorithm for discovering multi-relational association rules in the semantic web. GECCO 2017: 513-520
  7. Claudia d'Amato, Andrea G. B. Tettamanzi, Duc Minh Tran: Evolutionary Discovery of Multi-relational Association Rules from Ontological Knowledge Bases. EKAW 2016: 113-128
  8. Claudia d'Amato, Steffen Staab, Andrea G. B. Tettamanzi, Duc Minh Tran, Fabien L. Gandon: Ontology enrichment by discovering multi-relational association rules from ontological knowledge bases. SAC 2016: 333-338
  9. Andrea G. B. Tettamanzi, Catherine Faron-Zucker, Fabien L. Gandon: Dynamically Time-Capped Possibilistic Testing of SubClassOf Axioms Against RDF Data to Enrich Schemas. K-CAP 2015: 7:1-7:8
  10. Andrea G. B. Tettamanzi, Catherine Faron-Zucker, Fabien L. Gandon: Testing OWL Axioms against RDF Facts: A Possibilistic Approach. EKAW 2014: 519-530
  11. Burr Settles. Active learning literature survey. Technical report, University of Wisconsin Madison Department of Computer Sciences, 2009

Learning Ontologies from Linked Open Data

Context

The semantic Web has come of age. The first massive deployment of its concept is what is called the Linked Open Data (LOD). Yet, LOD covers but the data layer of the semantic Web, whose data model is the resource definition framework (RDF). As of today, billions of RDF triples are available on the Web. They can be queried by means of a specialized query language, SPARQL, through a number of SPARQL endpoints.

The existence of the LOD opens up enormous opportunities for research. LOD is often considered as a giant knowledge base, not just raw data. Now that LOD has reached the status of "big RDF data", an obvious thing to do is to analyze it and extract "smart data" from it, i.e., learn new knowledge from it.

The common approach to the semantic Web places a strong emphasis on a principled conceptual analysis of a domain of interest leading to the construction or reuse of ontologies (written in OWL 2, a knowledge representation language specifically designed for the semantic Web) as a prerequisite step for the organization of the LOD, much like a database schema must be designed before a database can be populated, in the wake of a time-honored tradition of knowledge engineering in artificial intelligence.

This approach has some limitations: it is aprioristic and dogmatic in the way knowledge should be organized; while it is quite successful when applied to specific domains, it does not scale well to more general settings; it does not lend itself to a collaborative effort; etc.

Objectives

The aim of this research is to radically change the way we look at the semantic Web: instead of postulating an a priori conceptualization of reality (i.e., an ontology) and requiring that our knowledge about facts complies with it, we propose to start from collected observations about facts and learn an ontology which is able to account for them. This is in many respects similar to how modern Science proceeds, trying to discover natural laws that explain empirical observations. The main research question addressed in this PhD thesis is therefore

How can we learn OWL 2 ontologies from RDF data in an open world?

In a sense, ontology learning may be classified as a special case of knowledge discovery from data (KDD) or "data mining" (in fact, we might refer to it as "RDF mining"), where the data are in the form of RDF triples and knowledge is to be described in terms of OWL 2 axioms. This offers the opportunity of building upon the large body of methods and techniques that have been proposed in the literature on KDD, including inductive logic programming and statistical relational learning. However, at the same time its scope is larger, for its objective is not just to find "actionable" patterns, but something more general, such as a schema or an explanation of the recorded facts.

Discovering axioms from a finite, albeit huge, set of known facts also raises philosophical problems that touch upon the discipline of epistemology. The task may be regarded as a form of inductive reasoning, in that it proceeds from particular instances of concepts and relations (the RDF triples) to broader generalizations (the OWL 2 axioms). Karl Popper is the one philosopher who has made the most valuable contribution to our understanding of the problem of induction, by proposing the principle of falsifiability, which lies at the foundation of his critical rationalism and of the modern scientific method: all knowledge is provisional, conjectural, hypothetical; we can never finally prove our scientific theories, we can merely (provisionally) corroborate or (conclusively) refute them. Popper solved the problem of induction by describing our never-ending quest for knowledge as an evolutionary process, whereby theories are generated through a process which lies outside Logic and are then subjected to rigorous and severe tests; only theories that manage to successfully overcome such tests survive. In a sense, this is the continuation of a natural process, namely natural selection, in which theories are embodied in organisms and organisms who follow "incorrect" theories get extinct.

We propose, therefore, to draw inspiration from Popper's evolutionary approach to epistemology and to imitate it by using an evolutionary algorithm to explore the set of OWL 2 axioms in search for the ones that are best suited to describe the recorded RDF facts.

In addition, since even the most comprehensive RDF repository contains a finite set of facts, while the real world is virtually infinite, only a tiny fraction of reality is captured by RDF facts. Therefore, the knowledge base represented by an RDF repository is incomplete and the open-world hypothesis must hold. Moreover, given the heterogeneous and collaborative character of the LOD, some facts recorded in an RDF repository may be erroneous; therefore, epistemological uncertainty of axioms extracted from a set of RDF facts must be accounted for (i.e., represented and dealt with). Various mathematical formalisms for dealing with uncertainty will be critically compared in this context. Among them, we may cite probability theory, Dempster-Shafer theory of evidence (and its elaborations, such as the transferable belief model), and possibility theory.

Organization

The thesis will take place in the WIMMICS research team of the I3S laboratory, specializing in Semantic Web models and technologies and Artificial Intelligence, under the joint supervision of Andrea G. B. Tettamanzi and Catherine Faron Zucker.

The research activities will proceed according to the following plan:

  1. Familiarization with semantic web and LOD models and technologies
  2. State of the art on ontology learning, especially with inductive logic programming approaches
  3. Development of a method for OWL2 axiom evaluation
  4. Development of an evolutionary algorithm for OWL2 ontology learning
  5. Experiments and validation of the proposed methods

Bibliographical References

  1. Alexandre Delteil, Catherine Faron-Zucker, and Rose Dieng. "Learning ontologies from RDF annotations". IJCAI'2001 Workshop on Ontology Learning, CEUR-WS.org, 2001.
  2. Daniel Fleischhacker, Johanna Völker, and Heiner Stuckenschmidt. "Mining RDF data for property axioms". OTM 2012.
  3. Sebastian Hellmann, Jens Lehmann, and Sören Auer. Learning of owl class descriptions on very large knowledge bases. Int. J. Semantic Web Inf. Syst., 5(2):25–48, 2009.
  4. Jens Lehmann. Dl-learner: Learning concepts in description logics. Journal of Machine Learning Research, 10:2639–2642, 2009.
  5. Alexander Maedche and Steffen Staab. "Ontology learning". In Handbook on Ontologies, International Handbooks on Information Systems, pages 173–190. Springer, Berlin, 2004.
  6. S. Muggleton, L. De Raedt, D. Poole, I. Bratko, P. Flach, K. Inoue, and A. Srinivasan. ILP turns 20: Biography and future challenges. Machine Learning, 86:3–23, 2012.
  7. Karl Popper. Logik der Forschung. Verlag von Julius Springer, Vienna, 1935
  8. Karl Popper. Objective Knowledge: An Evolutionary Approach. Oxford University Press, Oxford, 1972.
  9. Gerd Stumme, Andreas Hotho, and Bettina Berendt. Semantic web mining: State of the art and future directions. Journal of Web Semantics, 4(2):124–143, 2006.
  10. Andrea Tettamanzi and Marco Tomassini. Soft Computing: Integrating evolutionary, neural, and fuzzy systems. Springer, Berlin, 2001.

[English version]

Une approche de l'ingénierie des exigences basée sur un modèle BDI possibiliste

Le Web s'est avéré être un catalyseur d'innovations en terme d'architecture, de protocoles, de paradigmes et de langages de programmation qui ont profondément modifié notre façon de construire et d'utiliser les applications informatiques. Il a donné un essor formidable au travail collaboratif à distance et des outils toujours plus nombreux sont apparus comme les forums, les blogs ou les wikis. Dans le domaine du développement logiciels, les plate-formes de développement collaboratif en ligne, telles que Launchpad, ou celles de recueil des besoins des utilisateurs, comme OneDesk, sont des exemples d'initiatives dans lesquelles les utilisateurs finaux des applications informatiques sont de plus en plus nombreux à exprimer librement leurs avis et leurs besoins. Simultanément, les approches de collaborat (crowdsourcing), telles que la finance participative (crowdfunding), les votes participatifs (crowdvoting) ou les blogs participatifs (blogsourcing) par exemple, se sont multipliées. Il est désormais possible de tirer partie de l'accès facile à un grand nombre de personnes pour atteindre un objectif donné.

Dans ce contexte, ce travail de thèse aborde la question suivante : « comment exploiter la richesse des données et interactions rendues disponibles sur le Web afin de capturer et de traduire les besoins des utilisateurs en terme de solutions techniques et, plus précisément, comment mettre à profit le Web pour gérer ces exigences ? »

Les plate-formes de développement collaboratif en ligne permettent le partage d'un nombre conséquent d'artefacts variés supportant le processus d'ingénierie des besoins. Ces artefacts, des moins structurés, tels que les fils de discussion, à ceux qui le sont plus, comme les user stories, nécessitent d'être organisés. Des outils sont nécessaires pour permettre aux différentes parties prenantes d'accéder à ces exigences nombreuses et de les manipuler, tout au long du cycle de développement logiciel, tout en assurant la cohérence de l'ensemble.

Dans ce contexte, nous proposons une approche qui consiste à établir une correspondance entre un modèle BDI (beliefs-desires-intentions) basé sur la théorie des possibilités développé récemment dans le cadre de la théorie des agents cognitifs [1] et le processus d'ingénierie des exigences [5]. Cette correspondance sera exploitée pour modéliser formellement les décompositions et les dépendances entre exigences et pour formuler une méthode de résolution des conflits entre les exigences de différents utilisateurs et de prioritisation de ces dernières.

Les croyances du modèle BDI représentent les hypothèses sur le problème concerné par le logiciel en cours de développement, à savoir les connaissances sur le domaine métier (telles que les règles de gestion) et le contexte du projet de développement [6]. Des approches logiques pour la révision et la fusion des croyances, parmi lesquelles des approches basées sur la théorie de l'argumentation [3], seront employées pour fusionner des croyances provenant de sources hétérogènes, afin d'obtenir un ensemble cohérent d'exigences.

Les besoins exprimés par les différentes parties prenantes correspondent aux désirs du modèle BDI. Une caractéristique distinctive des désirs est qu'ils peuvent être contradictoires et incohérents. De plus, le modèle BDI nous apprend que les désirs peuvent avoir des degrés de justification différents, qui sont le résultat d'un processus de délibération.

Les intentions (ou les buts) du modèle BDI sont un sous-ensemble de désirs cohérent, en soi et avec les croyances. Dans le contexte de l'ingénierie des besoins, les buts représentent les exigences qui ont été déduites des besoins librement exprimés par les parties-prenantes. C'est là où les méthodes développées dans le cadre du modèle BDI peuvent intervenir pour permettre la résolution des conflits et la sélection d'un ensemble cohérent maximale d'exigences tenant compte à la fois des besoins exprimés par les parties prenantes (désirs) et des éléments de contexte du projet et du domaine métier (croyances) [2].

Un premier pas dans cette direction a été réalisé à l'aide de la théorie de l'argumentation et a consisté à modéliser formellement les décompositions et les dépendances qui peuvent exister entre des exigences exprimées sous forme de but [4]. La théorie de l'argumentation permet ainsi de mettre en évidence des sous-ensembles cohérents d'exigences ainsi que leur couplage avec les différentes catégories d'utilisateurs.

Le projet Linking Open Data (LOD), ayant pour objectif la construction du web de données ouvertes et liées grâce à la mise à disposition sur le web en RDF de diverses sources de données ouvertes et l'ajout de liens RDF entre les données issues de ces différentes sources, rend aujourd'hui possible l'enrichissement de connaissances sur un sujet donné par un ensemble de connaissances disponibles en ligne. Dans le cadre de l'ingénierie des exigences, cet enrichissement rend possible le raisonnement sur des connaissances a priori pour la plupart peu formalisées (user stories, blogs, ...). Les techniques et langages du Web sémantique [7] permettront d'agréger différentes sources et domaines d'information afin d'étendre les possibilités de recherche d'informations et/ou de connexions ; la théorie des agents cognitifs et la théorie de l'argumentation seront utilisées pour raisonner sur ces ensembles de connaissances étendues.

L'objectif final de ce travail de recherche est de proposer un outil d'aide à la prise de décision concernant la négociation et le prioritisation des exigences. Cet outil devra être capable de fournir aux personnes responsables de développement logiciel, à partir des artefacts nombreux et variés exploités tout au long du processus de développement d'un logiciel, des instruments de pilotage des exigences au travers d'une vision globale, cohérente et actualisée du logiciel en cours de développement.

Organisation

La thèse se deroulera au sein de l'équipe WIMMICS du Laboratoire I3S et sera coencadrée par Andrea G. B. Tettamanzi et Isabelle Mirbel.

Références bibliographiques

  1. Célia da Costa Pereira, Andrea G. B. Tettamanzi. « An Integrated Possibilistic Framework for Goal Generation in Cognitive Agents ». AAMAS 2010.
  2. Célia da Costa Pereira, Andrea G. B. Tettamanzi. « Belief-Goal Relationships in Possibilistic Goal Generation ». ECAI 2010.
  3. Célia da Costa Pereira, Andrea G. B. Tettamanzi, Serena Villata. « Changing One's Mind: Erase or Rewind? Possibilistic Belief Revision with Fuzzy Argumentation Based on Trust ». IJCAI 2011.
  4. Isabelle Mirbel, Serena Villata. « Enhancing Goal-based Requirements Consistency: an Argumentation-based Approach ». CLIMA 2012.
  5. Klaus Pohl. Requirements Engineering: Fundamentals, Principles, and Techniques. Springer, 2011.
  6. Isabelle Mirbel. « A polymorphic context frame to support scalability and evolvability of information system development processes ». ICEIS 2004.
  7. Fabien Gandon, Catherine Faron-Zucker, Olivier Corby. Le web sémantique : Comment lier les données et les schémas sur le web ? Dunod, 2012.

[Version française]

An Approach to Requirement Engineering Based on a Possibilistic BDI Model

The Web has proven to be a catalyst for innovation in terms of architecture, protocols, paradigms, and programming languages ​​that have profoundly changed the way we build and use computer applications. It has given a tremendous boost to remote collaborative work and more and more tools have emerged, such as forums, blogs, or wikis. In the field of software development, online platforms for collaborative development, such as Launchpad, or for collecting user requirements, such as OneDesk, are examples of initiatives in which the end users of IT applications are more likely to freely express their views and needs. At the same time, crowdsourcing approaches, such as crowdfunding, crowdvoting, or blogsourcing, for example, have proliferated. It is now possible to take advantage of easy access to a large number of people to achieve a given objective.

In this context, this thesis addresses the following question: "How to exploit the wealth of data and interactions made ​​available on the Web to capture and translate the needs of users in terms of technical solutions, specifically, how to use the Web to manage these requirements?"

Platforms for collaborative online development make it possible to share a large number of varied artifacts supporting the requirements engineering process. These artifacts, from the less structured, like conversation threads, to the more structured, like user stories, need to be organized. Tools are needed to enable stakeholders to access these numerous requirements and manipulate them throughout the software development lifecycle, while ensuring the coherence of the whole.

In this context, we propose an approach which consists in establishing a correspondence between a beliefs-desires-intentions (BDI) model based on possibility theory recently developed in the framework of the theory of cognitive agents [1] and the requirements engineering process [5]. This correspondence will be leveraged to formally model the decomposition and dependencies between requirements and to formulate a method for conflict resolution among the requirements of different users and for their prioritization.

The beliefs of the BDI model represent the assumptions about the problem concerned by the software under development, namely knowledge about the business domain (such as rules) and the context of the development project [6]. Logical approaches to revision and fusion of beliefs, including approaches based on argumentation theory [3] will be used to merge beliefs from heterogeneous sources and obtain a consistent set of requirements.

The needs expressed by the various stakeholders correspond to the desires of the BDI model. A distinctive feature of desires is that they can be contradictory and inconsistent. In addition, we know from the BDI model that desires can have different degrees of justification, which are the result of a deliberative process.

The Intentions (or goals) of the BDI model are a subset of desires, which is consistent both internally and with beliefs. In the context of requirements engineering, goals represent the requirements deduced from the needs freely expressed by the stakeholders. This is where the methods developed in the framework of the BDI model can intervene to enable conflict resolution and selection of a maximal consistent set of requirements, taking into account both the needs expressed by stakeholders (desires) and contextual elements of the project and the business domain (beliefs) [2].

A first step in this direction has been carried out using argumentation theory to formally model the decomposition and dependencies that may exist between requirements expressed as goals [4]. Argumentation theory allows to identify consistent subsets of requirements as well as their coupling with the different categories of users.

The Linking Open Data project (LOD), which aims at building the web of open linked data by making various sources of open data available on the Web in RDF format and adding RDF links between data from these different sources, now makes it possible to enrich knowledge on a given subject with knowledge available online. Within the framework of requirements engineering, such an enrichment makes it possible to reason about a priori knowledge, usually little formalized (user stories, blogs, ...). Semantic Web techniques and languages ​​[7] will aggregate different sources and areas of information in order to expand the possibilities of information and/or connection search; the theory of cognitive agents and argumentation theory will be used to reason about these sets of extended knowledge.

The ultimate goal of this research is to propose a decision support system for the negotiation and prioritization of requirements. This tool should be able to provide those in charge of software development, based on many and varied artifacts exploited throughout the software development process, with instruments for managing requirements, thanks to a global, coherent, and state-of-the-art vision of the software under development.

Organization

The thesis will take place in the WIMMICS research team of the I3S laboratory under the joint supervision of Andrea G. B. Tettamanzi and Isabelle Mirbel.

References

  1. Célia da Costa Pereira, Andrea G. B. Tettamanzi. "An Integrated Possibilistic Framework for Goal Generation in Cognitive Agents". AAMAS 2010.
  2. Célia da Costa Pereira, Andrea G. B. Tettamanzi. "Belief-Goal Relationships in Possibilistic Goal Generation". ECAI 2010.
  3. Célia da Costa Pereira, Andrea G. B. Tettamanzi, Serena Villata. "Changing One's Mind: Erase or Rewind? Possibilistic Belief Revision with Fuzzy Argumentation Based on Trust". IJCAI 2011.
  4. Isabelle Mirbel, Serena Villata. "Enhancing Goal-based Requirements Consistency: an Argumentation-based Approach". CLIMA 2012.
  5. Klaus Pohl. Requirements Engineering: Fundamentals, Principles, and Techniques. Springer, 2011.
  6. Isabelle Mirbel. "A polymorphic context frame to support scalability and evolvability of information system development processes". ICEIS 2004.
  7. Fabien Gandon, Catherine Faron-Zucker, Olivier Corby. Le web sémantique : Comment lier les données et les schémas sur le web ? Dunod, 2012.