Deep dive

Understand how the HBP Knowledge Graph works
The mechanisms "under the hood"

What makes the HBP Knowledge Graph run?

The HBP Knowledge Graph provides many tools and APIs trying to hide the complexity of semantic multi-datasource meta-data management. For this to work, there are multiple technical components involved, on which we would like to give you a short tour.

If you have any questions / suggestions / comments, please get in touch with us!

The components

The building blocks of the HBP Knowledge Graph

The HBP Knowledge Graph bases on BlueBrain Nexus which provides a multi-modal solution for an eventual consistent data store.

The components of the HBP KG are originating either from BlueBrain Nexus (blue boxes) or from extensions built and/or integrated by HBP (yellow boxes). The diagram shows write (blue arrows - asynchronous are dotted) and read (green arrows) operations. The arrows describe the directions of the data flow.

BlueBrain Nexus

As part of the BlueBrain Nexus, Apache Cassandra stores an event log of JSON-LD messages and is the primary storage component. The in-built indexing mechanism then ensures the indexing of the JSON-LD into multiple index-databases: Blazegraph is a triple store, elasticsearch is used for full-text queries. Since the indexing mechanism works asynchronously, the databases are eventually consistent.

HBP extensions

To achieve the described solution, additional services were built around the standard Nexus infrastructure:

KG Query API

An additional indexing client normalizes the incoming payload (full qualification), executes inference logic, indexes the data in Arango DB and interprets semantics (e.g. recognizes spatial anchoring payloads and indexes them after a rasterization it in the additional Apache Solr index).

The decision for an additional index has been taken because of the need for a more simple way to traverse the graph and to recombine query results in a client specific way. The direct consequence of this need is the KG Query API which allows to execute semantically unambiguous queries on the data, transparently handles the combination of the spatial search and standard meta-data query and allows reflection, automatic client-code generation and abstraction.

KG Editor

Although nice for scalability, the eventual consistency causes problems for applications such as the KG Editor where postponed updates can lead to confusing states on a reactive UI with data manipulation (it e.g. can happen that changes which were just applied by a user are not yet reflected in the database). The KG Sync API therefore provides a synchronous alternative API primarily created for this use-case: Creations / modifications / deletions are applied to the Arango index directly after they have been transferred in the Nexus API. Therefore, they are immediately reflected in queries of the KG Query API which allows us to provide a responsive UI. The standard indexing process will overwrite this "temporary indexing" after a while.

Import scripts

Automated import scripts (typically written in Python) which load data from a specific source, transform it to the required JSON-LD structures and make use of the Nexus API to upload the data to the Knowledge Graph can be triggered externally. At HBP, we're using a job scheduler who manages these kind of reoccurring jobs.

KG Search

The KG Search is built as a standalone application making use of the HBP Knowledge Graph as its original data source. This does not only reduce the dependency between the systems and allows us to scale the Search component independently of the HBP Knowledge Graph but also is a perfect showcase of how other (external) clients can integrate with the KG.

The KG Search is a standalone application consisting of the Search UI (a React web application), a reverse proxy (for access restriction and dispatching) as well as Elasticsearch (a completely independent instance from the one of Blue Brain Nexus). The underlying data is provided by the KG Search indexer which reads out the data by stored KG Query specifications and stores them in the expected format and granularity expected by the KG Search UI. In addition to the data, the KG Search indexer also queries information about the data structure from the KG Query API (executes a "meta"-query) which is used to generate the elasticsearch mapping file as well as directives for the UI (e.g. about layouting, etc.)

Contact

Any questions?

Find more information depending on:

Or contact us by e-mail: kg-team@humanbrainproject.eu