AI Case Study

Researchers from the National Institute for Materials Science Japan automate creation of knowledge graphs regarding the materials design process using natural language processing

Researchers from the National Institute for Materials Science (NIMS) Japan and Toyota Technological Institute developed a natural language processing method for extracting key concepts from academic journals to summarise relations between structures, properties and the resulting performance. This is intended to aid materials design research and production.



Construction And Engineering

Project Overview

The researchers developed "a computer-aided material design system to represent and extract knowledge related to material design from natural language texts. A machine learning model is trained on a text corpus weakly labeled by minimal annotated relationship data (~100 labeled relationships) to extract knowledge from scientific articles. The knowledge is represented by relationships between scientific concepts, such as {annealing, grain size, strength}. The extracted relationships are represented as a knowledge graph formatted according to design charts, inspired by the process-structure-property-performance (PSPP) reciprocity. The design chart provides an intuitive effect of processes on properties and prospective processes to achieve the certain desired properties. Our approach is knowledge graph population and graph search. The knowledge graph represents knowledge concerning material design. It consists of factors and their relations following PSPP reciprocity, and a PSPP chart is considered as a part of the graph with factors related to desired properties. Thus in this approach, we first extract the structure of the graph from text (knowledge graph population) and then find a PSPP chart from the graph for a desired material (graph search)."

Reported Results

"We believe the PSPP chart helps more efficient material developments by suggesting a prospective process... The end-to-end system proved our concept, and its relation identification performance was superior to that of other baseline models."


The researchers used a "CNN model for identifying the relation of a factor pair by weakly supervised learning. We leverage weakly supervised learning to identify relationships in the PSPP chart. Weakly supervised learning is used to train a model with a minimal number of annotations for relation identification. In a typical supervised setting, the training data for relation identification is a sentence labeled with entities in the sentence and the relations among them. However, labeling sentences is expensive because an annotator must read a sentence, and label the entities and relations described therein. This renders each label clean and strong. On the contrary, in a weakly supervised setting, a knowledge base produces a pair of entities and their relations and all sentences containing these entities are weakly labeled with the given relation."


R And D

Core Research And Development


"The processing-structure-property-performance (PSPP) reciprocity explains effect of processes on properties in three stages. The first stage is ‘process’ that can be controlled to develop a new material. The second stage is ‘structure’ of the material that the processes build. The third stage is ‘property’ that the structure gives. The properties in the third stage give the total performance of the new material. The PSPP design chart represents end-to-end knowledge in the form of relationships among factors."



"Our corpus consisted of publicly accessible scientific articles on ScienceDirect. ScienceDirect is an Elsevier platform providing access to articles in journals in a variety of fields, such as social sciences and engineering. Approximately 3400 articles were collected using the keyword search on ScienceDirect. The keywords were ‘material’ and ‘microstructure’, i.e. each article was related to both ‘material’ and ‘microstructure’.

The CNN models were trained on a pair of factors and sentences. As described in Section 3.2, each sentence mentions both factors. For the relationship data, about 5000 sentences were founded in the corpus in total, roughly 50 sentences for each pair of factors on average."