Abstract
Causal knowledge is the knowledge of the relationship between the cause and effect of an incident. Causal knowledge plays a critical role in the process of human cognizing the world because it can be used for reasoning and thus influence decision making. The extraction of causality between events or entities can help people understand the sequence and the evolution of information, helping people to predict and make decisions. This kind of causal knowledge discovery is very valuable in many fields, such as finance, medicine, biology, environmental science. At the same time, automatic extraction of causal knowledge is also a crucial step for many natural language processing tasks, such as event prediction, generating future scenarios, question answering, and discourse comprehension. However, due to the ambiguity and diversity of natural language texts, causal knowledge extraction from natural language texts is a challenging open problem in artificial intelligence.
In response to this problem, most of the early attempts used manually constructed linguistic and syntactic rules to extract causal knowledge on small or domain-specific datasets. Although this rule-based method can achieve higher accuracy, its cross-domain applicability is weak, and it requires extensive domain knowledge. With the continuous improvement of computer computing capabilities and the popularity of machine learning techniques, the existing mainstream methods combine rules and machine learning techniques and treat this task in a pipeline manner. They firstly extract candidate causal pairs with rules and then use machine learning algorithms to filter non-causal pairs among candidate pairs. This method does not require too much domain knowledge, but it relies heavily on the manual selection of text features and often requires considerable human effort and time on feature engineering.
To tackle these problems, we formulate causal knowledge extraction as a sequence labeling problem based on deep learning model, which does not use any handcrafted features. Then, we investigate different Bi-LSTM based end-to-end models to directly extract cause and effect, without extracting candidate causal pairs and identifying their relations separately. Besides, to address the tag class imbalance problem in causal sequence labeling, we propose an end-to-end model with Focal Loss as a loss function: Bi-LSTM-Softmax (FL). Experimental results show that the model can effectively enhance the association between cause and effect and thus outperforms the baseline models.
Keywords
Causal Knowledge Extraction, Sequence Labeling, Bi-LSTM Networks, Focal Loss