Semi-Open Attribute Extraction from Chinese Functional Description Text

li zhang (UCI); Yanzeng Li (Peking University)*; Ruoyu Zhang (Peking University); Wenjie Li (Wangxuan Institute of Computer Technology, Peking University)


Attribute extraction is a task to identify the attribute and the corresponding attribute value from unstructured text, which is important for extensive applications like web information retrieval and the recommended system. The traditional relation extraction-based methods or joint extraction-based systems are often perform attribute classify based on subject and attribute-value pairs, and extract the attribute triples in the scope of ontology schema categories, which is in the assumption of the close-world and cannot satisfy the diversity of attributes. In this work, we propose a semi-open information extraction system for attribute extraction in a multi-component framework. With the proposed semi-open attribute extraction system (SOAE), more attribute-value pairs can be discovered by extracting literal triples without the limitation of pre-defined ontology. An additional co-trained ontology-based attribute extraction model is appended as a component following the assumption of the partial-closed world (PCWA), remission the performance degradation of SOAE caused by missing of the literal predicate in raw text and contribute to extract richer attribute triples and construct more dense knowledge graph. For evaluating the performance of the attribute extraction system, we construct a Chinese functional description text dataset CNShipNet and conduct experiments on it. The experimental results demonstrate that our proposed approach outperforms several state-of-the-art baselines with a large margin.