Prediction of Importance of Figures in Scholarly Papers

figure_top

This paper shows that the importance of a figure in scholarly papers can be predicted by a machine learning technique. The growing number of scholarly papers makes it difficult to keep pace with researches. The current scholarly paper format of the ACM / IEEE long paper contains eight pages. These eight pages include details that may not be necessary for most readers. In fact, even before we read the details, we try to obtain a vague understanding of the entire content by browsing through the eight pages. To address this issue, summarization techniques have been explored. However, these techniques are focused on texts, thus there is room for research to make the paper easier to read by summarizing the figure of the paper. A wide variety of figures can be found in scholarly papers. They include figures that depict the overview of a paper or highly contextualized figures that can be understood only after reading the detailed text. Therefore, selecting important figures is a key issue in the summarization of scholarly papers. This paper shows that a figure that should be presented first to the readers can be selected based on a comparison of the sizes, page numbers or color features of the figures. We also described how our result can be applied in more practical cases on searching, exploring and serendipitious encounter of digital documents.

figure_bottom

論文をカード形式にまとめる技術の1つを紹介する。現在の学術論文は数ページ以上の画像と文章から構成される。論文の出版数は年々増加しており専門家でも全てのページに目を通すのは不可能である。実際には論文を読む前にタイトル、要約(アブストラクト)、画像などの情報から論文の概要を把握する。このプロセスは読む候補となる論文の数だけ実行されるが現在の数ページにわたる論文形式はインタフェースとして適切ではない。そのため論文から画像と文章を抽出しカード形式にまとめる研究が進められている。本研究では論文の画像が要約に適切か判別する機械学習手法を紹介する。

References