Prediction of Importance of Figures in Scholarly Papers


This paper shows that the importance of a figure in scholarly papers can be predicted by a machine learning technique. The growing number of scholarly papers makes it difficult to keep pace with researches. The current scholarly paper format of the ACM / IEEE long paper contains eight pages. These eight pages include details that may not be necessary for most readers. In fact, even before we read the details, we try to obtain a vague understanding of the entire content by browsing through the eight pages. To address this issue, summarization techniques have been explored. However, these techniques are focused on texts, thus there is room for research to make the paper easier to read by summarizing the figure of the paper. A wide variety of figures can be found in scholarly papers. They include figures that depict the overview of a paper or highly contextualized figures that can be understood only after reading the detailed text. Therefore, selecting important figures is a key issue in the summarization of scholarly papers. This paper shows that a figure that should be presented first to the readers can be selected based on a comparison of the sizes, page numbers or color features of the figures. We also described how our result can be applied in more practical cases on searching, exploring and serendipitious encounter of digital documents.