Document Atlas
Text Corpora Visualization

About

November 12th, 2007 by admin

This is a utility for visualizing large corpora of text documents. First it identifies relevant semantics based on the documents from the input corpus–this is done using Latent Semantic Indexing. Than the whole corpus is projected onto discovered semantics and positioned on a 2D plane using multidimensional scaling. The user can explore the 2D plane using an intuitive interface. The density of documents is used for generating the background relief in order to make the visualization of documents similar to a map. Keywords describing specific areas are also written on the map. All these features together provide the user with an easier path towards understanding the corpus.

docatlas.png