Histograph is a graph-based exploration and crowd-based indexation for multimedia collections. HistoGraph treats multimedia collections as networks. The underlying assumption is simple: if two people are mentioned together in a document, we assume that they may have something to do with each other. Whether or not such a relationship is interesting is in the eye of the beholder. Co-occurrence networks become huge and unwieldy very quickly, which forces us to filter them based on another simple assumption: the more often entities co-occur, the more likely it is that they have a meaningful relationship with each other. We combine these two assumptions with mathematical models (co-occurrence frequencies weighted by tf-idf specificity and Jaccard distances) which allow us to rank the list of co-occurrences.

HistoGraph combines tools like YAGO-AIDA for the automatic detection and disambiguation of named entities - people, places, institutions and dates - with crowd-based annotations. Thanks to the enrichment with DBPedia and VIAF links, histoGraph can handle multilanguage documents flawlessly. By default, every automatically detected entity is pending validation by a human user. HistoGraph is available open source under MIT licence. The application is designed to serve two purposes: To facilitate the non-hierarchical exploration of multimedia collections based on existing metadata and automatic entity detection and the crowd-based indexation of such collections. HistoGraph can handle any digitized text and image documents.

This description has been contributed by 'Shawn Day' and appeared first on the Dirt directory. It has been re-used under CC BY 4.0 license.