TXM is a free and open-source cross-platform Unicode,  XML & TEI based text analysis software, supporting Windows, Mac OS X and Linux. It is also available as a J2EE standard compliant portal software (GWT based) for online access with access control built in (see a demo portal: http://portal.textometrie.org/demo).

It offers a comprehensive range of analysis tools (concordances, collocate search, frequency lists, etc.) based on the powerfull CQP full text search engine (http://cwb.sourceforge.net) and a range of statistical tools (factorial analysis, clustering, cooccurrence analysis, etc.) based on R packages (http://www.r-project.org).

It can analyze three types of textual corpora with various source formats:

  • Written texts (possibly aligned to facsimile images): system clipboard content, TXT (raw text), XML, XML-TEI formats
  • Speech transcriptions (synchronized to audio or video): Word/Writer/TXT based, XML-TRS (from Transcriber software) formats
  • Parallel corpora (several languages per corpus): XML-TMX format

It lemmatizes and POS tags all texts on the fly during the import process by using the TreeTagger software.

This description has been contributed by 'Serge Heiden' and appeared first on the Dirt directory. It has been re-used under CC BY 4.0 license.