######### Tool ###### Synopsis ###### Links ######
>Tool
Semamap, Activist research on Semantic Indexing of texts on social transformation. Semamap is focused on the linguistic structures underlaying large text collections. It has been applied to the results of social gatherings such as ESF Paris 2003, its seminars and Ateliers and to the collection of texts of e-library . It is also a research on the possibilities to do semantic analysis using free software packages and thus being independent from academia or other institutionalized research laboratories. What is released in this version are some visualizations of a latent semantic analysis performed over the texts of the e-library on social transformation and of the seminars and ateliers held during ESF Paris 2003.
Semamap is developed by Alejandra Perez Nunez in collaboration with Luka Frelih, Yves Degoyon and Fabian Voegueli.
>Synopsis
Semamap consists of different non commercial and open source software packages used in a programmatic way in order to render a visualization of semantic structures contained in text data sets. Is an activist research project and a repackaging of software tools to render images out of matrix decomposition (LSA statistical technique) of moderately large collection of texts.
Semamap is a repackaging of existing non commercial and open source software libraries to perform semantic indexing of texts. It is designed to be used by other activist researchers interested in semi automated indexing of text collections and visualization of moderately large text databases. For them a repackaging of already existing open source software packages and two types of visualizations to represent proximity between texts.
It is also to be used by any user interested to browse the collection of texts on social transformation of the recently released e-library. Finally it is to be used by the community interested in the collection of texts produced during seminars and ateliers held at ESF Paris 2003.
We would like to offer in the future a research road map to develop a more stable version of this map tool.
>>
How is applied the visualisation of the data?
A set of open tools have been applied to unstructured and structured collections of text (without and with a database). Texts type 2 of the ESF Paris and texts type 3 of the e-library have been analysed and made into visual representations, forced directed graphs. This package made out of open packages and a set of scripts written to put it together may be applied to open collections of texts. Other elements of comprehension:
Multidimensional semantic space > Weight of words > Distance between words/documents > Forced directed graphs > Multi-language challenge wordset situation and LSI > Synonymies, co-citations > Queries and retrieval
>> How it has been developed?
First we have been mapping current open source research and development of software packages and libraries that can help in the development of visualisations based on a semantic analysis. This first purpose has been driven to indentify available software packages , we have compiled them and assembled for indexing and browsing textual data. It is interesting to notice that a huge part of the semantic analysis and research is driven by academic and private research laboratories that don't usually bring to public the tools and code they are using or developing. This situation is related also to several factors that turns semantic analysis difficult to develop if you are an activist researcher with no budget for worsets or fast machines that can perform statistical analysis. Considering our situation in terms of infrastructure a model had been developed mainly using the titles of the texts made into visual representations that show the proximity between words and texts.
Several steps that have been followed:
LSA-LSI (package de python para realizar LSA) > Stan James and others. We have begun to apply it to texts in the e-bibliography in a text-per-text basis > Research de LSI > Mtx.py > Matriz -> dot (Graphviz) Matriz to radial graphs > Matrix to database > mtx2db.py > db -> Visualización (PIL y Walter Zorn Java script library) > Matriz to forced directed graph > Research on open packages for forced directed graphs > ccwords.py RSF > CCVisu Java applet > Monster DHTML > SVG > Visual Search engine > grep.py > ebiblioTooltip.cgi > queries.sql > most related words to text > most proximate texts > applied to all texts selection ( ESF Paris's presentation , ESF Paris's workshops, Ebiblio + multilanguage: english, french, spanish, italian, portugues).

> Implementation:
We have used a non supervised statistical technique called Latent Semantic Analysis. This technique based on SVD (Singular Value Decomposition) to analyze the statistical relationships among words in a collection of text. This technique gives a matrix of words per document with coefficients that indicate the relation between the words an its contexts in a given set of documents. Out of this analysis we are able to obtain a list of words that are mentioned with higher frequency as well as the paragraphs that are highly related to them. What we obtain from the analized documents are networks of relations between words per document that we use as our main interface to browse the e-bibliography. An interface defines the communication boundary between two entities, such as a piece of software, a hardware device, or a user. It generally refers to an abstraction that an entity provides of itself to the outside. It may also provide a means of translation between entities which do not speak the same language, such as between a human and a computer . Semantic Analysis together with image rendering scripts provides the interface between the users and the e-library. You can contact me for any question here.
>Link
You can find here an extense library of articles around the semantic issues, enjoy!!!
“Introducing the Arabic WordNet Project”, William BLACK, Sabri ELKATEB, Horacio RODRIGUEZ, Musa ALKHALIFA, Piek VOSSEN, Adam PEASE, Christiane FELLBAUM
“Bursty and Hierarchical Structure in Streams”, Jon Kleinberg
“Research Problem Statement”, Dean Earl Wright
“Meta Latent Semantic Analysis”, Marin Simina, Costin Barbu
“Finding User Semantics on theWeb usingWord Co-occurrence Information”, Junichiro Mori, Yutaka Matsuo, and Mitsuru Ishizuka
“Which statistics reflect semantics? Rethinking synonymy and word similarity”, Derrick Higgins
“Mapping knowledge domains”, Richard M. Shiffrin and Katy Borner
“A Comparison of LSA, WordNet and PMI-IR for Predicting User Click Behavior”, Ishwinder Kaur and Anthony J. Hornof
“Mapping topics and topic bursts in PNAS”, Ketan K. Mane and Katy Borner
“From paragraph to graph: Latent semantic analysis for information visualization”, Thomas K. Landauer, Darrell Laham, and Marcia Derr
“Annotea and Semantic Web Supported Collaboration”, Marja-Riitta Koivunen, Ph.D
“Mining the Web for Synonyms:PMI-IR versus LSA on TOEFL”, Peter D. Turney
"A Comparison of LSA, WordNet and PMI-IR for Predicting User Click Behavior”, Ishwinder Kaur and Anthony J. Hornof
“An Associative Information Visualizer”, Howard D. White, Xia Lin, Jan Buzydlowski
“From paragraph to graph: Latent semantic analysis for information visualization”, Thomas K. Landauer, Darrell Laham, and Marcia Derr
“Bursty and Hierarchical Structure in Streams”, Jon Kleinberg
“The LSA Package”, Fridolin Wild
“Blogviz Mapping the dynamics of Information Diffusion in Blogspace”, Manuel Lima
“Visual Web Mining”, Amir H. Youssefi, David J. Duke, Mohammed J. Zaki

|