SciELO - Scientific Electronic Library Online

 
vol.12 número48Modelo genérico de celdas fotovoltaicasImplementación de filtros morfológicos utilizados en el procesamiento de imágenes digitales en un dispositivo lógico programable índice de autoresíndice de materiabúsqueda de artículos
Home Pagelista alfabética de revistas  

Servicios Personalizados

Revista

Articulo

Indicadores

Links relacionados

Compartir


Universidad, Ciencia y Tecnología

versión impresa ISSN 1316-4821versión On-line ISSN 2542-3401

Resumen

ARTIGAS FUENTES, Fernando; GIL GARCIA, Reynaldo; BADIA CONTELLES, José Manuel  y  PONS PORRATA, Aurora. VICINITY CALCULATION WITH GRAPHS IN TEXT MINING . uct [online]. 2008, vol.12, n.48, pp.163-170. ISSN 1316-4821.

Searching the most similar documents to a given one is crucial in Text Mining because it is the basic process of many techniques like classification or information retrieval. The documents are usually represented in high-dimensional feature space, where each term appeared in documents is treated as features and the weight of each term reflects its importance in the document. There are many approaches to find the vicinity of an object, but their performance drastically decreases as the number of dimensions grows. This problem prevents its application for documents. In this paper, we present an access method based on a graph structure that determines in an approximate way the vicinity of a novel document. The obtained method has a high selectivity and an acceptable error rate when it is embedded in a classifier and compared with the exhaustive method that evaluates all documents. Our experimental analysis shows that it is feasible the use of the proposed method in problems of very high dimensionality, such as Text Mining.

Palabras clave : Data Mining; Text Mining; Access Methods; Very high-dimensional indexing; Neighborhood calculation.

        · resumen en Español     · texto en Español

 

Creative Commons License Todo el contenido de esta revista, excepto dónde está identificado, está bajo una Licencia Creative Commons