Universidad, Ciencia y Tecnología
versión impresa ISSN 1316-4821versión On-line ISSN 2542-3401
Resumen
ARTIGAS FUENTES, Fernando; GIL GARCIA, Reynaldo; BADIA CONTELLES, José Manuel y PONS PORRATA, Aurora. VICINITY CALCULATION WITH GRAPHS IN TEXT MINING . uct [online]. 2008, vol.12, n.48, pp.163-170. ISSN 1316-4821.
Searching the most similar documents to a given one is crucial in Text Mining because it is the basic process of many techniques like classification or information retrieval. The documents are usually represented in high-dimensional feature space, where each term appeared in documents is treated as features and the weight of each term reflects its importance in the document. There are many approaches to find the vicinity of an object, but their performance drastically decreases as the number of dimensions grows. This problem prevents its application for documents. In this paper, we present an access method based on a graph structure that determines in an approximate way the vicinity of a novel document. The obtained method has a high selectivity and an acceptable error rate when it is embedded in a classifier and compared with the exhaustive method that evaluates all documents. Our experimental analysis shows that it is feasible the use of the proposed method in problems of very high dimensionality, such as Text Mining.
Palabras clave : Data Mining; Text Mining; Access Methods; Very high-dimensional indexing; Neighborhood calculation.