IJFANS International Journal of Food and Nutritional Sciences

ISSN PRINT 2319 1775 Online 2320-7876

Dimensionality Reduction for Brazilian Business Descriptions

Main Article Content

Venkateswarlu B1*, Dr Somasekhar Donthu2
» doi: 10.48047/IJFANS/V11/ISS10/421

Abstract

It appears that you have presented a dataset that includes business descriptions of Brazilian enterprises that are classified into several economic activities. You wish to reduce the size of the data matrix without sacrificing any significant information by doing dimensionality reduction. This is an overview of the points you raised. Dataset Overview: 1080 documents total from your dataset contain free-text business descriptions of Brazilian enterprises. The National Classification of Economic Activities is the basis for the nine distinct categories into which these descriptions are divided (CNAE). Prepositions have been eliminated, words have been transformed into their canonical forms, and each document has been represented as a vector based on word frequency. Data Reliability: With zeros occupying 99.22% of the matrix, the dataset is extremely sparse. This indicates that a high dimensionality issue results from the majority of terms not appearing in the majority of documents. Reducing the number of variables or features in order to address the high dimensionality issue is known as dimensionality reduction. It is separated into two categories: feature extraction and feature selection. Engineering and Feature Extraction: The process of feature extraction converts unprocessed data into features that can be used in modelling. The process of increasing data correctness for algorithms is called feature transformation. In feature selection, superfluous characteristics must be eliminated. primary goal Reducing the dimensionality of the data matrix while preserving crucial information is your main objective. This entails removing features or terminology while keeping as much important data as you can. In a vector space, vector S. Tempo Model: The texts in the database are represented by you using a vector space model, in which every term becomes a dimension. Weighting Terms: By identifying terms with the highest power of discrimination and removing fewer terms, you are using term weighting approaches to increase dimensionality reduction and choose the most relevant terms.

Article Details