Preview

Bulletin of Kazakh National Women's Teacher Training University

Advanced search

FEATURE SELECTION FOR AUTOMATIC DETECTION OF TEXT GENRE

https://doi.org/10.52512/2306-5079-2021-85-1-84-90

Abstract

This article discusses the algorithms that can be used in the study and analysis of symbols to determine the genre of texts. There are differences in defining the genre of texts. Algorithm is also defined by describing the text, removing unnecessary characters, leaving only the text, and comparing it with the database. The article describes a practical method of automatic recognition of the text genre based on all parameters. Comparing the logistics regression, solution tree, random forest, MLPClassifier, AdaBoostClassifier, svm, GaussianNB algorithms, the choice of the most important parameters for the texts was considered. Defining the genre of texts is now relevant in all areas of the information society.

About the Authors

S. Sh. Shanshar
Kazakh national university named after Al-Farabi
Kazakhstan

Almaty



I. M. Ualiyeva
Kazakh national university named after Al-Farabi
Kazakhstan

Almaty



References

1. Douglas B. (2015) Dimensions of Register Variation: A Cross-Linguistic Comparison. Cambridge University Press, Cambridge, England.

2. Xiao Hu, J. Stephen Downie, and Andreas F. (2009) Ehmann Lyric Text Mining in Music Mood Classification. American music.

3. Kotsiantis S B, Zaharakis I., and Pintelas P. (2007) Supervised machine learning: A review of classification techniques. Frontiers in Artificial Intelligence and Applications, 160, 3. 4. Sebastiani F. (2002) Machine learning in automated text categorization, ACM computing surveys (CSUR), vol. 34, no. 1, pp. 1-47.

4. Seaar Al-Dabooni, Donald Wunsch, (2019), Model Order Reduction Based on Agglomerative Hierarchical Clustering. IEEE Transactions on Neural Networks and Learning Systems (TNNLS) Volume 30, Issue 7 – July, P: 1928-1942

5. https://github.com/zamgi/lingvo--classify

6. Albitar S., Fournier S. and Espinasse B. (2014) An effective TF/IDF-based text-to-text semantic similarity measure for text classification, International Conference on Web Information Systems Engineering, pp. 105-114, October.

7. Bafna, P., Pramod, D., & Vaidya, A. (2016) Document clustering: TF-IDF approach. In Electrical, Electronics, and Optimization Techniques (ICEEOT), International Conference on (pp. 61-66). IEEE, March.

8. Stamatatos E., Fakotakis N., Kokkinakis G. (2010) Text Genre Detection Using Common Word Frequencies, Proc. of the 18th International Conference on COLING2000.

9. Kessler B., Nunberg G., Schütze H. (1997) Automatic Detection of Text Genre, ACL’97, pages 32-38, July.

10. Gershman A., Meisels A., Lüke K.H., Rokach L., Schclar A., Sturm A. A Decision Tree Based Recommender System. InIICS 2010 Jun 3 (pp. 170-179).

11. Ualiyeva I.M., Mussabayev R.R. (2019) A Close Look at Features for Text Categorization (in publication)

12. Breiman L. (2001) Random forests. Mach. Learn.

13. Sung-Bae Cho, Jee-Haeng Lee (2003) Learning Neural Network Ensemble for Practical Text Classification, Lecture Notes in Computer Science, Volume 2690, Aug, Pages 1032 – 1036.

14. Yatsko V.A., Starikov M.S., and Butakov A. V. (2010) Automatic genre recognition and adaptive text summarization. Autom. Doc. Math. Linguist., vol. 44, no. 3, pp. 111–120.


Review

For citations:


Shanshar S.Sh., Ualiyeva I.M. FEATURE SELECTION FOR AUTOMATIC DETECTION OF TEXT GENRE. Bulletin of Kazakh National Women's Teacher Training University. 2021;(1):84-90. (In Kazakh) https://doi.org/10.52512/2306-5079-2021-85-1-84-90

Views: 704


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2306-5079 (Print)