DISCOURSE BASED OPINION MINING ON ROMAN URDU DATA

  • Dr. Zareem Sharaf Szabist, Karachi, Pakistan
  • Dr. Husnain Manzoor Ali Szabist, Karachi, Pakistan

Abstract

The use of Roman Urdu (in the form of web and user-generated content) is a common mode of communication on social media. Content like comments, reviews, feedbacks and social networking posts have been generated in Roman Urdu in large volumes. But this area is not much worked on in terms of sentiment and opinion analysis.  Roman Urdu (the scripting style for Urdu language) is one of the limited resource languages that brings forward the challenges and problems for performing Opinion Mining. Adequate opinion mining is not just about understanding the overall sentiment of a document or a single paragraph, but it is also important to be able to extract sentiments on a very granular level and relate each sentiment to the aspect it corresponds to. On the more advanced level, the analysis can go beyond only positive or negative attitude and identify complex attitude types. We, therefore, developed a model for performing discourse-based opinion mining, so we could also consider the impact that various discourse elements have on the overall sentiment of the text. Our work differs from the existing body of knowledge in that not much work has been carried out on processing of Roman Urdu data for opinion mining considering discourse elements. Since our work focuses on performing discourse-based opinion mining it can be considered as first attempt in this direction as none of the literature surveyed revealed discourse-based analysis of Roman Urdu text. The overall gist of this research work is to have insights of the nature of user-generated content in Roman Urdu and to build necessary resources and devise algorithms to make an advancement in Sentiment Analysis and Classification for Roman Urdu.

References

[1] Mukherjee, Subhabrata, and Pushpak Bhattacharyya. , "Sentiment analysis in twitter with lightweight discourse analysis.," in Proceedings of COLING 2012, 2012.
[2] "Computational Discourse," [Online]. Available: http://www3.cs.stonybrook.edu/~ychoi/cse507/slides/06-discourse.pdf.
[3] Dey, Lipika and Haque, Sk., " Opinion Mining from Noisy Text Data," International Journal on Document Analysis and Recognition 12(3). , pp. 205-226, 2009.
[4] Go, Alec, et al., "Twitter sentiment classification using distant supervision," CS224N Project Report., 2009.
[5] Marcu, Daniel, "The Theory and Practice of Discourse, Parsing and Summarisation," MIT Press, Cambridge, M.A., 2000.
[6] Wolf, Florian and Gibson, Edward, "Representing discourse coherence: A corpus-based study.," Computational Linguistics, p. 249–287, 2005.
[7] B. a. J. A. a. B. P. AR, "Harnessing WordNet Senses for Supervised Sentiment Classification.," in In Proceedings of Empirical Methods in Natural Language Processing (EMNLP), 2011.
[8] Grosz, Barbara J. and Candace L. Sidner, "Attention, intentions, and the structure of discourse.," Computational Linguistics, p. 175–204, 1986.
[9] Hobbs, Jerry R., "On the coherence and structure of discourse.," Center for the Study of Language and Information (CSLI), Stanford, C.A., 1985.
[10] Syeed Ibn Faiz and Robert E. Mercer, "Identifying Explicit Discourse Connectives in Text," in Canadian Conference on Artificial Intelligence., London, ON, Canada, 2013.
[11] Pitler, Emily, et al., "Easily identifiable discourse relations.," {Technical Reports (CIS), p. 884, 2008.
[12] Graff, " English gigaword corpus," 2003.
[13] Miltsakaki, Eleni, et al., "Experiments on sense annotation and sense disambiguation of discourse connectives.," in Proceedings of the Fourth Workshop on Treebanks and Linguistic Theories (TLT2005), 2005.
[14] Ziheng Lin, et al., "Recognizing Implicit Discourse Relations in the Penn Discourse Treebank," in Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1-Volume 1. Association for Computational Linguistics., Singapore, 2009.
[15] Mukherjee, Subhabrata, and Pushpak Bhattacharyya., "Sentiment analysis in twitter with lightweight discourse analysis.," in Proceedings of COLING, 2012.
[16] Al-Moslmi, Tareq, "Enhanced Malay sentiment analysis with an ensemble classification machine learning approach.," Journal of Engineering and Applied Sciences, pp. 5226-5232, 2017.
[17] T. Z. Zhao, "Learning discourse-level diversity for neural dialog models using conditional variational autoencoders.," arXiv, 2017.
[18] B. Pan, " Discourse marker augmented network with reinforcement learning for natural language inference.," in Proceedings of the 56thAnnual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)., 2018.
[19] C. Bothe, "Discourse-wizard: discovering deep discourse structure in your conversation with RNNs.," arXiv preprint arXiv:1806.11420., 2018.
[20] RAFIQUE, Ayesha et al. Sentiment Analysis for Roman Urdu. Mehran University Research Journal of Engineering and Technology, [S.l.], v. 38, n. 2, p. 463-470, apr. 2019. ISSN 2413-7219. Available at: . Date accessed: 06 july 2020. doi: http://dx.doi.org/10.22581/muet1982.1902.20.
Published
2019-06-30
Section
Articles