Language Detection in English-Hindi Code-Mixed Tweets
Milind Mishra
, Sahil Shandil , Kartik Sharma , Sumit Sharan Shukla , Rakshul Mahajan
The identification of language in social media texts has come up as a fascinating aspect of study in recent times. Most social media communications in states where English is not the primary language contain mixed codes. Contextual embeddings used for pre-training have demonstrated state of the art performance for variety of subsequent tasks. More newly, models like BERT have demonstrated pretrained language models are even more useful for learning common language representations when they are trained on a huge quantity of unlabeled data. This research presents extensive experiments that use transfer learning and BERT model fine tuning to recognize languages on Twitter. For language pre-training, the work uses a data collection of Hindi-English-Urdu codemixed text; for word-level language categorization, it uses Hindi-English codemixed text. The findings demonstrate that represents pre trained on codemixed data outperform their monolingual counterparts in terms of output.
"Language Detection in English-Hindi Code-Mixed Tweets", IJNRD - INTERNATIONAL JOURNAL OF NOVEL RESEARCH AND DEVELOPMENT (www.IJNRD.org), ISSN:2456-4184, Vol.9, Issue 10, page no.b883-b887, October-2024, Available :https://ijnrd.org/papers/IJNRD2410195.pdf
Volume 9
Issue 10,
October-2024
Pages : b883-b887
Paper Reg. ID: IJNRD_301333
Published Paper Id: IJNRD2410195
Downloads: 00030
Research Area: Science and Technology
Country: Gharuan, Punjab, India
ISSN: 2456-4184 | IMPACT FACTOR: 8.76 Calculated By Google Scholar | ESTD YEAR: 2016
An International Scholarly Open Access Journal, Peer-Reviewed, Refereed Journal Impact Factor 8.76 Calculate by Google Scholar and Semantic Scholar | AI-Powered Research Tool, Multidisciplinary, Monthly, Multilanguage Journal Indexing in All Major Database & Metadata, Citation Generator
Publisher: IJNRD (IJ Publication) Janvi Wave