Detecting exaggerated numerals in financial text

Roy, Rima

Please use this identifier to cite or link to this item: http://20.198.91.3:8080/jspui/handle/123456789/8991

Title:	Detecting exaggerated numerals in financial text
Authors:	Roy, Rima
Advisors:	Naskar, Sudip Kumar
Keywords:	Natural Language Processing (NLP);Contextual Embedding for a given token (number), Supervised Learning, Deep Learning.
Issue Date:	2023
Publisher:	Jadavpur University, Kolkata, West Bengal
Abstract:	Numerics play a very important role in the financial text. If we change a particular numeric within a financial text then the meaning of the text is different. For example- "Google earnings increase 10%" If we change the numeric '10' and mistyped one extra zero then the sentence becomes "Google earnings increase 100%". So, the meaning of these two sentences is different in the financial aspect. So, changing the numeric in a financial text will lead to different outcomes in financial forecasting systems. In this thesis, I try to give an idea of making an automated system that helps to detect this type of exaggerated numeric in the financial text by predicting the range of the numeric in a position of the text using natural language processing. For this task, I use the standard data dataset called ‘numeracy-600k’ which has two subsets that contain a large set of market comments and article titles also. The work involves the contextual embedding of a numeric token using the BERT-SEC-NUM model, exploring different machine learning and deep learning model like logistic regression, random forest, XGBoost, light GBM, CNN, LSTM, MLP, etc. Evaluating the performance of the model and discussing the challenges through comprehensive experiments.
URI:	http://20.198.91.3:8080/jspui/handle/123456789/8991
Appears in Collections:	Dissertations

Files in This Item:

File	Description	Size	Format
MCA ( Dept of Computer Science and Engineering) RimaRoy.pdf		620.62 kB	Adobe PDF	View/Open

Show full item record

IR@JU Digital Repository

IR@JU preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets