Please use this identifier to cite or link to this item:
http://20.198.91.3:8080/jspui/handle/123456789/8991| Title: | Detecting exaggerated numerals in financial text |
| Authors: | Roy, Rima |
| Advisors: | Naskar, Sudip Kumar |
| Keywords: | Natural Language Processing (NLP);Contextual Embedding for a given token (number), Supervised Learning, Deep Learning. |
| Issue Date: | 2023 |
| Publisher: | Jadavpur University, Kolkata, West Bengal |
| Abstract: | Numerics play a very important role in the financial text. If we change a particular numeric within a financial text then the meaning of the text is different. For example- "Google earnings increase 10%" If we change the numeric '10' and mistyped one extra zero then the sentence becomes "Google earnings increase 100%". So, the meaning of these two sentences is different in the financial aspect. So, changing the numeric in a financial text will lead to different outcomes in financial forecasting systems. In this thesis, I try to give an idea of making an automated system that helps to detect this type of exaggerated numeric in the financial text by predicting the range of the numeric in a position of the text using natural language processing. For this task, I use the standard data dataset called ‘numeracy-600k’ which has two subsets that contain a large set of market comments and article titles also. The work involves the contextual embedding of a numeric token using the BERT-SEC-NUM model, exploring different machine learning and deep learning model like logistic regression, random forest, XGBoost, light GBM, CNN, LSTM, MLP, etc. Evaluating the performance of the model and discussing the challenges through comprehensive experiments. |
| URI: | http://20.198.91.3:8080/jspui/handle/123456789/8991 |
| Appears in Collections: | Dissertations |
Files in This Item:
| File | Description | Size | Format | |
|---|---|---|---|---|
| MCA ( Dept of Computer Science and Engineering) RimaRoy.pdf | 620.62 kB | Adobe PDF | View/Open |
Items in IR@JU are protected by copyright, with all rights reserved, unless otherwise indicated.