Please use this identifier to cite or link to this item:
http://localhost:8081/jspui/handle/123456789/18870Full metadata record
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | Kumar, Rahul | - |
| dc.date.accessioned | 2026-02-05T11:00:11Z | - |
| dc.date.available | 2026-02-05T11:00:11Z | - |
| dc.date.issued | 2024-06 | - |
| dc.identifier.uri | http://localhost:8081/jspui/handle/123456789/18870 | - |
| dc.guide | Kumar, Sanjeev | en_US |
| dc.description.abstract | The growing intricacy of problems related to natural language processing has brought to light the shortcomings of static word embeddings in terms of encapsulating sentence context. Transformer-based models that use self-attention processes have been used to solve this, however they have problems because of BERT's tokenizers word piece algorithm, which divide words into sub-words and may lead to problems with contextual meaning understanding. By improving the tokenizer, this research aims to improve vector embeddings of words. The tokenizer was altered with domain-specific terms or tokens, and the model was trained on a dataset that contained these tokens. According to our research, the prepared words' cosine similarity has improved, suggesting improved contextual representation. These stretching of word embeddings imply that certain words have better embedding quality than previously existing models, with some words having more accurate vector space representations. Furthermore, our methodology emphasizes the significance of tailoring tokenization techniques to particular domains, which may result in more accurate language models dynamic embeddings. This research shows the problem of existing tokenizer which are word piece based algorithm. By adding the special tokens into the tokenizers vocabulary and it shows the significant difference of dynamic embeddings generated by the BERT model. | en_US |
| dc.language.iso | en | en_US |
| dc.publisher | IIT, Roorkee | en_US |
| dc.title | WORD EMBEDDING IMPROVEMENTS IN NATURAL LANGUAGE PROCESSING | en_US |
| dc.type | Dissertations | en_US |
| Appears in Collections: | MASTERS' THESES (MFSDS & AI) | |
Files in This Item:
| File | Description | Size | Format | |
|---|---|---|---|---|
| 22566018_RAHUL KUMAR.pdf | 1.06 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.
