Location
Library Room 1576
Date and Time
Abstract
Word embeddings are vector-based representations of semantic relationships between words that enable computational analysis across large text corpora. However, the stability and validity of word embeddings remain a significant concern in NLP given limited corpora in certain languages. We define stability through the percent overlap between nearest neighbors in an embedding space and cosine similarity. This project examines how different embedding algorithms influence model stability across resource levels, with a focus on fastText in comparison to Word2Vec and GloVe. FastText incorporates subword information through character-level n-gram modeling, a feature that has shown particular promise for languages with limited training data. These results have broader implications for low-resource language technologies and computational linguistics, especially in domains where reproducibility and robustness are essential.