Pretraining Federated Text Models for Next Word Prediction

In the winter of 2020, I completed my MS in data science and my capstone project on pretraining neural networks for federated next word prediction. The focus of this work was to contribute research to the growing field of federated learning, to which I was introduced during my machine learning course, DATA558 at University of Washington. Brendan McMahan, a principal researcher in federated learning, gave a guest lecture on his work at Google on training federated deep learning models which introduced me to the topic. My enterprising classmate and soon to be research partner followed up with Brendan after the lecture. This led to a project the following academic year in which he and I worked with two project sponsors from the Google federated learning team to conduct research in the field and contribute experimental findings to their ongoing work.

My research focused on pretraining methods for federated deep learning models designed for next word prediction. This is an important area of applied research for federated learning, as the Google mobile keyboard application uses federated learning to train models to predict words in text messages, offering user suggestions for the top three predicted words given some starting text. Model pretraining promises enhanced accuracy and reduced training time, which we demonstrate in our research results.

While the Google mobile keyboard is an early application of federated learning, there are many open areas of research, some with tremendous impact potential. For example, most clinical research with machine learning models is limited to electronic medical records owned by a single institution. Models may not generalize from training data at one institution to another, and training datasets are often limited to the size of the dataset at one institution. Federated models present the possibility of learning from medical records from many institutions without compromising privacy or security by bringing the model to each distributed dataset and averaging the learned parameters, as opposed to centrally aggregating patient data, which is often a legal and ethical impossibility. Just as Google uses federated models to avoid centrally aggregating user text messages, so too federated models could be used to generate predictions and recommendations for patients based on distributed medical data.

I’m grateful to my research partner, mentors on the federated learning team at Google, and professor at UW for the opportunity to get involved in this exciting area of research. Feel free to check out the code and paper for our work, and please don’t hesitate to reach out if the topic sparks your interest :)