But anyways, research is hard. I'm trying to making a neural net that will read in a bunch of series of words (called ngrams) and try to use the middle word to predict the words around it. You might say, "that doesn't sound very useful, why would you want to predict words around a word?" and I totally agree; it isn't very useful. What is useful, is that to predict the surrounding words, the system first projects the middle word into a vector in a continuous space (at first to a random spot). Then, when it predicts wrong, it adjusts the vector a little bit in the direction that would have predicted the right word. After many iterations, these vectors end up containing information about the word that is useful to the system when predicting nearby words. Since similar words tend to be surrounded by the same words, similar words get clumped together. Once we have these projections, we can do cool stuff like translate phrases from English to French semantically rather than just using the most common words.
So I write the code for this neural node, run it on a small piece of data, and it seems to learn (the number it gets wrong while training goes down). Cool beans. I get the actual data and holy moley, there are 350,000,000 samples inside. My code can run about ~225 samples per second, so that would take about 3 weeks... Right now I am trying to run 10,000,000 samples, and it is not going so well. Not only is it taking longer than the predicted 12 hours, but after going down at first, the cost is now wavering back and forth very far from 0. This probably means my learning rate is too high, but the cost does seem to be decreasing slowly if I look at it from afar.
To make us all feel better, here are some pictures from my weekend wandering:
|A bakery with no sugar and white flour. It has integral bread, but that just sounds derivative.|
|Don't cook, just eat!|
|This picture isn't funny, but that bread looks delicious.|
|I too drink my mushrooms as tea.|