Article requires registration
USA: JL Lemma Code Making Big Data Small
A Harvard computer scientist demonstrates that a 30-year-old theorem is still best suited to reduce data and speed up algorithms.
Cambridge/USA — When we think about digital information, we often think about size. A daily email newsletter, for example, maybe 75 to 100 kilobytes in size. But data also has dimensions, based on the numbers of variables in a piece of data. An email, for example, can be viewed as a high-dimensional vector where there’s one coordinate for each word in the dictionary and the value in that coordinate is the number of times that word is used in the email. So a 75 Kb email that is 1,000 words long would result in a vector in the millions.
This geometric view on data is useful in some applications, such as learning spam classifiers, but, the more dimensions, the longer it can take for an algorithm to run and the more memory the algorithm uses.