German China
Search

Article requires registration

USA: JL Lemma Code Making Big Data Small

| Editor: Alexander Stark

A Harvard computer scientist demonstrates that a 30-year-old theorem is still best suited to reduce data and speed up algorithms.

Related Company

Pre-processing large data into lower dimensions is key to speedy algorithmic processing.
Pre-processing large data into lower dimensions is key to speedy algorithmic processing.
(Source: Harvard John A. Paulson School of Engineering and Applied Sciences)

Cambridge/USA — When we think about digital information, we often think about size. A daily email newsletter, for example, maybe 75 to 100 kilobytes in size. But data also has dimensions, based on the numbers of variables in a piece of data. An email, for example, can be viewed as a high-dimensional vector where there’s one coordinate for each word in the dictionary and the value in that coordinate is the number of times that word is used in the email. So a 75 Kb email that is 1,000 words long would result in a vector in the millions.

This geometric view on data is useful in some applications, such as learning spam classifiers, but, the more dimensions, the longer it can take for an algorithm to run and the more memory the algorithm uses.