New math lets data take the lead

Big data, the use of powerful computation to find insights in massive fields of information, is in many ways a new science. As such, Han Liu said, it requires a new approach in mathematics.

$Han Liu portrait inside Sherrerd Hall$

“Things are starting to change and change fundamentally,” said Liu, an assistant professor of operations research and financial engineering.

For centuries, science has followed the same pattern: Scientists make conjectures, test them, and try to disprove their hypotheses. Big Data has changed that process.

“People are collecting large amounts of data. They analyze the data to find hidden patterns and use the patterns to lead to new hypotheses,” Liu said. “Many of these hypotheses are very counterintuitive and surprising.”

These new methods rely on statistics and probability and on advanced computing techniques in which the data “train” computers to interact with them so that future sets of data yield even greater results.

Scientists are already using this approach for many problems, from researching artificial intelligence to probing the genetic background of complex diseases and biological processes. The successful programs can find patterns that human intuition cannot see. But Liu said that most of the work is still very focused on applications.

In his Statistical Machine Learning Lab, Liu and his team are developing broad analytic tools that allow researchers to analyze complex scientific and business data with the weakest possible assumptions. In particular, they use data and computation as lenses to explore science and machine intelligence. “We need to build fundamental principles to make this a solid field,”

Liu said. Liu said that the current generation of students will play a critical role as Big Data develops as a science. “We are not trying to teach them techniques, but to be smarter, to be deep thinkers,” he said.