Have you ever read the book Minority Report? Okay maybe not, but you at least remember that there was a movie, which was pretty stellar for a book adaptation. The visual effects were stunning, like the Precog room, the whiteboards you could manipulate with your hands, and the street scenes of digital ads triggered based on your presence. All from a book written back in 1956 and a movie from 2002.
Minority Report showed a world awash in a sea of information. Systems know your past to suggest what you want now or even what you will want in the future. And it is all driven by an infinite amount of data across all sorts of channels (and maybe some supernatural humanoids).
That was what came to mind when I read a post by a data scientist over at Stack Overflow, Julia Silge. In celebration of the release of the book she co-authored, she took a slice of the unstructured data in Stack Overflow as a demonstration of the text mining techniques she explores in the book.
Using topic modeling, you do not have to know anything about the data beforehand. That is the power of the technique. Through machine learning, you discover what the topics are and call out patterns that lead to interesting and unexpected discoveries. Then it can be extended to deeper questions. For instance, one can explore the enormous wealth of Stack Overflow and analyze how tags on questions & answers would align to topics as shown in the chart below:
How are the results? Well, it worked eerily well as she showed with a few questions she ran through the model. You can begin to imagine the various applications already, where any massive body of textual data can be analyzed to understand interests, to infer decision proclivity, and to influence choices. It gets us ever so closer to the world as shown in Minority Report where the data seems to understand us better than we understand ourselves.
What comes to mind for you when you think of how these techniques can enhance your work? Or do you think this borders on invasive and creepy?
Why is it faster to process a sorted array than an unsorted array?
So I was curious what was the most popular question on Stack, here it is…
We help IT leaders in enterprises solve the cultural challenges involved in digital transformation and move towards a community based culture that delivers innovation and customer value faster. Learn more about our work here.