Student: Matthew DeCarlo
Presentation Viewed: Introduction to Big Data: Are You Ready?
Presenter: Daniel Liu
Daniel Liu’s session: “Introduction to Big Data: Are You Ready?” covered a lot of
overarching topics related to Big Data. Specifically, he talked about what Big Data is and isn’t as well as discuss some technologies and strategies that can be implemented at an organization that looks to manage their Big Data needs. Liu defined ‘big data,’ as necessarily meeting four criteria: volume, velocity, value, and variety. The data must be: of an extremely large volume, be generated at an extremely fast pace, be of business value, and comes in a variety of formats (text, documents, media, etc.).
He also gave some statistics that were particularly interesting. On the topic of velocity, one of the key proponent of ‘big data,’ he explained that 90% of all the world’s data has been created in just the last two years. When thinking about the sheer amount of digital data that the world contains, it’s mind blowing to think that almost all of it, or 90% was created in the past two years. That means most of the data was created after I started college, and that was only in 2012. Speaking of the year 2012, during that year, the average number of Internet connected
devices was 9 billion. The sheer size of this can be compared to the world population of 7 billion, meaning there are more Internet connected devices than there are people on earth. This number is not only large now, but it’s set to multiply over the years until 2020. By the year 2020, it’s estimated that 50 billion devices will be connected to the Internet. These 50 billion devices will produce data at an even greater rate than it currently already is.
Another topic Liu talked about was the schema for Big Data. More specifically, he explained that because Big Data has variety, you couldn’t know the schema before you begin modeling a big data environment. Instead of a data model in which only relational databases are employed, newer ‘big data’ technologies must be integrated into that same model. While his discussion was far from technical, Liu finished his presentation by continuing on to show some of the different tools that can be used to manage Big Data. He introduced Hive, Hadoop, NoSQL, and MapReduce to name a few.
Liu promised that by the end of his session we would all have a furthered understanding of techniques and technologies that enable organizations to analyze data; I definitely left feeling he had delivered on that promise.