CIS607, Spring 2019
Big Data and Deep Learning
Course Description:
Big Data and Deep Learning are two active research areas which are related to several computer science disciplines, such as Databases, Data Mining, and Machine Learning, as well as applications in industry. Big data becomes a well-used term due to collections of real life data in both research and industry are so large and so complex that it is difficult to process them efficiently using traditional database and data mining tools. The challenges include capture, curation, storage, search, sharing, transfer, integration, analysis, and visualization. Deep learning is new research area in machine learning that attempt to model high-level abstractions in data by using architectures composed of multiple non-linear transformations. It is part of machine learning methods which are based on learning representations. The observations from data can be represented in many ways, but some representations make it easier to learn tasks of interest from data examples, and research in this area attempts to define what makes better representations and how to create models to learn these representations.
This graduate research seminar will first survey the basic concepts, research directions, and real world applications of Big Data and Deep Learning respectively. Then this seminar
will encourage the discussions of overlapping research and applications when Big Data meets Deep Learning, especially from the perspectives of data structure, complexity, and learning representations. From one hand, Big Data and Big systems helps Deep Learning models to improve their performance. On the other hand, Deep Learning can help analyze data in large scale and big varieties. The instructor will give some introduction about those topics and their overlapping in a couple of lectures. Students are expected to read and discuss papers from journals or conference proceedings or from unpublished manuscripts on the Web. Each student is expected to give one presentation about the paper or the topic he/she is interested in. The final report for each student can be a survey paper or a small implementation.
Prerequisites:
None. Basic knowledge of Databases, data mining, AI, and machine learning will be helpful.
Time and Place:
Fridays 4:00pm-5:20pm, 200 Deschutes Hall.
Instructor:
Dejing Dou, 303 Deschutes, phone 541-346-4572, email dou@cs.uoregon.edu.
Office hours:
Mondays 4:00-5:00 or by appointment.
Evaluation:
There is no exam for this seminar. Attendance and participation, paper
reading, paper presentation and final report will determine the course score. Students
will be encouraged to conduct further research projects from the topics
discussed in this seminar, but it is not the requirement. Some more detail.
Schedule and Lecture Notes:
Homework:
Papers for Reading (keeping updated):
- Introduction and Survey on Big Data and Deep Learning
- A. Jacobs. The pathologies of big data. Communications of the ACM 52, No. 8, pp. 36-44, 2009.
- Y. Bengio. Learning deep architectures for AI. Foundations and trends in Machine Learning 2, No. 1, pp. 1-127, 2009.
- D. Agrawal, S. Das, and A. E. Abbadi. Big data and cloud computing: current state and future opportunities. In Proceedings of the 14th International Conference on Extending Database Technology. pp. 530-533, 2011.
- Y. LeCun, Y. Bengio, and G. Hinton. Deep Learning. Nature 521(7553), pp.436-444, 2015.
- Research in Big Data
- J. Cohen, B. Dolan, M. Dunlap, J. M. Hellerstein, and C. Welton. MAD skills: new analysis practices for big data. In Proceedings of the VLDB Endowment 2, No. 2, pp. 1481-1492, 2009.
- G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski Pregel: a system for large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data pp. 135-146, 2010.
- H. Herodotou, H. Lim, G. Luo, N. Borisov, L. Dong, F. B. Cetin, and S. Babu. Starfish: A Self-tuning System for Big Data Analytics. In Proceedings of the 5th Biennial Conference on Innovative Data Systems Research (CIDR). pp. 261-272, 2011.
- Y. Chen, S. Alspaugh, and R. Katz. Interactive analytical processing in big data systems: A cross-industry study of MapReduce workloads. In Proceedings of the VLDB Endowment 5, No. 12, pp. 1802-1813, 2012.
- J. C. Corbett et al.Spanner: Google's Globally-Distributed Database. In Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation pp. 261-264, 2012.
- Research in Deep Learning
- G. E. Hinton, S. Osindero, and Y. Teh. A fast learning algorithm for deep belief nets. Neural Computation 18, No. 7, pp. 1527-1554, 2006.
- R. Collobert, and J. Weston. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th international conference on Machine learning, pp. 160-167, 2008.
- H. Lee, R. Grosse, R. Ranganath, and A. Y. Ng. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In Proceedings of the 26th Annual International Conference on Machine Learning, pp. 609-616, 2009.
- J. Weston, F. Ratle, H. Mobahi, and R. Collobert. Deep learning via semi-supervised embedding. In Neural Networks: Tricks of the Trade, pp. 639-655, 2012.
- R. Gens, and P. Domingos. Discriminative Learning of Sum-Product Networks. In Proceedings of the 2012 Neural Information Processing Systems Conference, pp. 3248-3256, 2012.
- When Big Data meets Deep Learning
- G. E. Hinton and R. R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science 313(5786), pp.504-507, 2006.
- Chen, X.W. and Lin, X. Big data deep learning: challenges and perspectives. IEEE Access, 2, pp.514-525, 2014.
- Zhou, Z.H., Chawla, N.V., Jin, Y. and Williams, G.J. Big data opportunities and challenges: Discussions from data analytics perspectives. IEEE Computational Intelligence Magazine, 9(4), pp.62-74, 2014.
- Perozzi, B., Al-Rfou, R. and Skiena, S. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM international conference on Knowledge Discovery and Data mining pp. 701-710. 2014.
- Zhang, K. and Chen, X.W. Large-scale deep belief nets with Mapreduce. IEEE Access, 2, pp.395-403.
- Najafabadi, M.M., Villanustre, F., Khoshgoftaar, T.M., Seliya, N., Wald, R. and Muharemagic, E. Deep learning applications and challenges in big data analytics. Journal of Big Data, 2(1), p.1, 2015.
- Al-Jarrah, O.Y., Yoo, P.D., Muhaidat, S., Karagiannidis, G.K. and Taha, K. Efficient machine learning for big data: A review.. Big Data Research, 2(3), pp.87-93, 2015.
- Xing, E.P., Ho, Q., Dai, W., Kim, J.K., Wei, J., Lee, S., Zheng, X., Xie, P., Kumar, A. and Yu, Y.. Petuum: A new platform for distributed machine learning on big data. IEEE Transactions on Big Data, 1(2), pp.49-67. 2015.
- Lv, Y., Duan, Y., Kang, W., Li, Z. and Wang, F.Y., Traffic flow prediction with big data: a deep learning approach. IEEE Transactions on Intelligent Transportation Systems 16(2), pp.865-873, 2015.
- Wang, J., Liu, W., Kumar, S. and Chang, S.F. Learning to hash for indexing big data - a survey. Proceedings of the IEEE, 104(1), pp.34-57, 2016.
- Zhang, Q., Yang, L.T. and Chen, Z. Privacy preserving deep computation model on cloud for big data feature learning. IEEE Transactions on Computers, 65(5), pp.1351-1362, 2016.
- Alsheikh, M.A., Niyato, D., Lin, S., Tan, H.P. and Han, Z. Mobile big data analytics using deep learning and apache spark. IEEE Network, 30(3), pp.22-29, 2016.
- Gu, X., Zhang, H., Zhang, D. and Kim, S. Deep API learning. In Proceedings of the 24th ACM International Symposium on Foundations of Software Engineering (FSE), pp. 631-642, 2016.
- Quanjun Chen, Xuan Song, Harutoshi Yamada, and Ryosuke Shibasaki. Learning Deep Representation from Big and Heterogeneous Data for Traffic Accident Inference. In Proceedings of the 30th AAAI Conference on Artificial Intelligence (AAAI-16). pp. 338-344, 2016.
- Kipf, T.N. and Welling, M. Semi-supervised classification with graph convolutional networks. In Proceedings of 5th International Conference on Learning Representations (ICLR). 2017.
- He Li, Kaoru Ota, Mianxiong Dong Learning IoT in Edge: Deep Learning for the Internet of Things with Edge Computing. IEEE Network 32(1), pp. 96-101, 2018.
Useful Links