IEORE4526 Analytics on the Cloud

Instructor: Hardeep Johar

Cloud computing refers to the processing and storage of data on remote servers. Cloud computing enables the analysis of large data sets (aka ‘big data’) by making it possible to use multiple machines for processing (parallelism) without having to buy expensive hardware. Cloud computing also provides mechanisms for storing and managing large datasets across many servers. 

The goal of this course is to introduce you to the programming issues around working with clouds for data analytics. While we will learn how to work with the infrastructure of cloud platforms, and talk a bit about distributed computing, the focus of the course is on programming. Topics covered will include MapReduce, parallelism, the rewriting of algorithms (statistical, OR, and machine learning) for the cloud, and the basics of porting applications so that they run on the cloud. We will mostly work in Python (2 or 3) so prior familiarity with Python is a must. 

500 W. 120th St., Mudd 315, New York, NY 10027    212-854-2942                 
©2014 Columbia University