Jupiter Dynamics

Hadoop


Big Data & Hadoop


  • Content
  • Learning goal
  • What is Big Data?
  • In this module, you will understand the meaning of big data, how traditional systems are limited in their ability to handle big data and how the Hadoop eco-system helps in solving this problem. You will learn about the various parts of the Hadoop eco-system and their roles.
  • Characteristics of big data
  • Traditional data management systems and their limitations
  • What is Hadoop?
  • Why is Hadoop used?
  • The Hadoop eco-system
  • Big data/Hadoop use cases
  • Content
  • Learning goal
  • HDFS Architecture
  • After this module you will be able to install, setup and configure a Hadoop cluster and also learn the various basic Hadoop shell commands. You will also learn about the distributed file storage system of Hadoop, HDFS, why it is used and how is it different and how files are read and written in the storage system. You will work hands-on in implementing what is taught in this module.
  • HDFS internals and use cases
  • HDFS Daemons
  • Files and blocks
  • Namenode memory concerns
  • Secondary namenode
  • HDFS access options
  • Installing and configuring Hadoop
  • Hadoop daemons
  • Basic Hadoop commands
  • Hands-on exercise
  • Content
  • Learning goal
  • HDFS workshop
  • This will be a workshop module where you will learn advanced concepts of HDFS. You will work hands-on in implementing what is taught in this module.
  • HDFS API
  • How to use configuration class
  • Using HDFS in MapReduce and programatically
  • HDFS permission and security
  • Additional HDFS tasks
  • HDFS web-interface
  • Hands-on exercise
  • Content
  • Learning goal
  • Cloud computing overview
  • Learn the fundamentals of cloud computing and how Hadoop can be installed on a server cluster in the cloud
  • SaaS/PaaS/IaaS
  • Characteristics of cloud computing
  • Cluster configurations
  • Configuring Masters and Slaves
  • Content
  • Learning goal
  • MapReduce basics
  • In this module, you will understand the MapReduce framework, how it works on HDFS and learn the basics of MapReduce programming and data flow (Basic Java knowledge will be required in the MapReduce modules)
  • Functional programming concepts
  • List processing
  • Mapping and reducing lists
  • Putting them together in MapReduce
  • Word Count example application
  • Understanding the driver, mapper and reducer
  • Closer look at MapReduce data flow
  • Additional MapReduce functionality
  • Fault tolerance
  • Hands-on exercises
  • Content
  • Learning goal
  • Hands-on work on MapReduce
  • This will be a complete hands-on module where you will work on several exercises in the class
  • Content
  • Learning goal
  • Understand combiners & partitioners
  • Learn advanced MapReduce algorithms to manage and manipulate data including unstructured data
  • Understand input and output formats
  • Distributed cache
  • Understanding counters
  • Chaining, listing and killing jobs
  • Hands-On Exercise
  • Content
  • Learning goal
  • Pig program structure and execution process
  • Hive is a data warehouse software for managing and querying large scale datasets. It uses a SQL like language, HiveQL to query the data. Pig is a platform to analyse large data sets through a high level language. In this module you will focus on learning both to query and analyse large amounts of data stored in distributed storage systems.
  • Joins & filtering using Pig
  • Group & co-group
  • Schema merging and redefining functions
  • Pig functions
  • Understanding Hive
  • Using Hive command line interface
  • Data types and file formats
  • Basic DDL operations
  • Schema design
  • Hands-on examples
  • Content
  • Learning goal
  • HBase overview, architecture & installation
  • HBase is a column-oriented database management system that runs on top of HDFS. Sqoop is a tool designed to transfer data between Hadoop and relational databases. Learn the basics of HBase, Zookeeper and Sqoop in this module.
  • HBase admin: test
  • HBase data access
  • Overview of Zookeeper
  • Sqoop overview and installation
  • Importing and exporting data in Sqoop
  • Hands-on exercise
  • Content
  • Learning goal
  • Overview of Oozie and Flume
  • Oozie is a workflow scheduler system to manage Hadoop jobs.Flume is a service for efficiently collecting, aggregating, and moving large amounts of streaming data into HDFS. YARN is the 2nd generation of MapReduce while HDFS federation is the 2nd generation of HDFS. Learn the basics of Oozie and Flume and Hadoop 2.2 in this module
  • Oozie features and challenges
  • How does Flume work
  • Connecting Flume with HDFS
  • YARN
  • HDFS Federation
  • Authentication and high availability in Hadoop
  • Content
  • Learning goal
  • Designing structures for POC
  • In this module you will work hands-on a proof of concept (POC) for analysis of large amount of web log data. You will also discuss the project you will work on.
  • Developing MapReduce code
  • Push data using Flume into HDFS
  • Run MapReduce code
  • Analyse the output
  • Project

©2015 ERBrains | All rights reserved.

]