| Courses Software Training | Locality Marathahalli Outer Ring Road |
Hadoop Administration: Cloudera and Hortonworks Distributions
Trainer Details: Sanyasi Naidu K
Overall IT experiences of 14 years.
1) Hadoop Hortonworks Certified Administrator (HDPCA v2.3),
2) VMware Certified Administrator (VCP v5.0)
3) RedHat Certified Administrator (RHEL 7.x/6.x/4.x)
Detailed Course content for each module
Introduction to Hadoop Enterprise Data Trends @ Scale What is Big Data? A Market for Big Data Characteristics of Big Data 3V 5V 7V s of Big Data Most Common New Types of Data Moving from Causation to Correlation What is Hadoop? And Why Hadoop? Traditional Systems vs. Hadoop What is Hadoop 2.0? Overview of a Hadoop Cluster and Core components of Hadoop Different distributions of Hadoop Hadoop Use Case Lab exercise :- Login to Your Cluster
Hadoop Architecture Characteristics of Hadoop (Fault tolerance, Replication, Block size, Robustness) What is node, Rack, Cluster, datacenter and Data Hub Mapreduce Architecture HDFS Architecture Understanding Block Storage Demonstration: Understanding Block Storage The NameNode The Data Nodes HDFS Clients
Installing Hadoop Cluster using Cloudera Manager Minimum Hardware Requirements Minimum Software Requirements A Formidable Starter Cluster Lab exercise :- Setting up the Environment Lab exercise :- Installing Cloudera Manager/Ambari and CDH/HDP Lab exercise :- Adding Services to Cluster
Configuring Hadoop Hadoop configuration files (core, hdfs.mapred,yarn-site.xml,bigtop_utils, master and slave files) Configuration Considerations Deployment Layout Configuring Hadoop Ports Configuring HDFS What Does the File System Check Look For? Replication Factor Understanding Hadoop Logs What is Cloudera Manager / Ambari Configuration via Cloudera Manager/Ambari Management Monitoring REST API and Thrift Server Overview Lab exercise :- Commissioning and Decommissioning of nodes Lab exercise :- Stopping and Starting CDH Services/HDP Services Lab exercise :- Using HDFS Commands, hadoop fsck and syntax and hadoop dfsadmin command
Ensuring Data Integrity Replication Placement Data Integrity Writing Data Data Integrity Reading Data Data Integrity Block Scanning Running a File System Check What Does the File System Check Look For? hadoop fsck Syntax Data Integrity File System Check: Commands & Output Hadoop dfsadmin Command NameNode Information Changing the Replication Factor Lab exercise :- Verify Data with Block Scanner and fsck
Mapreduce and YARN MapReduce Understanding MapReduce What is YARN? YARN Architecture ((RM, NM, AM, Container)) Hadoop as Next-Gen Platform Beyond MapReduce YARN Use Case Lifecycle of a YARN Application Configuring YARN Configuring MapReduce tools YARN application logs YARN CLI Lab exercise :- Troubleshooting a MapReduce Job
Job Schedulers Overview of Job Scheduling The Built-in Schedulers Overview of the Capacity Scheduler Configuring the Capacity Scheduler Defining Queues Configuring Capacity Limits Configuring User Limits Configuring Permissions Overview of the Fair Scheduler Multi-Tenancy Limits Lab exercise: Configuring the Capacity Scheduler
Enterprise Data Movement Backup and Recovery What should you backup? HDFS Snapshots HDFS Data Backups HDFS Data Automate & Restore Overview of BDR (Backup Disaster Recovery) Lab exercise :- Using HDFS Snapshots Managing Resources Configuring groups with Static Service Pools The Fair Scheduler Configuring Dynamic Resource Pools YARN Memory and CPU Settings
HDFS Web Services What is WebHDFS ? Setting up WebHDFS
Using WebHDFS WebHDFS Authentication Copying Files to HDFS Hadoop HDFS over HTTP Who Uses WebHCat REST API? Running WebHCat Using WebHCat Lab exercise : Using WebHDFS
Hive Administration Introduction and archetecture of Hive Comparing Hive with RDBMS Hive Components-- Hive MetaStore, HiveServer2, HCatalog Hive Clients-- beeline Hive Security HIVE & Impala Impala Query Scheduling Lab exercise :- Understanding Hive Tables
Sqoop Overview of Sqoop The Sqoop Import Tool Importing a Table Importing Specific Columns Importing from a Query The Sqoop Export Tool Exporting to a Table Lab exercise :- Using Sqoop
Flume Flume Introduction Installing Flume Flume Events Flume Sources Flume Channels Flume Channel Selectors Flume Channel Selector Flume Sinks Multiple Sinks Flume Interceptors Design Patters Configuring Individual Components Flume Netcat Source Example Flume Exec Source Example
Flume Configuration Monitoring Flume Lab exercise :- Install and Test Flume
Oozie Oozie Overview Oozie Components Jobs, Workflows, Coordinators, Bundles Workflow Actions and Decisions Oozie Job Submission Oozie Server Workflow Coordinator Oozie Console Interfaces to Oozie Oozie Server Configuration Oozie Scripts The Oozie CLI Using the Oozie CLI Submit Jobs through http Oozie Actions Oozie Metrics Lab exercise: Running an Oozie Workflow
HBASE Overview Why HBASE ? CAP theorem Architecture HBASE Components and Daemons HBASE Administration and Cluster Management Using HIVE and IMPALA with HBASE
Cluster Monitoring and Troubleshooting Cloudera Manager Monitoring Features Configuring Events and Alerts Monitoring Hadoop Clusters Troubleshooting Hadoop services Common Misconfigurations Monitoring Cluster services using Charts Using Trigger option Monitoring JVM Processes Understanding JVM Memory Eclipse Memory Analyzer JVM Memory Heap Dump
Java Management Extensions (JMX) Garbage Collection Tuning
Commissioning and De-commissioning of Cluster Nodes Decommissioning and Commissioning Nodes Decommissioning Nodes Steps for Decommissioning a Node Decommissioning Node States Steps for Commissioning a Node Balancer Balancer Threshold Setting Configuring Balancer Bandwidth Lab exercise :- Commissioning & Decommissioning Nodes
Backup and Recovery What should you backup? HDFS Snapshots HDFS Data Backups HDFS Data Automate & Restore Hive & Backup BDR (Backup Disaster Recovery) Lab exercise :- Using HDFS Snapshots Rack Awareness Rack Awareness YARN Rack Awareness Replica Placement Rack Topology Rack Topology Script Configuring the Rack Topology Script Lab exercise: Configuring Rack Awareness
Name Node High Availability NameNode Architecture Cloudera NameNode High Availability HDFS HA Components Understanding NameNode HA NameNodes in HA Failover Modes NameNode Architectures hdfs haadmin Command Protecting Metadata Repositories Lab exercise :- Configure NameNode High Availability using Cloudera Manager
Security in Hadoop Security Concepts Why Hadoop Security is required ? Kerberos Synopsis How it works? Enabling Kerberos via Cloudera ManagerLab exercise :- Installing and configuring Kerberos
Miscellaneous Overview of the following: - Kafka - Solr End of the document