Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
project:rpihadoop [2017/10/31 08:42]
licho [Reference]
project:rpihadoop [2017/12/21 08:58] (current)
licho [Zpracovani Dat]
Line 11: Line 11:
   - Data Analysis   - Data Analysis
 {{project:​hadoop-data-analysis-arch.png}} {{project:​hadoop-data-analysis-arch.png}}
 +
 +{{project:​kafka-spark.jpg?​650}}
 ==== HDFS ==== ==== HDFS ====
 {{project:​hadoop-hdfs-arch.png?​550}} {{project:​hadoop-hdfs-arch.png?​550}}
Line 398: Line 400:
 mkdir -p /​opt/​hadoop_tmp/​hdfs/​datanode mkdir -p /​opt/​hadoop_tmp/​hdfs/​datanode
 chown -R hduser:​hadoop /​opt/​hadoop_tmp chown -R hduser:​hadoop /​opt/​hadoop_tmp
-chmod -R 750 /​opt/​hadoop_tmp +chmod -R 750 /​opt/​hadoop_tmp</​code>​ 
-/​opt/​hadoop-2.7.4/​bin/​hdfs namenode -format</​code>​ +  - **Spusteni ''​hdfs'' ​z master nodu:​**<​code>​/​opt/​hadoop-2.7.4/​bin/​hdfs namenode -format
-  - **Spusteni ''​hdfs'':​**<​code>​+
 /​opt/​hadoop-2.7.4/​sbin/​start-dfs.sh /​opt/​hadoop-2.7.4/​sbin/​start-dfs.sh
 curl  http://​hadoop-rpi1.labka.cz:​50070/​ curl  http://​hadoop-rpi1.labka.cz:​50070/​
Line 684: Line 685:
   * **Disk Space** – Sufficient disk space for configurations used by channels or sinks. ​   * **Disk Space** – Sufficient disk space for configurations used by channels or sinks. ​
   * **Directory Permissions** – Read/Write permissions for directories used by agent. ​   * **Directory Permissions** – Read/Write permissions for directories used by agent. ​
-== Apache ​Flume Installation == +==Flume Installation ===
   - **Download** latest stable release of apache flume binary distribution from apache download mirrors at [[http://​flume.apache.org/​download.html|Flume Download]]. At the time of writing this post, apache-flume-1.5.0 is the latest version and the same (''​apache-flume-1.5.0.1-bin.tar.gz''​) is used for installation in this post.   - **Download** latest stable release of apache flume binary distribution from apache download mirrors at [[http://​flume.apache.org/​download.html|Flume Download]]. At the time of writing this post, apache-flume-1.5.0 is the latest version and the same (''​apache-flume-1.5.0.1-bin.tar.gz''​) is used for installation in this post.
-  - **Copy** the ''​apache-flume-1.5.0.1-bin.tar.gz''​from downloads folder to our preferred flume installation directory, usually into ''/​usr/​lib/​flume''​*and**unpack**the tarball. Below are the set of commands to perform these activities. Flume installation Shell <​code>​ $ sudo mkdir /​usr/​lib/​flume $ sudo chmod -R 777 /​usr/​lib/​flume+  - **Copy** the ''​apache-flume-1.5.0.1-bin.tar.gz''​from downloads folder to our preferred flume installation directory, usually into ''/​usr/​lib/​flume''​ and **unpack** the tarball. Below are the set of commands to perform these activities. Flume installation Shell <​code>​ $ sudo mkdir /​usr/​lib/​flume $ sudo chmod -R 777 /​usr/​lib/​flume
 $ cp apache-flume-1.5.0.1-bin.tar.gz /​usr/​lib/​flume/​ $ cp apache-flume-1.5.0.1-bin.tar.gz /​usr/​lib/​flume/​
 $ cd /​usr/​lib/​flume $ cd /​usr/​lib/​flume
 $ tar -xzf apache-flume-1.5.0.1-bin.tar.gz</​code>​ $ tar -xzf apache-flume-1.5.0.1-bin.tar.gz</​code>​
   - **Set ''​FLUME_HOME'',​ ''​FLUME_CONF_DIR''​** environment variables in ''​.bashrc''​ file as shown below and add the Flume bin directory to ''​PATH''​ environment variable. Shell:<​code>​$ vi ~/​.bashrc</​code>​   - **Set ''​FLUME_HOME'',​ ''​FLUME_CONF_DIR''​** environment variables in ''​.bashrc''​ file as shown below and add the Flume bin directory to ''​PATH''​ environment variable. Shell:<​code>​$ vi ~/​.bashrc</​code>​
-  - In FLUME_CONF_DIR directory, rename flume-env.sh.template file to **flume-env.sh** and provide value for JAVA_HOME environment variable with Java installation directory.  +  - **Edit:​** ​In ''​FLUME_CONF_DIR'' ​directory, rename flume-env.sh.template file to ''​flume-env.sh'' ​and provide value for ''​JAVA_HOME'' ​environment variable with Java installation directory.  
-  - If we are going to use **memory channels** while setting flume agents, it is preferable to increase the memory limits in **JAVA_OPTS** variable. By default, the minimum and maximum memory values are 100 MB and 200 MB respectively (Xms100m -Xmx200m). Better to increase these limits to **500 MB** and **1000 MB** respectively. Shell: <​code>​JAVA_HOME="​cesta"​ +  - If we are going to use **memory channels** while setting flume agents, it is preferable to increase the memory limits in ''​JAVA_OPTS'' ​variable. By default, the minimum and maximum memory values are 100 MB and 200 MB respectively (Xms100m -Xmx200m). Better to increase these limits to **500 MB** and **1000 MB** respectively. Shell: <​code>​JAVA_HOME="​cesta"​ 
-JAVAOPTS="​-Xms200m ​-Xmx800m ​-Dcom.sun/​management.jmxremote"</​code>​ +JAVAOPTS="​-Xms500m ​-Xmx1000m ​-Dcom.sun/​management.jmxremote"</​code>​ 
-  - With these settings, we can consider flume installation as completed. +  - **Work done:​** ​With these settings, we can consider flume installation as completed. 
-  - We can verify the flume installation with**$ flume-ng –help** command on terminal. If we get output similar to below then flume installation is successful. +  - **Verification:​** ​We can verify the flume installation with<​code>​$ flume-ng –help</​code> ​command on terminal. If we get output similar to below then flume installation is successful. 
-  +
 ==== Krok 7: Oozie ===== ==== Krok 7: Oozie =====
-http://​www.rohitmenon.com/​index.php/apache-oozie-installation/+== Prerequisite:​ == 
 +  * **Hadoop 2** is installed on our machine.  
 +=== Oozie Installation === 
 +My Hadoop Location : /​opt/​hadoop-2.7.4 
 + 
 +  - From your home directory execute the following commands (my home directory is /​home/​hduser):<​code>​$ pwd 
 +/​home/​hduser</​code>​ 
 +  - **Download Oozie: **<​code>​$ wget http://​supergsego.com/​apache/​oozie/​3.3.2/​oozie-3.3.2.tar.gz</​code>​ 
 +  - **Untar: **<​code>​$ tar xvzf oozie-3.3.2.tar.gz</​code>​ 
 +  - **Build Oozie** <​code>​$ cd oozie-3.3.2/​bin 
 +$ ./​mkdistro.sh -DskipTests</​code>​ 
 +=== Oozie Server Setup === 
 +  - Copy the built binaries to the home directory as ‘oozie’<​code>​$ cd ../../ 
 +$ cp -R oozie-3.3.2/​distro/​target/​oozie-3.3.2-distro/​oozie-3.3.2/​ oozie</​code>​ 
 +  - Create the required libext directory<​code>​$ cd oozie 
 +$ mkdir libext</​code>​ 
 +  - Copy all the required jars from hadooplibs to the libext directory using the following command:<​code>​$ cp ../​oozie-3.3.2/​hadooplibs/​target/​oozie-3.3.2-hadooplibs.tar.gz . 
 +$ tar xzvf oozie-3.3.2-hadooplibs.tar.gz 
 +$ cp oozie-3.3.2/​hadooplibs/​hadooplib-1.1.1.oozie-3.3.2/​* libext/</​code>​ 
 +  - Get Ext2Js – This library is not bundled with Oozie and needs to be downloaded separately. This library is used for the Oozie Web Console:<​code>​$ cd libext 
 +$ wget http://​extjs.com/​deploy/​ext-2.2.zip 
 +$ cd ..</​code>​ 
 +  - Update **../​hadoop/​conf/​core-site.xml** as follows:<​code><​property>​ 
 +<​name>​hadoop.proxyuser.hduser.hosts</​name>​ 
 +<​value>​localhost</​value>​ 
 +</​property>​ 
 +<​property>​ 
 +<​name>​hadoop.proxyuser.hduser.groups</​name>​ 
 +<​value>​hadoop</​value>​ 
 +</​property></​code>​ 
 +  - Here, ‘hduser’ is the username and it belongs to ‘hadoop’ group. 
 +  - Prepare the WAR file<​code>​$ ./​bin/​oozie-setup.sh prepare-war 
 + 
 +setting CATALINA_OPTS="​$CATALINA_OPTS -Xmx1024m"​ 
 + 
 +INFO: Adding extension: /​home/​hduser/​oozie/​libext/​commons-beanutils-1.7.0.jar 
 +INFO: Adding extension: /​home/​hduser/​oozie/​libext/​commons-beanutils-core-1.8.0.jar 
 +INFO: Adding extension: /​home/​hduser/​oozie/​libext/​commons-codec-1.4.jar 
 +INFO: Adding extension: /​home/​hduser/​oozie/​libext/​commons-collections-3.2.1.jar 
 +INFO: Adding extension: /​home/​hduser/​oozie/​libext/​commons-configuration-1.6.jar 
 +INFO: Adding extension: /​home/​hduser/​oozie/​libext/​commons-digester-1.8.jar 
 +INFO: Adding extension: /​home/​hduser/​oozie/​libext/​commons-el-1.0.jar 
 +INFO: Adding extension: /​home/​hduser/​oozie/​libext/​commons-io-2.1.jar 
 +INFO: Adding extension: /​home/​hduser/​oozie/​libext/​commons-lang-2.4.jar 
 +INFO: Adding extension: /​home/​hduser/​oozie/​libext/​commons-logging-1.1.jar 
 +INFO: Adding extension: /​home/​hduser/​oozie/​libext/​commons-math-2.1.jar 
 +INFO: Adding extension: /​home/​hduser/​oozie/​libext/​commons-net-1.4.1.jar 
 +INFO: Adding extension: /​home/​hduser/​oozie/​libext/​hadoop-client-1.1.1.jar 
 +INFO: Adding extension: /​home/​hduser/​oozie/​libext/​hadoop-core-1.1.1.jar 
 +INFO: Adding extension: /​home/​hduser/​oozie/​libext/​hsqldb-1.8.0.7.jar 
 +INFO: Adding extension: /​home/​hduser/​oozie/​libext/​jackson-core-asl-1.8.8.jar 
 +INFO: Adding extension: /​home/​hduser/​oozie/​libext/​jackson-mapper-asl-1.8.8.jar 
 +INFO: Adding extension: /​home/​hduser/​oozie/​libext/​log4j-1.2.16.jar 
 +INFO: Adding extension: /​home/​hduser/​oozie/​libext/​oro-2.0.8.jar 
 +INFO: Adding extension: /​home/​hduser/​oozie/​libext/​xmlenc-0.52.jar 
 + 
 +New Oozie WAR file with added 'ExtJS library, JARs' at /​home/​hduser/​oozie/​oozie-server/​webapps/​oozie.war 
 + 
 +INFO: Oozie is ready to be started</​code>​ 
 +  - Create sharelib on HDFS<​code>​$ ./​bin/​oozie-setup.sh sharelib create -fs hdfs://​localhost:​54310 
 +setting CATALINA_OPTS="​$CATALINA_OPTS -Xmx1024m"​ 
 +the destination path for sharelib is: /​user/​hduser/​share/​lib</​code>​ 
 +  - Create the OoozieDB<​code>​$ ./​bin/​ooziedb.sh create -sqlfile oozie.sql -run 
 +setting CATALINA_OPTS="​$CATALINA_OPTS -Xmx1024m"​ 
 + 
 +Validate DB Connection 
 +DONE 
 +Check DB schema does not exist 
 +DONE 
 +Check OOZIE_SYS table does not exist 
 +DONE 
 +Create SQL schema 
 +DONE 
 +Create OOZIE_SYS table 
 +DONE 
 + 
 +Oozie DB has been created for Oozie version '​3.3.2'​ 
 + 
 +The SQL commands have been written to: oozie.sql</​code>​ 
 +  - To start Oozie as a daemon use the following command:<​code>​$ ./​bin/​oozied.sh start 
 + 
 +Setting OOZIE_HOME: /​home/​hduser/​oozie 
 +Setting OOZIE_CONFIG:​ /​home/​hduser/​oozie/​conf 
 +Sourcing: /​home/​hduser/​oozie/​conf/​oozie-env.sh 
 +setting CATALINA_OPTS="​$CATALINA_OPTS -Xmx1024m"​ 
 +Setting OOZIE_CONFIG_FILE:​ oozie-site.xml 
 +Setting OOZIE_DATA: /​home/​hduser/​oozie/​data 
 +Setting OOZIE_LOG: /​home/​hduser/​oozie/​logs 
 +Setting OOZIE_LOG4J_FILE:​ oozie-log4j.properties 
 +Setting OOZIE_LOG4J_RELOAD:​ 10 
 +Setting OOZIE_HTTP_HOSTNAME:​ rohit-VirtualBox 
 +Setting OOZIE_HTTP_PORT:​ 11000 
 +Setting OOZIE_ADMIN_PORT:​ 11001 
 +Setting OOZIE_HTTPS_PORT:​ 11443 
 +Setting OOZIE_BASE_URL:​ http://​rohit-VirtualBox:​11000/​oozie 
 +Setting CATALINA_BASE:​ /​home/​hduser/​oozie/​oozie-server 
 +Setting OOZIE_HTTPS_KEYSTORE_FILE:​ /​home/​hduser/​.keystore 
 +Setting OOZIE_HTTPS_KEYSTORE_PASS:​ password 
 +Setting CATALINA_OUT:​ /​home/​hduser/​oozie/​logs/​catalina.out 
 +Setting CATALINA_PID:​ /​home/​hduser/​oozie/​oozie-server/​temp/​oozie.pid 
 + 
 +Using CATALINA_OPTS:​ -Xmx1024m -Dderby.stream.error.file=/​home/​hduser/​oozie/​logs/​derby.log 
 +Adding to CATALINA_OPTS:​ -Doozie.home.dir=/​home/​hduser/​oozie -Doozie.config.dir=/​home/​hduser/​oozie/​conf -Doozie.log.dir=/​home/​hduser/​oozie/​logs -Doozie.data.dir=/​home/​hduser/​oozie/​data -Doozie.config.file=oozie-site.xml -Doozie.log4j.file=oozie-log4j.properties -Doozie.log4j.reload=10 -Doozie.http.hostname=rohit-VirtualBox -Doozie.admin.port=11001 -Doozie.http.port=11000 -Doozie.https.port=11443 -Doozie.base.url=http://​rohit-VirtualBox:​11000/​oozie -Doozie.https.keystore.file=/​home/​hduser/​.keystore -Doozie.https.keystore.pass=password -Djava.library.path= 
 + 
 +Using CATALINA_BASE:​ /​home/​hduser/​oozie/​oozie-server 
 +Using CATALINA_HOME:​ /​home/​hduser/​oozie/​oozie-server 
 +Using CATALINA_TMPDIR:​ /​home/​hduser/​oozie/​oozie-server/​temp 
 +Using JRE_HOME: /​usr/​lib/​jvm/​java-6-oracle 
 +Using CLASSPATH: /​home/​hduser/​oozie/​oozie-server/​bin/​bootstrap.jar 
 +Using CATALINA_PID:​ /​home/​hduser/​oozie/​oozie-server/​temp/​oozie.pid</​code>​ 
 + 
 +  - To start Oozie as a foreground process use the following command:<​code>​$ ./​bin/​oozied.sh run</​code>​ Check the Oozie log file logs/​oozie.log to ensure Oozie started properly. 
 +  - Use the following command to check the status of Oozie from command line:<​code>​$ ./bin/oozie admin -oozie http://​localhost:​11000/​oozie -status 
 +System mode: NORMAL</​code>​ 
 +  - URL for the Oozie Web Console is [[http://​localhost:​11000/​oozie|Oozie Web Console]]{{http://​www.rohitmenon.com/​wp-content/​uploads/​2013/​12/​OozieWebConsole.png|Oozie Web Console}} 
 +=== Oozie Client Setup === 
 +  - **Instalation:​ **<​code>​$ cd .. 
 +$ cp oozie/oozie-client-3.3.2.tar.gz . 
 +$ tar xvzf oozie-client-3.3.2.tar.gz 
 +$ mv oozie-client-3.3.2 oozie-client 
 +$ cd bin</code> 
 +  - Add the **/​home/​hduser/​oozie-client/​bin** to ''​PATH''​ in .bashrc and restart your terminal. 
 +  - Your Oozie Server and Client setup on a single node cluster is now ready. In the next post, we will configure and schedule some Oozie workflows.
  
 ==== Krok 8: Zookeeper ===== ==== Krok 8: Zookeeper =====
  • project/rpihadoop.1509435744.txt.gz
  • Last modified: 2017/10/31 08:42
  • by licho