Differences
This shows you the differences between two versions of the page.
| Both sides previous revision Previous revision Next revision | Previous revision | ||
|
project:rpihadoop [2017/10/31 08:42] licho [Reference] |
project:rpihadoop [2017/12/21 08:58] (current) licho [Zpracovani Dat] |
||
|---|---|---|---|
| Line 11: | Line 11: | ||
| - Data Analysis | - Data Analysis | ||
| {{project:hadoop-data-analysis-arch.png}} | {{project:hadoop-data-analysis-arch.png}} | ||
| + | |||
| + | {{project:kafka-spark.jpg?650}} | ||
| ==== HDFS ==== | ==== HDFS ==== | ||
| {{project:hadoop-hdfs-arch.png?550}} | {{project:hadoop-hdfs-arch.png?550}} | ||
| Line 398: | Line 400: | ||
| mkdir -p /opt/hadoop_tmp/hdfs/datanode | mkdir -p /opt/hadoop_tmp/hdfs/datanode | ||
| chown -R hduser:hadoop /opt/hadoop_tmp | chown -R hduser:hadoop /opt/hadoop_tmp | ||
| - | chmod -R 750 /opt/hadoop_tmp | + | chmod -R 750 /opt/hadoop_tmp</code> |
| - | /opt/hadoop-2.7.4/bin/hdfs namenode -format</code> | + | - **Spusteni ''hdfs'' z master nodu:**<code>/opt/hadoop-2.7.4/bin/hdfs namenode -format |
| - | - **Spusteni ''hdfs'':**<code> | + | |
| /opt/hadoop-2.7.4/sbin/start-dfs.sh | /opt/hadoop-2.7.4/sbin/start-dfs.sh | ||
| curl http://hadoop-rpi1.labka.cz:50070/ | curl http://hadoop-rpi1.labka.cz:50070/ | ||
| Line 684: | Line 685: | ||
| * **Disk Space** – Sufficient disk space for configurations used by channels or sinks. | * **Disk Space** – Sufficient disk space for configurations used by channels or sinks. | ||
| * **Directory Permissions** – Read/Write permissions for directories used by agent. | * **Directory Permissions** – Read/Write permissions for directories used by agent. | ||
| - | == Apache Flume Installation == | + | === Flume Installation === |
| - **Download** latest stable release of apache flume binary distribution from apache download mirrors at [[http://flume.apache.org/download.html|Flume Download]]. At the time of writing this post, apache-flume-1.5.0 is the latest version and the same (''apache-flume-1.5.0.1-bin.tar.gz'') is used for installation in this post. | - **Download** latest stable release of apache flume binary distribution from apache download mirrors at [[http://flume.apache.org/download.html|Flume Download]]. At the time of writing this post, apache-flume-1.5.0 is the latest version and the same (''apache-flume-1.5.0.1-bin.tar.gz'') is used for installation in this post. | ||
| - | - **Copy** the ''apache-flume-1.5.0.1-bin.tar.gz''from downloads folder to our preferred flume installation directory, usually into ''/usr/lib/flume''*and**unpack**the tarball. Below are the set of commands to perform these activities. Flume installation Shell <code> $ sudo mkdir /usr/lib/flume $ sudo chmod -R 777 /usr/lib/flume | + | - **Copy** the ''apache-flume-1.5.0.1-bin.tar.gz''from downloads folder to our preferred flume installation directory, usually into ''/usr/lib/flume'' and **unpack** the tarball. Below are the set of commands to perform these activities. Flume installation Shell <code> $ sudo mkdir /usr/lib/flume $ sudo chmod -R 777 /usr/lib/flume |
| $ cp apache-flume-1.5.0.1-bin.tar.gz /usr/lib/flume/ | $ cp apache-flume-1.5.0.1-bin.tar.gz /usr/lib/flume/ | ||
| $ cd /usr/lib/flume | $ cd /usr/lib/flume | ||
| $ tar -xzf apache-flume-1.5.0.1-bin.tar.gz</code> | $ tar -xzf apache-flume-1.5.0.1-bin.tar.gz</code> | ||
| - **Set ''FLUME_HOME'', ''FLUME_CONF_DIR''** environment variables in ''.bashrc'' file as shown below and add the Flume bin directory to ''PATH'' environment variable. Shell:<code>$ vi ~/.bashrc</code> | - **Set ''FLUME_HOME'', ''FLUME_CONF_DIR''** environment variables in ''.bashrc'' file as shown below and add the Flume bin directory to ''PATH'' environment variable. Shell:<code>$ vi ~/.bashrc</code> | ||
| - | - In FLUME_CONF_DIR directory, rename flume-env.sh.template file to **flume-env.sh** and provide value for JAVA_HOME environment variable with Java installation directory. | + | - **Edit:** In ''FLUME_CONF_DIR'' directory, rename flume-env.sh.template file to ''flume-env.sh'' and provide value for ''JAVA_HOME'' environment variable with Java installation directory. |
| - | - If we are going to use **memory channels** while setting flume agents, it is preferable to increase the memory limits in **JAVA_OPTS** variable. By default, the minimum and maximum memory values are 100 MB and 200 MB respectively (Xms100m -Xmx200m). Better to increase these limits to **500 MB** and **1000 MB** respectively. Shell: <code>JAVA_HOME="cesta" | + | - If we are going to use **memory channels** while setting flume agents, it is preferable to increase the memory limits in ''JAVA_OPTS'' variable. By default, the minimum and maximum memory values are 100 MB and 200 MB respectively (Xms100m -Xmx200m). Better to increase these limits to **500 MB** and **1000 MB** respectively. Shell: <code>JAVA_HOME="cesta" |
| - | JAVAOPTS="-Xms200m -Xmx800m -Dcom.sun/management.jmxremote"</code> | + | JAVAOPTS="-Xms500m -Xmx1000m -Dcom.sun/management.jmxremote"</code> |
| - | - With these settings, we can consider flume installation as completed. | + | - **Work done:** With these settings, we can consider flume installation as completed. |
| - | - We can verify the flume installation with**$ flume-ng –help** command on terminal. If we get output similar to below then flume installation is successful. | + | - **Verification:** We can verify the flume installation with<code>$ flume-ng –help</code> command on terminal. If we get output similar to below then flume installation is successful. |
| - | + | ||
| ==== Krok 7: Oozie ===== | ==== Krok 7: Oozie ===== | ||
| - | http://www.rohitmenon.com/index.php/apache-oozie-installation/ | + | == Prerequisite: == |
| + | * **Hadoop 2** is installed on our machine. | ||
| + | === Oozie Installation === | ||
| + | My Hadoop Location : /opt/hadoop-2.7.4 | ||
| + | |||
| + | - From your home directory execute the following commands (my home directory is /home/hduser):<code>$ pwd | ||
| + | /home/hduser</code> | ||
| + | - **Download Oozie: **<code>$ wget http://supergsego.com/apache/oozie/3.3.2/oozie-3.3.2.tar.gz</code> | ||
| + | - **Untar: **<code>$ tar xvzf oozie-3.3.2.tar.gz</code> | ||
| + | - **Build Oozie** <code>$ cd oozie-3.3.2/bin | ||
| + | $ ./mkdistro.sh -DskipTests</code> | ||
| + | === Oozie Server Setup === | ||
| + | - Copy the built binaries to the home directory as ‘oozie’<code>$ cd ../../ | ||
| + | $ cp -R oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/ oozie</code> | ||
| + | - Create the required libext directory<code>$ cd oozie | ||
| + | $ mkdir libext</code> | ||
| + | - Copy all the required jars from hadooplibs to the libext directory using the following command:<code>$ cp ../oozie-3.3.2/hadooplibs/target/oozie-3.3.2-hadooplibs.tar.gz . | ||
| + | $ tar xzvf oozie-3.3.2-hadooplibs.tar.gz | ||
| + | $ cp oozie-3.3.2/hadooplibs/hadooplib-1.1.1.oozie-3.3.2/* libext/</code> | ||
| + | - Get Ext2Js – This library is not bundled with Oozie and needs to be downloaded separately. This library is used for the Oozie Web Console:<code>$ cd libext | ||
| + | $ wget http://extjs.com/deploy/ext-2.2.zip | ||
| + | $ cd ..</code> | ||
| + | - Update **../hadoop/conf/core-site.xml** as follows:<code><property> | ||
| + | <name>hadoop.proxyuser.hduser.hosts</name> | ||
| + | <value>localhost</value> | ||
| + | </property> | ||
| + | <property> | ||
| + | <name>hadoop.proxyuser.hduser.groups</name> | ||
| + | <value>hadoop</value> | ||
| + | </property></code> | ||
| + | - Here, ‘hduser’ is the username and it belongs to ‘hadoop’ group. | ||
| + | - Prepare the WAR file<code>$ ./bin/oozie-setup.sh prepare-war | ||
| + | |||
| + | setting CATALINA_OPTS="$CATALINA_OPTS -Xmx1024m" | ||
| + | |||
| + | INFO: Adding extension: /home/hduser/oozie/libext/commons-beanutils-1.7.0.jar | ||
| + | INFO: Adding extension: /home/hduser/oozie/libext/commons-beanutils-core-1.8.0.jar | ||
| + | INFO: Adding extension: /home/hduser/oozie/libext/commons-codec-1.4.jar | ||
| + | INFO: Adding extension: /home/hduser/oozie/libext/commons-collections-3.2.1.jar | ||
| + | INFO: Adding extension: /home/hduser/oozie/libext/commons-configuration-1.6.jar | ||
| + | INFO: Adding extension: /home/hduser/oozie/libext/commons-digester-1.8.jar | ||
| + | INFO: Adding extension: /home/hduser/oozie/libext/commons-el-1.0.jar | ||
| + | INFO: Adding extension: /home/hduser/oozie/libext/commons-io-2.1.jar | ||
| + | INFO: Adding extension: /home/hduser/oozie/libext/commons-lang-2.4.jar | ||
| + | INFO: Adding extension: /home/hduser/oozie/libext/commons-logging-1.1.jar | ||
| + | INFO: Adding extension: /home/hduser/oozie/libext/commons-math-2.1.jar | ||
| + | INFO: Adding extension: /home/hduser/oozie/libext/commons-net-1.4.1.jar | ||
| + | INFO: Adding extension: /home/hduser/oozie/libext/hadoop-client-1.1.1.jar | ||
| + | INFO: Adding extension: /home/hduser/oozie/libext/hadoop-core-1.1.1.jar | ||
| + | INFO: Adding extension: /home/hduser/oozie/libext/hsqldb-1.8.0.7.jar | ||
| + | INFO: Adding extension: /home/hduser/oozie/libext/jackson-core-asl-1.8.8.jar | ||
| + | INFO: Adding extension: /home/hduser/oozie/libext/jackson-mapper-asl-1.8.8.jar | ||
| + | INFO: Adding extension: /home/hduser/oozie/libext/log4j-1.2.16.jar | ||
| + | INFO: Adding extension: /home/hduser/oozie/libext/oro-2.0.8.jar | ||
| + | INFO: Adding extension: /home/hduser/oozie/libext/xmlenc-0.52.jar | ||
| + | |||
| + | New Oozie WAR file with added 'ExtJS library, JARs' at /home/hduser/oozie/oozie-server/webapps/oozie.war | ||
| + | |||
| + | INFO: Oozie is ready to be started</code> | ||
| + | - Create sharelib on HDFS<code>$ ./bin/oozie-setup.sh sharelib create -fs hdfs://localhost:54310 | ||
| + | setting CATALINA_OPTS="$CATALINA_OPTS -Xmx1024m" | ||
| + | the destination path for sharelib is: /user/hduser/share/lib</code> | ||
| + | - Create the OoozieDB<code>$ ./bin/ooziedb.sh create -sqlfile oozie.sql -run | ||
| + | setting CATALINA_OPTS="$CATALINA_OPTS -Xmx1024m" | ||
| + | |||
| + | Validate DB Connection | ||
| + | DONE | ||
| + | Check DB schema does not exist | ||
| + | DONE | ||
| + | Check OOZIE_SYS table does not exist | ||
| + | DONE | ||
| + | Create SQL schema | ||
| + | DONE | ||
| + | Create OOZIE_SYS table | ||
| + | DONE | ||
| + | |||
| + | Oozie DB has been created for Oozie version '3.3.2' | ||
| + | |||
| + | The SQL commands have been written to: oozie.sql</code> | ||
| + | - To start Oozie as a daemon use the following command:<code>$ ./bin/oozied.sh start | ||
| + | |||
| + | Setting OOZIE_HOME: /home/hduser/oozie | ||
| + | Setting OOZIE_CONFIG: /home/hduser/oozie/conf | ||
| + | Sourcing: /home/hduser/oozie/conf/oozie-env.sh | ||
| + | setting CATALINA_OPTS="$CATALINA_OPTS -Xmx1024m" | ||
| + | Setting OOZIE_CONFIG_FILE: oozie-site.xml | ||
| + | Setting OOZIE_DATA: /home/hduser/oozie/data | ||
| + | Setting OOZIE_LOG: /home/hduser/oozie/logs | ||
| + | Setting OOZIE_LOG4J_FILE: oozie-log4j.properties | ||
| + | Setting OOZIE_LOG4J_RELOAD: 10 | ||
| + | Setting OOZIE_HTTP_HOSTNAME: rohit-VirtualBox | ||
| + | Setting OOZIE_HTTP_PORT: 11000 | ||
| + | Setting OOZIE_ADMIN_PORT: 11001 | ||
| + | Setting OOZIE_HTTPS_PORT: 11443 | ||
| + | Setting OOZIE_BASE_URL: http://rohit-VirtualBox:11000/oozie | ||
| + | Setting CATALINA_BASE: /home/hduser/oozie/oozie-server | ||
| + | Setting OOZIE_HTTPS_KEYSTORE_FILE: /home/hduser/.keystore | ||
| + | Setting OOZIE_HTTPS_KEYSTORE_PASS: password | ||
| + | Setting CATALINA_OUT: /home/hduser/oozie/logs/catalina.out | ||
| + | Setting CATALINA_PID: /home/hduser/oozie/oozie-server/temp/oozie.pid | ||
| + | |||
| + | Using CATALINA_OPTS: -Xmx1024m -Dderby.stream.error.file=/home/hduser/oozie/logs/derby.log | ||
| + | Adding to CATALINA_OPTS: -Doozie.home.dir=/home/hduser/oozie -Doozie.config.dir=/home/hduser/oozie/conf -Doozie.log.dir=/home/hduser/oozie/logs -Doozie.data.dir=/home/hduser/oozie/data -Doozie.config.file=oozie-site.xml -Doozie.log4j.file=oozie-log4j.properties -Doozie.log4j.reload=10 -Doozie.http.hostname=rohit-VirtualBox -Doozie.admin.port=11001 -Doozie.http.port=11000 -Doozie.https.port=11443 -Doozie.base.url=http://rohit-VirtualBox:11000/oozie -Doozie.https.keystore.file=/home/hduser/.keystore -Doozie.https.keystore.pass=password -Djava.library.path= | ||
| + | |||
| + | Using CATALINA_BASE: /home/hduser/oozie/oozie-server | ||
| + | Using CATALINA_HOME: /home/hduser/oozie/oozie-server | ||
| + | Using CATALINA_TMPDIR: /home/hduser/oozie/oozie-server/temp | ||
| + | Using JRE_HOME: /usr/lib/jvm/java-6-oracle | ||
| + | Using CLASSPATH: /home/hduser/oozie/oozie-server/bin/bootstrap.jar | ||
| + | Using CATALINA_PID: /home/hduser/oozie/oozie-server/temp/oozie.pid</code> | ||
| + | |||
| + | - To start Oozie as a foreground process use the following command:<code>$ ./bin/oozied.sh run</code> Check the Oozie log file logs/oozie.log to ensure Oozie started properly. | ||
| + | - Use the following command to check the status of Oozie from command line:<code>$ ./bin/oozie admin -oozie http://localhost:11000/oozie -status | ||
| + | System mode: NORMAL</code> | ||
| + | - URL for the Oozie Web Console is [[http://localhost:11000/oozie|Oozie Web Console]]{{http://www.rohitmenon.com/wp-content/uploads/2013/12/OozieWebConsole.png|Oozie Web Console}} | ||
| + | === Oozie Client Setup === | ||
| + | - **Instalation: **<code>$ cd .. | ||
| + | $ cp oozie/oozie-client-3.3.2.tar.gz . | ||
| + | $ tar xvzf oozie-client-3.3.2.tar.gz | ||
| + | $ mv oozie-client-3.3.2 oozie-client | ||
| + | $ cd bin</code> | ||
| + | - Add the **/home/hduser/oozie-client/bin** to ''PATH'' in .bashrc and restart your terminal. | ||
| + | - Your Oozie Server and Client setup on a single node cluster is now ready. In the next post, we will configure and schedule some Oozie workflows. | ||
| ==== Krok 8: Zookeeper ===== | ==== Krok 8: Zookeeper ===== | ||