This document is for openSUSE users who want to use Hadoop.
Environment setup note:
OS: openSUSE 11.2 ( sure, of course ^^)
HD: 80GB
Prepare two PC for single and cluster pratice.
You can set ip your own ip with your env.
Server1:
10.10.x.y server.digitalairlines.com server
Server2:
10.10.v.w server2.digitalairlines.com server2
Partition
User Admin
Software
Services ( Daemon)
For numerous use
Prepare the software
---------------------------------------- Pratice ------------------------------------------
Hadoop with single host
At Server1
Please login with max use password linux
Please notice shell promote is >
Step 1. Create ssh key for connet ssh without password
Use non-interactive method to create Server1 DSA key pair
>ssh-keygen -N '' -d -q -f ~/.ssh/id_dsa
copy public key for authorized_keys
>cp ~/.ssh/id_dsa.pub ~/.ssh/authorized_keys
>ssh-add ~/.ssh/id_dsa
Identity added: /root/.ssh/id_dsa (/root/.ssh/id_dsa)
Test connect to ssh without password -- with Key
>ssh localhost
The authenticity of host 'localhost (: :1)' can't be established.
RSA key fingerprint is 05:22:61:78:05:04:7e:d1:81:67:f2:d5:8a:42:bb:9f.
Are you sure you want to continue connecting (yes/no)? Please input yes
Logout SSH
>exit
Step 2. Instll Hadoop
Exarct Hadoop package(we prepare it at /opt/OSSF) -- please use sudo to do it
(Because regular user has no permission with /opt folder)
>sudo tar zxvf /opt/OSSF/hadoop-0.20.2.tar.gz -C /opt
It will ask to input root password, pleasure input linux
Change /opt/hadoop-0.20.2 owner to max, and the group belong users
> sudo chown -R max:users /opt/hadoop-0.20.2/
Create /var/hadoop Folder
> sudo mkdir /var/hadoop
Change /var/hadoop owner to max, and group belong users
> sudo chown -R max:users /var/hadoop/
Step 3. Set up Hadoop Configuration
3-1. Set up environment with hadoop-env.sh
>vi /opt/hadoop-0.20.2/conf/hadoop-env.sh
#Please add these settings ( Depend your env)
export JAVA_HOME=/usr/lib/jvm/java-1.6.0-sun
export HADOOP_HOME=/opt/hadoop-0.20.2
export HADOOP_CONF_DIR=/opt/hadoop-0.20.2/conf
3-2. add configuration with core-site.xml in <configuration> to </configuration>
you can copy and paste it ^^
>vi /opt/hadoop-0.20.2/conf/core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/var/hadoop/hadoop-\${user.name}</value>
</property>
</configuration>
3-3. add configuration with hdfs-site.xml ( Set up replication) in <configuration> to </configuration>
you can copy and paste it ^^
>vi /opt/hadoop-0.20.2/conf/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
3-4. add configuration with mapred-site.xml 內的( For JobTracker ) in <configuration> to </configuration>
you can copy and paste it ^^
>vi /opt/hadoop-0.20.2/conf/mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>
Step 4. Format HDFS
>/opt/hadoop-0.20.2/bin/hadoop namenode -format
10/07/20 00:51:13 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = server/127.0.0.2
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 0.20.2
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
************************************************************/
10/07/20 00:51:13 INFO namenode.FSNamesystem: fsOwner=max,users,video
10/07/20 00:51:13 INFO namenode.FSNamesystem: supergroup=supergroup
10/07/20 00:51:13 INFO namenode.FSNamesystem: isPermissionEnabled=true
10/07/20 00:51:14 INFO common.Storage: Image file of size 93 saved in 0 seconds.
10/07/20 00:51:14 INFO common.Storage: Storage directory /var/hadoop/hadoop-\max/dfs/name has been successfully formatted.
10/07/20 00:51:14 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at server/127.0.0.2
************************************************************/
Step 5. Start hadoop
>/opt/hadoop-0.20.2/bin/start-all.sh
starting namenode, logging to /opt/hadoop-0.20.2/logs/hadoop-max-namenode-server.out
localhost: starting datanode, logging to /opt/hadoop-0.20.2/logs/hadoop-max-datanode-server.out
localhost: starting secondarynamenode, logging to /opt/hadoop-0.20.2/logs/hadoop-max-secondarynamenode-server.out
starting jobtracker, logging to /opt/hadoop-0.20.2/logs/hadoop-max-jobtracker-server.out
localhost: starting tasktracker, logging to /opt/hadoop-0.20.2/logs/hadoop-max-tasktracker-server.out
Step 6. Check Hadoop Status
You can use mouse to link your computer
Hadoop Admin
http://localhost:50030
Hadoop Task Tracker
http://localhost:50060
Hadoop DFS
http://localhost:50070
Lab2 HDFS commands pratice
1.
show hadoop command help
>/opt/hadoop-0.20.2/bin/hadoop fs
use hadoop command to list HDFS
( But we don’t upload and file to HDFS, It will have error messages )
>/opt/hadoop-0.20.2/bin/hadoop fs -ls
2. Upload /opt/hadoop-0.20.2/conf Folder to HDFS and rename to input
Syntax is
#hadoop command upload Local-Dir HDFS-Folder-Name
>/opt/hadoop-0.20.2/bin/hadoop fs -put /opt/hadoop-0.20.2/conf input
3. Please check HDFS again
3-1 check the HDFS
> /opt/hadoop-0.20.2/bin/hadoop fs -ls
Found 1 items
drwxr-xr-x - max supergroup 0 2010-07-18 21:16 /user/max/input
If you don’t order the path , Default path is /user/username
you can use absolute path name too, for example
> /opt/hadoop-0.20.2/bin/hadoop fs -ls /user/max/
Tips: You can check the /var/hadoop folder before / after you upload to HDFS
(You can see some change to your folder with your localhost)
>ls -lh /var/hadoop/hadoop-\\max/dfs/data/current/
3-2 List input folder on HDFS
>/opt/hadoop-0.20.2/bin/hadoop fs -ls input
Found 13 items
-rw-r--r-- 1 max supergroup 3936 2010-07-21 16:00 /user/max/input/capacity-scheduler.xml
-rw-r--r-- 1 max supergroup 535 2010-07-21 16:00 /user/max/input/configuration.xsl
-rw-r--r-- 1 max supergroup 379 2010-07-21 16:00 /user/max/input/core-site.xml
-rw-r--r-- 1 max supergroup 2367 2010-07-21 16:00 /user/max/input/hadoop-env.sh
-rw-r--r-- 1 max supergroup 1245 2010-07-21 16:00 /user/max/input/hadoop-metrics.properties
-rw-r--r-- 1 max supergroup 4190 2010-07-21 16:00 /user/max/input/hadoop-policy.xml
-rw-r--r-- 1 max supergroup 254 2010-07-21 16:00 /user/max/input/hdfs-site.xml
-rw-r--r-- 1 max supergroup 2815 2010-07-21 16:00 /user/max/input/log4j.properties
-rw-r--r-- 1 max supergroup 270 2010-07-21 16:00 /user/max/input/mapred-site.xml
-rw-r--r-- 1 max supergroup 10 2010-07-21 16:00 /user/max/input/masters
-rw-r--r-- 1 max supergroup 10 2010-07-21 16:00 /user/max/input/slaves
-rw-r--r-- 1 max supergroup 1243 2010-07-21 16:00 /user/max/input/ssl-client.xml.example
-rw-r--r-- 1 max supergroup 1195 2010-07-21 16:00 /user/max/input/ssl-server.xml.example
4. Download files from HDFS to local
Please check your local folder first
>ls
Use command “ hadoop fs -get “ to download it
>/opt/hadoop-0.20.2/bin/hadoop fs -get input fromHDFS
Please check your local folder again
>ls
5. Use -cat to check the file on HDFS
>/opt/hadoop-0.20.2/bin/hadoop fs -cat input/slaves
localhost
6. Delete files on HDFS with -rm ( with directory please use -rmr )
Check input Folder’s files first, you will see /user/max/input/slaves exist
>> /opt/hadoop-0.20.2/bin/hadoop fs -ls /user/max/input
Found 13 items
-rw-r--r-- 1 max supergroup 3936 2010-07-21 16:00 /user/max/input/capacity-scheduler.xml
-rw-r--r-- 1 max supergroup 535 2010-07-21 16:00 /user/max/input/configuration.xsl
-rw-r--r-- 1 max supergroup 379 2010-07-21 16:00 /user/max/input/core-site.xml
-rw-r--r-- 1 max supergroup 2367 2010-07-21 16:00 /user/max/input/hadoop-env.sh
-rw-r--r-- 1 max supergroup 1245 2010-07-21 16:00 /user/max/input/hadoop-metrics.properties
-rw-r--r-- 1 max supergroup 4190 2010-07-21 16:00 /user/max/input/hadoop-policy.xml
-rw-r--r-- 1 max supergroup 254 2010-07-21 16:00 /user/max/input/hdfs-site.xml
-rw-r--r-- 1 max supergroup 2815 2010-07-21 16:00 /user/max/input/log4j.properties
-rw-r--r-- 1 max supergroup 270 2010-07-21 16:00 /user/max/input/mapred-site.xml
-rw-r--r-- 1 max supergroup 10 2010-07-21 16:00 /user/max/input/masters
-rw-r--r-- 1 max supergroup 10 2010-07-21 16:00 /user/max/input/slaves
-rw-r--r-- 1 max supergroup 1243 2010-07-21 16:00 /user/max/input/ssl-client.xml.example
-rw-r--r-- 1 max supergroup 1195 2010-07-21 16:00 /user/max/input/ssl-server.xml.example
Use hadoop fs -rm to delete file with name slaves
>/opt/hadoop-0.20.2/bin/hadoop fs -rm input/slaves
Deleted hdfs://localhost:9000/user/max/input/slaves
Check input Folder’s files again, you will see /user/max/input/slaves not exist
>> /opt/hadoop-0.20.2/bin/hadoop fs -ls /user/max/input
Found 12 items
-rw-r--r-- 1 max supergroup 3936 2010-07-22 15:08 /user/max/input/capacity-scheduler.xml
-rw-r--r-- 1 max supergroup 535 2010-07-22 15:08 /user/max/input/configuration.xsl
-rw-r--r-- 1 max supergroup 379 2010-07-22 15:08 /user/max/input/core-site.xml
-rw-r--r-- 1 max supergroup 2367 2010-07-22 15:08 /user/max/input/hadoop-env.sh
-rw-r--r-- 1 max supergroup 1245 2010-07-22 15:08 /user/max/input/hadoop-metrics.properties
-rw-r--r-- 1 max supergroup 4190 2010-07-22 15:08 /user/max/input/hadoop-policy.xml
-rw-r--r-- 1 max supergroup 254 2010-07-22 15:08 /user/max/input/hdfs-site.xml
-rw-r--r-- 1 max supergroup 2815 2010-07-22 15:08 /user/max/input/log4j.properties
-rw-r--r-- 1 max supergroup 270 2010-07-22 15:08 /user/max/input/mapred-site.xml
-rw-r--r-- 1 max supergroup 10 2010-07-22 15:08 /user/max/input/masters
-rw-r--r-- 1 max supergroup 1243 2010-07-22 15:08 /user/max/input/ssl-client.xml.example
-rw-r--r-- 1 max supergroup 1195 2010-07-22 15:08 /user/max/input/ssl-server.xml.example
Use hadoop fs -rmr to delete folder
>/opt/hadoop-0.20.2/bin/hadoop fs -rm input
Deleted hdfs://localhost:9000/user/max/input
Lab 3 Hadoop example pratice
1.grep command
1-1 Upload /opt/hadoop-0.20.2/conf folder to HDFS and rename to source
Syntax is
#hadoop upload LocalFolder HDFS-Folder-Name
>/opt/hadoop-0.20.2/bin/hadoop fs -put /opt/hadoop-0.20.2/conf source
1-2 Check upload source folder sucessful
 
> /opt/hadoop-0.20.2/bin/hadoop fs -ls /user/max/
Found 1 items
drwxr-xr-x - max supergroup 0 2010-07-23 15:13 /user/max/source
1-3 Use grep command to find out files in source folder, and the content text start with dfs , save it to output-1
>/opt/hadoop-0.20.2/bin/hadoop jar /opt/hadoop-0.20.2/hadoop-0.20.2-examples.jar grep source output-1 'dfs[a-z.]+'
1-4 Check Result
>/opt/hadoop-0.20.2/bin/hadoop fs -ls output-1
Found 2 items
drwxr-xr-x - max supergroup 0 2010-07-20 00:33 /user/max/output/_logs
-rw-r--r-- 1 max supergroup 96 2010-07-20 00:33 /user/max/output/part-00000
>/opt/hadoop-0.20.2/bin/hadoop fs -cat output-1/part-00000
3 dfs.class
2 dfs.period
1 dfs.file
1 dfs.replication
1 dfs.servers
1 dfsadmin
1 dfsmetrics.log
2. wordcount Pratice
2-1 Count how many words in source folder and save it to output-2
>/opt/hadoop-0.20.2/bin/hadoop jar /opt/hadoop-0.20.2/hadoop-0.20.2-examples.jar wordcount source output-2
2-2 Check result
>/opt/hadoop-0.20.2/bin/hadoop fs -ls output-2
Found 2 items
drwxr-xr-x - max supergroup 0 2010-07-20 02:00 /user/max/output-2/_logs
-rw-r--r-- 1 max supergroup 10886 2010-07-20 02:01 /user/max/output-2/part-r-00000
Display the result with -cat
>/opt/hadoop-0.20.2/bin/hadoop fs -cat output-2/part-r-00000
Lab 4 Hadoop Cluster
-- Please do it on Server 2 ---
Login with max password linux
1, Prepare the folder
>sudo mkdir /opt/hadoop-0.20.2
Input root password, Please input linux
>sudo mkdir /var/hadoop
>sudo chown -R max:users /opt/hadoop-0.20.2/
>sudo chown -R max:users /var/hadoop
Set up Name resolve ( It’s very important )
>sudo vi /etc/hosts
Please comment server2’s name resolve for 127.0.0.2
#127.0.0.2 server2.digitalairlines.com server2
Please add server1 and server2 IP ( Depend on your env)
10.10.x.y server.digitalairlines.com server
10.10.v.w server2.digitalairlines.com server2
-----------------------------------------------------------------------------------------------------------------------
-- Please do it on Server 1 ---
1-1 stop hadoop
>/opt/hadoop-0.20.2/bin/stop-all.sh
1-2 Delete old hadoop folder
>rm -rf /var/hadoop/*
1-3 Modify Namenode configurtion
>vi /opt/hadoop-0.20.2/conf/core-site.xml
Please fix
<value>hdfs://localhost:9000</value>
To server1 IP
<value>hdfs://Srv1’s ip:9000</value>
***You can use “>ip address show” or “/sbin/ifconfig” display IP address***
1-4 Modify HDFS replication setting
>vi /opt/hadoop-0.20.2/conf/hdfs-site.xml
Please fix
<value>1</value>
to
<value>2</value>
1-5
>vi /opt/hadoop-0.20.2/conf/mapred-site.xml
Please fix
1-6 Set up slaves (The host who act slaves will be datanode and tasktracker role)
>vi /opt/hadoop-0.20.2/conf/slaves
Please delete localhost
Please add Srv1’s ip
Please add Srv2’s ip
***The ip addree might be 10.10.x.y ***
1-7 Set up Name resolve
>sudo vi /etc/hosts
Please comment server1’s Name resolve for 127.0.0.2
#127.0.0.2 server.digitalairlines.com server
Please add server1 and server2 IP address for name resolve
10.10.x.y server.digitalairlines.com server
10.10.v.w server2.digitalairlines.com server2
1-8 Modify ssh configuration
>sudo vi /etc/ssh/ssh_config
Uncomment the StrictHostKeyChecking and modify it to no
# StrictHostKeyChecking ask
StrictHostKeyChecking no
1-9 Copy SSH Key to another Node
>scp -r ~/.ssh Srv2-IP:~/
Warning: Permanently added '10.10.v.w' (RSA) to the list of known hosts.
Password: Please input max password
Test connect to SSH without password -- with key
Connect to server1
>ssh Srv1’s IP
>exit
Connect to server2
>ssh Srv2’s IP
>exit
1-10 Copy hadoop to Server 2
>scp -r /opt/hadoop-0.20.2/* Srv2-IP:/opt/hadoop-0.20.2/
1-11 Format HDFS
>/opt/hadoop-0.20.2/bin/hadoop namenode -format
1-12 Start DFS ( It will depend /opt/hadoop-0.20.2/conf/slaves to active datanode )
>/opt/hadoop-0.20.2/bin/start-dfs.sh
Please check there are 2 datanode
starting namenode, logging to /opt/hadoop-0.20.2/logs/hadoop-max-namenode-linux-7tce.out
10.10.x.y: starting datanode, logging to /opt/hadoop-0.20.2/logs/hadoop-max-datanode-linux-7tce.out
10.10.v.w: starting datanode, logging to /opt/hadoop-0.20.2/logs/hadoop-max-datanode-server2.out
localhost: starting secondarynamenode, logging to /opt/hadoop-0.20.2/logs/hadoop-max-secondarynamenode-linux-7tce.out
Please Check “ http://Srv1’s IP:50070/ ”
Please check “Live Nodes” -- It should be 2
1-13 Start JobTracker
>/opt/hadoop-0.20.2/bin/start-mapred.sh
Please Check “ http://Srv1’s IP:50030/ ”
Please Check “Nodes” -- It should be 2
Now, just run programs like Lab3 to examine
Environment setup note:
OS: openSUSE 11.2 ( sure, of course ^^)
HD: 80GB
Prepare two PC for single and cluster pratice.
You can set ip your own ip with your env.
Server1:
10.10.x.y server.digitalairlines.com server
Server2:
10.10.v.w server2.digitalairlines.com server2
Partition
- swap 1GB
- / 73.5GB
User Admin
- User: root password: linux
- User: max password: linux
Software
- select Base Development Packages
- update openSUSE packages
- install java-1_6_0-sun-devel packages( I found openjdk got problem ^^||)( you can install it in update repositories)
Services ( Daemon)
- Active sshd and set bootable
- #rcsshd start
- #chkconfig sshd on
For numerous use
- Fix /etc/fstab and /boot/grub/menu.lst HardDisk DeviceName Use /dev/sda1 and not use /dev/disk/by-id -- If you want to clone your hard disks to deploy it!!
- Delete /etc/udev/rules.d/70-persistent-net.rules for Network Interface Card ( If you didn’t delete it, when you clone your disk, your new NIC name will be eth1 and not eth0 )
Prepare the software
- Create Directory OSSF at /opt
- #mkdir /opt/OSSF
- Download hadoop-0.20.2.tar.gz at /opt/OSSF
- You can download the software here
- http://www.apache.org/dyn/closer.cgi/hadoop/core/
---------------------------------------- Pratice ------------------------------------------
Hadoop with single host
At Server1
Please login with max use password linux
Please notice shell promote is >
Step 1. Create ssh key for connet ssh without password
Use non-interactive method to create Server1 DSA key pair
>ssh-keygen -N '' -d -q -f ~/.ssh/id_dsa
copy public key for authorized_keys
>cp ~/.ssh/id_dsa.pub ~/.ssh/authorized_keys
>ssh-add ~/.ssh/id_dsa
Identity added: /root/.ssh/id_dsa (/root/.ssh/id_dsa)
Test connect to ssh without password -- with Key
>ssh localhost
The authenticity of host 'localhost (: :1)' can't be established.
RSA key fingerprint is 05:22:61:78:05:04:7e:d1:81:67:f2:d5:8a:42:bb:9f.
Are you sure you want to continue connecting (yes/no)? Please input yes
Logout SSH
>exit
Step 2. Instll Hadoop
Exarct Hadoop package(we prepare it at /opt/OSSF) -- please use sudo to do it
(Because regular user has no permission with /opt folder)
>sudo tar zxvf /opt/OSSF/hadoop-0.20.2.tar.gz -C /opt
It will ask to input root password, pleasure input linux
Change /opt/hadoop-0.20.2 owner to max, and the group belong users
> sudo chown -R max:users /opt/hadoop-0.20.2/
Create /var/hadoop Folder
> sudo mkdir /var/hadoop
Change /var/hadoop owner to max, and group belong users
> sudo chown -R max:users /var/hadoop/
Step 3. Set up Hadoop Configuration
3-1. Set up environment with hadoop-env.sh
>vi /opt/hadoop-0.20.2/conf/hadoop-env.sh
#Please add these settings ( Depend your env)
export JAVA_HOME=/usr/lib/jvm/java-1.6.0-sun
export HADOOP_HOME=/opt/hadoop-0.20.2
export HADOOP_CONF_DIR=/opt/hadoop-0.20.2/conf
3-2. add configuration with core-site.xml in <configuration> to </configuration>
you can copy and paste it ^^
>vi /opt/hadoop-0.20.2/conf/core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/var/hadoop/hadoop-\${user.name}</value>
</property>
</configuration>
3-3. add configuration with hdfs-site.xml ( Set up replication) in <configuration> to </configuration>
you can copy and paste it ^^
>vi /opt/hadoop-0.20.2/conf/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
3-4. add configuration with mapred-site.xml 內的( For JobTracker ) in <configuration> to </configuration>
you can copy and paste it ^^
>vi /opt/hadoop-0.20.2/conf/mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>
Step 4. Format HDFS
>/opt/hadoop-0.20.2/bin/hadoop namenode -format
10/07/20 00:51:13 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = server/127.0.0.2
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 0.20.2
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
************************************************************/
10/07/20 00:51:13 INFO namenode.FSNamesystem: fsOwner=max,users,video
10/07/20 00:51:13 INFO namenode.FSNamesystem: supergroup=supergroup
10/07/20 00:51:13 INFO namenode.FSNamesystem: isPermissionEnabled=true
10/07/20 00:51:14 INFO common.Storage: Image file of size 93 saved in 0 seconds.
10/07/20 00:51:14 INFO common.Storage: Storage directory /var/hadoop/hadoop-\max/dfs/name has been successfully formatted.
10/07/20 00:51:14 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at server/127.0.0.2
************************************************************/
Step 5. Start hadoop
>/opt/hadoop-0.20.2/bin/start-all.sh
starting namenode, logging to /opt/hadoop-0.20.2/logs/hadoop-max-namenode-server.out
localhost: starting datanode, logging to /opt/hadoop-0.20.2/logs/hadoop-max-datanode-server.out
localhost: starting secondarynamenode, logging to /opt/hadoop-0.20.2/logs/hadoop-max-secondarynamenode-server.out
starting jobtracker, logging to /opt/hadoop-0.20.2/logs/hadoop-max-jobtracker-server.out
localhost: starting tasktracker, logging to /opt/hadoop-0.20.2/logs/hadoop-max-tasktracker-server.out
Step 6. Check Hadoop Status
You can use mouse to link your computer
Hadoop Admin
http://localhost:50030
Hadoop Task Tracker
http://localhost:50060
Hadoop DFS
http://localhost:50070
Lab2 HDFS commands pratice
1.
show hadoop command help
>/opt/hadoop-0.20.2/bin/hadoop fs
use hadoop command to list HDFS
( But we don’t upload and file to HDFS, It will have error messages )
>/opt/hadoop-0.20.2/bin/hadoop fs -ls
2. Upload /opt/hadoop-0.20.2/conf Folder to HDFS and rename to input
Syntax is
#hadoop command upload Local-Dir HDFS-Folder-Name
>/opt/hadoop-0.20.2/bin/hadoop fs -put /opt/hadoop-0.20.2/conf input
3. Please check HDFS again
3-1 check the HDFS
> /opt/hadoop-0.20.2/bin/hadoop fs -ls
Found 1 items
drwxr-xr-x - max supergroup 0 2010-07-18 21:16 /user/max/input
If you don’t order the path , Default path is /user/username
you can use absolute path name too, for example
> /opt/hadoop-0.20.2/bin/hadoop fs -ls /user/max/
Tips: You can check the /var/hadoop folder before / after you upload to HDFS
(You can see some change to your folder with your localhost)
>ls -lh /var/hadoop/hadoop-\\max/dfs/data/current/
3-2 List input folder on HDFS
>/opt/hadoop-0.20.2/bin/hadoop fs -ls input
Found 13 items
-rw-r--r-- 1 max supergroup 3936 2010-07-21 16:00 /user/max/input/capacity-scheduler.xml
-rw-r--r-- 1 max supergroup 535 2010-07-21 16:00 /user/max/input/configuration.xsl
-rw-r--r-- 1 max supergroup 379 2010-07-21 16:00 /user/max/input/core-site.xml
-rw-r--r-- 1 max supergroup 2367 2010-07-21 16:00 /user/max/input/hadoop-env.sh
-rw-r--r-- 1 max supergroup 1245 2010-07-21 16:00 /user/max/input/hadoop-metrics.properties
-rw-r--r-- 1 max supergroup 4190 2010-07-21 16:00 /user/max/input/hadoop-policy.xml
-rw-r--r-- 1 max supergroup 254 2010-07-21 16:00 /user/max/input/hdfs-site.xml
-rw-r--r-- 1 max supergroup 2815 2010-07-21 16:00 /user/max/input/log4j.properties
-rw-r--r-- 1 max supergroup 270 2010-07-21 16:00 /user/max/input/mapred-site.xml
-rw-r--r-- 1 max supergroup 10 2010-07-21 16:00 /user/max/input/masters
-rw-r--r-- 1 max supergroup 10 2010-07-21 16:00 /user/max/input/slaves
-rw-r--r-- 1 max supergroup 1243 2010-07-21 16:00 /user/max/input/ssl-client.xml.example
-rw-r--r-- 1 max supergroup 1195 2010-07-21 16:00 /user/max/input/ssl-server.xml.example
4. Download files from HDFS to local
Please check your local folder first
>ls
Use command “ hadoop fs -get “ to download it
>/opt/hadoop-0.20.2/bin/hadoop fs -get input fromHDFS
Please check your local folder again
>ls
5. Use -cat to check the file on HDFS
>/opt/hadoop-0.20.2/bin/hadoop fs -cat input/slaves
localhost
6. Delete files on HDFS with -rm ( with directory please use -rmr )
Check input Folder’s files first, you will see /user/max/input/slaves exist
>> /opt/hadoop-0.20.2/bin/hadoop fs -ls /user/max/input
Found 13 items
-rw-r--r-- 1 max supergroup 3936 2010-07-21 16:00 /user/max/input/capacity-scheduler.xml
-rw-r--r-- 1 max supergroup 535 2010-07-21 16:00 /user/max/input/configuration.xsl
-rw-r--r-- 1 max supergroup 379 2010-07-21 16:00 /user/max/input/core-site.xml
-rw-r--r-- 1 max supergroup 2367 2010-07-21 16:00 /user/max/input/hadoop-env.sh
-rw-r--r-- 1 max supergroup 1245 2010-07-21 16:00 /user/max/input/hadoop-metrics.properties
-rw-r--r-- 1 max supergroup 4190 2010-07-21 16:00 /user/max/input/hadoop-policy.xml
-rw-r--r-- 1 max supergroup 254 2010-07-21 16:00 /user/max/input/hdfs-site.xml
-rw-r--r-- 1 max supergroup 2815 2010-07-21 16:00 /user/max/input/log4j.properties
-rw-r--r-- 1 max supergroup 270 2010-07-21 16:00 /user/max/input/mapred-site.xml
-rw-r--r-- 1 max supergroup 10 2010-07-21 16:00 /user/max/input/masters
-rw-r--r-- 1 max supergroup 10 2010-07-21 16:00 /user/max/input/slaves
-rw-r--r-- 1 max supergroup 1243 2010-07-21 16:00 /user/max/input/ssl-client.xml.example
-rw-r--r-- 1 max supergroup 1195 2010-07-21 16:00 /user/max/input/ssl-server.xml.example
Use hadoop fs -rm to delete file with name slaves
>/opt/hadoop-0.20.2/bin/hadoop fs -rm input/slaves
Deleted hdfs://localhost:9000/user/max/input/slaves
Check input Folder’s files again, you will see /user/max/input/slaves not exist
>> /opt/hadoop-0.20.2/bin/hadoop fs -ls /user/max/input
Found 12 items
-rw-r--r-- 1 max supergroup 3936 2010-07-22 15:08 /user/max/input/capacity-scheduler.xml
-rw-r--r-- 1 max supergroup 535 2010-07-22 15:08 /user/max/input/configuration.xsl
-rw-r--r-- 1 max supergroup 379 2010-07-22 15:08 /user/max/input/core-site.xml
-rw-r--r-- 1 max supergroup 2367 2010-07-22 15:08 /user/max/input/hadoop-env.sh
-rw-r--r-- 1 max supergroup 1245 2010-07-22 15:08 /user/max/input/hadoop-metrics.properties
-rw-r--r-- 1 max supergroup 4190 2010-07-22 15:08 /user/max/input/hadoop-policy.xml
-rw-r--r-- 1 max supergroup 254 2010-07-22 15:08 /user/max/input/hdfs-site.xml
-rw-r--r-- 1 max supergroup 2815 2010-07-22 15:08 /user/max/input/log4j.properties
-rw-r--r-- 1 max supergroup 270 2010-07-22 15:08 /user/max/input/mapred-site.xml
-rw-r--r-- 1 max supergroup 10 2010-07-22 15:08 /user/max/input/masters
-rw-r--r-- 1 max supergroup 1243 2010-07-22 15:08 /user/max/input/ssl-client.xml.example
-rw-r--r-- 1 max supergroup 1195 2010-07-22 15:08 /user/max/input/ssl-server.xml.example
Use hadoop fs -rmr to delete folder
>/opt/hadoop-0.20.2/bin/hadoop fs -rm input
Deleted hdfs://localhost:9000/user/max/input
Lab 3 Hadoop example pratice
1.grep command
1-1 Upload /opt/hadoop-0.20.2/conf folder to HDFS and rename to source
Syntax is
#hadoop upload LocalFolder HDFS-Folder-Name
>/opt/hadoop-0.20.2/bin/hadoop fs -put /opt/hadoop-0.20.2/conf source
1-2 Check upload source folder sucessful
> /opt/hadoop-0.20.2/bin/hadoop fs -ls /user/max/
Found 1 items
drwxr-xr-x - max supergroup 0 2010-07-23 15:13 /user/max/source
1-3 Use grep command to find out files in source folder, and the content text start with dfs , save it to output-1
>/opt/hadoop-0.20.2/bin/hadoop jar /opt/hadoop-0.20.2/hadoop-0.20.2-examples.jar grep source output-1 'dfs[a-z.]+'
1-4 Check Result
>/opt/hadoop-0.20.2/bin/hadoop fs -ls output-1
Found 2 items
drwxr-xr-x - max supergroup 0 2010-07-20 00:33 /user/max/output/_logs
-rw-r--r-- 1 max supergroup 96 2010-07-20 00:33 /user/max/output/part-00000
>/opt/hadoop-0.20.2/bin/hadoop fs -cat output-1/part-00000
3 dfs.class
2 dfs.period
1 dfs.file
1 dfs.replication
1 dfs.servers
1 dfsadmin
1 dfsmetrics.log
2. wordcount Pratice
2-1 Count how many words in source folder and save it to output-2
>/opt/hadoop-0.20.2/bin/hadoop jar /opt/hadoop-0.20.2/hadoop-0.20.2-examples.jar wordcount source output-2
2-2 Check result
>/opt/hadoop-0.20.2/bin/hadoop fs -ls output-2
Found 2 items
drwxr-xr-x - max supergroup 0 2010-07-20 02:00 /user/max/output-2/_logs
-rw-r--r-- 1 max supergroup 10886 2010-07-20 02:01 /user/max/output-2/part-r-00000
Display the result with -cat
>/opt/hadoop-0.20.2/bin/hadoop fs -cat output-2/part-r-00000
Lab 4 Hadoop Cluster
-- Please do it on Server 2 ---
Login with max password linux
1, Prepare the folder
>sudo mkdir /opt/hadoop-0.20.2
Input root password, Please input linux
>sudo mkdir /var/hadoop
>sudo chown -R max:users /opt/hadoop-0.20.2/
>sudo chown -R max:users /var/hadoop
Set up Name resolve ( It’s very important )
>sudo vi /etc/hosts
Please comment server2’s name resolve for 127.0.0.2
#127.0.0.2 server2.digitalairlines.com server2
Please add server1 and server2 IP ( Depend on your env)
10.10.x.y server.digitalairlines.com server
10.10.v.w server2.digitalairlines.com server2
-----------------------------------------------------------------------------------------------------------------------
-- Please do it on Server 1 ---
1-1 stop hadoop
>/opt/hadoop-0.20.2/bin/stop-all.sh
1-2 Delete old hadoop folder
>rm -rf /var/hadoop/*
1-3 Modify Namenode configurtion
>vi /opt/hadoop-0.20.2/conf/core-site.xml
Please fix
<value>hdfs://localhost:9000</value>
To server1 IP
<value>hdfs://Srv1’s ip:9000</value>
***You can use “>ip address show” or “/sbin/ifconfig” display IP address***
1-4 Modify HDFS replication setting
>vi /opt/hadoop-0.20.2/conf/hdfs-site.xml
Please fix
<value>1</value>
to
<value>2</value>
1-5
>vi /opt/hadoop-0.20.2/conf/mapred-site.xml
Please fix
<value>localhost:9001</value>
to<value>Srv1’s ip:9001</value>
1-6 Set up slaves (The host who act slaves will be datanode and tasktracker role)
>vi /opt/hadoop-0.20.2/conf/slaves
Please delete localhost
Please add Srv1’s ip
Please add Srv2’s ip
***The ip addree might be 10.10.x.y ***
1-7 Set up Name resolve
>sudo vi /etc/hosts
Please comment server1’s Name resolve for 127.0.0.2
#127.0.0.2 server.digitalairlines.com server
Please add server1 and server2 IP address for name resolve
10.10.x.y server.digitalairlines.com server
10.10.v.w server2.digitalairlines.com server2
1-8 Modify ssh configuration
>sudo vi /etc/ssh/ssh_config
Uncomment the StrictHostKeyChecking and modify it to no
# StrictHostKeyChecking ask
StrictHostKeyChecking no
1-9 Copy SSH Key to another Node
>scp -r ~/.ssh Srv2-IP:~/
Warning: Permanently added '10.10.v.w' (RSA) to the list of known hosts.
Password: Please input max password
Test connect to SSH without password -- with key
Connect to server1
>ssh Srv1’s IP
>exit
Connect to server2
>ssh Srv2’s IP
>exit
1-10 Copy hadoop to Server 2
>scp -r /opt/hadoop-0.20.2/* Srv2-IP:/opt/hadoop-0.20.2/
1-11 Format HDFS
>/opt/hadoop-0.20.2/bin/hadoop namenode -format
1-12 Start DFS ( It will depend /opt/hadoop-0.20.2/conf/slaves to active datanode )
>/opt/hadoop-0.20.2/bin/start-dfs.sh
Please check there are 2 datanode
starting namenode, logging to /opt/hadoop-0.20.2/logs/hadoop-max-namenode-linux-7tce.out
10.10.x.y: starting datanode, logging to /opt/hadoop-0.20.2/logs/hadoop-max-datanode-linux-7tce.out
10.10.v.w: starting datanode, logging to /opt/hadoop-0.20.2/logs/hadoop-max-datanode-server2.out
localhost: starting secondarynamenode, logging to /opt/hadoop-0.20.2/logs/hadoop-max-secondarynamenode-linux-7tce.out
Please Check “ http://Srv1’s IP:50070/ ”
Please check “Live Nodes” -- It should be 2
1-13 Start JobTracker
>/opt/hadoop-0.20.2/bin/start-mapred.sh
Please Check “ http://Srv1’s IP:50030/ ”
Please Check “Nodes” -- It should be 2
Now, just run programs like Lab3 to examine
 
