Windows 10 安装Hadoop

配置启动Windows 10 Linux子系统

  • 通过启用Windows 10 Linux子系统(WSL)安装配置WSL

    下载安装Hadoop二进制文件

  • 前往Hadoop发布页面找到需要下载版本的URL,这里下载最新的Hadoop 3.3.0
  • 运行下面的命令下载安装
    1
    wget http://mirror.intergrid.com.au/apache/hadoop/common/hadoop-3.3.0/hadoop-3.3.0.tar.gz
  • 创建hadoop文件夹保存解压后的hadoop 3.3.0
    1
    mkdir ~/hadoop
  • 解压下载的Hadoop二进制文件
    1
    tar -xvzf hadoop-3.3.0.tar.gz -C ~/hadoop
  • 进入hadoop文件夹
    1
    cd ~/hadoop/hadoop-3.3.0/

    配置passphraseless ssh

  • 检查是否可以通过sshlocalhost
    1
    ssh localhost
  • 如果遇到错误ssh: connect to host localhost port 22: Connection refused,尝试下面的命令
    1
    2
    sudo apt-get install ssh
    sudo service ssh restart
  • 如果无法无密码连接,运行下面的命令初始化私有和公有密钥
    1
    2
    3
    ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
    cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
    chmod 0600 ~/.ssh/authorized_keys

配置伪分布式模式(单节点模式)

  • 通过修改~/.bashrc配置系统环境变量
    1
    vi ~/.bashrc
  • 添加下面的环境变量
    1
    2
    3
    4
    export JAVA_HOME=/usr/lib/jvm/java-1.11.0-openjdk-amd64
    export HADOOP_HOME=~/hadoop/hadoop-3.3.0
    export PATH=$PATH:$HADOOP_HOME/bin
    export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
  • 运行下面的命令更新变量
    1
    source ~/.bashrc
  • 修改etc/hadoop/hadoop-env.sh
    1
    vi etc/hadoop/hadoop-env.sh
  • 设置JAVA_HOME变量
    1
    export JAVA_HOME=/usr/lib/jvm/java-1.11.0-openjdk-amd64
  • 修改etc/hadoop/core-site.xml
    1
    vi etc/hadoop/core-site.xml
  • 添加下面的配置
    1
    2
    3
    4
    5
    6
    <configuration>
    <property>
    <name>fs.defaultFS</name>
    <value>hdfs://localhost:9000</value>
    </property>
    </configuration>
  • 修改etc/hadoop/hdfs-site.xml
    1
    vi etc/hadoop/hdfs-site.xml
  • 添加配置
    1
    2
    3
    4
    5
    6
    <configuration>
    <property>
    <name>dfs.replication</name>
    <value>1</value>
    </property>
    </configuration>
  • 修改etc/hadoop/mapred-site.xml
    1
    vi etc/hadoop/mapred-site.xml
  • 添加配置
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    <configuration>
    <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
    </property>
    <property>
    <name>mapreduce.application.classpath</name>
    <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
    </property>
    </configuration>
  • 修改etc/hadoop/yarn-site.xml
    1
    vi etc/hadoop/yarn-site.xml
  • 添加配置
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    <configuration>
    <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
    </property>
    <property>
    <name>yarn.nodemanager.env-whitelist</name>
    <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
    </property>
    </configuration>

    格式化namenode

    1
    bin/hdfs namenode -format

运行DFS程序

  • 启动NameNodeDataNode
    1
    2
    3
    4
    5
    hadoop@xx:~/hadoop/hadoop-3.3.0
    $ sbin/start-dfs.sh
    Starting namenodes on [localhost]
    Starting datanodes
    Starting secondary namenodes [xx]
  • 检查状态
    1
    2
    3
    4
    5
    hadoop@xx:~/hadoop/hadoop-3.3.0$ jps
    4384 DataNode
    4198 NameNode
    4615 SecondaryNameNode
    4766 Jps
  • 查看name node
    访问地址http://localhost:9870/dfshealth.html#tab-overview
    name node

运行YARN

1
sbin/start-yarn.sh

– 检查状态

1
2
3
4
5
6
7
hadoop@xx:~/hadoop/hadoop-3.3.0$ jps
4384 DataNode
5424 Jps
4898 ResourceManager
5186 NodeManager
4198 NameNode
4615 SecondaryNameNode
  • 可以看到NodeManagerResourceManager
  • 访问YARN地址http://localhost:8088/cluster
    Yarn View

关闭服务

1
2
sbin/stop-yarn.sh
sbin/stop-dfs.sh
  • 检查状态
    1
    ~/hadoop/hadoop-3.3.0$ jps

FAQ

  • “etc/hadoop/hadoop-env.sh” E212: Can’t open file for writing
    确保导航到正确的Hadoop路径$HADOOP_HOME