安装共享数据分析和机器学习工具#

准备#

选择共享文件系统目录。我们使用/share/apps作为例子,以后所有的工具都将安装在这个目录下

mkdir -p /share/apps

安装任意版本的anaconda,这个版本只是为了把所有工具安装在以上目录中,不影响最终用户的conda版本。

yum -y install epel-release
yum -y install conda

安装Python#

建立Anaconda Python3.9环境,并安装在这个环境中的Anaconda

conda create -p /share/apps/python python=3.9
conda init bash

退出当前终端,再进入,然后运行:

conda activate /share/apps/python
pip3 install -U pip
conda install conda

安装机器学习库#

pip3 install pandas numpy scipy matplotlib seaborn scikit-learn xgboost
  • 安装TensorFlow和TensorBoard

pip3 install tensorflow
pip3 install tensorboard
  • 安装PyTorch

pip3 install torch

安装Jupyter#

安装Jupyter Lab

pip3 install jupyterlab jupyterlab-language-pack-zh-CN

安装R#

安装R 4.3

conda install -c conda-forge R

安装Julia#

  • 从网页https://julialang.org/downloads/ 下载 Generic Linux on x86 - 64 bit (glibc)

  • 解压,例子

tar xfz julia-1.9.3-linux-86+64.tar.gz -C /share/apps

安装Spark#

访问Spark下载网页,下载“Pre-built for Apache Hadoop”,如:spark-3.5.0-bin-hadoop3.tgz

解压到共享目录:

tar xfz spark-3.5.0-bin-hadoop3.tgz -C /share/apps

安装Jupyter内核#

安装Python 2内核#

  1. 安装Python 2 在安装前,确保退出原有的Conda环境

conda deactivate

建立Python 2环境

conda create -p /share/apps/python2 python=2
  1. 安装Python 2内核

/share/apps/python2/bin/pip install ipykernel
/share/apps/python2/bin/python2 -m ipykernel install
  1. 迁移内核到Conda Python 3和Jupyter的环境中

mv /usr/local/share/jupyter/kernels/python2 /share/apps/python/share/jupyter/kernels/

安装R内核#

  • 安装

conda activate /share/apps/python
conda install -c r r-irkernel
  • 修改R运行命令行,修改/share/apps/python/share/jupyter/kernels/ir/kernel.json的第一行,加入R运行路径:

{"argv": ["/share/apps/python/lib/R/bin/R", "--slave", "-e", "IRkernel::main()", "--args", "{connection_file}"],
 "display_name":"R",
 "language":"R"
}
  • 增加R运行设置,在/share/apps/python/lib/R/etc中添加文件Rprofile:

options(bitmapType='cario')
options(jupyter.plot_mimetypes="image/svg+xml")

确保运行Jupyter的主机上安装了X11的库(yum group install “Server with GUI”)

安装Scala内核(Spark)#

  • 进入Python3环境

conda activate /share/apps/pythnon
  • 安装软件

pip3 install spylon_kernel
python -m spylon_kernel install
conda install openjdk
pip3 install pyspark
  • 迁移内核到Conda Python 3和Jupyter的环境中

mv /usr/local/share/jupyter/kernels/spylon-kernel /share/apps/python/share/jupyter/kernels/
  • 在内核定义文件中增加Spark位置,编辑/share/apps/python/share/jupyter/kernels/spylon-kernel/kernel.json,在env里增加SPARK_HOME和JAVA_HOME。例子:

{"argv": ["/share/apps/python/bin/python", "-m", "spylon_kernel", "-f", "{connection_file}"],
"display_name": "spylon-kernel","env": {"SPARK_HOME":"/share/apps/spark-3.5.0-bin-hadoop3", "JAVA_HOME":"/share/apps/python", "PYTHONUNBUFFERED": "1", "SPARK_SUBMIT_OPTS": "-Dscala.usejavacp=true"}, "language": "scala", "name": "spylon-kernel"}

安装Julia内核#

由于Julia的不支持系统级与用户级运行环境的融合(如Python和R都可以支持系统与用户级的融合),安装Julia内核必须有需要使用的用户自行完成。详见Jupyter使用。

安装RStudio#

wget https://download2.rstudio.org/server/rhel8/x86_64/rstudio-server-rhel-2023.09.1-494-x86_64.rpm
  • 展开rpm包

rpm2cpio rstudio-server-rhel-2023.09.1-494-x86_64.rpm | cpio -idmv
  • 把文件移到共享文件夹中

mv usr/lib/rstudio-server /share/apps

安装Singularity#

在Linux上安装Singularity#

yum install -y epel-release
yum install -y singularity-ce

下载集群中Linux同版本的镜像#

singularity pull /share/apps/containers/centos7.sif docker://centos:7
singularity pull /share/apps/containers/rocky8.sif docker://rockylinux:8

安装Git#

常规操作系统中Git为v1版,很多开发者需要使用Git v2。我们使用conda安装Git v2

conda activate /share/apps/python
conda install git