安装共享数据分析和机器学习工具#
准备#
选择共享文件系统目录。我们使用/share/apps作为例子,以后所有的工具都将安装在这个目录下
mkdir -p /share/apps
安装任意版本的anaconda,这个版本只是为了把所有工具安装在以上目录中,不影响最终用户的conda版本。
yum -y install epel-release
yum -y install conda
安装Python#
建立Anaconda Python3.9环境,并安装在这个环境中的Anaconda
conda create -p /share/apps/python python=3.9
conda init bash
退出当前终端,再进入,然后运行:
conda activate /share/apps/python
pip3 install -U pip
conda install conda
安装机器学习库#
pip3 install pandas numpy scipy matplotlib seaborn scikit-learn xgboost
安装TensorFlow和TensorBoard
pip3 install tensorflow
pip3 install tensorboard
安装PyTorch
pip3 install torch
安装Jupyter#
安装Jupyter Lab
pip3 install jupyterlab jupyterlab-language-pack-zh-CN
安装R#
安装R 4.3
conda install -c conda-forge R
安装Julia#
从网页https://julialang.org/downloads/ 下载 Generic Linux on x86 - 64 bit (glibc)
解压,例子
tar xfz julia-1.9.3-linux-86+64.tar.gz -C /share/apps
安装Spark#
访问Spark下载网页,下载“Pre-built for Apache Hadoop”,如:spark-3.5.0-bin-hadoop3.tgz
解压到共享目录:
tar xfz spark-3.5.0-bin-hadoop3.tgz -C /share/apps
安装Jupyter内核#
安装Python 2内核#
安装Python 2 在安装前,确保退出原有的Conda环境
conda deactivate
建立Python 2环境
conda create -p /share/apps/python2 python=2
安装Python 2内核
/share/apps/python2/bin/pip install ipykernel
/share/apps/python2/bin/python2 -m ipykernel install
迁移内核到Conda Python 3和Jupyter的环境中
mv /usr/local/share/jupyter/kernels/python2 /share/apps/python/share/jupyter/kernels/
安装R内核#
安装
conda activate /share/apps/python
conda install -c r r-irkernel
修改R运行命令行,修改/share/apps/python/share/jupyter/kernels/ir/kernel.json的第一行,加入R运行路径:
{"argv": ["/share/apps/python/lib/R/bin/R", "--slave", "-e", "IRkernel::main()", "--args", "{connection_file}"],
"display_name":"R",
"language":"R"
}
增加R运行设置,在/share/apps/python/lib/R/etc中添加文件Rprofile:
options(bitmapType='cario')
options(jupyter.plot_mimetypes="image/svg+xml")
确保运行Jupyter的主机上安装了X11的库(yum group install “Server with GUI”)
安装Scala内核(Spark)#
进入Python3环境
conda activate /share/apps/pythnon
安装软件
pip3 install spylon_kernel
python -m spylon_kernel install
conda install openjdk
pip3 install pyspark
迁移内核到Conda Python 3和Jupyter的环境中
mv /usr/local/share/jupyter/kernels/spylon-kernel /share/apps/python/share/jupyter/kernels/
在内核定义文件中增加Spark位置,编辑/share/apps/python/share/jupyter/kernels/spylon-kernel/kernel.json,在env里增加SPARK_HOME和JAVA_HOME。例子:
{"argv": ["/share/apps/python/bin/python", "-m", "spylon_kernel", "-f", "{connection_file}"],
"display_name": "spylon-kernel","env": {"SPARK_HOME":"/share/apps/spark-3.5.0-bin-hadoop3", "JAVA_HOME":"/share/apps/python", "PYTHONUNBUFFERED": "1", "SPARK_SUBMIT_OPTS": "-Dscala.usejavacp=true"}, "language": "scala", "name": "spylon-kernel"}
安装Julia内核#
由于Julia的不支持系统级与用户级运行环境的融合(如Python和R都可以支持系统与用户级的融合),安装Julia内核必须有需要使用的用户自行完成。详见Jupyter使用。
安装RStudio#
选择OS版本,拷贝下载命令,如
wget https://download2.rstudio.org/server/rhel8/x86_64/rstudio-server-rhel-2023.09.1-494-x86_64.rpm
展开rpm包
rpm2cpio rstudio-server-rhel-2023.09.1-494-x86_64.rpm | cpio -idmv
把文件移到共享文件夹中
mv usr/lib/rstudio-server /share/apps
安装Singularity#
在Linux上安装Singularity#
yum install -y epel-release
yum install -y singularity-ce
下载集群中Linux同版本的镜像#
singularity pull /share/apps/containers/centos7.sif docker://centos:7
singularity pull /share/apps/containers/rocky8.sif docker://rockylinux:8
安装Git#
常规操作系统中Git为v1版,很多开发者需要使用Git v2。我们使用conda安装Git v2
conda activate /share/apps/python
conda install git