JupyterHub setup on Cent OS or Red Hat server

I set up a collaboration tool for a new Data Science team that allows everyone to share their work and use the computational resources efficiently. I'm a Data Scientist, my background is from Computer Engineering and I wear many hats!

It was initially hard for me to know where to start but I got good references from the Internet. Eventually, I ended up with JupyterHub for my team!

I jotted down some notes during my installation and thought to share with others. I hope it may help someone. I installed the JupyterHub initially on Cent OS and we migrated to Red Hat server. The following steps is applicable for both Cent OS and Red Hat.

Check sudo access. You should have sudo access because we are going to serve JupyterHub via sudo user account.
Download Anacoda distribution that comes with default packages and libraries. Anacond5.2.0-Linux-x86_64.sh is the latest one while I'm writing this blog.
Install the Anaconda at /opt/anaconda3 location. Command to run:

sudo bash Anaconda5.2.0-4.2.0-Linux-x86_64.sh

Install npm, nodejs and configurable-http-proxy.

sudo bash yum install npm nodejs
sudo npm install -g configurable-http-proxy

Optional: Upgrade JupyterHub if you are reinstalling.

sudo /opt/anaconda3/bin/pip install --upgrade jupyterhub

Install SudoSpawner that enables JupyterHub to spawn single-user servers without being root. Please check SudoSpawner github link for more information.

sudo /opt/anaconda3/bin/pip install git+https://github.com/jupyter/sudospawner

Create sudo user without password to serve the JupyterHub.

sudo useradd rhea

Add SudoSpawner to the sudoers file.

Edit sudoer file,
```
sudo visudo
```
Add below lines to the file

Defaults secure_path=/sbin:/bin:/usr/sbin:/usr/bin:/opt/anaconda3/bin
Cmnd_Alias JUPYTER_CMD=/opt/anaconda3/bin/sudospawner
rhea ALL=(%jupyterhub)NOPASSWD:JUPYTER_CMD

PAM configuration. For more details, please check JupyterHub wiki

sudo groupadd shadow
sudo chgrp shadow /etc/shadow
sudo chmod g+r /etc/shadow
sudo usermod -a -G shadow rhea

To run Jupyter Hub in port 80,

sudo setcap 'cap_net_bind_service=+ep' /usr/bin/node

Check permissions for non-root users. NOTE: rhea should not be added to wheel group.

```
ls -l /etc/shadow
```
If you don't see read write permission then set one by running
```
sudo chmod g+rw /etc/shadow
```

JupyterHub configuration,

Create a separate directory to maintain JupyterHub configuration file, SSL certificate and key file.

sudo mkdir /etc/jupyterhub
sudo chown rhea /etc/jupyterhub
cd /etc/jupyterhub/

Generate JupyterHub config file,

sudo -u rhea /opt/anaconda3/bin/jupyterhub --generate-config

Generate SSL certificate and key file,

sudo -u rhea openssl req -x509 -nodes -days 365 -newkey rsa:1024 -keyout jhubk.key -out jhubc.pem

Edit JupyterHub config file and add the below lines,

c.Authenticator.admin_users = {'admin1','admin2'} # Jupyter Hub administrators username. Only these users can stop and start servers of other users, and JupyterHub server itself from the browser
c.JupyterHub.ip = 'xxx.xx.xx.xx' # Add your server IP address
c.JupyterHub.port = 8888 # Port number you want to server JupyterHub
c.JupyterHub.ssl_cert = '/etc/jupyterhub/jhubc.pem'
c.JupyterHub.ssl_key = '/etc/jupyterhub/jhubk.key'
c.JupyterHub.spawner_class = 'sudospawner.SudoSpawner'

Create JupyterHub users account. For example, to create admin1 account,

```
sudo adduser admin1
sudo passwd admin1
```
If it is an administrator account, add it to wheel,

sudo usermod -aG wheel admin1

Create JupyterHub users group and add users to this group. Users who have account in the server level cannot access JupyterHub unless s/he is added to this users group.

Create JupyterHub users group

sudo groupadd jupyterhub

To add "admin1" user to this group

sudo usermod -a -G jupyterhub admin1

Test sudo account rhea set up,

sudo -u rhea sudo -n -u $USER sudospawner --help

sudo -u rhea sudo -n -u $USER echo 'fail'

sudo: a password is required

Check if PAM is working,

sudo -u rhea /opt/anaconda3/bin/python -c "import pamela, getpass; print(pamela.authenticate('$USER', getpass.getpass()))"

Open the port you entered in the JupyterHub config file if you haven't done previously

Let's serve the JupyterHub!!!

Run the below command to start the JupyterHub

sudo -u rhea /opt/anaconda3/bin/jupyterhub -f /etc/jupyterhub/jupyterhub_config.py

To run the start command as a background process, use below command.

nohup sudo -u rhea /opt/anaconda3/bin/jupyterhub -f jupyterhub_config.py &

Hope this helps. Please provide your comments if any.

Error and work around,

1. PAM session error. This error was not allowing to restart the JupyterHub. The work around was discussed in this github issue

Solution
Edit the PAM file, sudo vi /etc/pam.d/login . Comment the line if it is not commented already, #session required pam_loginuid.so

2. If you get, failed to create PAM sessions error, github issue.

Solution

Edit your JupyterHub config file and set, c.PAMAuthenticator.open_sessions = False

3. If you see, Proxy appears to be running at [] but I can't access it (HTTP 403: Forbidden)

Solution:

Kill the proxy that is running currently, ps ax | grep proxy and kill the process

4. When you get 500 Page error not found

Solution:

Check "rhea" account if it is added to wheel. If so, remove rhea user account from the wheel.

Real-time Stream Processing and Analytics in Large Scale Using Apache NiFi, HDFS, Hive and Power BI

Twitter’s developer platform provides numerous API endpoints to collect data, and build app on Twitter. Twitter streaming allows us to collect live tweets. In this blog, I show you how I used Twitter Streaming data to build interactive dashboards. I used Apache nifi, Power BI and Hive in this work. The tweets are filtered based on certain key words and geo location. You can find the Apache nifi template I built for this work from my Github repo . The nifi template has key words and geo-locations which are differ from what I used in my work. Apachi nifi – to collect tweets from Twitter stream, doing data transformation and routing the collected data to different systems such as Power BI and Hive database Power BI Streaming Dataset - Power BI has streaming dataset. I created streaming dataset and did the data ingestion from Nifi through Streaming dataset API. Power BI dashboard is built based on the streaming dataset. To know about how to create streaming dataset. Please check thi...

xanthavaccaMarch 4, 2022 at 6:31 PM
Las Vegas Hotel & Casino - JAMH Hub
The Las Vegas Convention & Visitors 서귀포 출장안마 Bureau is pleased to announce the opening 양주 출장샵 of The Casino 화성 출장마사지 at Wynn Hotel and Casino 삼척 출장샵 in Las 경상북도 출장안마 Vegas, NV.
kalinbabuNovember 8, 2022 at 9:46 PM
The course of at each playing website is a little different, but it typically doesn’t take greater than 5 minutes. Below are some simply ideas to help you|that will assist you|that can 점보카지노 help you} along as you study the game of Video Poker. Remember, like each different on line casino card sport, apply makes good. We can draw this by matching two of the three hold playing cards with the draw. After this, there are six remaing playing cards that will lead to Two Pair. There is not any probability of drawing to a Full House with this hold, however there is a probability of obtaining Three of a Kind.

gsadhas

Search This Blog