Skip to main content

Posts

Showing posts from 2018

Real-time Stream Processing and Analytics in Large Scale Using Apache NiFi, HDFS, Hive and Power BI

Twitter’s developer platform provides numerous API endpoints to collect data, and build app on Twitter. Twitter streaming allows us to collect live tweets. In this blog, I show you how I used Twitter Streaming data to build interactive dashboards. I used Apache nifi, Power BI and Hive in this work. The tweets are filtered based on certain key words and geo location. You can find the Apache nifi template I built for this work from my Github repo . The nifi template has key words and geo-locations which are differ from what I used in my work. Apachi nifi – to collect tweets from Twitter stream, doing data transformation and routing the collected data to different systems such as Power BI and Hive database Power BI Streaming Dataset - Power BI has streaming dataset. I created streaming dataset and did the data ingestion from Nifi through Streaming dataset API. Power BI dashboard is built based on the streaming dataset. To know about how to create streaming dataset. Please check thi...

JupyterHub setup on Cent OS or Red Hat server

I set up a collaboration tool for a new Data Science team that allows everyone to share their work and use the computational resources efficiently. I'm a Data Scientist, my background is from Computer Engineering and I wear many hats! It was initially hard for me to know where to start but I got good references from the Internet. Eventually, I ended up with JupyterHub for my team! I jotted down some notes during my installation and thought to share with others. I hope it may help someone. I installed the JupyterHub initially on Cent OS and we migrated to Red Hat server. The following steps is applicable for both Cent OS and Red Hat. Check sudo access. You should have sudo access because we are going to serve JupyterHub via sudo user account.  Download  Anacoda distribution  that comes with default packages and libraries. Anacond5.2.0-Linux-x86_64.sh is the latest one while I'm writing this blog. Install the Anaconda at /opt/anaconda3 location. Command to run: ...