Real-time Stream Processing and Analytics in Large Scale Using Apache NiFi, HDFS, Hive and Power BI

Twitter’s developer platform provides numerous API endpoints to collect data, and build app on Twitter. Twitter streaming allows us to collect live tweets. In this blog, I show you how I used Twitter Streaming data to build interactive dashboards.

I used Apache nifi, Power BI and Hive in this work. The tweets are filtered based on certain key words and geo location. You can find the Apache nifi template I built for this work from my Github repo. The nifi template has key words and geo-locations which are differ from what I used in my work.

Apachi nifi – to collect tweets from Twitter stream, doing data transformation and routing the collected data to different systems such as Power BI and Hive database
Power BI Streaming Dataset - Power BI has streaming dataset. I created streaming dataset and did the data ingestion from Nifi through Streaming dataset API. Power BI dashboard is built based on the streaming dataset. To know about how to create streaming dataset. Please check this link Power BI Streaming dataset
Hive - Power BI streaming dataset is used to build dashboard. However, it can store only latest records. For example, it can store 200+ tweets, new tweets will delete the old ones. All the tweets collected from Twitter Stream stored in HDFS and Hive table built on top of HDFS files. Using Hive dataset, can do further process such as data analysis and model building.

You can download the nifi template from the Github repo and import it into your environment. We need to configure the below processors to run the template.

Get Twitter processor

GetTwitter processor allows us to connect to Twitter API endpoints. Configure the processor to set Consumer Key, Consumer Secret, Access Token, and Access Token Secret keys. By default , the nifi template collects tweet based on keywords “insurance, claim, rain, winter, storm” and tweeted from United States. You may need to modify the “Terms to Filter On” and “Locations to Filter On” property if you need.

Power BI data ingestion

Processed data need to be sent to the Power BI streaming dataset. InvokeHTTP processor allows us to call Power BI streaming dataset API endoint. From nifi the data is ingested to Power BI through REST API call. Note: Create the Power BI streaming dataset if you do not have one before you proceed further. Configure the InvokeHTTP processors to set; Remote URL - the Power BI API URL for streaming dataset; Basic Authentication Username – Power BI username; Basic Authentication Password – Power BI password.

Hive data ingestion

Power BI streaming dataset has limited storage and it will store only latest tweets. In order to access all the data, we store the tweets in HDFS files. Hive table needs to be created on top of the HDFS files. We store the processed tweets as delimited file. Please create the Hive table schema based on the delimited input file to PutHDFS processor. Configure PutHDFS processor to set Kerberos Principal – set this for authentication. Example, hdfs-twitter@domain.com; Directory – provide full path to the HDFS directory where you want to store the collected tweets.

The nifi template has two streams one for Power BI streaming dataset and another one for HDFS storage.

Power BI Dashboard
The following Power BI reports were created using the data we ingested from nifi.

Visualize live tweets

Users and their device type

Search tweets

Hive

The nifi template has pipeline to store the tweets in HDFS. Hive table is built on top of the HDFS files. Upon successful completion of table creation, you can access the data in Hive.

Hive query

Thoughts to future work!

Build Machine Learning model on collected data
Change nifi pipeline to run inference via API call to the model
Update Power BI report to include inference result

JupyterHub setup on Cent OS or Red Hat server

I set up a collaboration tool for a new Data Science team that allows everyone to share their work and use the computational resources efficiently. I'm a Data Scientist, my background is from Computer Engineering and I wear many hats! It was initially hard for me to know where to start but I got good references from the Internet. Eventually, I ended up with JupyterHub for my team! I jotted down some notes during my installation and thought to share with others. I hope it may help someone. I installed the JupyterHub initially on Cent OS and we migrated to Red Hat server. The following steps is applicable for both Cent OS and Red Hat. Check sudo access. You should have sudo access because we are going to serve JupyterHub via sudo user account. Download Anacoda distribution that comes with default packages and libraries. Anacond5.2.0-Linux-x86_64.sh is the latest one while I'm writing this blog. Install the Anaconda at /opt/anaconda3 location. Command to run: ...

lalehwagnonNovember 5, 2022 at 8:28 AM
Grosjean would check out his refined approach along with his longtime companion Bobby Sanchez. The cowl story defined why a newcomer would all of a sudden present up and spend a number of} nights in an out-of-the-way casino, gambling extra money than anyone else in the place. Slowly, because the evening grinded on, a number of the} high-returning prop bets began to pay off. 우리카지노 Signaled that ace, 2 and 4 all had a high chance of exclusion earlier than one flip of the playing cards, Munchkin threw a $100 chip throughout the felt and stated, “One hundred, Yo!
badynjaeleaNovember 7, 2022 at 7:41 AM
When distracted or mad, you can’t play to your full talents; you won’t make as much money might you’ve performed elsewhere. When reviewing a Straight or a Straight Flush on this state of affairs, you additionally need to examine whether it is open or inside. If it is open, this implies a card is lacking from the beginning or end, for instance a 3 or eight to add to a run of four, five, six and 7. If it is inside, this implies a minimum of|no less than} one of the three middle playing cards within the 블랙잭 run is required, for instance the five, six or seven in a run from four to eight. Hold the pair, except {you have|you've|you might have} four of five playing cards required for a Straight Flush. This supplies a 23.40% total probability of enhancing upon your hand.
okalanigaffaneyNovember 8, 2022 at 10:47 PM
Your finest guess is picking a forex that's just like Won in value, to avoid trade charges as much as possible. Those seeking to declare a simple €100 are in luck at Mozzart,BetShah, andFezbetas well as at18Bet,1xBet, andMelbet. You can simply get them atCloudbet, CasinoFair, 1xBit, andSpin Sports.888 Sportwill knock you out with their on line casino bonus packages. The eating spot from native restaurateurs Nick Son and Steven Wang will serve Korean barbecued meats {as well as|in addition to} other "conventional Korean entrees, side dishes, desserts and a full bar," in accordance 1xbet korea with the release. In an announcement, Son mentioned the venue will provide "an elevated genuine Korean BBQ eating expertise."
sandieyakubovNovember 9, 2022 at 4:47 AM
Our Videoslots evaluation can be incomplete if we didn’t mention the wagering requirements for bonuses. You’ll have 30 days to satisfy the wagering requirements, during which you’ll need to guess 35 instances your deposit. You will in a position to|be succesful of|have the flexibility to} benefit of|benefit from|reap the benefits of} further promotions when taking part in} at Videoslots on line casino. For example, there's a reload bonus the place find a way to|you possibly can} declare your funds using a selected bonus code while the promotion is stay. With the development of expertise, stay video games have been in existence. As long because the gamers have 메리트카지노 an internet connection, they will start gambling in real-time.

gsadhas

Search This Blog

Real-time Stream Processing and Analytics in Large Scale Using Apache NiFi, HDFS, Hive and Power BI

Labels

Comments

Post a Comment

Popular posts from this blog

JupyterHub setup on Cent OS or Red Hat server