Twitter’s developer platform provides numerous API endpoints
to collect data, and build app on Twitter. Twitter streaming allows us to
collect live tweets. In this blog, I show you how I used Twitter Streaming data
to build interactive dashboards.
I used Apache nifi, Power BI and Hive in this work. The
tweets are filtered based on certain key words and geo location. You can find
the Apache nifi template I built for this work from my Github repo. The nifi
template has key words and geo-locations which are differ from what I used in
my work.
- Apachi nifi – to collect tweets from Twitter stream, doing data transformation and routing the collected data to different systems such as Power BI and Hive database
- Power BI Streaming Dataset - Power BI has streaming dataset. I created streaming dataset and did the data ingestion from Nifi through Streaming dataset API. Power BI dashboard is built based on the streaming dataset. To know about how to create streaming dataset. Please check this link Power BI Streaming dataset
- Hive - Power BI streaming dataset is used to build dashboard. However, it can store only latest records. For example, it can store 200+ tweets, new tweets will delete the old ones. All the tweets collected from Twitter Stream stored in HDFS and Hive table built on top of HDFS files. Using Hive dataset, can do further process such as data analysis and model building.
You can download the nifi template from the Github repo and import it into your environment. We need to configure the below processors to run the template.
Get Twitter processor
Power BI data ingestion
Hive data ingestion
The nifi template has two streams one for Power BI streaming dataset and another one for HDFS storage.
Power BI Dashboard
The following Power BI reports were created using the data we ingested from nifi.
Thoughts to future work!
Get Twitter processor
GetTwitter processor allows us to connect to Twitter API endpoints. Configure the processor to set Consumer Key, Consumer Secret, Access Token, and Access Token Secret keys. By default , the nifi template collects tweet based on keywords “insurance, claim, rain, winter, storm” and tweeted from United States. You may need to modify the “Terms to Filter On” and “Locations to Filter On” property if you need.
Power BI data ingestion
Processed data need to be sent to the Power BI streaming dataset. InvokeHTTP processor allows us to call Power BI streaming dataset API endoint. From nifi the data is ingested to Power BI through REST API call. Note: Create the Power BI streaming dataset if you do not have one before you proceed
further. Configure the InvokeHTTP processors to set; Remote URL - the Power BI API URL for streaming dataset; Basic Authentication Username – Power BI username; Basic Authentication Password – Power BI
password.
Hive data ingestion
Power BI streaming dataset has limited storage and it will store only latest tweets. In order to access all the data, we store the tweets in HDFS files. Hive table needs to be created on top of the HDFS files. We store the processed tweets as delimited file. Please create the Hive table schema based on the delimited input file to PutHDFS processor. Configure PutHDFS processor to set Kerberos Principal – set this for authentication. Example, hdfs-twitter@domain.com; Directory – provide full path to the HDFS directory where
you want to store the collected tweets.
The nifi template has two streams one for Power BI streaming dataset and another one for HDFS storage.
Power BI Dashboard
The following Power BI reports were created using the data we ingested from nifi.
![]() |
Visualize live tweets |
![]() |
Users and their device type |
![]() |
Search tweets |
Hive
The nifi template has pipeline to store the tweets in HDFS. Hive table is built on top of the HDFS files. Upon successful completion of table creation, you can access the data in Hive.
![]() |
Hive query |
Thoughts to future work!
- Build Machine Learning model on collected data
- Change nifi pipeline to run inference via API call to the model
- Update Power BI report to include inference result