In part 1 of this series, we covered the definition and types of streaming data.  That blog can be found here.  Once you have identified your business need to utilize streaming data in your Azure analytics solution, it is time to determine what options and services are available to connect to and ingest this data. 

Within the Azure Streaming Analytics platform, the following options are available, IoT Hubs, Event Hubs and Service Buses.  In part 2 of this series, we will be ingesting streaming data using Event Hubs.  We will be leveraging Python to generate some telemetry data to push to the Event Hub.

 

First, we need to create the event hub resource in our Azure Resource Group.

 

 

 

Create the Event Hub Namespace.  The namespace is a container that will allow you to create one or many event hubs.  I chose the basic pricing tier for demo purposes.  Throughput controls the Event Hub traffic.  One throughput unit allows 1 MB per second ingress and 2 MB per second egress.  Since this is only a proof of concept, I chose one Throughput.  This may be a setting to evaluate based on data size and frequency of ingestion to prevent throttling.

 

 

 

After we create the Event Hub namespace, we need to create and configure the Event Hub itself.  The Event Hub is the resource that receives, processes and stores data messages

 

 

 

There are several configurations in creating an event hub that can be important depending the amount and frequency of data being pushed to it.  The partition count determines how the subset of messages are sequenced and read.  Newer data is always added to the end of the partition.   The level of concurrency for reading event hubs has a huge impact on performance.  Since partition counts cannot be changed after the fact, they should be set according to the expected workload for scalability.  Microsoft recommends the number of partitions should be equal to or greater than the number of throughput units for best performance.  Message retention determines how long messages are stored in the event hub.

 

 

 

Now that we have the Event Hub instance created, we need to get the shared access key to define how to connect to the Event Hub from an application.  From the namespace window, select Shared access policies.

 

 

 

 

Once in the shared access policies, select RootManageSharedAccessKey.

 

 

 

You want to use the primary connection string of RootManageSharedAccessKey.  Click the copy button to place it in your clipboard.

 

 

 

For the sake of this demo, I created a python script to generate record sets to send as messages to the event hub.  Below you will see the test code and parameters configured.  The key components of this script are ensuring you have the json and service bus libraries for python.  There is also an event hub library as well so feel free to use whatever libraries you feel most comfortable with.  We will use the shared access key from above to make the connection to our Event Hub.  The remainder of the script is simply generating random weather data and sending to the service bus as events in the form of json.  Replace  the following items with your own configuration values:  ADDRESS, USER, KEY, service_namespace, send_event.

 

import uuid
import datetime
import random
import json
from azure.servicebus.control_client import ServiceBusService

# Address can be in either of these formats:
# "amqps://<URL-encoded-SAS-policy>:<URL-encoded-SAS-key>@<mynamespace>.servicebus.windows.net/myeventhub"
# "amqps://<mynamespace>.servicebus.windows.net/myeventhub"
# For example:
ADDRESS = "amqps://StreamWeatherData.servicebus.windows.net/weatherdatamessagestream"

# SAS policy and key are not required if they are encoded in the URL
USER = "RootManageSharedAccessKey"
KEY = "fh6m8IpVSACft9UQyConSxWIbs/TBtqUHfAEDJesWiw="

sbs = ServiceBusService(service_namespace='StreamWeatherData', shared_access_key_name=USER, shared_access_key_value=KEY)
devices = []
for x in range(0, 10):
    devices.append(str(uuid.uuid4()))

for y in range(0,10):
    for dev in devices:
        reading = {"WeatherTimeStamp": str(datetime.datetime.utcnow()), "Temperature": random.randint(20, 100), "Visibility":  random.randint(0, 10), "WindSpeed": random.randint(0,60) }
        s = json.dumps(reading)
        sbs.send_event('weatherdatamessagestream', s)
    #print (reading)

 

 

Once you modify and run the above python code in your IDE of choice according to your environment, you should be able to refresh the metrics section of the Event Hub.  Below you will see the 10 events we generated from the python script.

 

 

 

Now that we have data in our event hub, we have a couple of options for integrating and using the data.  We can set up a streaming job or create an event followed by some sort of trigger.  For the streaming jobs option, I will be releasing another blog shortly to finish the 3rd and final part of this series.

Stay tuned…

 

As businesses and corporations become exposed to more tools and begin adopting a wider variety of data points associated with their industry, more opportunities and challenges arise with integrating the disparate data sources to provide insights about the business. In this blog series, I want to talk about streaming data into Azure for analytics.

Generally, standard applications are tied to a relational backend database. These standard applications require specific user interaction to generate and modify data written to a transactional database. Historically, to get this data into your data warehouse, an Extraction, Transformation, and Loading (ETL) process is developed. Most of the time, the ETL process to refresh the data happens on specified intervals. From my experience, this type of solution is still suitable today but considered a warm or cold repository depending on the frequency of data loads. With the transition of technology, ETL has also turned into an ELT (Extraction, Load, Transformation process).

The Internet of Things (IoT) has introduced a new element of cloud architecture among modern businesses and corporations; we also see the transition of business intelligence and advanced analytics spanning into more, near real-time solutions. Data from sensors, logs, portable devices, social media, and control and network systems can be generated quickly by user interaction or through some form of automation. A few examples of streaming data in these platforms would be GPS information in transportation, social media posts, devices that measure temperature, and manufacturing equipment with sensors that generates logs.

Due to the frequency, these data points accumulate and are usually not stored on the devices and systems for long periods. This data needs to be captured and stored quickly in order to maintain history. The data can be structured, unstructured or semi-structured due to the many forms and devices it can be generated from. I look forward to providing some solutions for working with this type of data using Azure tools and services. Please stay tuned for future parts to this blog series.