While product documentation may have sufficient information, it is sometimes hard to imagine how it is going to come out until you finish the work. I would be happy to learn if my small experiment for retrieving Analytics data via Python is useful to others.
I hope that you can learn how to overlay two or more Analytics data into the same graph.
Having multiple datasets in a single graph makes it easier to read the co-relation of the datapoints among different datasets.
Due to the length of the post, there will be two posts.
This post is going to talk about how you retrieve your Analytics data using Python.
I have to assume that you know how to use RESTful API. This post is more focused on how to retrieve the data from ZFS Storage using Python and the "requests" package.
I am going to discuss:
In ZFS Storage, statistics data are called "datasets." There is the same term used in the context of the ZFS filesystem, but they are not related at all.
In ZFS Storage's standard Analytics, each graph has a single dataset. Therefore multiple datasets result in separate charts. We are unable to overlay the graphs. If we can overlay or calculate numbers based on the Analytics data points, we can probably analyze from the same angle.
In this post, we are going to retrieve the Analytics data. I will attach a sample Python code using Python 3 and "requests" package. I will write another entry to overlay the downloaded Analytics data.
I will talk about the steps to retrieve the single dataset first. We can create worksheets from RESTful API as well to gather datasets of your choice into a worksheet.
We will discuss reading data from the worksheet on a different occasion. We need to follow the steps below to download the data from ZFS Storage Analytics.
It would be easier for you to try out with RESTful API clients such as Advanced REST Client and then write down the code.
You can access ZFS Storage's RESTful API by specifying "https://<ZFS Storage IP address>:215/.z" Accessing this URL from a browser shows you a Browser User Interface (BUI). Adding an action that starts with "/api" after allows you to use REST API.
The URL you need to authenticate the client is something like below. (The IP address is from my closed test box.)
https://192.168.150.21:215/api/access/v1
In the sample program list.py, I use the "requests" package to access to ZFS Storage.
Other calls share the same, but I use the following parameters.
You need to set ZFS Storage's user ID and password for each user_id and password, respectively.
By specifying no_proxy in proxies, it will tell the "requests" package that we don't need to go through a proxy server to connect to ZFS Storage. If you have http_proxy or https_proxy settings in your environment variable, the package tries to use it, so it is a good idea to stop that behavior.
verify=False is the setting to silence the warning about the SSL certificate. While we still get the below message, even with verify is set to false, I am suppressing the alert using "urllib3."
InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings InsecureRequestWarning)
For the sake of explanation, let me name the part of the URL.
Let me name the part of the URL after "/api" is an action. In the below example, "/api/access/v1" is an action.
https://192.168.150.21:215/api/access/v1
To retrieve the data from a dataset, we need a URL, something like below.
https://192.168.150.21:215/api/analytics/v1/datasets/nfs3.ops[op]
At the time we authenticate our client, we do not yet know the exact action we are going to use to read the data from the datasets.
So, we need to list out the datasets, and their actions or "href" entires first.
https://192.168.150.21:215/api/analytics/v1/datasets
Using the above URL, you would get a response, such as below.
The "name" field has the name of the dataset, "explanation" is the description and "href" has the action that corresponds to the dataset.
So if you want to get the data from "NFSv3 operations per second," you need to know a "href" -- "/api/analytics/v1/datasets/nfs3.ops[op]". Then, we can specify the next URL, something like "https://192.168.150.21:215/api/analytics/v1/datasets/nfs3.ops[op]."
{ |
When you specify something like https://192.168.150.21:215/api/analytics/v1/datasets/nfs3.ops[op], you will get the dataset property instead of data.
{ |
For you to read data from the dataset, you need to add "/data" at the end. Then it would return you a response something like below.
{ |
The dictionary in "data" has three indices – sample, startTime, and samples.
By specifying the range, we can get multiple data points.
There are a few issues found during my experiment.
Without specifying anything, ZFS Storage will provide us "live" data.
You can create a params variable such as below and pass it to "request" to avoid this problem.
|
|
You can find a description of these parameters in Oracle ZFS Storage Appliance RESTful API Guide, Release OS 8.8.x. (See "Get Dataset Data" section.)
The below is the description of each parameter.
Please also note that ZFS Storage would remove your Analytics data based on the retention policy. So please make sure you are querying the range of date/time where data is available.
ZFS Storage replies in a different format based on what type of data you query. For instance, datasets with and without a drill-down have different response formats.
In a previous example, since there was not any available data, the field came back as '"value": 0.' However, when we query NFS commands IOPS, each data point comes back in the form of a dictionary entry.
|
|
Having multiple formats in response is not very handy. For example, if I use json_normalize to create a Pandas' data frame from the dictionary, it would create something like below.
You would get NaN columns.
|
|
I guess my understanding of Pandas is yet to be improved.
In the next post, I would like to explain how we overcome the varying format.
Thank you for reading this far.
The sample code is available on Gitlab. The code assumes to use Python 3.6 and ZFS Storage 8.7.17.
Since any Python versions that understand Python3 code should work with necessary packages.
ZFS Storage REST API has not changed, so the same should work with the older or newer versions of ZFS Storage software.
However, the above is the combination that I ran my code and debug.
Gitlab: https://gitlab.com/hisao.tsujimura/public/blob/master/zfssa-rest-analytics/list1.py