I Can Read Product Manual, So Why This Post?
While product documentation may have sufficient information, it is sometimes hard to imagine how it is going to come out until you finish the work. I would be happy to learn if my small experiment for retrieving Analytics data via Python is useful to others.
I hope that you can learn how to overlay two or more Analytics data into the same graph.
Having multiple datasets in a single graph makes it easier to read the co-relation of the datapoints among different datasets.
Due to the length of the post, there will be two posts.
This post is going to talk about how you retrieve your Analytics data using Python.
Summary of This Post
I have to assume that you know how to use RESTful API. This post is more focused on how to retrieve the data from ZFS Storage using Python and the “requests” package.
I am going to discuss:
- Dataset v Worksheet
- The goal of this post
- Reading data from a dataset.
- Problems Found
Dataset v. Worksheet
In ZFS Storage, statistics data are called “datasets.” There is the same term used in the context of the ZFS filesystem, but they are not related at all.
- Dataset
- A set of or a collection of data. For example, how many NFS commands processed every second or disk I/O every second is a kind of datasets.
- Worksheets
- In contrast, a worksheet has data points from multiple datasets at the same time. In the previous example, we would have the numbers for NFS commands and disk I/O processed every second, plotted in the same graph. Worksheets can be saved and opened whenever necessary.
The Goal of This Post
In ZFS Storage’s standard Analytics, each graph has a single dataset. Therefore multiple datasets result in separate charts. We are unable to overlay the graphs. If we can overlay or calculate numbers based on the Analytics data points, we can probably analyze from the same angle.
In this post, we are going to retrieve the Analytics data. I will attach a sample Python code using Python 3 and “requests” package. I will write another entry to overlay the downloaded Analytics data.
Reading Data From a Dataset
I will talk about the steps to retrieve the single dataset first. We can create worksheets from RESTful API as well to gather datasets of your choice into a worksheet.
We will discuss reading data from the worksheet on a different occasion. We need to follow the steps below to download the data from ZFS Storage Analytics.
- Authenticate the client
- List up the dataset to obtain the path we need to create a request.)
- Download the data in a dataset.
It would be easier for you to try out with RESTful API clients such as Advanced REST Client and then write down the code.
Authenticate The Client
You can access ZFS Storage’s RESTful API by specifying “https://<ZFS Storage IP address>:215/.z” Accessing this URL from a browser shows you a Browser User Interface (BUI). Adding an action that starts with “/api” after allows you to use REST API.
The URL you need to authenticate the client is something like below. (The IP address is from my closed test box.)
https://192.168.150.21:215/api/access/v1
In the sample program list.py, I use the “requests” package to access to ZFS Storage.
Other calls share the same, but I use the following parameters.
- auth=(user_id, password)
- proxies = no_proxy
- verify=False
You need to set ZFS Storage’s user ID and password for each user_id and password, respectively.
By specifying no_proxy in proxies, it will tell the “requests” package that we don’t need to go through a proxy server to connect to ZFS Storage. If you have http_proxy or https_proxy settings in your environment variable, the package tries to use it, so it is a good idea to stop that behavior.
verify=False is the setting to silence the warning about the SSL certificate. While we still get the below message, even with verify is set to false, I am suppressing the alert using “urllib3.”
InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings InsecureRequestWarning)
List Datasets
For the sake of explanation, let me name the part of the URL.
Let me name the part of the URL after “/api” is an action. In the below example, “/api/access/v1” is an action.
https://192.168.150.21:215/api/access/v1
To retrieve the data from a dataset, we need a URL, something like below.
https://192.168.150.21:215/api/analytics/v1/datasets/nfs3.ops[op]
At the time we authenticate our client, we do not yet know the exact action we are going to use to read the data from the datasets.
So, we need to list out the datasets, and their actions or “href” entires first.
https://192.168.150.21:215/api/analytics/v1/datasets
Using the above URL, you would get a response, such as below.
The “name” field has the name of the dataset, “explanation” is the description and “href” has the action that corresponds to the dataset.
So if you want to get the data from “NFSv3 operations per second,” you need to know a “href” — “/api/analytics/v1/datasets/nfs3.ops[op]”. Then, we can specify the next URL, something like “https://192.168.150.21:215/api/analytics/v1/datasets/nfs3.ops[op].”
{ |
Read Data From the Dataset
When you specify something like https://192.168.150.21:215/api/analytics/v1/datasets/nfs3.ops[op], you will get the dataset property instead of data.
{ |
For you to read data from the dataset, you need to add “/data” at the end. Then it would return you a response something like below.
{ |
The dictionary in “data” has three indices – sample, startTime, and samples.
- sample – A sequential number when of this data point
- startTime – A timestamp for this data point. Time is in UTC.
- samples – The serial number for the last data point in this dataset.
By specifying the range, we can get multiple data points.
Problems Found
There are a few issues found during my experiment.
(1) It collects “current” data.
Without specifying anything, ZFS Storage will provide us “live” data.
You can create a params variable such as below and pass it to “request” to avoid this problem.
|
|
You can find a description of these parameters in Oracle ZFS Storage Appliance RESTful API Guide, Release OS 8.8.x. (See “Get Dataset Data” section.)
The below is the description of each parameter.
- startTime: Start date/time for your query
- span: Data collection span (minute, hour, day, week, month, year)
- granularity: how detailed your data should be (minute, hour, day, week, month, or year)
Please also note that ZFS Storage would remove your Analytics data based on the retention policy. So please make sure you are querying the range of date/time where data is available.
(2) There are multiple patterns in answer
ZFS Storage replies in a different format based on what type of data you query. For instance, datasets with and without a drill-down have different response formats.
In a previous example, since there was not any available data, the field came back as ‘”value”: 0.’ However, when we query NFS commands IOPS, each data point comes back in the form of a dictionary entry.
|
|
Having multiple formats in response is not very handy. For example, if I use json_normalize to create a Pandas’ data frame from the dictionary, it would create something like below.
You would get NaN columns.
|
|
I guess my understanding of Pandas is yet to be improved.
In The Next Post
In the next post, I would like to explain how we overcome the varying format.
Thank you for reading this far.
References
- Oracle ZFS Storage Appliance Analytics Guide, Release OS8.8.x
- Oracle® ZFS Storage Appliance RESTful API Guide, Release OS8.8.x
Sample Code
The sample code is available on Gitlab. The code assumes to use Python 3.6 and ZFS Storage 8.7.17.
Since any Python versions that understand Python3 code should work with necessary packages.
ZFS Storage REST API has not changed, so the same should work with the older or newer versions of ZFS Storage software.
However, the above is the combination that I ran my code and debug.
Gitlab: https://gitlab.com/hisao.tsujimura/public/blob/master/zfssa-rest-analytics/list1.py
