X

An Oracle blog about ZFS Storage

How To Retrieve Analytics Data from ZFS Storage?

Hisao Tsujimura
Principal Software Engineer

I Can Read Product Manual, So Why This Post?


While product documentation may have sufficient information, it is sometimes hard to imagine how it is going to come out until you finish the work.  I would be happy to learn if my small experiment for retrieving Analytics data via Python is useful to others.

I hope that you can learn how to overlay two or more Analytics data into the same graph.
Having multiple datasets in a single graph makes it easier to read the co-relation of the datapoints among different datasets.

Due to the length of the post, there will be two posts.
This post is going to talk about how you retrieve your Analytics data using Python.

Summary of This Post

I have to assume that you know how to use RESTful API.  This post is more focused on how to retrieve the data from ZFS Storage using Python and the "requests" package.
I am going to discuss:

  1. Dataset v Worksheet
  2. The goal of this post
  3. Reading data from a dataset.
  4. Problems Found

Dataset v. Worksheet

In ZFS Storage, statistics data are called "datasets."  There is the same term used in the context of the ZFS filesystem, but they are not related at all.  

  • Dataset 
    • A set of or a collection of data.   For example, how many NFS commands processed every second or disk I/O every second is a kind of datasets.
  • Worksheets
    • In contrast, a worksheet has data points from multiple datasets at the same time.  In the previous example, we would have the numbers for NFS commands and disk I/O processed every second, plotted in the same graph.  Worksheets can be saved and opened whenever necessary.

The Goal of This Post

In ZFS Storage's standard Analytics, each graph has a single dataset. Therefore multiple datasets result in separate charts. We are unable to overlay the graphs.  If we can overlay or calculate numbers based on the Analytics data points, we can probably analyze from the same angle.

In this post, we are going to retrieve the Analytics data.  I will attach a sample Python code using Python 3 and "requests" package.  I will write another entry to overlay the downloaded Analytics data.


Reading Data From a Dataset

I will talk about the steps to retrieve the single dataset first.  We can create worksheets from RESTful API as well to gather datasets of your choice into a worksheet.
We will discuss reading data from the worksheet on a different occasion.  We need to follow the steps below to download the data from ZFS Storage Analytics.

  1. Authenticate the client
  2. List up the dataset to obtain the path we need to create a request.)
  3. Download the data in a dataset.

It would be easier for you to try out with RESTful API clients such as Advanced REST Client and then write down the code.

Authenticate The Client

You can access ZFS Storage's RESTful API by specifying "https://<ZFS Storage IP address>:215/.z"  Accessing this URL from a browser shows you a Browser User Interface (BUI).  Adding an action that starts with "/api" after allows you to use REST API.
The URL you need to authenticate the client is something like below.  (The IP address is from my closed test box.)

https://192.168.150.21:215/api/access/v1

In the sample program list.py, I use the "requests" package to access to ZFS Storage.
Other calls share the same, but I use the following parameters.

  1. auth=(user_id, password)
  2. proxies = no_proxy 
  3. verify=False

You need to set ZFS Storage's user ID and password for each user_id and password, respectively.  

By specifying no_proxy in proxies, it will tell the "requests" package that we don't need to go through a proxy server to connect to ZFS Storage.  If you have http_proxy or https_proxy settings in your environment variable, the package tries to use it, so it is a good idea to stop that behavior.  

verify=False is the setting to silence the warning about the SSL certificate.   While we still get the below message, even with verify is set to false, I am suppressing the alert using "urllib3."

InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings InsecureRequestWarning)

List Datasets

For the sake of explanation, let me name the part of the URL.

Let me name the part of the URL after "/api" is an action.  In the below example, "/api/access/v1" is an action.

https://192.168.150.21:215/api/access/v1

To retrieve the data from a dataset, we need a URL, something like below.

https://192.168.150.21:215/api/analytics/v1/datasets/nfs3.ops[op]

At the time we authenticate our client, we do not yet know the exact action we are going to use to read the data from the datasets.
So, we need to list out the datasets, and their actions or "href" entires first.

https://192.168.150.21:215/api/analytics/v1/datasets

Using the above URL, you would get a response, such as below.
The "name" field has the name of the dataset, "explanation" is the description and "href" has the action that corresponds to the dataset.
So if you want to get the data from "NFSv3 operations per second," you need to know a "href" --  "/api/analytics/v1/datasets/nfs3.ops[op]".  Then, we can specify the next URL, something like "https://192.168.150.21:215/api/analytics/v1/datasets/nfs3.ops[op]." 

{
"datasets": [
    {
        "name": "arc.accesses[hit/miss]",
        "grouping": "Cache",
        "explanation": "ARC accesses per second broken down by hit/miss",
        "incore": 649724,
        "size": 20304264,
        "suspended": false,
        "activity": "none",
        "overhead": "low",
        "since": "Thu Apr 25 2019 03:09:03 GMT+0000 (UTC)",
        "last_access": "Thu Apr 25 2019 03:09:02 GMT+0000 (UTC)",
        "dataset": "dataset-000",
        "href": "/api/analytics/v1/datasets/arc.accesses[hit/miss]"
    },
    (......)
    {
        "name": "nfs3.ops",
        "grouping": "Protocol",
        "explanation": "NFSv3 operations per second",
        "incore": 205920,
        "size": 4338584,
        "suspended": false,
        "activity": "none",
        "overhead": "low",
        "since": "Thu Apr 25 2019 03:09:17 GMT+0000 (UTC)",
        "last_access": "Sun Jul 28 2019 04:20:14 GMT+0000 (UTC)",
        "dataset": "dataset-026",
        "href": "/api/analytics/v1/datasets/nfs3.ops"
    },
    {
        "name": "nfs3.ops[op]",
        "grouping": "Protocol",
        "explanation": "NFSv3 operations per second broken down by type of operation",
        "incore": 212627,
        "size": 8420936,
        "suspended": false,
        "activity": "none",
        "overhead": "low",
         "since": "Thu Apr 25 2019 03:09:17 GMT+0000 (UTC)",
        "last_access": "Wed Aug 21 2019 08:38:03 GMT+0000 (UTC)",
        "dataset": "dataset-027",
        "href": "/api/analytics/v1/datasets/nfs3.ops[op]"
    },
    (.....)

 

Read Data From the Dataset

 

When you specify something like https://192.168.150.21:215/api/analytics/v1/datasets/nfs3.ops[op], you will get the dataset property instead of data.

{
    "dataset": {
        "href": "/api/analytics/v1/datasets/nfs3.ops[op]",
        "name": "nfs3.ops[op]",
        "grouping": "Protocol",
        "explanation": "NFSv3 operations per second broken down by type of operation",
        "incore": 369754,
        "size": 4853020,
        "suspended": false,
        "activity": "none",
        "overhead": "low",
        "since": "Mon Mar 25 2019 05:41:58 GMT+0000 (UTC)",
        "last_access": "Thu Oct 31 2019 07:53:04 GMT+0000 (UTC)"
    }
}

 

For you to read data from the dataset, you need to add "/data" at the end.  Then it would return you a response something like below.
 

{
    "data": {
        "sample": 432552627,
            "data": {
            "value": 0
            },
        "startTime": "20191031T07:56:35",
        "samples": 432552628
    }
}

The dictionary in "data" has three indices – sample, startTime, and samples.

  • sample – A sequential number when of this data point
  • startTime – A timestamp for this data point.  Time is in UTC.
  • samples – The serial number for the last data point in this dataset.

By specifying the range, we can get multiple data points.

Problems Found

There are a few issues found during my experiment.

(1) It collects "current" data.

Without specifying anything, ZFS Storage will provide us "live" data. 
You can create a params variable such as below and pass it to "request" to avoid this problem.

params = {
    "startTime" : "20190808T00:00:00",
    "span" : "day",
    "granularity" : "hour"
}

 

You can find a description of these parameters in Oracle ZFS Storage Appliance RESTful API Guide, Release OS 8.8.x.  (See "Get Dataset Data" section.)
The below is the description of each parameter.

  • startTime: Start date/time for your query
  • span:  Data collection span (minute, hour, day, week, month, year)
  • granularity: how detailed your data should be (minute, hour, day, week, month, or year)


Please also note that ZFS Storage would remove your Analytics data based on the retention policy.  So please make sure you are querying the range of date/time where data is available.


(2) There are multiple patterns in answer

 

ZFS Storage replies in a different format based on what type of data you query.  For instance, datasets with and without a drill-down have different response formats.

In a previous example, since there was not any available data, the field came back as '"value": 0.'  However, when we query NFS commands IOPS, each data point comes back in the form of a dictionary entry.

(.....)
{'data':
    {'data':
    [
        {'key': 'read', 'max': 578, 'min': 0, 'value': 0},
        {'key': 'getattr',
        'max': 72,
        'min': 0,
        'value': 0}
    ],
    'max': 699,
    'min': 0,
    'value': 0},
    'sample': 426440359,
    'samples': 426443959,
    'startTime': '20190822T01:43:32'},
(.....)


Having multiple formats in response is not very handy.  For example, if I use json_normalize to create a Pandas' data frame from the dictionary, it would create something like below.
You would get NaN columns.

22 426438231 ... NaN
23 426441831 ... [{'max': 578, 'key': 'read', 'value': 0, 'min'...

 

I guess my understanding of Pandas is yet to be improved.


In The Next Post

In the next post, I would like to explain how we overcome the varying format.
Thank you for reading this far.

References

Sample Code

The sample code is available on Gitlab.  The code assumes to use Python 3.6 and ZFS Storage 8.7.17.
Since any Python versions that understand Python3 code should work with necessary packages.
ZFS Storage REST API has not changed, so the same should work with the older or newer versions of ZFS Storage software.
However, the above is the combination that I ran my code and debug.

Gitlab: https://gitlab.com/hisao.tsujimura/public/blob/master/zfssa-rest-analytics/list1.py

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.