Counting Distinctly

Scouring the forum this morning I almost had an 'instant message' session with Darice, we were pinging the forum back and forth so quickly - a great way to communicate and really shows the value of the forum. There were problems getting counting working when re-grouping. We ended up exchanging emails and I think things are going to be working now.
Another thread on counting came up today too and is worthy of mention here. It came from Vadim (nagornyi):





Here is sample XML:

<ROWSET>
<ROW>
<DEPT>A1</DEPT>
<DATA>2112</DATA>
</ROW>
<ROW>
<DEPT>A1</DEPT>
<DATA>7985</DATA>
</ROW>
<ROW>
<DEPT>B1</DEPT>
<DATA>8452</DATA>
</ROW>
<ROW>
<DEPT>C2</DEPT>
<DATA>2159</DATA>
</ROW>
</ROWSET>


Using Table Wizard, I am getting this report:

A1 2112
7985
B1 8452
C2 2159

Everything OK, but at the report header I need to count the number of DEPT, i.e. 3. How to get it? Thanks.




I have to admit I rushed at this one and missed the crux of the issue - Im making amends here and sharing a hidden feature of the RTF templates. I missed the fact that Vadim wanted the number of unique DEPTs not the total number ie 3 not 4.


There are ways to do this programmatically in XSL but its nasty, very nasty. Thats why we have a function that makes life tres simple - 'distinct_values'. Its a Publisher extension that does all the heavy lifting for you in the background, it takes the format:


xdoxslt:distinct_values(Node name)


and returns a numeric value.


In Vadim's case


<?count(xdoxslt:distinct_values(/ROWSET/ROW/DEPT))?>
will do the trick and return '3' rather than '4'.


A word of caution, its expensive, just like DISTINCT in SQL its going to require a lot of processing to read the node tree in and process it eliminating duplicates. There are other slightly less expensive ways to achieve the same using XPATH but for now, just use it wisely.



An update - Nilanshu came up with the XPATH solution I was aluding to above ... remember the data must be ordered by the data you want to count and remove the duplicates from.


If your data is ordered, then you may simply count all ROWS that have a different DEPT than the previous one, keeping into account the first row of course. This might give you a better performance for large XML files.

<?count(ROW[position()=1 or ./preceding-sibling::ROW[1]/DEPT!=DEPT])?>

XPATH 2.0 does provide something similar, but AFAIK XPATH 2.0 does not work in Oracle XMLP yet.

<?count(distinct-values('DEPT')?>

Comments:

I don't expect that the XSL expression delivers good performance for large data sets: The algorithm is O(square(n)) - all good sorting algorithms are O(n*log(n)) - this is at most a sorting problem. Having a hash table or bins - the problem could be solved in nearly O(n). It may avoid running out of memory if the other expressions rely on the data set fitting into memory.

Posted by Klaus Fabian on February 01, 2010 at 06:59 AM MST #

Hi, Distinct count syntax does not work for my RTF template. I'm using the syntax as follows: But when I deploy in the application or try to view from local machine the Excel/PDF/HTML outputs. The errors thrown is as follows: ************************************ at java.lang.reflect.Method.invoke(Unknown Source) at oracle.apps.xdo.common.xml.XSLT10gR1.invokeNewXSLStylesheet(Unknown Source) at oracle.apps.xdo.common.xml.XSLT10gR1.transform(Unknown Source) at oracle.apps.xdo.common.xml.XSLTWrapper.transform(Unknown Source) at oracle.apps.xdo.template.fo.util.FOUtility.generateFO(Unknown Source) at oracle.apps.xdo.template.fo.util.FOUtility.generateFO(Unknown Source) at oracle.apps.xdo.template.FOProcessor.createFO(Unknown Source) at oracle.apps.xdo.template.FOProcessor.generate(Unknown Source) at RTF2PDF.runRTFto(RTF2PDF.java:629) at RTF2PDF.runXDO(RTF2PDF.java:439) at RTF2PDF.main(RTF2PDF.java:289) Caused by: oracle.xdo.parser.v2.XPathException: Function 'distinct-values' not found. at oracle.xdo.parser.v2.XSLProcessor.reportException(XSLProcessor.java:806) at oracle.xdo.parser.v2.XSLProcessor.newXSLStylesheet(XSLProcessor.java:571) ... 14 more ******************************************************** Please let me know what needs to be done. Thanks Ashwini

Posted by Ashwini on April 20, 2010 at 02:26 AM MDT #

I'm using the syntax as posted in the website count(distinct-values('Name') (enclosed in angle brackets and a question mark). But does not work!.

Posted by Ashwini on April 20, 2010 at 02:29 AM MDT #

Thanks for the article. I could get Distinct values syntax right. I have used the following in my RTF file: if@inlines: count(xdoxslt:distinct_values(ProjectNumber))> ‘1’ Enclosed in the angular brackets and a question mark.

Posted by Ashwini on April 21, 2010 at 03:24 AM MDT #

Post a Comment:
  • HTML Syntax: NOT allowed
About

Follow bipublisher on Twitter Find Us on Facebook BI Publisher Youtube ChannelDiscussion Forum

Join our BI Publisher community to get the most and keep updated with the latest news, How-to, Solutions! Share your feedback and let us hear your voice @bipublisher on Twitter, on our official Facebook page, and Youtube!

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today