Wednesday Dec 04, 2013

Last element for JSON array: what do you think?

After I released maintenance release of JSON UDFs last week it is time to think about which features I should implement in upcoming major version.

Many users asked me about the possibility to explicitly specify if they want to append last element to a JSON array. This feature can be made for two functions: json_append and json_set.

I have four ideas of how to implement this. All have pros and contras.



  1. Create new function called json_append_last which will work exactly like json_append, but it will add the element to the end of array. I.e., for JSON document {"colors": ["red", "green", "blue"]} json_append_last(doc, 'colors', 'orange') returns {"colors": ["red", "green", "blue", "orange"]}


    1. Pros:


      1. Such a code is easy to understand


    2. Contras:


      1.  There is no sense to implement analogous function json_set_last, therefore this possibility is for json_append only

      2. You should explicitly branch your code if you want to add an element to known position of the array and if you want to add last element



  2. Use string keyword. For example, 'LAST'. I.e., for JSON document {"colors": ["red", "green", "blue"]} json_append(doc, 'colors', 'LAST', 'orange') returns {"colors": ["red", "green", "blue", "orange"]} 


    1. Pros:


      1. It is easy to implement same thing for json_set

      2. You should not branch code if you need both to add an element to known position and to the end of array


    2. Contras:


      1. It is easy to misuse. For example, if you mistakenly passed JSON document {"colors": {"LAST": "transparent"}} to json_append(doc, 'colors', 'LAST', 'orange') it returns {"colors": {"LAST": "orange"}} instead of error



  3. Use NULL instead of string keyword. I.e., for JSON document {"colors": ["red", "green", "blue"]} json_append(doc, 'colors', NULL, 'orange') returns {"colors": ["red", "green", "blue", "orange"]} 


    1. Pros:


      1. It is easy to implement same thing for json_set

      2. You should not branch code if you need both to add an element to known position and to the end of array

      3. No way to misuse: JSON functions don't take NULL as an argument anywhere else.


    2. Contras:


      1. Looks like a hack: NULL is not the type of last element

      2. It is hard to guess what this function is doing without looking into user manual



  4. Use negative index. So, -1 means last element, -2 last before last and so on. I.e., for JSON document {"colors": ["red", "green", "blue"]} json_append(doc, 'colors', -1, 'orange') returns {"colors": ["red", "green", "blue", "orange"]} and json_set(doc, 'colors', -2, 'orange') returns {"colors": ["red", "green", "orange"]}. json_append returns original document in case of -2.


    1. Pros:


      1. Works for both json_append and json_set

      2. You should not branch code if you need both to add an element to known position and to the end of array

      3. No way to misuse: JSON functions don't take negative number as an argument anywhere else.


    2. Contras:


      1. Confusing: while users of programming languages which use negative array indexes can guess what these functions do it is still needed to look in the manual to find out how -1, -2, etc. are interpreted

      2. Error-prone: it is easy to mix up such numbers





As a conclusion I can say that I think solutions 1 and 3 are less error-prone and solution 4 introduces useful side-effects. I think that it is better to implement either 1 or 3, but certainly not both.

But what do you think about it? Please either comment here or at bugs.mysql.com

Thursday Nov 28, 2013

JSON UDF functions version 0.2.1 have been released.

Today new version of JSON UDF functions: 0.2.1 was released. This is maintenance release which added no new functionality and only contains bug fixes. However, it also includes improvements for build ans test procedures. As usual, you can download source and binary packages at MySQL Labs. Binary packages were build for MySQL server 5.6.14. If you want to use other version of the server, you need to recompile functions.

What was changed? Let me quote the ChangeLog.

Functionality added or changed:

Added cmake option WITH_PCRE which alolows to specify if existent or bundled version of PCRE  should be used. Bundled is default on Windows. To compile with bundled version, run: "cmake . -DMYSQL_DIR=/path/to/mysql/dir -DWITH_PCRE=bundled", to turn bundled version off on Windows, run: "cmake . -DMYSQL_DIR=/path/to/mysql/dir -DWITH_PCRE=system"


This means you don't need to type additional commands and move files from one directory to another to build JSON UDFs on Windows.

Added possibility to run tests, using command make test. By default tests are running without valgrind. If you need to run tests under valgrind, add option VALGRIND=1 to cmake command: "cmake . -DMYSQL_DIR=/path/to/mysql/dir -DVALGRIND=1"


You can automatically test on both UNIX and Windows. To test on UNIX simply run make test. On Windows: open solution my_json_udf.sln in Visual Studio, then choose target ALL_TESTS and run it. If you know how to run custom project of a solution on command line please let me know too =)

Added cmake target build_source_dist which creates source tarball. Usage: "make build_source_dist".

This addition is mostly for developers and needed to create packages for MySQL Labs.

Minimum supported cmake version changed to 2.8. It is necessary to run tests and build PCRE.

Bugs Fixed:

70392/17491678 MySQL JSON UDFs binary is called libmy_json_udf.so but DDL uses libmy_json.so

70573/17583505 Typo in README for JSON_MERGE

70605/17596660 Include all references to documentaiton to README file

70569/17583335 JSON_VALID allows mixed case in the keyword names

70568/17583310 JSON_VALID treats invalid values as valid

70570/17583367 JSON_VALID allows \u two-hex-digit while standard allows only \u four-hex-digit

70574/17583525 JSON_MERGE treats document without opening bracket as valid

70486/17548362 When using json_replace(), '}' of the end disappear.

70839/17751860 JSON_VALID allows to have two elements with the same key


I hope this version will be more stable than the first one and I hope you enjoy it.

Wednesday Nov 27, 2013

To be safe or to be fast?

When I designed first version of JSON UDFs which was reviewed only internally, I let all functions to validate input and output JSON. But my colleagues told me to remove this functionality, because it makes such functions, as json_search, json_replace or json_contains_key deadly slow if they find the occurrence in the beginning of the long document. And first published version of JSON UDFs: 0.2.0 has not this functionality. What we expected is that users would call json_valid if they want to be 100% sure the document is valid.

But I was not surprised that some users expect JSON functions to work as it was in the first version: validate first, then process. For example, Ammon Sutherland writes: "json_set - according to the documentation a sort of 'INSERT... ON DUPLICATE KEY UPDATE' function which checks and parses the JSON (but not the whole document in this version for some reason)." Ulf Wendel also writes: "Taken from the README. For most functions manipulating or generating JSON documents is true: Warning! This version does not check whole document for validity. Hey, it is a labs.mysql.com pre-production release."

So this is certain what at least some of users want to have the behavior which was rejected. But since I believe others still can want better speed, I decided to put in my plans implementation of second set of the functions: safe JSON functions.

First, and more logical, candidate for this implementation is json_safe_merge, because currently json_merge is mostly not usable: if you checked already that documents are valid, you can easily split them. Therefore I created a separate feature request about this function: http://bugs.mysql.com/bug.php?id=70575

But I am not sure regarding other functions. This is why I created one more public feature request: "Implement SAFE versions for all JSON functions". Will I implement this or not would depend from your opinions. Therefore, if you want me to do so, please vote in the bug report by clicking "Affects Me!" button and comment if you want this feature will be implemented only for a few functions (or sooner for a particular function).

Tuesday Nov 05, 2013

Late feedback


MySQL Community team asked me to write about Devconf 2013 few months ago. Conference was in June, 2013, but I remembered about this my promise only now: month later after my participating in MySQL Connect and Expert Troubleshooting seminar (change country to United Kingdom if you see blank page). I think it is too late for the feedback, but I still have few thoughts which I want to record.

DevConf (former PHPConf) always was a place where I tried new topics. At first, because I know audience there very well and they will be bored if I repeat a story which I was telling last year, but also because it is much easier to get feedback in your own native language. But last years my habit seems started to change and I presented improved version of my 2012 MySQL Connect talk about MySQL backups. Of course, I also had a seminar with unique topic, made for this conference first time: Troubleshooting MySQL Performance with EXPLAIN and Using Performance Schema to Troubleshoot MySQL. And these topics, improved, were presented at the expert seminar. It is interesting how my habit changes and one public speaking activity interferes next one.

What is good about DevConf is it forces you to create new ideas and do it really well, because audience is not forgiving at all, so they catch everything you miss or prepared not good enough. This can be bad if you want to make a marketing-style topic for free, but allows to present technical features in really good details: all these sudden discussions really help.

In year 2013 Oracle had a booth at the conference and was presented by a bunch of people. Dmitry Lenev presented topic "New features of replication in MySQL 5.6" and Victoria Reznichenko worked on the booth.




What was new at the conference this year is greater interest in
NoSQL, scale and fast development solutions. This, unfortunately, means
not so huge interest in MySQL as it was earlier. However, at the same
time, "Common" track was really MySQL track: not only Oracle, but people
from other companies presented about it.

Thursday Oct 31, 2013

JSON UDFs have own bugs.mysql.com category

JSON UDFs got own category at MySQL Bugs Database: "Server: JSON UDF"


Use this category to post new bug reports and vote for existent.

Thursday Oct 10, 2013

JSON UDFs first feedback

Yesterday Ulf Wendel created great blog post with deep analysis of JSON UDF functions which I presented at MySQL Connect at September, 21.

Ulf found few bugs in these functions, I reported them all at bugs.mysql.com. You can find numbers in my comment to his blog post. But he also raised concerns, which can not be considered pure bugs, rather feature requests, or even design especiallities.

* First concern, of course, is the documentation. Ulf writes: "Here’s what the README, the only documentation available apart from the *.c[omment]/*.h[elp] files".

I agree single README file is not enough, but this is still Labs project for which I can not abuse MySQL documentation team for making proper documentation for me. But you still can find more information, than single README file. And these are slides from MySQL Connect, which are available on the conference website (they published them today!) at https://oracleus.activeevents.com/2013/connect/fileDownload/session/470E8AA1C46CAA3A9ACFBDB7653FC313/CON3130_Smirnova.pdf as well as on my own website at http://www.microbecal.com/ftp_load/JSON_UDFs.odp or http://www.microbecal.com/ftp_load/JSON_UDFs.pdf I strongly recommend you to download these slides, because this is the best user manual for JSON functions which exists so far.

But I also want to hear your suggestions about where I can put the complete user manual. I was going to post updates to my blog and update README file. Is this sufficient? Or do you think would be better if I put the documentation at my own website? Or pack it as a separate *-doc package at labs.mysql.com? Any other ideas? If you also miss the documentation, please comment.

* Another request which I have for you is about parameter syntax. Ulf writes: "A function that works “by key” on a subset of a JSON document, usually has the prototype: JSON_SOMETHING(string json_document, string key [, string key...], other_parameter). The first argument passed to the functions is always the JSON document to work on. It follows a variable number of arguments that describe on which subpart of the entire document the function shall work. Having this string key [, string key...] arguments in the middle of the function signature is a bit odd. Some programming languages may forbid this for style considerations. However, depending on the function, further arguments may follow, such as the value to search for or to add."

We discussed this syntax internally before I started implementing the functions and get to this style. But I understand what some people could prefer different syntax. If you such a person, please, file a feature request. At this stage it is easy to rewrite functions, so they support better style of user arguments. I will think some time about it and will post a poll to my blog with proposals which I think can be better than current syntax. Also it is possible to have different set of functions with the same functionality, but which support different style of arguments. Only requirement for this feature is users suggestions and votes. Therefore, please suggest and vote!

* Another feature for which I need your suggestions is "speed vs validity checks". I again will cite Ulf: "Taken from the README. For most functions manipulating or generating JSON documents is true: Warning! This version does not check whole document for validity. Hey, it is a labs.mysql.com pre-production release :-)."

But this design was done not because these functions are at the Labs and not because this is alpha version. We discussed this functionality internally, but decided to don't implement validity checks to speed up functions. If you think we did wrong, please, open a feature request at bugs.mysql.com about *safe* version of the functions. I also think that having two sets of "safe" and "fast" functions can also work. But I need to hear your opinions before implementing this feature.

* Another feature for which I want to count user votes is search abilities. Currently, function JSON_SEARCH uses only exact match. There is no support for wildcards, case insensitive search or full-text search. I got complains about such a limitation not only from Ulf, but from users at MySQL Connect too. Therefore I created two feature requests: http://bugs.mysql.com/bug.php?id=70571 about LIKE and http://bugs.mysql.com/bug.php?id=70572  about FULLTEXT. If you need these features, please go and vote for them using "Affects Me!" Button.

Current behavior of JSON_SEARCH is tricky and Ulf, expectedly, did mistake when tried to use it. Correct syntax should be select json_search('{"key":"value"}', '"value"' ); and select json_search('{"a":{"b":"c"}}', '"c"' ) instead of select json_search('{"key":"value"}', "value" ); and select json_search('{"a":{"b":"c"}}', "c" ) which he used:

mysql> select json_search('{"key":"value"}', '"value"' )\G
*************************** 1. row ***************************
json_search('{"key":"value"}', '"value"' ): key::
1 row in set (0.00 sec)

mysql> select json_search('{"a":{"b":"c"}}', '"c"' );
+----------------------------------------+
| json_search('{"a":{"b":"c"}}', '"c"' ) |
+----------------------------------------+
| b:a::                                  |
+----------------------------------------+
1 row in set (0.00 sec)


This syntax is used to let you search for objects:

mysql> select json_search('{"a":{"b":"c"}}', '{"b":"c"}' );
+----------------------------------------------+
| json_search('{"a":{"b":"c"}}', '{"b":"c"}' ) |
+----------------------------------------------+
| a::                                          |
+----------------------------------------------+
1 row in set (0.00 sec)


You can also notice "::" in the end of the key path. This is the root element and done by design.

And last Ulf's concern is about syntactic sugar:"With a bit of syntactic sugar, say “SELECT document.member[offset]” this could look much more appealing. On the contrary, I am talking syntaxtic sugar only! Syntactic sugar is really hard to add with todays MySQL."

I so agree here! I wish I could implement the functions in such a way! But it is not possible to change MySQL syntax with UDFs!

And while we can not do much for syntactic sugar, you still can help to make better other features. Please comment and vote for the user manual format and location; parameter syntax; safe versions of the functions and extended search capabilites.

Thank you! :)

Thursday Jun 27, 2013

Vote for bugs which impact you!

Matt Lord already announced this change, but I am so happy, so want to repeat. MySQL Community Bugs Database Team introduced new button "Affects Me". After you click this button, counter, assigned to each of bug reports, will increase by one. This means we: MySQL Support and Engineering, - will see how many users are affected by the bug.


Why is this important? We have always considered community input as we prioritize bug fixes, and this is one more point of reference for us. Before this change we only had a counter for support customers which increased when they opened a support request, complaining they are affected by a bug. But our customers are smart and not always open support request when hit a bug: sometimes they simply implement workaround. Or there could be other circumstances when they don't create a ticket. Or this could be just released version, which big shops frighten to use in production. Therefore, sometimes, when discussing which bug to prioritize and which not we can not rely only on "Affects paying customers" number, rather need to make guess if one or another bug can affect large group of our users. We used number of bug report subscribers, most recent comments, searched forums, but all these methods gave only approximation.


Therefore I want to ask you. If you hit a bug which already was reported, but not fixed yet, please click "Affects Me" button! It will take just a few seconds, but your voice will be heard.

Saturday Apr 13, 2013

Yet another UDF tutorial

Some time ago I wrote a blog post describing a way I use to verify MySQL Server bugs. But my job consists not only of bugs which can be verified just by passing SQL queries to the server.

One of such examples is UDF bugs.

MySQL User Reference Manual is good source of information for those who want to write UDF functions, as well as book "MySQL 5.1 Plugin Development" by  Sergei Golubchik and Andrew Hutchings. But while the book describes in details how to write UDFs it was created in time when current MySQL version was 5.1 and does not contain information about how to build UDF nowadays. User Reference Manual has this information, of course. But it missed details, such as how to build UDF with custom library. And, last but not least, I need to create layout which allows me to test my UDF quickly with any server version I need to.

So here is brief overview of how I do it.

Lets took a MySQL bug report which I just created as example: "Bug #68946     UDF *_init and *_deinit functions called only once for multiple-row select".

All code, necessary to repeat the issue, is attached to the bug report. So you can download it and try. Or simply read this description.

After unpacking the archive we can see following layout:


-rwxrwx--- 1 sveta sveta  1070 Apr 13 11:48 CMakeLists.txt



-rwxrwx--- 1 sveta sveta   180 Apr 13 12:16 initid_bug.cc



-rwxrwx--- 1 sveta sveta   146 Apr 13 12:16 initid_bug.h



drwxrwx--- 1 sveta sveta     0 Apr 13 11:22 initid_bug_udf



-rwxrwx--- 1 sveta sveta   715 Apr 13 12:18 initid_bug_udf.cc



-rwxrwx--- 1 sveta sveta    76 Apr 13 11:48 initid_bug_udf.def


-rwxrwx--- 1 sveta sveta   484 Apr 13 12:08 initid_bug_udf.h


-rwxrwx--- 1 sveta sveta  6281 Apr 13 13:07 Makefile



-rwxrwx--- 1 sveta sveta   299 Apr 13 12:28 Makefile.unix


Lets start from code.


$ cat initid_bug_udf.def 

LIBRARY         initid_bug_udf

VERSION         0.1

EXPORTS

  initid_bug

This is common *.def file, which contains version of UDF and function names. I am going to use single function, showing issue with initid->ptr pointer.

initid_bug.h and initid_bug.cc contain declaration and definition of a helper function, necessary to demonstrate the problem. For this particular bug report I didn't need to create this function, but I had in mind future blog post when created it, so I have an example of external library which should be linked with the UDF:


$ cat initid_bug.h

/* MySQL */

#pragma once

#ifndef _INITID_BUG

#define _INITID_BUG

#define MAXRES 10

char* multiply_by_ten(int value);

#endif /* UNUTUD_BUG */

$ cat initid_bug.cc 

/* MySQL */

#include <stdio.h>

#include "initid_bug.h"

char* multiply_by_ten(int value)

{

    char* result= new char[MAXRES];

    sprintf(result, "%d", value * 10);

 

    return result;

}

Files initid_bug_udf.h and initid_bug_udf.cc contain code for the UDF itself.

initid_bug_udf.h is simple header file:


$ cat initid_bug_udf.h

/* MySQL */

#include <my_global.h>

#include <my_sys.h>

#include <mysql.h>

#include <string.h>

#include "initid_bug.h"

#ifdef __WIN__

#define WINEXPORT __declspec(dllexport)

//#define strcpy strcpy_s

#else

#define WINEXPORT

#endif

extern "C" {

WINEXPORT long long initid_bug(UDF_INIT *initid, UDF_ARGS *args, char *is_null, char *error);

 
WINEXPORT my_bool initid_bug_init(UDF_INIT *initid, UDF_ARGS *args, char *message);

 
WINEXPORT void initid_bug_deinit(UDF_INIT *initid);

}

And initid_bug_udf.cc contains code, demonstrating the issue:


$ cat initid_bug_udf.cc

/* MySQL */

#include "initid_bug_udf.h"

long long initid_bug(UDF_INIT *initid, UDF_ARGS *args, char *is_null, char *error)

{

    int result= atoi(initid->ptr);

    char* bug= multiply_by_ten(result);

    memcpy(initid->ptr, bug, strlen(bug));

    delete[] bug;

    return result;

}

 
my_bool initid_bug_init(UDF_INIT *initid, UDF_ARGS *args, char *message)

{

  if (!(initid->ptr= (char*)malloc(MAXRES)))

  {

    strcpy(message,"Couldn't allocate memory for result buffer");

    return 1;

  }

  memset(initid->ptr, '\0', MAXRES);

 

  memcpy(initid->ptr, "1", strlen("1"));

  initid->maybe_null= 1;

  initid->const_item= 0;

 

  return 0;

}

 
void initid_bug_deinit(UDF_INIT *initid)

{

  if (initid->ptr)

    free(initid->ptr);

}

So far so good. However there is nothing interesting yet.

And now is the part I am writing this tutorial for: how to build and test it.

User manual contains build instructions for version 5.5 and up at http://dev.mysql.com/doc/refman/5.6/en/udf-compiling.html

But since I want to test my UDF with several MySQL installations I need an easy way to pass value of MySQL's basedir to my build and test scripts. For this purpose I introduced variable MYSQL_DIR in CMakeLists.txt


$ cat CMakeLists.txt 

CMAKE_MINIMUM_REQUIRED(VERSION 2.6)

# Avoid warnings in higher versions

if("${CMAKE_MAJOR_VERSION}.${CMAKE_MINOR_VERSION}" GREATER 2.6)

 CMAKE_POLICY(VERSION 2.8)

endif()

PROJECT(initid_bug_udf)

# The version number.

set (initid_bug_udf_VERSION_MAJOR 0)

set (initid_bug_udf_VERSION_MINOR 1)

# Path for MySQL include directory

SET(MYSQL_DIR_NAME_DOCSTRING "Path to MySQL directory")

IF(DEFINED MYSQL_DIR)

  SET(MYSQL_DIR ${MYSQL_DIR} CACHE STRING ${MYSQL_DIR_NAME_DOCSTRING} FORCE)

ELSE()

  MESSAGE(WARNING "${MYSQL_DIR_NAME_DOCSTRING} was not specified. If something goes wrong re-reun with option -DMYSQL_DIR")  

ENDIF()

INCLUDE_DIRECTORIES("${MYSQL_DIR}/include")

 


I also added the library here




ADD_LIBRARY(initid_bug initid_bug.cc)

ADD_DEFINITIONS("-DMYSQL_DYNAMIC_PLUGIN")

ADD_DEFINITIONS("-fPIC")

ADD_DEFINITIONS("-g")

ADD_LIBRARY(initid_bug_udf MODULE initid_bug_udf.cc initid_bug_udf.def)


And linked it


IF(${CMAKE_SYSTEM_NAME} MATCHES "Windows")

TARGET_LINK_LIBRARIES(initid_bug_udf initid_bug wsock32)

ELSE()

TARGET_LINK_LIBRARIES(initid_bug_udf initid_bug)

ENDIF()


In other aspects this CMakeLists.txt is the same as described in the user manual.

Now it is easy to build UDF for any server, installed on the same machine as UDF sources.

On Linux/Solaris/Mac:

cmake . -DMYSQL_DIR=/home/sveta/src/mysql-5.6
make

On some Mac machines it is failed to create 64-bit binaries. You can build universal binaries instead providing option -DCMAKE_OSX_ARCHITECTURES="x86_64;i386;ppc"

On Windows:

You need Visual Studio (I did not test with Express, but I hope it works) and cmake (cmake.org) If you want to run automatic tests you need Perl.

To create makefiles run:


"C:\Program Files (x86)\CMake 2.8\bin\cmake.exe" -G "Visual Studio 11 Win64" . -DMYSQL_DIR="D:/build/mysql-5.5"


As you can see I have Visual Studio 11. If you have another version, change accordingly, for example: "Visual Studio 10 Win64"


Then compile:


devenv my_json_udf.sln /build Release


In all cases change MYSQL_DIR value so it points to basedir of your MySQL installation. It does not matter if you compiled MySQL server yourself or have pre-compiled binaries.

After compiling the library, install it or test. Since I care about tests mostly I did not create install script, but created tests and makefile to run them.

Tests are located in the initid_bug_udf directory:


$ ls -l initid_bug_udf

total 4

drwxrwx--- 1 sveta sveta 4096 Apr 13 13:08 include

drwxrwx--- 1 sveta sveta    0 Apr 13 11:31 plugin

drwxrwx--- 1 sveta sveta    0 Apr 13 11:59 r

drwxrwx--- 1 sveta sveta    0 Apr 13 11:21 t


They are in MTR format. I put all installation-related and commonly used scripts into include directory and copy UDF binary into plugin directory.

Test case itself is easy:


$ cat initid_bug_udf/t/initid_bug.test 

--source suite/initid_bug_udf/include/initid_bug_udf_install.inc

--source suite/initid_bug_udf/include/initid_bug_udf_testdata.inc

select initid_bug() from t1;

--source suite/initid_bug_udf/include/initid_bug_udf_cleanup.inc


As well as initialization and cleanup files:


$ cat initid_bug_udf/include/initid_bug_udf_install.inc 

--disable_query_log

let $win=`select @@version_compile_os like 'win%'`;

if($win==1)

{

--source suite/initid_bug_udf/include/initid_bug_udf_install_win.inc

}

if($win==0)

{

--source suite/initid_bug_udf/include/initid_bug_udf_install_unix.inc

}

--enable_result_log

--enable_query_log

$ cat initid_bug_udf/include/initid_bug_udf_install_unix.inc 

--disable_query_log

--disable_warnings

drop function if exists initid_bug;

--enable_warnings

create function initid_bug returns integer soname 'libinitid_bug_udf.so';

--enable_query_log

$ cat initid_bug_udf/include/initid_bug_udf_install_win.inc 

--disable_query_log

--disable_warnings

drop function if exists initid_bug;

--enable_warnings

create function initid_bug returns integer soname 'initid_bug_udf.dll';

--enable_query_log

However you can use *install* scripts as templates for installation automation if you care more about usage than testing.



Note, I did not test this particular UDF on Windows, so Windows-related code can be buggy.

Also, if you use versions prior to 5.5, you could not use initid_bug_udf/include/initid_bug_udf_install.inc script, but have to distinguish Windows and UNIX installation scripts in different way.


$ cat initid_bug_udf/include/initid_bug_udf_testdata.inc 

--disable_warnings

drop table if exists t1;

--enable_warnings

create table t1(f1 int);

insert into t1 values(1);

insert into t1 select f1 from t1;

insert into t1 select f1 from t1;

insert into t1 select f1 from t1;


And, finally, cleanup:


$ cat initid_bug_udf/include/initid_bug_udf_cleanup.inc 

--disable_query_log

--disable_warnings

drop table if exists t1;

drop function if exists initid_bug;

--enable_warnings

--enable_query_log


To run tests easily I created Makefile.unix file. I leave a task to create similar file for Windows up to you, because this depends from Windows Perl installation: scripts for Cygwin, ActiveState Perl or whichever Perl you have can vary in slight details.


$ cat Makefile.unix 

#MYSQL_DIR - path to source dir

test_initid_bug:    

            cp libinitid_bug_udf.so initid_bug_udf/plugin

            cp -R initid_bug_udf  $(MYSQL_DIR)/mysql-test/suite

            cd $(MYSQL_DIR)/mysql-test; \

                perl mtr  --suite=initid_bug_udf --mysqld=--plugin-dir=$(MYSQL_DIR)/mysql-test/suite/initid_bug_udf/plugin


Finally, to run tests you simply need to call following command:


$ MYSQL_DIR=/home/sveta/src/mysql-5.6 make -f Makefile.unix test_initid_bug


It will copy test directory under mysql-test/suite directory of the MySQL installation, pointed by MYSQL_DIR variable, then run the test.

If you did not create result file you will see test failure and result of queries. You can create result file after that and put it into r directory. ZiP archive, attached to the bug report, contains result file.

Wednesday Jan 30, 2013

Troubleshooting Performance Diagrams


Last year, when I was speaking about MySQL performance at Devconf in Moscow, I expected my audience will be very experienced as this always happen at all PHPClub conferences. So I had to choose: either make full-day seminar and explain people every basic of performance, or rely on their knowledge and make some one and half hours seminar. I prefer short speeches, so I considered latter.



But even with such a mature audience you don't always know if they knew some or another basic thing. Like somebody can be good analyzing EXPLAIN output and other is in reading InnoDB Monitor printout. Also, native language of the audience is not English and it would be always good to have short reference to simple things, described in their native language. In this case Russian. This is why I created these slides first time.



I was in doubt if I need to translate them to English, because information there repeats the official MySQL user reference manual in many aspects. Although diagrams still can be useful for English-speaking audience. So I did not translate those slides in first turn.



Time passed and in few days I am speaking about MySQL Troubleshooting at Fosdem. This is 30-minustes talk this time! And I just created slides and content for 8-hours Oracle University training! Can I zip this 8-hours training into 30-minutes talk? Of course, not. But I also want to give users as much in these 30 minutes as possible. So idea with add-ons came back.



You can download them in PDF format either from my own website or from slideshare.

Saturday Jan 26, 2013

My way to verify MySQL bug reports

I promised to write this blog post long time ago at one of conferences in Russia. Don't know why I delayed this, but finally I did.

We, members of MySQL bugs verification group, have to verify bugs in all currently supported versions. We use not only version reported, but test in development source tree for each of supported major versions and identify recent regressions.

You can imagine that even if I would do so for simple bug report about wrong results with perfect test case, which requires me simply run few queries I would have to start 4 or more MySQL servers: one for each of currently supported versions 5.0, 5.1, 5.5 plus one for current development. And unknown number of servers if I could not repeat or if I want to check if this is regression.

Even if I have all these basic 4 servers running I still should type all these queries at least 4 times. How much time it would take to verify single bug report if I did so?

I know some members of my group preferred this way, because typing queries manually is same action which our customers do. Again, some bugs are repeatable only if you type queries manually.

But I prefer to test manually erroneous exceptions only and don't make it my routine job.


So how do I test bug reports?


Every version of MySQL server comes with regression test suite: a program, called mtr (mysql-test-run.pl), its libraries, mysqltest program (you should not call it directly, though) and set of tests. Good thing with MySQL test suite is that you can create your own test cases. So do I.

I write my tests in MTR format, then run MTR with record option and examine result. Actually this is kind of hack, because users expected to create result file first, then compare output of running test with that result file. But my purpose is to repeat bug report, not to create proper test case for it, so I can be lazy.

But simply running MTR manually still takes time. And I found a way to automate this process as well.

I created a BASH script, called do_test.sh, which run through all my MySQL trees and runs tests for me automatically, then prints result.

Let me explain it a little bit.


$ cat ~/scripts/do_test.sh

#!/bin/bash

# runs MySQL tests in all source directories

# prints usage information

usage ()

{

    echo "$VERSION"

    echo "

do_test copies MySQL test files from any place

to each of source directory, then runs them

Usage: `basename $0` [option]... [testfile ...]

    or `basename $0` [option]... -d dirname [test ...]

    or `basename $0` [option]... [-b build [build option]... ]...

Options:

    -d --testdir    directory, contains test files

I have a directory, there I store test files. It has subdirectory t where tests to be run are stored, subdirectory r, where results, sorted by MySQL server version number, are stored, and directory named archive, there test are stored for archiving purpose.


    -s --srcdir     directory, contains sources directories

This is path to the directory where MySQL package is located. I called it srcdir, but this is actually not so strict: program will work with binary packages as well.


    -b --build      mysql source directory

Name of MySQL source directory. You can specify any package name. For example, to run tests in 5.6.9 package in my current dir I call the program as do_test -s . -b mysql-5.6.9-rc-linux-glibc2.5-x86_64


    -c --clean      remove tests from src directory after execution

    -t --suite      suite where to put test

MTR can have test suites with their own rules of how to run test case. If you want to run your tests in specific suite, specify this option. You can also have directory for your own suite, but in this case you need to create directories your_suite, your_suite/t and your_suite/r in mysql-test/suite directory of your MySQL installation prior doing this.

As I told I am lazy, so I run tests in main test suite mostly. This can be not good idea if you use MySQL installation not only for tests of its own bugs, but for some other tests.

Rest of the code speaks for itself, so I would not explain it. What you need to do to run this program is simply call it: do_test.sh and pass paths to your test, src dir and MySQL installation.


    -v --version    print version number, then exit

    -h --help       print this help, then exit

You can also pass any option to mysqltest program.

    "

}

# error exit

error()

{

    printf "$@" >&2

    exit $E_CDERROR

}

# creates defaults values

initialize()

{

This probably not very obvious. These are my default paths and, most importantly, default set of servers I test


    TESTDIR=/home/sveta/src/tests

    SRCDIR=/home/sveta/src

    BUILDS="mysql-5.0 mysql-5.1 mysql-5.5 mysql-trunk"

    CLEAN=0 #false

    MYSQLTEST_OPTIONS="--record --force"

    TESTS_TO_PASS=""

    TESTS=""

    SUITE=""

    SUITEDIR=""

    OLD_PWD=`pwd`

    VERSION="do_test v0.2 (May 28 2010)"

}

# parses arguments/sets values to defaults

parse()

{

    TEMP_BUILDS=""

    

    while getopts "cvhd:s:b:t:" Option

    do

        case $Option in

            c) CLEAN=1;;

            v) echo "$VERSION";;

            h) usage; exit 0;;

            d) TESTDIR="$OPTARG";;

            s) SRCDIR="$OPTARG";;

            b) TEMP_BUILDS="$TEMP_BUILDS $OPTARG";;

            t) SUITE="$OPTARG"; SUITEDIR="/suite/$SUITE"; MYSQLTEST_OPTIONS="$MYSQLTEST_OPTIONS --suite=$SUITE";;

            *) usage; exit 0; ;;

        esac

    done

    if [[ $TEMP_BUILDS ]]

    then

        BUILDS="$TEMP_BUILDS"

    fi

}

# copies test to source directories

copy()

{

    cd "$TESTDIR/t"

    TESTS_TO_PASS=`ls *.test 2>/dev/null | sed s/.test$//`

    cd $OLD_PWD

    for build in $BUILDS

    do

        #cp -i for reject silent overload

        cp "$TESTDIR"/t/*.{test,opt,init,sql,cnf} "$SRCDIR/$build/mysql-test$SUITEDIR/t" 2>/dev/null

    done

}

# runs tests

run()

{

    for build in $BUILDS

    do

        cd "$SRCDIR/$build/mysql-test"

        ./mysql-test-run.pl $MYSQLTEST_OPTIONS $TESTS_TO_PASS

    done

    cd $OLD_PWD

}

# copies result and log files to the main directory

get_result()

{

    for build in $BUILDS

    do

        ls "$TESTDIR/r/$build" 2>/dev/null

        if [[ 0 -ne $? ]]

        then

            mkdir "$TESTDIR/r/$build"

        fi

        for test in $TESTS_TO_PASS

        do

            cp "$SRCDIR/$build/mysql-test$SUITEDIR/r/$test".{log,result} "$TESTDIR/r/$build" 2>/dev/null

        done

    done

}

# removes tests and results from MySQL sources directories

cleanup()

{

    if [[ 1 -eq $CLEAN ]]

    then

        for build in $BUILDS

        do

            for test in $TESTS_TO_PASS

            do

                rm "$SRCDIR/$build/mysql-test$SUITEDIR/r/$test".{log,result} 2>/dev/null

                rm "$SRCDIR/$build/mysql-test$SUITEDIR/t/$test.test"

            done

        done

    fi

}

# shows results

show()

{

    for build in $BUILDS

    do

        echo "=====$build====="

        for test in $TESTS_TO_PASS

        do

            echo "=====$test====="

            cat "$TESTDIR/r/$build/$test".{log,result} 2>/dev/null

            echo

        done

        echo

    done

}

E_CDERROR=65

#usage

initialize

parse $@

copy

run

get_result

cleanup

show

exit 0

After I finished with test I copy it to archive directory, again, with a script, named ar_test.sh:


$ cat ~/scripts/ar_test.sh

#!/bin/bash

# moves MySQL tests from t to archive directory and clean ups r directories

# prints usage information

usage ()

{

    echo "$VERSION"

    echo "

ar_test copies MySQL test files from t to archive folder

Usage: `basename $0` [-v] [-d dirname] [test ...]

Options:

    -d    directory, contains test files

    -v    print version

    -h    print this help

    "

}

# error exit

error()

{

    printf "$@" >&2

    exit $E_CDERROR

}

# creates defaults values

initialize()

{

    TESTDIR=/home/sveta/src/tests

    TESTS_TO_MOVE=""

    OLD_PWD=`pwd`

    VERSION="ar_test v0.2 (Dec 01 2011)"

}

# parses arguments/sets values to defaults

parse()

{    

    while getopts "vhd:" Option

    do

        case $Option in

            v) echo "$VERSION"; shift;;

            h) usage; exit 0;;

            d) TESTDIR="$OPTARG"; shift;;

            *) usage; exit 0;;

        esac

    done

    

    TESTS_TO_MOVE="$@"

}

# copies test to source directories

copy()

{

    if [[ "xx" = x"$TESTS_TO_MOVE"x ]]

    then

        cp "$TESTDIR"/t/* "$TESTDIR"/archive 2>/dev/null

    else

        for test in $TESTS_TO_MOVE

        do

            cp "$TESTDIR/t/$test".{test,opt,init,sql} "$TESTDIR"/archive 2>/dev/null

        done

    fi

}

# removes tests and results from r directories

cleanup()

{

    if [[ "xx" = x"$TESTS_TO_MOVE"x ]]

    then

        rm "$TESTDIR"/t/* 2>/dev/null

        rm "$TESTDIR"/r/*/* 2>/dev/null

    else

        for test in $TESTS_TO_MOVE

        do

            rm "$TESTDIR/t/$test".{test,opt,init,sql} 2>/dev/null

            rm "$TESTDIR/r/"*"/$test".{test,opt,init,sql} 2>/dev/null

        done

    fi

}

E_CDERROR=65

initialize

parse $@

copy

cleanup

exit 0

But most important part: what to do if I want to test on some specific machine which is not available at home? Fortunately, we have shared machines to run tests on, so I can simply move them to my network homedir, then choose appropriate machine and run. Since this is BASH script and test cases in MTR format this would work on any operating system.


$ cat ~/scripts/scp_test.sh

#!/bin/bash

# copies MySQL tests to remote box

# prints usage information

usage ()

{

    echo "$VERSION"

    echo "

scp_test copies MySQL test files from t directory on local box to MySQL's XXX

    

Usage: `basename $0` [-v] [-d dirname] [-r user@host:path] [test ...]

Options:

    -d    directory, contains test files

    -r    path to test directory on remote server, default: USERNAME@MACHINE_ADDRESS:~/PATH/src/tests/t

    -v    print version

    -h    print this help

    "

}

# error exit

error()

{

    printf "$@" >&2

    exit $E_CDERROR

}

# creates defaults values

initialize()

{

    TESTDIR=/home/sveta/src/tests

    MOVETO='USERNAME@MACHINE_ADDRESS:~/PATH/src/tests/t'

    TESTS_TO_MOVE=""

    OLD_PWD=`pwd`

    VERSION="scp_test v0.2 (Dec 1 2011)"

}

# parses arguments/sets values to defaults

parse()

{    

    while getopts "vhd:" Option

    do

        case $Option in

            v) echo "$VERSION"; shift;;

            h) usage; exit 0;;

            d) TESTDIR="$OPTARG"; shift;;

            r) MOVETO="$OPTARG"; shift;;

            *) usage; exit 0;;

        esac

    done

    

    TESTS_TO_MOVE="$@"

}

# copies test to source directories

copy()

{

    if [[ "xx" = x"$TESTS_TO_MOVE"x ]]

    then

        scp "$TESTDIR"/t/* "$MOVETO"

    else

        for test in $TESTS_TO_MOVE

        do

            scp "$TESTDIR/t/$test".{test,opt,init,sql} "$MOVETO"

        done

    fi

}

E_CDERROR=65

initialize

parse $@

copy

exit 0

Wanted to put them to Launchpad, but stack with name for this package. Does anybody have ideas?

Saturday Jan 12, 2013

My eighteen MySQL 5.6 favorite troubleshooting improvements

MySQL 5.6 is in RC state now which means it is going to be GA sooner or later.

This release contains a lot of improvements. However, since I am a support engineer, I most amazed by those which make troubleshooting easier.

So here is the list of my favorite troubleshooting improvements.

1. EXPLAIN for UPDATE/INSERT/DELETE.


This is extremely useful feature.

Although prior version 5.6 we, theoretically, could have some kind of explain for them too, for example, if convert DML queries to their SELECT equivalents, optimizer can optimize them differently.

We still could execute DELETE or UPDATE, then query Hadler_% status variables, but who wants to execute update just for testing on live database? And anyway, querying Handler_% variables we could only know if some index was used or not, but can not identify which one.

2. INFORMATION SCHEMA.OPTIMIZER_TRACE table.

This table contains trace of last few queries, number of which is specified by configuration option optimizer_trace_limit.

Actually we could have similar information prior version 5.6: just use debug server and start it with option --debug. In this case MySQL server creates trace file where it writes information about functions and methods executed during server run. Problem with such files they are extremely verbose and large. It is hard to find necessary information there if you are not MySQL developer. As support engineer I use them when need to test a bug, but I use as follow: create very short script which executes as less queries as it can, then start MySQL server with debug option and run the test. Then stop debugging. Even with such short single-threaded tests size of resulting file is thousands of rows. It is almost not possible to use such files in production.

With optimizer trace we can have similar information just for optimizer, but in much compact way.

Hmm... I want similar feature for all parts of the server!

3. EXPLAIN output in JSON format.

This is actually simply syntax sugar around normal EXPLAIN output. It prints exactly same information like normal, table-view EXPLAIN, but can be used for some automations.

4. New InnoDB tables in Information Schema.

Table INNODB_METRICS contains a lot of information about InnoDB performance, starting from InnoDB buffer usage and up to number of transactions and records.

Tables INNODB_SYS_* contains information about InnoDB internal dictionary and its objects. Particularly about tables, fields, keys, etc.

Table INNODB_SYS_TABLESTATS contains information about InnoDB performance statistics.

Tables INNODB_BUFFER_POOL_* store information about InnoDB Buffer Pool usage.

5. Option to log all InnoDB deadlocks into error log file: innodb_print_all_deadlocks.

Currently we can use InnoDB Monitor output for the same purpose, but it prints latest deadlock only, not everyone since server startup. This feature should be very handy to troubleshoot issues, repeatable on production only.

6. Persistent statistics for InnoDB tables.

It is also OFF by default and can be turned ON with help of option innodb_analyze_is_persistent.


It was originally OFF. Now this option has name innodb_stats_persistent and is ON by default.

But this is performance improvement and how does it relate to troubleshooting?

In version 5.5 by default when you run query ANALYZE TABLE and after each MySQL server restart InnoDB renews table statistics which optimizer uses to generate better execution plans. Unfortunately such often statistics renewal does not fit for all data and tables. Sometimes fresh statistics can lead to choice of not the best plan. Now, if innodb_stats_persistent is ON, statistics get renewed only after ANALYZE TABLE and stays same after server restart.

In past we sometimes get complains from people who initially had fast perfectly running query, but performance decreased after inserting few millions of rows. As you can imagine rebuilding the table does not help in this case. For such cases we offered FORCE/IGNORE INDEX as a workaround which usually means customers had to rewrite dozens of queries.

Now we can start recommending to load data for which optimizer chooses best plan, then ANALYZE TABLE and never update statistics again.

7. InnoDB read-only transactions.

Every query on InnoDB table is part of transaction. This does not depend if you SELECT or modify the table. For example, for TRANSACTION ISOLATION levels REPEATABLE-READ and SERIALIZABLE, InnoDB creates a snapshot for data which was received at the first time. This is necessary for multiversion concurrency control. At the same time such snapshots slow down those transactions which are used only for reads. For example, many users complained about catastrophic performance decrease after InnoDB reached 256 connections. And if for transactions, which modify data, such slowdown is necessary for better stability, for read-only transactions this is not fair.

To solve this issue in version 5.6 new access modificator for transactions was introduced: READ ONLY or READ WRITE. Default is READ WRITE. We can modify it with help of START TRANSACTION READ ONLY, SET TRANSACTION READ ONLY queries or variables. InnoDB run tests which showed that usage of READ ONLY transaction completely solves 256 threads issue.

If InnoDB is running in autocommit mode it considers data snapshot for SELECT queries is not needed and does not create it. This means SELECT performance in autocommit mode is same as if READ ONLY transactions are used.


InnoDB Team published in their blog nice graphs with results of benchmarks of this new feature. Blog post is available here.

8. Support of backup/restore of InnoDB Buffer Pool during restart and on demand.

This feature can noticably speed up first few running hours of installations which use large InnoDB Buffer Pool. Earlier, when application with large InnoDB Buffer Pool started, some time, until queries filled the buffer, performance could be not so fast as it should. Now, we should not wait until the application execute all queries and fills up the pool: just backup it at shutdown, then restore at startup or in any time while MySQL server is running.

9. Multithreaded slave.

Prior version 5.6 slave could be catching up the master very slowly not because it runs on slow hardware, but due to its single-threaded nature. Now it can use up to 1024 parallel threads and catch up the master more easily.

10. Possibility to execute events from binary log with N-seconds delay.

This feature is opposite to previous one. Why did I put it in my troubleshooting improvements favorites list than? Because in addition to its main purposes, such as possibility to rollback DROP, DELETE (and other) operations we can also use this technique when master sends updates which slow down parallel reads executed by slave. For example, we can delay expensive ALTER TABLE until time when peak load on a web site passed. Because this option is set up on a slave, we can spread such expensive queries among them, so the application can use another slave for reads while one is busy.

11. Partial images in RBR.


This feature can increase replication performance dramatically.

Today one of most popular row-based replication performance issues is long transferring time of large events. For example, if you have a row with LONGBLOB column, any modification of this row, even of an integer field, will require to send whole row. In other words, to modify 4 bytes master will send to slave up to 4G. Since version 5.6 it is possible to setup how to send rows in case of row-based replication.

binlog_row_image=full will mimic 5.5 and earlier behavior: full row will be stored in the binary log file.

binlog_row_image=minimal will store only modified data

and

binlog_row_image=noblob will store full row for all types except blobs

12. GET DIAGNOSTICS queries and access to Diagnostic Area.

Diagnostic Area is a structure which filled after each query execution. It contains two kinds of information: query results, such as number of affected rows, and information about warnings and notes.

Now we can access Diagnostic Area through GET DIAGNOSTICS queries.

13.  HANDLERs processing is more like SQL standard now.

I wrote about this two times already: here and here

So I would not repeat my second post, just mention again I am finally happy with how HANDLERs work now.

PERFORMANCE_SCHEMA improvements


These are huge improvements!

In version 5.5, when performance schema was introduced, I'd say it was mostly designed for MySQL developers themselves or at least for people who know MySQL source code very well. Now performance schema is for everybody.

14. New instruments to watch IO operations.

We can even watch IO operations for specific table. There are also new aggregation tables by these operations.

15. New tables events_statements_* instrument statements.

Now we can know what happens during query execution.

Things like SQL text of the query, event name, corresponding to particular code and statistics of the query execution.

16. Instruments for operation stages: events_stages_* tables.


EVENT_NAME
field of these tables contains information similar to content of SHOW PROCESSLIST, but field SOURCE has exact source code row there operations executed. This is very useful when you want to know what is going on.


I strongly recommend you to study events_statements_* and events_stages_* tables

17. New digests: for operators by account, user, host and so on.


These are very useful for work with historical data.

Of course, there are history tables, which we know since version 5.5, exist for both events_stages and events_statements.

We can also filter information by users, sessions and tables.

18. HOST_CACHE table.

This table contains information about client host and IP address which are stored in the host cache, so MySQL server should not query DNS server second time.

I am not sure why it is in Performance Schema and not in Information Schema, but maybe this is only me.


I expect to get use of it when diagnosing connection failures.

Monday Oct 01, 2012

Slides for MySQL Connect session "Managing and Troubleshooting MySQL for Oracle DBAs"

I just uploaded presentation for session "Managing and Troubleshooting MySQL for Oracle DBAs" here.


I also created a guide of EXPLAIN features (URL, which I forgot to put on slide). Here it is.

Wednesday Sep 19, 2012

That's all about nuances

When I sent a proposal for session "Managing and Troubleshooting MySQL for Oracle DBAs" to MySQL Connect conference org committee it had not any mention of Oracle in its name, but later I was asked to provide more details for former Oracle DBAs who want to use MySQL. I was fast and I said "yes".

So my original aim to teach people to troubleshoot MySQL changed to teaching of how different is MySQL from Oracle in troubleshooting aspects. Although both RDBMs have very much in common they are definitely very different. So what I am going to speak about this time is nuances of how MySQL stores data, how it manages locks, why its high availability solutions: MySQL Cluster and Replication have same names as Oracle's, but work differently and more. And, of course, I will tell how to troubleshoot it all.

Friday Jul 13, 2012

Performance Schema in version 5.6 helps to prevent SQL Injection attacks

There were few prominent SQL Injection attacks lately.



Such breakages could be possible when an attacker finds that user data, sent to a SQL server, not properly sanitized.

Usually successful attack is preceded by queries which cause parse errors.


mysql> select thread_id, sql_text from events_statements_history_long where event_name='statement/sql/er'ror;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'ror' at line 1

Easiest test to find out if website can be attacked is to send, using web form or address bar, a value which contains special symbol, say "'", like in the word "D'Artagnan", then analyse if it was properly handled.



There are practices which web developer should use to avoid such situations. They include 100% sanitizing of all incoming data, allowing only expected values and rejecting any other or limitation of database accounts which web site uses. In latter case even if SQL injection happens, damage, which it can make, would be minimal.

But all these means work at the design stage. What to do if you are responsible for large legacy project, so you can not be 100% sure everything is handled properly?

In old time we could only suggest to log queries into general query log, then search for those which look suspicious. This hard job can be hardly automated: general query log does not contain information about query return code, therefore you have to guess which query could cause parse error yourself and not much automation can be applied.

In version 5.6 MySQL introduced new tables in Performance Schema database.
Performance schema in 5.6 became really user friendly and I am falling in love with it.
A bunch of them, events_statements_*, can be useful for debugging possible SQL injection attacks.

event_statements_* tables keep queries, which were executed in past, together with information about how they were proceeded. It includes:



  • time, spent waiting for table locks

  • how many temporary tables were created, splitting disk and memory-based tables

  • which optimizer techniques were chosen

  • statistics based on them.



However, for SQL injection discussion, we are interested in diagnostic information, such as return SQLSTATE code, warning and error information.

These fields can give the list of queries which were used by attackers to predict their chances to break the website.

Performance schema has two special error-handling instruments: statement/com/Error for statements which were parsed successfully, but rejected thereafter and statement/sql/error for statements which were rejected at parsing state.

In case of SQL Injection we should examine those statements, which were rejected at parsing state: statement/sql/error. They can easily found with help of this query:



mysql> select thread_id, sql_text from events_statements_history_long where event_name='statement/sql/error';

+-----------+-------------------------------------------------------------------------------------+

| thread_id | sql_text                                                                            |

+-----------+-------------------------------------------------------------------------------------+

|        16 | select * from events_statements_history_long wher event_name='statement/sql/error'  |

|        16 | select * from events_statements_history_long here event_name='statement/sql/error'  |

|        16 | select * from events_statements_history_long where event_name='statement/sql/er'ror |

+-----------+-------------------------------------------------------------------------------------+

3 rows in set (0.00 sec)


In this example I specially made errors in queries I typed, but the idea can be derived. Usually application does not produce such queries at all: they all should be fixed at the development stage, therefore periodically checking for such queries can be good practice for determining possibility of SQL injection attacks.

And to finish this topic I should mention another tool which can help to prevent SQL injection attacks even before they reach real database. This is Oracle Database Firewall. I'll copy part of its white paper here:

Oracle Database Firewall monitors data access, enforces access policies, highlights anomalies and helps protect against network based attacks originating from outside or inside the organization. Attacks based on SQL injection can be blocked by comparing SQL against the approved white list of application SQL. Oracle Database Firewall is unique and offers organizations a first line of defense, protecting databases from threats and helping meet regulatory compliance requirement.

With its unique feature of blocking access by analyzing all incoming SQL statements your database can be really protected. And amazingly it supports MySQL out of the box.

References:

MySQL user manual - http://dev.mysql.com/doc/refman/5.6/en/performance-schema-statements-tables.html
Mark Leith's blog - http://www.markleith.co.uk/ps_helper/#statements_with_errors_or_warnings
Oracle Database Firewall - http://www.oracle.com/technetwork/products/database-firewall/overview/index.html

Wednesday Jul 04, 2012

Warning and error information in stored procedures revisited

Originally way to handle warnings and errors in MySQL stored routine was designed as follows:


  • if warning was generated during stored routine execution which has a handler for such a warning/error, MySQL remembered the handler, ignored the warning and continued execution

  • after routine is executed MySQL checked if there is a remembered handler and activated if any

This logic was not ideal and causes several problems, particularly:


  • it was not possible to choose right handler for an instruction which generated several warnings or errors, because only first one was chosen

  • handling conditions in current scope messed with conditions in different

  • there were no generated warning/errors in Diagnostic Area that is against SQL Standard.

First try to fix this was done in version 5.5. Patch left Diagnostic Area intact after stored routine execution, but cleared it in the beginning of each statement which can generate warnings or to work with tables. Diagnostic Area checked after stored routine execution.

This patch solved issue with order of condition handlers, but lead to new issues. Most popular was that outer stored routine could see warnings which should be already handled by handler inside inner stored routine, although latest has handler. I even had to wrote a blog post about it.

And now I am happy to announce this behaviour changed third time.

Since version 5.6 Diagnostic Area cleared after instruction leaves its handler.

This lead to that only one handler will see condition it is supposed to proceed and in proper order. All past problems are solved.

I am happy that my old blog post describing weird behaviour in version 5.5 is not true any more.

About

Working blog of Sveta Smirnova - MySQL Senior Principal Support Engineer working in Bugs Analysis Support Group

Search

Categories
Archives
« March 2015
SunMonTueWedThuFriSat
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
    
       
Today