Genericizing SQL - aka bustin' loose some DB triggerFu


#1

I love that word ‘genericizing’, i was gonna try generification or generifying but don’t you just think ‘genericizing’ sounds SO much better? but since i’m making up words, maybe i should tell you why I am and what prompted me to do so.

So, there I was, there I was, there I was in the Congo, actually I was at a prospect, talking to the lead DBA, looking at a set of detail metrics describing performance of individual SQL statement executions - the individual SQL statements looked like this

  select * from TABLE where ID = 1;
  select * from TABLE where ID = 2;
  select * from TABLE where ID = 3;
  ...
  select * from TABLE where ID = 100;

The DBA says to me "what would be REALLY, REALLY useful to me would be to show me performance profile of the generic SQL statement:

  select * from TABLE where ID = #;

for all instances of that genericized statement for the time interval. in the (now) infamous words of mc juggernaught (see how i did that) - “if you need tool, [the extrahop platform allows you to] make a tool!”

so, a bit of regex replacement magic (I TRIED to keep is simplish and maintainable) - most of the hard work is done in the following function:

function genericizeSql(stmt) {
    var rex00 = /[\x00-\x1F\uFFFD]+/gm;        // NON PRINTING CHARACTERS & \uFFFD - \357\277\275 - unicode replacement char
    var rex01a = /'.+?'/gm;                    // SINGLE QUOTED STRINGS
    var rex01b = /".+?"/gm;                    // DOUBLE QUOTED STRINGS
    var rex02  = /\b[0-9][0-9]*\b/gm,          // WORD BOUNDED NUMBERS
        rex03  = /(\s*,\s*#)\1{1,}/gm,         // SPACE COMMAs ???{1,}????
        rex04  = /'\?'(\s*,\s*'\?')\1{0,}/gm,  // CONDENSE COMMA SEPARATED IN CLAUSE LISTS
        rex05  = /\s\s+/gm;                    // CONDENSE SPACES
    if (stmt) {
        stmt = stmt.replace(rex00,  " "); 
        stmt = stmt.replace(rex01a, "'?'"); 
        if (!quotedIdentifiers) {  // icky-ness
            stmt = stmt.replace(rex01b, "'?'"); 
        }
        stmt = stmt.replace(rex02,  "#"); 
        stmt = stmt.replace(rex03,  "s"); 
        stmt = stmt.replace(rex04,  "'?'s");
        stmt = stmt.replace(rex05,  " ");
    }
    return stmt;
}

I then use the function on the coalesced SQL, shove it in the Flow store for (re)use on the RESPONSE side!!!

    Flow.store.sql  = DB.statement || DB.procedure || '***SQL Unavailable***';
    Flow.store.gsql = genericizeSql(Flow.store.sql);

I’ve written a trigger to help determine specific login errors that I modified here for a more generic purpose, but whether you are looking for correlation to errors, specific performance demographics, or want to log every statement to the ExtraHop Explore appliance, you now have a genericized SQL statement to do so…

I tested this on a couple MSSQL instances - YMMV - the regex’s should be clean enough to use on just about any relational database that supports ANSI SQL standards… but you may need to tweak and tune a bit to get it exactly where you need.
generifySQL.txt (3.7 KB)