In the previous post, we did make ag_labels() function by #plpgsql. In this time, we will make ag_sample() function that do sampling graph-data by limit N on an edge basis. Also this function has the optional parameter of whether using random-selection.
PL/pgSQL is a loadable procedural language for the PostgreSQL database system. [ref] It adds control structures to the SQL language and can perform complex computations. In #AgensGraph, Cypher procedure can be written by PL/pgSQL. Graph queries become also more complex as to need pre-processing. (Finding sub-query what you want to look up, translating to another graph having different topology and relations, so on.) PL/pgSQL is a basic and powerful analysis tool to graph analysists.
This is a screenshot of executing ag_sample(100) function in #AgensBrowser. It’s very convenient to not have to write a query. There is no need to worry about the structure of graph-data.
This is a meta-graph of this result. You get to know that all labels exists.
Let’s list the procedure of a function. It’s a kind of long story. But step by step is easy.
- get current “graph_path” setting
- get edges’ labels list of graph_path
- get size of edges per edges’ label
- get total size of all edges and calculate size-limit per edges’ label
- create temp table for accumulating sampling graph-data
- insert sample-data into temp table as many as each own limit
- return temp table as result of function
If you write these procedures in plpgsql, you will see:
-- DROP FUNCTION if exists ag_sample(); CREATE OR REPLACE FUNCTION ag_sample(total_limit int, use_rand boolean default false) RETURNS table (s vertex, r edge, e vertex) AS $$ DECLARE graph_path text; cur record; edge_size int; edges_cnt int := 0; edges_sum bigint := 0; BEGIN -- get graph_path from user environment SELECT setting into graph_path FROM pg_settings WHERE name='graph_path'; RAISE NOTICE 'graph_path => %', graph_path; -- if graph_path does not exists, return null if graph_path is null then return query SELECT WHERE false; end if; if total_limit edges.la_size for cur in select * from __edges loop EXECUTE 'SELECT count(id) FROM '||cur.la_name INTO edge_size; update __edges set la_size = edge_size where la_oid = cur.la_oid; edges_sum := edges_sum + edge_size; edges_cnt := edges_cnt + 1; end loop; -- update edge's ratio by edges_sum RAISE NOTICE 'edges = %, total_size = %', edges_cnt, edges_sum; update __edges set la_ratio = la_size::float4/edges_sum::float4, la_limit = case when floor(total_limit*la_size::float4/edges_sum::float4) edges.la_size for cur in select * from __edges loop RAISE NOTICE '%[%] = %(%) => %', cur.la_name, cur.la_oid, cur.la_size, cur.la_ratio, cur.la_limit; if use_rand=false then EXECUTE 'insert into __samples with tmp as ' || '(match (s)-[r:"'||cur.la_name||'"]->(e) return s,r,e limit '||cur.la_limit||' ) ' || 'select * from tmp;'; else EXECUTE 'insert into __samples with tmp as ' || '(match (s)-[r:"'||cur.la_name||'"]->(e) ' || 'with s, r, e, random() as ridx order by ridx limit '||cur.la_limit||' ' || 'return s, r, e) select * from tmp;'; end if; end loop; return query select * from __samples; END; $$ LANGUAGE plpgsql;
Let’s execute this function in pgAdmin4.
select * from ag_sample(100); -- same to ag_sample(100,false) select * from ag_sample(100, true); -- use random-selection
This is a screenshot of pgAdmin4 result.
In a case of using random-selection, every result would be changed by random. (for sorting random value, it takes more time)
If you want include specific labels, use part of this code with your modification.
Have a good day.