AgensGraph: graph sampling with random selection

In the previous post, we did make ag_labels() function by #plpgsql. In this time, we will make ag_sample() function that do sampling graph-data by limit N on an edge basis. Also this function has the optional parameter of whether using random-selection.

PL/pgSQL is a loadable procedural language for the PostgreSQL database system. [ref] It adds control structures to the SQL language and can perform complex computations. In #AgensGraph, Cypher procedure can be written by PL/pgSQL. Graph queries become also more complex as to need pre-processing. (Finding sub-query what you want to look up, translating to another graph having different topology and relations, so on.) PL/pgSQL is a basic and powerful analysis tool to graph analysists.

This is a screenshot of executing ag_sample(100) function in #AgensBrowser. It’s very convenient to not have to write a query. There is no need to worry about the structure of graph-data.

Visualization of ag_sample(100) function with no random in AgensBrowser

This is a meta-graph of this result. You get to know that all labels exists.

Let’s list the procedure of a function. It’s a kind of long story. But step by step is easy.

  1. get current “graph_path” setting
  2. get edges’ labels list of graph_path
  3. get size of edges per edges’ label
  4. get total size of all edges and calculate size-limit per edges’ label
  5. create temp table for accumulating sampling graph-data
  6. insert sample-data into temp table as many as each own limit
  7. return temp table as result of function

If you write these procedures in plpgsql, you will see:

-- DROP FUNCTION if exists ag_sample();

CREATE OR REPLACE FUNCTION ag_sample(total_limit int, use_rand boolean default false)
  RETURNS table (s vertex, r edge, e vertex) AS $$
  graph_path text;
  cur record;
  edge_size int;
  edges_cnt int := 0;
  edges_sum bigint := 0;       
  -- get graph_path from user environment
  SELECT setting into graph_path FROM pg_settings WHERE name='graph_path';
  RAISE NOTICE 'graph_path => %', graph_path;

  -- if graph_path does not exists, return null 
  if graph_path is null then 
    return query SELECT WHERE false;
  end if;

  if total_limit  edges.la_size
  for cur in select * from __edges loop
    EXECUTE 'SELECT count(id) FROM '||cur.la_name INTO edge_size;
    update __edges set la_size = edge_size where la_oid = cur.la_oid;
    edges_sum := edges_sum + edge_size;
    edges_cnt := edges_cnt + 1;                     
  end loop;

  -- update edge's ratio by edges_sum
  RAISE NOTICE 'edges = %, total_size = %', edges_cnt, edges_sum;
  update __edges set 
    la_ratio = la_size::float4/edges_sum::float4,
    la_limit = case when floor(total_limit*la_size::float4/edges_sum::float4)  edges.la_size
  for cur in select * from __edges loop
    RAISE NOTICE '%[%] = %(%) => %', cur.la_name, cur.la_oid, cur.la_size, cur.la_ratio, cur.la_limit;
    if use_rand=false then
      EXECUTE 'insert into __samples with tmp as '
         || '(match (s)-[r:"'||cur.la_name||'"]->(e) return s,r,e limit '||cur.la_limit||' ) '
         || 'select * from tmp;';
      EXECUTE 'insert into __samples with tmp as '
         || '(match (s)-[r:"'||cur.la_name||'"]->(e) '
         || 'with s, r, e, random() as ridx order by ridx limit '||cur.la_limit||' '
         || 'return s, r, e) select * from tmp;';
    end if;
  end loop;

  return query select * from __samples;
$$ LANGUAGE plpgsql;

Let’s execute this function in pgAdmin4.

select * from ag_sample(100); -- same to ag_sample(100,false)

select * from ag_sample(100, true);   -- use random-selection

This is a screenshot of pgAdmin4 result.

result of ag_sample(100) function
result of ag_sample(100,true) function

In a case of using random-selection, every result would be changed by random. (for sorting random value, it takes more time)

Visualization of ag_sample(1000, true) in AgensBrowser

If you want include specific labels, use part of this code with your modification.

Have a good day.

답글 남기기

아래 항목을 채우거나 오른쪽 아이콘 중 하나를 클릭하여 로그 인 하세요: 로고

WordPress.com의 계정을 사용하여 댓글을 남깁니다. 로그아웃 /  변경 )

Google+ photo

Google+의 계정을 사용하여 댓글을 남깁니다. 로그아웃 /  변경 )

Twitter 사진

Twitter의 계정을 사용하여 댓글을 남깁니다. 로그아웃 /  변경 )

Facebook 사진

Facebook의 계정을 사용하여 댓글을 남깁니다. 로그아웃 /  변경 )

%s에 연결하는 중

%d 블로거가 이것을 좋아합니다: