Welcome to the Tutorial of Probabilistic Logic Programming!

This tutorial is meant to introduce the reader into the amazing world of Probabilistic Logic Programming

Introduction

Probabilistic Logic Programming (PLP) introduces probabilistic reasoning in Logic Programs in order to represent uncertain information. It is receiving an increased attention due to its applications in particular in the Machine Learning field.
In this tutorial we will show how to use cplint on SWISH, a web application for performing inference and learning on user-defined probabilistic logic programs. You will learn how to write a probabilistic logic program processable by cplint on SWISH, how to execute the different types of queries allowed by this application and how to perform learning.

cplint on SWISH is based on SWISH, a web framework for Logic Programming, and on cplint , a suite of programs for inference and learning of Logic Programs with annotated disjunctions (LPADs) . It keeps the same syntax of cplint and as cplint it uses Logic Programs with annotated disjunctions (LPADs) as formalism to represent probabilistic logic programs.

The tutorial is structured as follows. provides an introduction to LPADs. shows how to write an LPAD processable by cplint on SWISH and how to perform simple and conditional exact inference over LPADs. and illustrate how to perform approximate inference over LPADs. Sections and explain how to perform simple and conditional approximate inference over hybrid programs, i.e., programs where some of the random variables are continuous. describes how to learn the parameters of an LPAD, whereas explains how to perform structure learning. Finally shows how to test a learned program.

Table of Contents


Logic Program with Annotated Disjunction

Definition

A Logic Program with Annotated Disjunction (LPAD) consists of a set of rules of the following form:

h_1 : a_1 ; ... ; h_n : a_n :- b_1, ..., b_m. 

where h_i are atoms, b_i are literals and a_i are real numbers between 0 and 1 such that the sum of all a_i is 1. The set of elements h_i : a_i compose the head of a rule, while the set b_i is the body. Disjunction in the head is represented with a semicolon and atoms in the head are separated from probabilities by a colon. If the head of a rule contains only one element h : 1, we can simply write this element as h, i.e. the clause takes the form of a normal prolog clause. Therefore

h : 1 :- b_1, ..., b_m.

is equivalent to

h :- b_1, ..., b_m.

If the clause has an empty body, it can be represented like this:

h_1 : a_1 ; ... ; h_n : a_n. 

If the sum of all the a_i is smaller than 1, an extra disjunct null is assumed with probability 1 - sum(a_i). Therefore

h_1 : 0.5 ; h_2 : 0.2 :- b_1, ..., b_m. 

is equivalent to

null : 0.3 ; h_1 : 0.5 ; h_2 : 0.2 :- b_1, ..., b_2.

Writing an LPAD and Asking a Query with Exact Inference

In this section the basic features of cplint on SWISH will be illustrated. You will learn:
  • how to write your first probabilistic logic program processable by cplint on SWISH that follows the syntax of LPADs;
  • how to submit a simple or a conditional query;
  • how to obtain the histogram of a query

We consider as example a program which models the fact that if somebody has the flu and the climate is cold, there is the possibility that an epidemic or a pandemic arises. We are uncertain about whether the climate is cold but we know for sure that David and Robert have the flu (based on ).

Epidemic Example: Writing the Program Step by Step

The rule that we want to write is the one which states that, if somebody has the flu and the climate is cold, an epidemic arises with 60% probability, a pandemic arises with 30% probability, whereas we have a 10% probability that neither an epidemic nor a pandemic arises. We can write

epidemic : 0.6; pandemic : 0.3; null: 0.1 :- flu(_), cold.

As we said in Section Logic Program with Annotated Disjunction, the null atom can be implicit. Therefore the previous rule, without changing its meaning, can be written

epidemic : 0.6; pandemic : 0.3 :- flu(_), cold.

The following probabilistic fact says that the weather is cold with a 70% probability. Note that the null atom is implicit here as well.

cold : 0.7.

Now we assert that David and Robert have the flu.

flu(david).
flu(robert).

Epidemic Example: Full Program

The program is almost complete, what we need now is to load the library pita in order to perform exact inference (for further information about PITA see ). Therefore we import this library with the built-in predicate use_module/1. So we need to write

:- use_module(library(pita)).

Moreover, after :- use_module(library(pita)) we need to write :- pita. in order to initalize the library and the program should be enclosed by :- begin_lpad. and :- end_lpad. (respectively at the begin and at the end of the program). These goals are mandatory to initialize the inference system.

The full LPAD of this example is shown below.

% load the 'pita' library to perform inference
:- use_module(library(pita)).
:- pita.
% to be written before the program
:- begin_lpad.
epidemic : 0.6; pandemic : 0.3 :- flu(_), cold.
cold : 0.7.
flu(david).
flu(robert).
% to be written after the program
:- end_lpad.

How to Execute a Simple Query

Ok, we have our program, now what?! Now it’s time to submit some queries!

To query a program you must use the prob/2 predicate with the following syntax

prob(:Query:atom,-Probability:float).

Query is an atom that represents the query that we want to perform, while Probability is the variable that will contain the probability that Query is true (a float between 0 and 1).

For instance, let us ask for the probability that an epidemic arises. To know it we just have to submit the following query.

prob(epidemic, P).

If you click on the icon a frame containing cplint on SWISH will be open, then you can submit the query by clicking on the "Run" button.
You can close the frame by clicking on the icon

How to Execute an Conditional Query

cplint on SWISH allows to ask conditional queries with the predicate prob/3

prob(:Query:atom,:Evidence:atom,-Probability:float).

For instance, we can ask for the probability that an epidemic arises given that outside is cold

prob(epidemic, cold, P).

How to Execute a Query with Graphical Results

cplint on SWISH can show the probabilistic results of a query as histograms using predicate bar/2.
The syntax is

bar(+Probability:float,-Chart:dict).

Where Probability is the probability of the query and Chart is a dictionary that will contain a bar chart with two bars, one for the probability of the atom of being true and one for the probability of the atom of being false (1- true_probability). After the submission of the query a graphical bar chart of the two values will be plotted.

However, before submitting this kind of query, we need to specify that we want to use the renderer c3 by adding the following line before the :- begin_lpad. goal

:- use_rendering(c3).

Moreover cplint on SWISH allows to draw the Binary Decision Diagram (BDD) used to perform queries with exact inference. A BDD represents the explanations of a query.
The syntax is the following:

bdd_dot_string(:Query:atom,-DotString:string,-Var:list).

A solid edge indicates a 1-child, a dashed edge indicates a 0-child and a dotted edge indicates a negated 0-child. Each level of the BDD is associated to a variable of the form XI_J indicated on the left: I indicates the multivalued variable index and J the index of the Boolean variable of I. The table Var contains the associations between the rule groundings and the multivalued variables: the first column contains the multivalued variable index, the second column contains the rule index, corresponding to its position in the program, and the last column contains the list of constants grounding the rule, each replacing a variable in the order of appearance in the rule.

However, before submitting bdd_dot_string/3, we have to specify that we want to use the renderers graphviz and table.

:- use_rendering(graphviz).
:- use_rendering(table,[header(['Multivalued variable index','Rule index','Grounding substitution'])]).

This render plugins are necessary to plot the BDD and to show what is contained in Var as a table.

Therefore our program becomes

% load the 'pita' library to perform inference
:- use_module(library(pita)).
:- pita.
% allows to create graphical results
:- use_rendering(c3).
:- pita.
% the following renderers allow to correctly show the BDDs
:- use_rendering(graphviz).

% to be written before the program
:- begin_lpad.
epidemic : 0.6; pandemic : 0.3 :- flu(_), cold.
cold : 0.7.
flu(david).
flu(robert).
% to be written after the program
:- end_lpad.

Let us see the histogram of the first query of the example (simple inference)

prob(epidemic, P),bar(P,G).

If we want to see the histogram of the second query (conditional query)

prob(epidemic, cold, P),bar(P,G).

If we want to see the BDD of the first query of the example (simple inference)

bdd_dot_string(epidemic, BDD, Var).

This example shows that conclusions from different groundings of a rule are combined with a noisy or rule: the probability of an epidemic is obtained by combining with noisy or the conclusions of the two groundings of the rule where the only variable is replaced by David or Robert. So the probability of an epidemic if cold is true is 0.6+0.6-0.6*0.6=84. Since cold is also uncertain, the overall probability is 0.84*0.7=0.588.


Complete example: epidemic.pl


Approximate Inference

In this example we will show how to perform an approximate inference with Monte Carlo sampling. In addition, as in the exact inference setting, it is possible to plot an histogram representing the probabilistic values.

To show these features we exploit the Markov chain example . In this example we want to know what is the likelihood that on an execution of a Markov chain from a start state 's', a final state 't' will be reached? The chains may be infinite so the query may have an infinite number of explanations and if we want exact inference and use PITA, the inference may not terminate. To ensure termination, we have two solutions. We may either fix a bound on the depth of the derivations of PITA by setting the parameters

:- set_pita(depth_bound,true).
:- set_pita(depth,<level of depth (integer)>).

(see exact inference variant of this example). Alternatively, MCINTYRE can be used.

Here we will use the latter approach.

Markov Chain Example: Full Program

Below the full program of this example is shown

% load the library ‘mcintyre’ to perform approximate inference
:- use_module(library(mcintyre)).
% load the renderer ‘c3’ for graphical results
:- use_rendering(c3).
% initialize the library 'mcintyre'
:- mc.
% to be written before the program
:- begin_lpad.
reach(S, I, T) :-
trans(S, I, U),
reach(U, next(I), T).
reach(S, _, S).
trans(s0,S,s0):0.5; trans(s0,S,s1):0.3; trans(s0,S,s2):0.2.
trans(s1,S,s1):0.4; trans(s1,S,s3):0.1; trans(s1,S,s4):0.5.
trans(s4,_,s3).
% to be written after the program
:- end_lpad.

To execute queries we must use the predicates mc_prob/2.

mc_prob(:Query:atom,-Probability:float).

We ask for the probability that starting at state 's0' at instance 0, state 's3' is reachable

mc_prob(reach(s0,0,s3),P).

if we want to see the probability histogram

mc_prob(reach(s0,0,s3),P),bar(P,G).

Sampling a Query

With MCINTYRE, you can also take a given number of sample with

mc_sample(:Query:atom,+Samples:int,-Probability:float).

For example this query

mc_sample(reach(s0,0,s3),1000,P).

samples reach(s0,0,s3) 1000 times and returns in P the estimated probability.

We can obtain a bar chart of the samples with the predicate bar/2 (note: remember to load the renderer c3):

mc_sample(reach(s0,0,s3),1000,P),bar(P,G).

We can also sample arguments of queries with the predicate mc_sample_arg/4

mc_sample_arg(:Query:atom,+Samples:int,?Arg:var,-Values:list).

The predicate samples Query a number of Samples times. Arg should be a variable in Query. The predicate returns in Values a list of couples L-N where L is the list of values of Arg for which Query succeeds in world sampled at random and N is the number of samples. If L is the empty list, it means that for that sample the query failed. If L is a list with a single element, it means that for that sample the query is determinate. If, in all couples L-N, L is a list with a single element, it means that the clauses in the program are mutually exclusive, i.e., that in every sample, only one clause for each subgoal has the body true.

So for example we may sample the argument S of reach(s0,0,S) with

mc_sample_arg(reach(s0,0,S),50,S,Values).

If we want to see the bar graph of this sampling we use the predicate argbar/2

argbar(+List:list,-Chart:dict).

For example

mc_sample_arg(reach(s0,0,S),50,S,List),argbar(List,Chart).

Moreover, we can sample arguments of queries with the predicate mc_sample_arg_first/4

mc_sample_arg_first(:Query:atom,+Samples:int,?Arg:var,-Values:list)

that returns in Values a list of couples V-N where V is the value of Arg returned as the first answer by Query in a world sampled at random and N is the number of samples returning that value. V is failure if the query fails. mc_sample_arg_first/4 differs from mc_sample_arg/4 because the first just computes the first answer of the query for each sampled world.

So for example we may sample 50 times the first answer for S in reach(s0,0,S) with

mc_sample_arg_first(reach(s0,0,S),50,S,Values).

Complete example: markov_chain.pl


Computing Expectations

With cplint on SWISH it is possible to compute the expectation of a random variable.
In this example we want to perform model checking of the Synchronous Leader Election Protocol expressed in Probabilistic Computation Tree Logic (PCTL).
Given a synchronous ring of N processes the Synchronous Leader Election Protocol is used to elect a leader (a uniquely designated processor) by sending messages around the ring. The protocol proceeds in rounds and is parametrised by a constant K. Each round begins by all processors (independently) choosing a random number (uniformly) from {1,…,K} as an id. The processors then pass their ids around the ring. If there is a unique id, then the processor with the maximum unique id is elected the leader, and otherwise the processors begin a new round.

Synchronous Leader Election Protocol example: full program

The full program of this example is

:- use_module(library(mcintyre)).

:- if(current_predicate(use_rendering/1)).
:- use_rendering(c3).
:- endif.
:- dynamic kr/1,num/1.
:- mc.

:- begin_lpad.

% State Formulae 
models(_S, tt,_Hist,_Limit,_Time).
models(S, prop(P),_Hist,_Limit,_Time) :-
    proposition(P, S).
models(S, and(F1, F2),Hist,Limit,Time) :-
    models(S, F1,Hist,Limit,Time), models(S, F2,Hist,Limit,Time).
models(S, or(F1, _F2),Hist,Limit,Time) :-
    models(S, F1,Hist,Limit,Time).
models(S, or(F1, F2),Hist,Limit,Time) :-
    \+ models(S, F1,Hist,Limit,Time),
    models(S, F2,Hist,Limit,Time).
models(S, not(F), Hist,Limit,Time) :-
    \+ models(S, F,Hist,Limit,Time).
models(S, prob_until(comp(Op, P), F1, F2),Hist,Limit,Time) :-
    mc_sample(pmodels(S, until(F1, F2),Hist,Limit,Time),20, Q),
    comp(Q, Op, P).
models(S, prob_next(comp(Op, P), F),Hist,Limit,Time) :-
    mc_sample(pmodels(S, next(F),Hist,Limit,Time),20, Q),
    comp(Q, Op, P).

comp(Q,>,P):-
  Q>P.

comp(Q,>=,P):-
  Q>=P.

comp(Q,<,P):-
  Q<P.

comp(Q,=<,P):-
  Q=<P.


% Path Formulae
pmodels(S,F):-
  pmodels(S,F,[],nolimit,0,_Time).

pmodels(S,F,Hist,Limit,Time):-
  pmodels(S,F,Hist,Limit,Time,_Time).

pmodels(S, until(_F1, F2),Hist,Limit,Time,Time) :-
    models(S, F2,Hist,Limit,Time),!.
    
pmodels(S, until(F1, F2),Hist0,Limit,Time0,Time) :-
    within_limit(Time0,Limit),
    models(S, F1,Hist0,Limit,Time0),
    ctrans(S, _, T, Hist0, Hist),!,
    Time1 is Time0+1,
    pmodels(T, until(F1,F2),Hist,Limit,Time1,Time).

pmodels(S, next(F), Hist,Limit,Time0,Time) :-
    within_limit(Time0,Limit),
    ctrans(S, _, T, Hist, _),!,
    Time is Time0+1,
    models(T, F,Hist,Limit,Time).

within_limit(_Time,nolimit):-!.

within_limit(Time,Limit):-
  Time<Limit.

bounded_eventually(Prop,Rounds):-
  num(N),
  B is Rounds*(N+1),
  eventually(Prop,B,_T).

eventually(Prop):-
  eventually(Prop,_T).

eventually(Prop,Rounds):-
  eventually(Prop,nolimit,T),
  num(N),
  Rounds is T/(N+1).

eventually(Prop,Limit,T) :-
    init(S),
    pctlspec(Prop, F),
    pmodels(S, F,[],Limit,0,T).


pctlspec(X, until(tt, prop(X))).
proposition(P, S) :- final(P, S).

final(elect, [_|L]) :-
    num(N),
    gen_elected_state(N, L).

gen_elected_state(J, L) :-
    (J==0
    ->    L=[]
    ;     L = [state(3,_,_,_)|Rest],
          J1 is J-1,
          gen_elected_state(J1,Rest)
    ).
    

% transitions
% module counter
% [read] c<N-1 -> (c'=c+1);
% reading
trans(counter, counter(C), read, counter(D),_S,H,H) :-
  num(N),
  C < N-1,
  D is C+1.

% [read] c=N-1 -> (c'=c);
% finished reading
trans(counter, counter(C), read, counter(C),_S,H,H) :-
  num(N),
  C =:= N-1.

% [done] u1 | u2 | u3 | u4 -> (c'=c);
% done
trans(counter, counter(C), done, counter(C),S,H,H) :-
  get_processid(P), 
  nonlocal(process(P,_), uniqueid, 1,S).
   

% [retry] !(u1 | u2 | u3 | u4) -> (c'=1);
% pick again reset counter 
trans(counter, counter(_C), retry, counter(1),S,H,H) :-
        findall(P,get_processid(P),PL),
    maplist(nl(S),PL).

% [loop] s1=3 -> (c'=c);
% loop (when finished to avoid deadlocks)
trans(counter, counter(C), loop, counter(C),S,H,H) :-
  nonlocal(process(1,_), state, 3,S).

% module process
% local state
% s1=0 make random choice
% s1=1 reading
% s1=2 deciding
% s1=3 finished

% [pick] s1=0 -> 1/K : (s1'=1) & (p1'=0) & (v1'=0) & (u1'=true) + ...;
% pick value
trans(process(_N,_Next), state(0,_,_,_), pick, state(1,1,R,R),_S,H,[pick(R)|H]) :-
  pick(H,R).

%read 
% [read] s1=1 &  u1 & !p1=v2 & c<N-1 -> (u1'=true) & (v1'=v2);
trans(process(_N,Next), state(1,1,_,P), read, state(1,1,V,P),S,H,H) :-
  nonlocal(counter, counter, C,S),
  num(CN),
  C < CN - 1,
  nonlocal(process(Next,_), value, V,S),
  P \== V.

% [read] s1=1 &  u1 &  p1=v2 & c<N-1 -> (u1'=false) & (v1'=v2) & (p1'=0);
trans(process(_N,Next), state(1,1,_,P), read, state(1,0,V,0),S,H,H) :-
  nonlocal(counter, counter, C,S),
  num(CN),
  C < CN - 1,
  nonlocal(process(Next,_), value, V,S),
  P == V.

% [read] s1=1 & !u1 &  c<N-1 -> (u1'=false) & (v1'=v2);
trans(process(_N,Next), state(1,0,_,P), read, state(1,0,V,P),S,H,H) :-
  nonlocal(counter, counter, C,S),
  num(CN),
  C < CN - 1,
  nonlocal(process(Next,_), value, V,S).
 
% read and move to decide 
% [read] s1=1 &  u1 & !p1=v2 & c=N-1 -> (s1'=2) & (u1'=true) & (v1'=0) & (p1'=0);
trans(process(_N,Next), state(1,1,_,P), read, state(2,1,0,0),S,H,H) :-
  nonlocal(counter, counter, C,S),
  num(CN),
  C =:= CN - 1,
  nonlocal(process(Next,_), value, V,S),
  P \== V.

% [read] s1=1 &  u1 &  p1=v2 & c=N-1 -> (u1'=false) & (v1'=0) & (p1'=0);
trans(process(_N,Next), state(1,1,_,P), read, state(2,0,0,0),S,H,H) :-
  nonlocal(counter, counter, C,S),
  num(CN),
  C =:= CN - 1,
  nonlocal(process(Next,_), value, V,S),
  P == V.

% [read] s1=1 & !u1 &  c=N-1 -> (s1'=2) & (u1'=false) & (v1'=0);
trans(process(_N,_Next), state(1,0,_,P), read, state(2,0,0,P),S,H,H) :-
  nonlocal(counter, counter, C,S),
  num(CN),
  C =:= CN - 1.

% done
% [done] s1=2 -> (s1'=3) & (u1'=false) & (v1'=0) & (p1'=0);
trans(process(_N,_Next), state(2,_,_,_), done, state(3,0,0,0),_S,H,H).

% retry
% [retry] s1=2 -> (s1'=0) & (u1'=false) & (v1'=0) & (p1'=0);
trans(process(_N,_Next), state(2,_,_,_), retry, state(0,0,0,0),_S,H,H).

% loop (when finished to avoid deadlocks)
% [loop] s1=3 -> (s1'=3);
trans(process(_N,_Next), state(3,U,V,P), loop, state(3,U,V,P),_S,H,H).

pick(H,V):-
  kr(K),
  K1 is K-1,
  PH is 1/K,
  findall(I,between(0,K1,I),L),
  foldl(pick_value(H,PH),L,(1,_),(_,V)).

pick_value(_H,_PH,_I,(P0,V0),(P0,V0)):-
  nonvar(V0).

pick_value(H,PH,I,(P0,V0),(P1,V1)):-
  var(V0),
  PF is PH/P0,
  (pick_fact(H,V0,PF)->
    P1=PF,
    V1=I
  ;
    P1 is P0*(1-PF),
    V1=V0
  ).

pick_fact(_,_,P):P.

%pick(H,0):0.5; pick(H,1):0.5.

ctrans(S, A, T, Hi, Ho) :-
    config(P),
    find_matching_trans(P,S,S,[],A,T,Hi,Ho).

find_matching_trans([], [], _CS, _PA, A, [], H,H) :- nonvar(A).
find_matching_trans([P|Ps], [S|Ss], CS, PA, A, [T|Ts],Hi,Ho) :-
    pick_trans(P, S, CS, PA, A, T, Hi,H1),
    find_matching_trans(Ps, Ss, CS, PA, A, Ts,H1,Ho).
find_matching_trans([P|Ps], [S|Ss], CS, PA, A, [S|Ts], Hi,Ho) :-
    % skip current process; but then all transitions enabled in the current
    % process will be prohibited for selection in later processes.
    enabled_trans(P,L),
    (nonvar(A) -> \+ member(A,L); true),
    append(L, PA, PA1),
    find_matching_trans(Ps, Ss, CS, PA1, A, Ts, Hi, Ho).

pick_trans(P, S, CS, PA, A, T, H0,H) :-
    etrans(P, S, PA, A, T,CS, H0,H).

etrans(P, S, PA, A, T, CS,H0,H) :-
    trans(P, S, A, T,CS,H0,H),
    A \= epsilon,
    \+ member(A, PA).

enabled_trans(P, L) :-
    setof(A, enabled_trans_in_process(P,A), L).

enabled_trans_in_process(P,A) :-
    clause(trans(P,_,A,_,_,_,_),_),
    A \= epsilon.

nonlocal(Proc, Var, Val,CS) :-
    getpid(Proc, Var, Pid, Idx),
    nth1(Pid, CS, State),
    arg(Idx, State, Val).
%   writeln(nonlocal_read(Proc, State, Var, Val)).

getpid(Proc, Var, Pid, Idx) :-
    config(Config),
    nth1(Pid, Config, Proc),
    nonlocal_access(Proc, Var, Idx).

get_processid(P):-
  num(N),
  between(1,N,P).

config([counter| L]) :-
    findall(P,get_processid(P),PL),
    maplist(neighbor,PL,L).

neighbor(I,process(I,J)) :-
    num(N),
    (I<N
  ->
      J is I+1
    ;   J = 1
    ).

%config([counter, process(1,2), process(2,3), process(3,4), process(4,1)]).

init(S) :-
    config(P),
    maplist(init_state,P,S).

init_state(counter, counter(1)).
init_state(process(_,_), state(0,0,0,0)).

nonlocal_access(counter, counter, 1).
nonlocal_access(process(_,_), state, 1).
nonlocal_access(process(_,_), uniqueid, 2).
nonlocal_access(process(_,_), value, 3).

nl(S,P):-
  nonlocal(process(P, _), uniqueid, 0,S).

num(4).
kr(4).


:- end_lpad.

graph_prob(G):-
  retract(num(N)),
  retract(kr(K)),
  assert(num(4)),
  assert(kr(2)),
  findall(L-P,
    (between(1,6,L),mc_sample(bounded_eventually(elect,L),100,P)),LV),
  G=c3{data:_{x:x, rows:[x-'Probability of leader elected within L rounds (N=4, K=2)'|LV]},%legend:_{show: false},
    axis:_{x:_{min:1,max:6,label:'L',padding:_{bottom:0.0,top:0.0}},
           y:_{label:'Probability',padding:_{bottom:0.0,top:0.0}}}},
  retract(num(4)),
  retract(kr(2)),
  assert(num(N)),
  assert(kr(K)).

graph_exp_rounds_n(G):-
  retract(num(NI)),
  retract(kr(KI)),
  assert(kr(3)),
  findall(N-E,
    (between(3,8,N),
     assert(num(N)),
     mc_expectation(eventually(elect,T),100,T,E),
     retract(num(N))),
    LV),
  G=c3{data:_{x:x, rows:[x-'Expected rounds to elect a leader (K=3)'|LV]},%legend:_{show: false},
    axis:_{x:_{min:3,max:8,label:'N',padding:_{bottom:0.0,top:0.0}},
           y:_{label:'Expected rounds',padding:_{bottom:0.0,top:0.0}}}},
  retract(kr(3)),
  assert(num(NI)),
  assert(kr(KI)).

graph_exp_rounds_k(G):-
  retract(num(NI)),
  retract(kr(KI)),
  assert(num(3)),
  findall(K-E,
    (between(3,8,K),
     assert(kr(K)),
     mc_expectation(eventually(elect,T),500,T,E),
     retract(kr(K))),
    LV),
  G=c3{data:_{x:x, rows:[x-'Expected rounds to elect a leader (N=3)'|LV]},%legend:_{show: false},
    axis:_{x:_{min:3,max:8,label:'K',padding:_{bottom:0.0,top:0.0}},
           y:_{label:'Expected rounds',padding:_{bottom:0.0,top:0.0}}}},
  retract(num(3)),
  assert(num(NI)),
  assert(kr(KI)).

How to compute the expectations

We consider the problem of computing expectations. We would like to compute the expected number of rounds to elect a leader.
We can compute expectations with

mc_expectation(:Query:atom,+N:int,?Arg:var,-Exp:float).

that computes the expected value of Arg in Query by sampling. It takes N samples of Query and sums up the value of Arg for each sample. The overall sum is divided by N to give Exp.

An example of use of the above predicate is

mc_expectation(eventually(elect,T),1000,T,E).

that returns in E the expected value of rounds necessary to elect a leader computed by taking 1000 samples.


Complete example: pctl_slep.pl


Approximate conditional inference with Rejection Sampling and Metropolis-Hastings MCMC

In this example we want to show how to perform conditional inference in an approximate way using sampling. In particular, we will show how to use rejection sampling and Metropolis-Hastings.

This example generatively defines a random arithmetic function. The problem is to predict the value returned by the function given one or two couples of input-output, i.e., to compute a conditional probability. This program is translated from the example in the Church functional probabilistic programming language. Sampling is necessary as queries have an infinite number of explanations.

Random Arithmetic Function: Full Program

The full program of this example is

:- use_module(library(mcintyre)).

:- if(current_predicate(use_rendering/1)).
:- use_rendering(c3).
:- endif.

:- mc.

:- begin_lpad.

eval(X,Y):-
  random_fn(X,0,F),
  Y is F.

op(+):0.5;op(-):0.5.

random_fn(X,L,F):-
  comb(L),
  random_fn(X,l(L),F1),
  random_fn(X,r(L),F2),
  op(Op),
  F=..[Op,F1,F2].

random_fn(X,L,F):-
  \+ comb(L),
  base_random_fn(X,L,F).

comb(_):0.3.

base_random_fn(X,L,X):-
  identity(L).

base_random_fn(_X,L,C):-
  \+ identity(L),
  random_const(L,C).

identity(_):0.5.

random_const(L,0):0.1;random_const(L,1):0.1;random_const(L,2):0.1;
random_const(L,3):0.1;random_const(L,4):0.1;random_const(L,5):0.1;
random_const(L,6):0.1;random_const(L,7):0.1;random_const(L,8):0.1;
random_const(L,9):0.1.

:- end_lpad.

We know that the random function returns 3 for input 1 and we want to compute the probability that it returns 4 for input 2. We thus need to ask a conditional query and sampling is necessary as queries have an infinite number of explanations.
The simplest approach is to use rejection sampling: you first query the evidence and, if the query is successful, query the goal in the same sample, otherwise the sample is discarded.
This can be done with

mc_rejection_sample(:Query:atom,:Evidence:atom,+Samples:int, -Probability:float,+Options:list) is det where Options is a list of options, the following are recognised by mc_sample_arg/5: successes(-Successes:int) Number of succeses failures(-Failures:int) Number of failueres
mc_rejection_sample(:Query:atom,:Evidence:atom,+Samples:int,
  -Probability:float,+Options:list).

that takes Samples samples of Query given that Evidence is true.

Options is a list of options, the following are recognised:

  • successes(-successes:int) Number of successes
  • failures(-failures:int) Number of failures

An example of use of the above predicate is

mc_rejection_sample(eval(2,4),eval(1,3),1000,P,[]).

that perform rejection sampling of eval(2,4) given that eval(1,3) is true.

Differently from exact inference, in approximate inference the evidence can be a conjunction of atoms, so if you also know that eval(0,2) is true, you can use

mc_rejection_sample(eval(2,4),(eval(0,2),eval(1,3)),1000,P,[]).

and, as you can see, the query with more evidence is now almost certainly true.

In Metropolis-Hastings MCMC, a Markov chain is produced using the algorithm of : after a sample, a number of sampled probabilistic choices are deleted and the others are retained for the next sample. The sample is accepted with a probability of min{1,N0/N1} where N0 is the number of choices sampled in the previous sample and N1 is the number of choices sampled in the current sample. Metropolis-Hastings is usually much faster than rejection sampling because less samples are discarded.

To use Metropolis-Hastings, the following predicate is available

mc_mh_sample(:Query:atom,:Evidence:atom,+Samples:int,
  -Probability:float,+Options:list).

where Options is a list of options, the following are recognised:

  • mix(+Mix:int) The first Mix samples are discarded (mixing time), default value 0
  • lag(+lag:int) lag between each sample, Lag sampled choices are forgotten, default value 1
  • successes(-successes:int) Number of successes
  • failures(-failures:int) Number of failures
  • bar(-BarChar:dict) BarChart is a bar chart of the possible values

where Lag is the number of sampled choices to forget before taking a new sample. For example

mc_mh_sample(eval(2,4),eval(1,3),10000,P).

takes 10000 accepted samples and returns in P the estimated probability.

Sampling the arguments

You can sample arguments of queries with rejection sampling and Metropolis-Hastings MCMC using
mc_rejection_sample_arg(:Query:atom,:Evidence:atom,+Samples:int,
  ?Arg:Var,-Values:list,+Options:list).
mc_mh_sample_arg(:Query:atom,:Evidence:atom,+Samples:int,
  ?Arg:Var,-Values:list,+Options:list).

that return the distribution of values for Arg in Query in Samples of Query given that Evidence is true. The predicate returns in Values a list of couples L-N where L is the list of values of Arg for which Query succeeds in a world sampled at random where Evidence is true and N is the number of samples returning that list of values.

For example if we want to sample arg Y of eval(2,Y) given that eval(0,2) and eval(1,3) are true with Metropolis-Hastings MCMC

mc_mh_sample_arg(eval(2,Y),eval(1,3),1000,Y,V,[]).

The printed results are pretty ugly. For visualizing the results of sampling arguments as bar chart you can use argbar/2

If we want to plot the bar chart of the previous query

mc_mh_sample_arg(eval(2,Y),eval(1,3),1000,Y,L),argbar(L,C).

Compute conditional expectations

You can also compute conditional expectations with

mc_mh_expectation(:Query:atom,:Evidence:atom,+N:int,?Arg:var,-Exp:float,+Options:list).

as in

mc_mh_expectation(eval(2,Y),eval(1,3),1000,Y,E,[]).

that computes the expectation of argument Y of eval(2,Y) given that eval(1,3) is true by taking 1000 samples using Metropolis-Hastings MCMC.


Complete example: arithm.pl


Unconditional Queries over Hybrid Programs

Up to now we have considered only discrete random variables and discrete probability distributions. Here we consider Hybrid Probabilistic Logic Programs, where some of the random variables are continuous.

How can we consider continuous random variables and probability density functions, for example real variables following a Gaussian distribution?

cplint on SWISH with its sampling inference module allows the specification of density functions over arguments of atoms in the head of rules. To specify a probability density on an argument Var of an atom A you can used rules of the form

A : Density :- Body.

where Density is a special atom identifying a probability density on variable Var and Body (optional) is a regular clause body. Allowed Density atoms are:

  • uniform(Var,L,U): Var is uniformly distributed in \([L,U]\)

  • gaussian(Var,Mean,Variance): Var follows a Gaussian distribution with mean Mean and variance Variance

  • dirichlet(Var,Par): Var is a list of real numbers following a Dirichlet distribution with parameters \(\alpha\) specified by the list Par

  • gamma(Var,Shape,Scale): Var follows a gamma distribution with shape parameter Shape and scale parameter Scale.

  • beta(Var,Alpha,Beta): Var follows a beta distribution with parameters Alpha and Beta.

This syntax can be used to describe also discrete distribution, with either a finite or countably infinite support:

  • discrete(Var,D) or finite(Var,D): A is an atom containg variable Var and D is a list of couples Value:Prob assigning probability Prob to Value

  • uniform(Var,D): A is an atom containg variable Var and D is a list of values each taking the same probability (1 over the length of D).

  • poisson(Var,Lambda): Var follows a Poisson distribution with parameter Lambda.

g(X) : gaussian(X,0,1).

states that argument X of g(X) follows a Gaussian distribution with mean 0 and variance 1.

In this section we will see how to perform simple queries over hybrid programs.
If an atom encodes a continuous random variable (such as g(X) above), asking the probability that a ground instantiation, such as g(0.3), is true is not meaningful, as the probability that a continuous random variables takes a specific value is always 0. In this case you are more interested in computing the distribution of X of a goal g(X), possibly after having observed some evidence.

Sampling the Arguments of Unconditional Queries over Hybrid Programs

If the query is unconditional, we can use approximate inference with Monte Carlo sampling as described in the section . When you have continuous random variables, you may be interested in sampling arguments of goals representing continuous random variables. In this way you can build a probability density of the sampled argument. To do that you can use the predicate mc_sample_arg/4.
As example let us consider a program the following program.

Mixture of two Gaussians: Full Program

A biased coin is thrown, if it lands heads, X in mix(X) is sampled from a Gaussian with mean 0 and variance 1. If it lands tails, X is sampled from a Gaussian with mean 5 and variance 2.

:- use_module(library(mcintyre)).
:- use_rendering(c3).
:- mc.
:- begin_lpad.
% a coin is thrown. The coin is biased: with probability 0.6 it lands heads,
% with probabiity 0.4 it lands tails
heads:0.6;tails:0.4. 
% X in g(X)  follows a Gaussian distribution with mean 0 and variance 1
g(X): gaussian(X,0,1).
% X in h(X)  follows a Gaussian distribution with mean 5 and variance 2
h(X): gaussian(X,5,2).
% if the coin lands heads, X in mix(X) is given by g(X)
mix(X) :- heads, g(X).
% if the coin lands tails, X in mix(X) is given by h(X)
mix(X) :- tails, h(X).
:- end_lpad.

For example we can now perform the query

mc_sample_arg(mix(X),1000,X,Values).

Values will contain a list of couples L-N where L is the list of values of X for which query succeeds in a world sampled at random and N is the number of samples returning L.
Notice that, in every couple L-N, L will contain just one element and N will be always 1. This is because the random variable X is continuous and mix(X) always succeeds exactly once, therefore the predicate mc_sample_arg/4 will sample 1000 different worlds and every world will have a different value for X.

Of course watching the resulting values as a list of numerical values could make you feel the pesky sensation of your eyes melting.
It would be better if we could use some kind of graphical feature to represent the results. In other words our desire is to draw the probability density function of the argument. cplint on SWISH provides the predicate histogram/3 that comes in handy for these situations.

histogram(+List:list,-Chart:dict,+Options:list).

where Options is a list of options, the following are recognised:

  • min(+Min:float) the minimum value of domain, default value the minimum in List
  • max(+Max:float) the maximum value of domain, default value the maximum in List
  • nbins(+NBins:int) the number of bins for dividing the domain, default value 40

This predicate draws a vertical histogram of the samples in List dividing the domain in NBins bins. List must be a list of couples of the form L-W where L is a list of sampled values and W is its weight.
So we can use the output list Values of mc_sample_arg/4 as input list List for histogram/3. In our example, if we want to plot the probability density function of X, we can use

mc_sample_arg(mix(X),1000,X,_Values), histogram(_Values,Chart,[]).

Note: we wrote _Values instead of Values because we are not interested in printing the results of the sampling.


Complete examples: gaussian_mixture.pl


Conditional Queries over Hybrid Programs

cplint on SWISH also allows to execute conditional query over hybrid programs.

Sampling the Arguments of Conditional Queries over Hybrid Programs

As in the previous section we are interested in sampling arguments of goals representing continuous random variables and drawing a probability density of the sampled argument.
To perform this kind of query we must distinguish three cases depending on what type of evidence we have:

  1. The evidence does not contain atoms with continuous random variables (the probability of evidence is different from 0).
  2. The evidence contains atoms with continuous random variables, but its probability is not zero.
  3. The evidence contains the grounding of atoms with continuous random variables (its probability is 0).
For the first two cases you can use the predicates mc_rejection_sample_arg/6 and mc_mh_sample_arg/6, the predicates for rejection sampling and Metropolis-Hastings MCMC (see section )

Case 1

Let us consider the same program of the previous section
We want to take 1000 samples of X in mix(X) given that heads was true using rejection sampling and Metropolis-Hastings MCMC

mc_rejection_sample_arg(mix(X),heads,1000,X,Values,[]).
mc_mh_sample_arg(mix(X),heads,1000,X,Values,[lag(2)]).

To plot the distribution of the arguments we can use histogram/3.

mc_rejection_sample_arg(mix(X),heads,1000,X,_V), histogram(_V,Chart,[]).
mc_mh_sample_arg(mix(X),heads,1000,X,_V,[lag(2)]), histogram(_V,Chart,[]).

Case 2

Let us consider the same program of the previous section
We want to take 1000 samples of X in mix(X) given that X>2 was true using rejection sampling and draw an histogram with 40 bins representing the probability density of X. To do that we can ask the query

mc_rejection_sample_arg(mix(X),(mix(Y),Y>2),1000,X,_V,[]), histogram(_V,Chart,[]).

We can do the same by using Metropolis-Hastings MCMC

mc_mh_sample_arg(mix(X),(mix(Y),Y>2),1000,X,_Values,[lag(2)]), 
  histogram(_Values,Chart,[]).

Case 3

When you have evidence on ground atoms that have continuous values as arguments, you cannot use neither rejection sampling nor Metropolis-Hastings, as the probability of the evidence is 0, but you need to use likelihood weighting to obtain samples of continuous arguments of a goal.
Let us take as example the following program.

Posterior estimation in Bayesian models: Full Program

We are trying to estimate the true value of a Gaussian distributed random variable, given some observed data. The variance is known (2) and we suppose that the mean has a Gaussian distribution with mean 1 and variance 5. We take different measurement (e.g. at different times), indexed with an integer. Given that we observe 9 and 8 at indexes 1 and 2, how does the distribution of the random variable (value at index 0) changes with respect to the case of no observations?

:- use_module(library(mcintyre)).
:- use_rendering(c3).
:- mc.
:- begin_lpad.
% at time I we see X sampled from a Gaussian with mean M and variance 2.0
val(I,X) :- 
  mean(M),
  measurement(I,M,X).
% Gaussian distribution of the mean of the Gaussian of the variable
mean(M): gaussian(M,1.0, 5.0).
% Gaussian distribution of the variable
measurement(_,M,X): gaussian(X,M, 2.0).
:- end_lpad.

As in the previous cases we are interested in finding the posterior probability density function of an argument. To do that we can use the predicate

mc_lw_sample_arg(:Query:atom,:Evidence:atom,+N:int,?Arg:var,-ValList)

This predicate returns in ValList a list of couples V-W where V is a value of Arg for which Query succeeds and W is the weight computed by likelihood weighting according to Evidence (a conjunction of atoms is allowed here).
For example we may ask given that we observe 9 and 8 at indexes 1 and 2, what is the distribution of the random variable (value at index 0)?

mc_lw_sample_arg(val(0,X),(val(1,9),val(2,8)),1000,X,V)

Of course it would be better if we could plot the results in a graph, instead of just printing them (aah! my eyes!).
We cannot use histogram/4 because it takes as input a list of couples L-W where L is a list of values, whereas mc_lw_sample_arg/5 returns a list of couples V-W where V is a value. In this case you can use densities/4

densities(+PriorList:list,+PostList:list,-Chart:dict,+Options:list)

This predicate draws a line chart of the density of two sets of samples, usually prior and post observations. The samples from the prior are in PriorList as couples L-W, where L is a list of values and W is its weight, while the samples from the posterior are in PostList as couples V-W where V is a value and W its weigth. The same options as histogram/3 are recognized. The lines are drawn dividing the domain in the number of bins provided as an option.
If we want to plot the densities of the random variable before and after observing 9 and 8 by taking 1000 samples and by dividing the domain in 40 bins, we can run

mc_sample_arg(val(0,Y),1000,Y,_V0), 
mc_lw_sample_arg(val(0,X),(val(1,9),val(2,8)),1000,X,_V),
densities(_V0,_V,Chart,[]).

Complete examples: gaussian_mixture.pl, gauss_mean_est.pl


Parameter Learning

In this section we will see how to learn the parameters given a background knowledge and an initial program. We take into account the Machines dataset .

Machines Example: Writing the Program Step by Step

To execute a learning algorithm the input source, a .pl file, must be divided in five parts:

  • preamble,
  • background knowledge, i.e., knowledge valid for all interpretations,
  • initial LPAD program for which you want to learn the parameters (optional),
  • language bias information,
  • example interpretations.

Here we will define a program step by step and then execute the algorithm EMBLEM which learns the parameters of a given initial LPAD program.

For more information of how to perform learning see the cplint on SWISH manual (PDF version).

Preamble

In order to perform both EMBLEM and SLIPCOVER you need to load the library slipcover with the command

:- use_module(library(slipcover)).

After that you have to initialize slipcover with the command

:- sc.

Background Knowledge

Now that we have defined the preamble, one can specify the background knowledge with a fact of the form

bg(<list of terms representing clauses>).

Alternatively, we can specify a set of clauses by including them in a section between the goals :- begin_bg. and :- end_bg.. We will use the latter approach.

:- begin_bg.
component(C):-
  replaceable(C).
component(C):-
  not_replaceable(C).
replaceable(gear).
replaceable(wheel).
replaceable(chain).
not_replaceable(engine).
not_replaceable(control_unit).
not_worn(C):-
  component(C),
  \+ worn(C).
one_worn:-
  worn(_).
none_worn:-
  \+ one_worn.
:- end_bg.

Initial Program

At this point we can define an initial program for which we want to learn the parameters. We can do it with a fact of the form

in(<list of terms representing clauses>).

Alternatively, you can specify an input program in a section between :- begin_in. and :- end_in.. We will use the latter method. Therefore in our example

:- begin_in.
class(sendback):0.5 :-
  worn(A),
  not_replaceable(A).

class(fix):0.6 :-
  worn(A),
  replaceable(A).

class(ok):0.5 :-
  not_worn(_A).
:- end_in.

Language Bias

The language bias part contains the declarations of the input and output predicates.

The typical input for EMBLEM will be a set of interpretations, i.e. sets of ground facts. Among the predicates for the input facts the user has to indicate which are the output predicates. Output predicates are declared as

output(<predicate>/<arity>).

In our example

output(class/1).

Input predicates are those whose atoms you are not interested in predicting.

You can declare closed world input predicates with

input_cw(<predicate>/<arity>).

For these predicates, the only true atoms are those in the interpretations and those derivable from them using the background knowledge; the clauses in the input/hypothesized program are not used to derive atoms for these predicates. Moreover, clauses in the background knowledge that define closed world input predicates and that call an output predicate in the body will not be used for deriving examples. In our example

input_cw(replaceable/1).
input_cw(not_replaceable/1).
input_cw(worn/1).
input_cw(not_worn/1).
input_cw(none_worn/0).

Besides closed world input predicates we can declare open world input predicates with

input(<predicate>/<arity>).

In our example we do not have open world input predicates.

Example Interpretations

The last part of the input file contains the data. You can specify data with two modalities: models and keys. In the models type, you specify an example model (or interpretation) as a list of Prolog facts initiated by begin(model(<name>)). and terminated by end(model(<name>)).. Alternatively, with the keys modality, you can directly write the facts and the first argument will be interpreted as a model identifier.
The two modalities, models and keys, can be mixed in the same source. If we use the model modality for the example/interpretation 1, we get

begin(model(1)).
class(sendback).
neg(class(fix)).
neg(class(ok)).
worn(gear).
worn(engine).
end(model(1)).

If we use this modality the system asserts a int(<name>). for each model enclosed in begin(model(<name>)). and end(model(<name>))..

Instead, if we use the key modality, our example will be (note the first argument of each fact)

class(1,sendback).
neg(1,class(fix)).
neg(1,class(ok)).
worn(1,gear).
worn(1,engine).

If we use this modality, facts for int/1 are not asserted for interpretations, but can be explicitily added by the user.

Fold division

After defining the examples/interpretations we must indicate how the examples are divided in folds with facts of the form:

fold(<fold_name>,<list of model identifiers>)

as for example

fold(train1,[1,2,3]).
fold(train2,[4,5,6,7,8,9,10]).

We can also define intensionally the folds as in

fold(all,F) :- findall(I,int(I),F).

Machines: Full Dataset (for Parameter Learning)

The complete Machines input file is

% PREAMBLE %
:- use_module(library(slipcover)).
% use the renderer 'lpad'. It not mandatory to use it, but it prints the learned clauses in a more readable way
:- use_rendering(lpad).
:- sc.
:- set_sc(depth_bound,false).
:- set_sc(neg_ex,given).
:- set_sc(megaex_bottom,15).
:- set_sc(max_iter,10).
:- set_sc(max_iter_structure,50).
:- set_sc(verbosity,1).

% BACKGROUND KNOWLEDGE %
:- begin_bg.
component(C):-
  replaceable(C).
component(C):-
  not_replaceable(C).
replaceable(gear).
replaceable(wheel).
replaceable(chain).
not_replaceable(engine).
not_replaceable(control_unit).
not_worn(C):-
  component(C),
  \+ worn(C).
one_worn:-
  worn(_).
none_worn:-
  \+ one_worn.
:- end_bg.
% INITIAL PROGRAM %
:- begin_in.
class(sendback):0.5 :-
  worn(A),
  not_replaceable(A).

class(fix):0.6 :-
  worn(A),
  replaceable(A).

class(ok):0.5 :-
  not_worn(_A).
:- end_in. 

% LANGUAGE BIAS %
% output predicates
output(class/1).
% input closed world predicates
input_cw(replaceable/1).
input_cw(not_replaceable/1).
input_cw(worn/1).
input_cw(not_worn/1).
input_cw(none_worn/0).

% EXAMPLES (interpretations) %
begin(model(1)).
class(sendback).
neg(class(fix)).
neg(class(ok)).
worn(gear).
worn(engine).
end(model(1)).

begin(model(2)).
class(ok).
neg(class(sendback)).
neg(class(fix)).
end(model(2)).

begin(model(3)).
class(fix).
neg(class(sendback)).
neg(class(ok)).
worn(gear).
end(model(3)).

begin(model(4)).
class(sendback).
neg(class(fix)).
neg(class(ok)).
worn(engine).
end(model(4)).

begin(model(5)).
class(fix).
neg(class(sendback)).
neg(class(ok)).
worn(gear).
worn(chain).
end(model(5)).

begin(model(6)).
class(fix).
neg(class(sendback)).
neg(class(ok)).
worn(wheel).
end(model(6)).

begin(model(7)).
class(sendback).
neg(class(fix)).
neg(class(ok)).
worn(wheel).
worn(control_unit).
end(model(7)).

begin(model(8)).
class(ok).
neg(class(sendback)).
neg(class(fix)).
end(model(8)).

begin(model(9)).
class(fix).
neg(class(sendback)).
neg(class(ok)).
worn(wheel).
worn(chain).
end(model(9)).

begin(model(10)).
class(sendback).
neg(class(fix)).
neg(class(ok)).
worn(engine).
worn(chain).
end(model(10)).

begin(model(11)).
class(sendback).
neg(class(fix)).
neg(class(ok)).
worn(engine).
worn(control_unit).
end(model(11)).

begin(model(12)).
class(fix).
neg(class(sendback)).
neg(class(ok)).
worn(chain).
worn(wheel).
worn(gear).
end(model(12)).

begin(model(13)).
class(sendback).
neg(class(fix)).
neg(class(ok)).
worn(chain).
worn(wheel).
worn(gear).
worn(engine).
end(model(13)).

begin(model(14)).
class(ok).
neg(class(sendback)).
neg(class(fix)).
end(model(14)).

begin(model(15)).
class(fix).
neg(class(sendback)).
neg(class(ok)).
worn(wheel).
worn(gear).
end(model(15)).

fold(all, F) :- findall(I,int(I),F).

Performing Parameter Learning

To execute the parameter learning algorithm EMBLEM, we need to ask a query of the form

induce_par(<list of folds>,P).

where <list of folds> is a list of the folds for training and P will contain the input program with updated parameters.

In our example we want to learn the parameters by using the fold which contains all the examples (all). Therefore

induce_par([all],P).

Complete example: mach.pl


For more information about how to perform learning and EMBLEM see the manual (or PDF version) and the references in the About page


Structure Learning

cplint on SWISH allows the user to perform structure learning. In this tutorial section we will learn how to do that.
We take into account the Machines dataset already illustrated in the previous section.

Note: the learning algorithms are available only if you use the Prolog editor.

Machines Example: Writing the Program Step by Step

To execute a learning algorithm the input source must be divided in five parts:

  • preamble,
  • background knowledge, i.e., knowledge valid for all interpretations,
  • Initial LPAD (optional),
  • language bias information,
  • example interpretations.

Here we will define a program step by step and then execute the algorithm SLIPCOVER which learns the structure and the parameters given a background knowledge.
We keep the background knowledge, the initial LPAD, and the interpretations unchanged. However, in order to perform structure learning, we need to modify the preamble and the language bias.

For more information of how to perform learning see the cplint on SWISH manual (PDF version).

Preamble

We load the library slipcover, initialize it and then set some parameters. We use the same preamble of the previous section, plus we add the settings set_sc(max_iter_structure,50). and :- set_sc(megaex_bottom,15).. So the preamble becomes

:-use_module(library(slipcover)).
:-sc.
:- set_sc(depth_bound,false).
:- set_sc(neg_ex,given).
:- set_sc(max_iter,10).
:- set_sc(max_iter_structure,50).
:- set_sc(megaex_bottom,15).
:- set_sc(verbosity,1).

Language Bias

In the previous example we already defined the set of output and input predicates for the EMBLEM algorithm. However now we want to learn new clauses (the structure) for the program, in order to do that we need to add mode declarations in the style of Progol.

To specify the atoms that can appear in the head of clauses you must use facts of the form

modeh(<recall>,<predicate>(<arg1>,...)).

To specify the atoms that can appear in the body of clauses you must use facts of the form

modeb(<recall>,<predicate>(<arg1>,...)).

where <recall> can be an integer or *. <recall> indicates how many atoms for the predicate specification are retained in the bottom clause during a saturation step. * stands for all those that are found.

We refer to the cplint on SWISH manual for further details.

In our example we use the following mode declarations

modeh(*,class(fix)).
modeh(*,class(ok)).
modeh(*,class(sendback)).

modeb(*,not_replaceable(-comp)).
modeb(*,replaceable(-comp)).
modeb(*,worn(-comp)).
modeb(*,not_worn(-comp)).
modeb(*,none_worn).

SLIPCOVER also requires facts for the determination/2 predicate Aleph-style that indicate which predicates can appear in the body of clauses. For our program we have

determination(class/1,replaceable/1).
determination(class/1,not_replaceable/1).
determination(class/1,worn/1).
determination(class/1,not_worn/1).
determination(class/1,none_worn/0).

For example the first determination/2 fact states that the predicate replaceable/1 can appear in the body of hypothesized clauses having class/1 in the head.

Machines Dataset: Full Program (for Structure Learning)

Below there is the complete program

% PREAMBLE %
:- use_module(library(slipcover)).
% use the renderer 'lpad'. It not mandatory to use it, but it prints the learned clauses in a more readable way
:- use_rendering(lpad).
:- use_rendering(c3).
:- sc.
:- set_sc(depth_bound,false).
:- set_sc(neg_ex,given).
:- set_sc(megaex_bottom,15).
:- set_sc(max_iter,10).
:- set_sc(max_iter_structure,50).
:- set_sc(verbosity,1).

% BACKGROUND KNOWLEDGE %
:- begin_bg.
component(C):-
  replaceable(C).
component(C):-
  not_replaceable(C).
replaceable(gear).
replaceable(wheel).
replaceable(chain).
not_replaceable(engine).
not_replaceable(control_unit).
not_worn(C):-
  component(C),
  \+ worn(C).
one_worn:-
  worn(_).
none_worn:-
  \+ one_worn.
:- end_bg.
% INITIAL PROGRAM %
:- begin_in.
class(sendback):0.5 :-
  worn(A),
  not_replaceable(A).

class(fix):0.6 :-
  worn(A),
  replaceable(A).

class(ok):0.5 :-
  not_worn(_A).
:- end_in. 

% LANGUAGE BIAS %
% output predicates
output(class/1).
% input closed world predicates
input_cw(replaceable/1).
input_cw(not_replaceable/1).
input_cw(worn/1).
input_cw(not_worn/1).
input_cw(none_worn/0).

modeh(*,class(fix)).
modeh(*,class(ok)).
modeh(*,class(sendback)).

modeb(*,not_replaceable(-comp)).
modeb(*,replaceable(-comp)).
modeb(*,worn(-comp)).
modeb(*,not_worn(-comp)).
modeb(*,none_worn).

determination(class/1,replaceable/1).
determination(class/1,not_replaceable/1).
determination(class/1,worn/1).
determination(class/1,not_worn/1).
determination(class/1,none_worn/0).

% EXAMPLES (interpretations) %
begin(model(1)).
class(sendback).
neg(class(fix)).
neg(class(ok)).
worn(gear).
worn(engine).
end(model(1)).

begin(model(2)).
class(ok).
neg(class(sendback)).
neg(class(fix)).
end(model(2)).

begin(model(3)).
class(fix).
neg(class(sendback)).
neg(class(ok)).
worn(gear).
end(model(3)).

begin(model(4)).
class(sendback).
neg(class(fix)).
neg(class(ok)).
worn(engine).
end(model(4)).

begin(model(5)).
class(fix).
neg(class(sendback)).
neg(class(ok)).
worn(gear).
worn(chain).
end(model(5)).

begin(model(6)).
class(fix).
neg(class(sendback)).
neg(class(ok)).
worn(wheel).
end(model(6)).

begin(model(7)).
class(sendback).
neg(class(fix)).
neg(class(ok)).
worn(wheel).
worn(control_unit).
end(model(7)).

begin(model(8)).
class(ok).
neg(class(sendback)).
neg(class(fix)).
end(model(8)).

begin(model(9)).
class(fix).
neg(class(sendback)).
neg(class(ok)).
worn(wheel).
worn(chain).
end(model(9)).

begin(model(10)).
class(sendback).
neg(class(fix)).
neg(class(ok)).
worn(engine).
worn(chain).
end(model(10)).

begin(model(11)).
class(sendback).
neg(class(fix)).
neg(class(ok)).
worn(engine).
worn(control_unit).
end(model(11)).

begin(model(12)).
class(fix).
neg(class(sendback)).
neg(class(ok)).
worn(chain).
worn(wheel).
worn(gear).
end(model(12)).

begin(model(13)).
class(sendback).
neg(class(fix)).
neg(class(ok)).
worn(chain).
worn(wheel).
worn(gear).
worn(engine).
end(model(13)).

begin(model(14)).
class(ok).
neg(class(sendback)).
neg(class(fix)).
end(model(14)).

begin(model(15)).
class(fix).
neg(class(sendback)).
neg(class(ok)).
worn(wheel).
worn(gear).
end(model(15)).

fold(all,F):- findall(I,int(I),F).
:- fold(all,F),
   sample(4,F,FTr,FTe),
   assert(fold(train,FTr)),
   assert(fold(test,FTe)).

Performing Structure Learning

To execute the structure learning algorithm SLIPCOVER (which learns also the parameters of the learned program), we need to execute a query with the form

induce(<list of folds>,P).

where <list of folds> is a list of the folds for training and P will contain the learned program.

In our example we want to learn the structure (and the parameters) by using the fold which contains all the examples (all). Therefore

induce([all],P).

Complete example: mach.pl


For more information about how to perform learning and SLIPCOVER see the cplint on SWISH manual (or PDF version) and .


How to Test a Program

In this tutorial section we will see how to execute a test on a program. We take into account the Machines dataset already seen in the previous sections ( Parameter Learning and Structure Learning).

A program can also be tested on a test set with a query of the form

?- test(<program>,<list_of_folds>,LL,AUCROC,ROC,AUCPR,PR).

where <program> is a list of terms representing clauses and <list_of_folds> is a list of folds. This returns the log likelihood of the test examples in LL, the Area Under the ROC curve in AUCROC, a dictionary containing the list of points (in the form of Prolog pairs x-y) of the ROC curve in ROC, the Area Under the PR curve in AUCPR, a dictionary containing the list of points of the PR curve in PR.

Let us suppose now that we have two disjunt folds of examples named train and test. We will now see how to test a (learned) program.

How to Test the Initial Program

we can test the input program on the fold test with a query of the form

?- in(P), test(P,[test],LL,AUCROC,ROC,AUCPR,PR).

How to Test a Program after Parameter Learning

Suppose we want to perform parameter learning on the initial program by using the train fold and then test the resulting program by using the test fold. Then we have just to run the query

?- induce_par([train],P), test(P,[test],LL,AUCROC,ROC,AUCPR,PR).

How to Test the Learned Program

Suppose we want to learn new clauses (i.e. we perform structure learning) by using the train fold and then test the resulting program by using the test fold. Then we have just to run the query

?- induce([train],P), test(P,[test],LL,AUCROC,ROC,AUCPR,PR).

Adding Renderers

It is possible to see the curves AUCROC, ROC and PR as graphs by including the renderer c3 before :- sc.. Morover we include the renderer lpad to have the output program pretty printed. Therefore we add the following commands in the preamble before :- sc..

:- use_rendering(c3).
:- use_rendering(lpad).

Dynamic Folds

We can intensionally create the fold containing all the example with

fold(all,F):- findall(I,int(I),F).

We can dynamically create the folds train and test with the following command

:- fold(all,F),
   sample(4,F,FTr,FTe),
   assert(fold(train,FTr)),
   assert(fold(test,FTe)).

This last command should however be inserted after the input interpretations. As can be seen, it uses sample(N,List,Sampled,Rest) exported from the library slipcover that samples N elements from List and returns the sampled elements in Sampled and the rest in Rest. If List has N elements or less, Sampled is equal to List and Rest is empty.

Performing Parameter Learning and Testing

If we want to learn the parameters of the initial program and then test the resulting program, we can use the following query

induce_par([train],P), test(P,[test],LL,AUCROC,ROC,AUCPR,PR).

Performing Structure Learning and Testing

If we want to learn a program and then test it, we can use the following query

induce([train],P), test(P,[test],LL,AUCROC,ROC,AUCPR,PR).

Complete example: mach.pl


For more information about how to perform learning see the cplint on SWISH manual (or PDF version).


References

  1. E. Bellodi and F. Riguzzi. Expectation Maximization over binary decision diagrams for probabilistic logic programs. Intelligent Data Analysis, 17(2):343-363, IOS Press, 2013.
  2. E. Bellodi and F. Riguzzi. Experimentation of an expectation maximization algorithm for probabilistic logic programs. Intelligenza Artificiale, 8(1):3-18, IOS Press, 2012.
  3. Elena Bellodi and Fabrizio Riguzzi. Structure learning of probabilistic logic programs by searching the clause space. Theory and Practice of Logic Programming, 15(2):169-212, Cambridge University Press, 2015.
  4. F. Riguzzi and T. Swift. The PITA system: Tabling and answer subsumption for reasoning under uncertainty. Theory and Practice of Logic Programming, 27th International Conference on Logic Programming (ICLP'11) Special Issue, 11(4-5), pages 433-449, 2011.
  5. Riguzzi F. Extended semantics and inference for the Independent Choice Logic. Log. J. IGPL 2009, 17(6), pages 589-629, 2009.
  6. A. Itai and M. Rodeh. Symmetry Breaking in Distributed Networks.Information and Computation, 88(1). 1990.
  7. Gorlin, Andrey, C. R. Ramakrishnan, and Scott A. Smolka. Model checking with probabilistic tabled logic programming. Theory and Practice of Logic Programming 12.4-5 (2012).
  8. http://www.prismmodelchecker.org/casestudies/synchronous_leader.php
  9. Nampally, Arun, and C. R. Ramakrishnan. Adaptive MCMC-Based Inference in Probabilistic Logic Programs. arXiv preprint arXiv:1403.6036 (2014).
  10. http://forestdb.org/models/arithmetic.html
  11. The ACE Data Mining System User’s Manual https://dtai.cs.kuleuven.be/ACE/doc/ACEuser-1.2.16.pdf
  12. J. Vennekens, S. Verbaeten, and M. Bruynooghe. Logic programs with annotated disjunctions. In International Conference on Logic Programming, volume 3131 of LNCS, pages 195-209. Springer, 2004.
  13. D. Nitti, T. De Laet,L. De Raedt. Probabilistic logic programming for hybrid relational domains. Mach. Learn. 103(3), 407–449 (2016)
Person

Fabrizio Riguzzi

Associate Professor - University of Ferrara

Main developer of cplint on SWISH

Person

Giuseppe Cota

PhD student - University of Ferrara

Contacts

For troubleshooting or any question about cplint on SWISH go to the forum:
Machine Learning @ Unife Forum
or send an email to:
Machine Learning @ Unife