Features¶
Sanity Check¶
pyungo will raise an error in the following situations:
Circular dependencies: The
Graph
need to be finite and cannot form a loop.All inputs needed to run a
Graph
are not provided.Input collision: An input name provided as data in the
Graph
has a conflict with at least of the output name.Duplicated outputs: Several nodes are giving output(s) that have the same name.
Add a Node
explicitely¶
While the simple example register nodes at import time with a decorator, it is possible to explicitely add a node a runtime. Here is the same example:
from formulas import f_my_function_1, f_my_function_2, f_my_function_3
graph = Graph()
graph.add_node(f_my_function_1, inputs=['d', 'a'], outputs=['e'])
graph.add_node(f_my_function_2, inputs=['c'], outputs=['d'])
graph.add_node(f_my_function_3, inputs=['a', 'b'], outputs=['c'])
res = graph.calculate(data={'a': 2, 'b': 3})
print(res)
Parallelism¶
When resolving the dag, pyungo figure out nodes that can be run in parallel.
When creating a graph, we can specify the option parallel=True
for running calculations
concurently when possible, using multiprocess
module. This package is not automatically installed with pyungo, and will need to be
installed manually if parallelism is used. We can specify the pool size when instantiating
the Node
. This will set the maximum number of processes that will be
launched. If 3 nodes can run in parallel and just 2 processes are used, pyungo will run
calculation on the first 2 nodes first and will run the last one as soon as a process
will be free.
Instantiating a Graph
with a pool of 5 processes for running
calculations in parralel:
graph = Graph(parallel=True, pool_size=5)
Note
Running functions in parallel has a cost. Python will spend time creating / deleting new processes. Parallelism is recommended when at least 2 concurrent nodes have heavy calculations which takes a significant amount of time.
Args, Kwargs, Constants¶
If a function registred in a Node
contains args
or kwargs
,
it is possible to define which data will be passed to them:
graph.add_node(
my_function,
inputs=['a', 'b'],
args=['c', 'd'],
kwargs=['e', 'f'],
outputs=['g']
)
Sometimes, we want one of the input to be defined as a constant:
@graph.register(inputs=['a', {'b': 2}], outputs=['c'])
def f_my_function(a, b):
return a + b
Then, only a
and b
will be needed when calling calculate
.
Input
and Output
objects¶
Inputs and outputs can be defined directly with their names, or with Input
/ Output
objects. This come in handy when there is extra behavior to be
attached to an input / output (e.g. Contracts
).
from pyungo.io import Input, Output
graph.add_node(
my_function,
inputs=[Input(name='a'), Input(name='b')],
outputs=[Output(name='g')]
)
Often, inputs are used multiple times across the nodes. In those cases, it is better to define
inputs only once (with their special features if any). It is possible to pass a list of
Input
/ Output
objects a Graph
:
from pyungo.io import Input, Output
inputs = [Input(name='a'), Input(name='b')]
outputs = [Output(name='c'), Output(name='d')]
graph = Graph(inputs, outputs)
graph.add_node(
my_function,
inputs=['a', 'b'],
outputs=['c']
)
graph.add_node(
my_other_function,
inputs=['c', 'b'],
outputs=['d']
)
Note
If inputs / outputs are explicitely provided to a graph, inputs / outputs defined in the nodes can only be strings.
Schema¶
Inputs validation is an important step to run a model with confidence. pyungo uses the JSON Schema specification through a Python library: jsonschema. The following is now possible:
schema = {
"type": "object",
"properties": {
"a": {"type": "number"},
"b": {"type": "number"}
}
}
graph = Graph(schema=schema)
@graph.register(
inputs=['a', 'b'],
outputs=['c']
)
def f_my_function(a, b):
return a + b
graph.calculate(data={'a': 1, 'b': '2'})
The calculation is going to fail as b
is of type string. It is better to catch this problem
early on before running the model. As we provided a schema saying we explicitely want b
to
be of type number
, the data validation against the schema will fail with the following error:
'2' is not of type 'number'
.
Name mapping¶
Often, the name of the data we get are different from the ones used in the functions / models / formulas. pyungo makes things easy providing a mapping feature. Here is an example:
graph = Graph()
@graph.register(
inputs=[Input('a', map='q'), Input('b', map='w')],
outputs=[Output('c', map='e')]
)
def f_my_function(a, b):
return a + b
res = graph.calculate(data={'q': 2, 'w': 3})
assert res == 5
assert graph.data['e'] == 5
Contracts¶
Sometimes we want to make sure a value meet specific criteria before moving forward. pyungo uses pycontracts for attaching contracts to inputs or outputs.
from pyungo.io import Input, Output
graph.add_node(
my_function,
inputs=[Input(name='a', contract='>0'), Input(name='b', contract='float')],
outputs=[Output(name='g', contract='float')]
)