Pymwp is a tool for automatically performing static analysis on programs written in C

Last update: Dec 02, 2022

Related tags

Overview

pymwp: MWP analysis in Python

pymwp is a tool for automatically performing static analysis on programs written in C, inspired by "A Flow Calculus of mwp-Bounds for Complexity Analysis". It analyzes resource usage and determines if a program's variables growth rates are no more than polynomially related to their inputs sizes. You can run our many examples on-line in our demo before installing it, or consult our list of supported C language features.

Documentation and Demo

Refer to statycc.github.io/pymwp for a documentation of our modules, an on-line demo as well as a presentation of our examples.

Installation

Install latest release from PyPI

pip install pymwp

How to Use

To analyze a C file, run:

pymwp path/to_some_file.c

For all available options and help, run:

pymwp

You can also use pymwp in a Python script:

from pymwp import Polynomial
from pymwp.matrix import identity_matrix, show

matrix = identity_matrix(3)
matrix[0][1] = Polynomial('m')
matrix[1][0] = Polynomial('w')
matrix[2][1] = Polynomial('p')

show(matrix)

See documentation for available modules and methods.

Running from source

If you want to use the latest stable version (possibly ahead of latest release), use the version from source following these steps.

Clone the repository

git clone https://github.com/statycc/pymwp.git

Set up Python environment

install required packages
```
pip install -q -r requirements.txt
```
Run the analysis

From project root run:
```
python -m pymwp path/to_some_file.c
```
for example:
```
python -m pymwp c_files/basics/if.c
```
for all available options and help, run:
```
python -m pymwp
```

Comments

TU `test_analyze_simple_infinite` does not pass on branch `simpl`

Test on infinite_2.c with fc2068cdedf9560879294b71f009ee780cf3ca86

int main(){
    int X0=1, X1=1;
    while(X1<10){
        X0 = X1*X0;
        X1 = X1+X0;
    }
}

Fail with :

>           assert str(relation.matrix[0][0].list[2]) == 'i.delta(0,0).delta(0,1)'
E           AssertionError: assert 'i.delta(1,0)' == 'i.delta(0,0).delta(0,1)'
E             - i.delta(0,0).delta(0,1)
E             + i.delta(1,0)

Maybe we should check by hand that output values ?

opened by ThomasRuby 23

Simplify sums

A small comment, but for instance

int main(){
    int y1, y2;
    y2 = y1 + y1;
}

gives

y2    |     +o  +o
y1    |     +o+w.delta(0,0)+p.delta(1,0)+w.delta(2,0)  +m

we could get rid of the +o in the lower-left corner.

opened by aubertc 18

Variable mismatch / ignored

The following program:

int main(){
    x2 = x3 + x1;
    x4 = x2;
}

gives

--- Affiche <relation_list.RelationList object at 0x7f54084e7ef0> ---
1:
x2    |     +o  +o  +o
x3    |     +o+w.delta(0,0)+m.delta(1,0)+p.delta(2,0)  +m  +o
x1    |     +o+w.delta(0,0)+p.delta(1,0)+m.delta(2,0)  +o  +m
--- FIN ---
[[0], [1], [2]]

As we can see, x4 is simply not accounted for.

opened by aubertc 15

Re-organize example folder
I've started to re-organize the example folder a bit with https://github.com/seiller/pymwp/commit/e261323f5f8d9d1ff50789881d2121810a5b8677

A couple of observations:

the "readme" at https://github.com/seiller/pymwp/blob/tests-examples/c_files/readme.md is very basics, we will have to expand it more "tutorial" style.

It would be profitable to have "the usual examples" of non-polynomial programs (typically, exponential computation),

I was a bit dumb and created this branch before https://github.com/seiller/pymwp/pull/20 was created. As a result, some tests are not correct, but I suspect they have been fixed since.

I have not changed the https://github.com/seiller/pymwp/tree/tests-examples/tests files, so I probably broke a couple of things.

We need to discuss the fact that most / some of those programs cannot be compiled.

As of now,

basics/declaration.c should give an identity matrix of dimension 1.

basics/assign_value.c should give a matrix containing 0.

basics/assign_variable.c should returns [m m // 0 0] (x / y).

basics/if_else.c should returns [m m // m m](x / y) I believe.

basics/if.c should returns [m m // 0 m](x / y) I believe.

basics/while_1.c should returns something (not 100% sure what)

not_infinite/exponent.c results in an empty matrix?

I have not checked the while_if.c, but I believe all the other examples are ok.
opened by aubertc 8

uncovered case for simple program

The current example we are using in the notes,

int main(){
    int X1;
    int X2;
    X1 + X2;
}

produces:

In compute_rel
uncovered case ! Create empty RelationList (function call) ?
Decl: X1, [], [], []
  TypeDecl: X1, []
    IdentifierType: ['int']
In compute_rel
uncovered case ! Create empty RelationList (function call) ?
Decl: X2, [], [], []
  TypeDecl: X2, []
    IdentifierType: ['int']
In compute_rel
uncovered case ! Create empty RelationList (function call) ?
BinaryOp: +
  ID: X1
  ID: X2
--- Affiche <relation_list.RelationList object at 0x7f3e314adf98> ---
1:

--- FIN ---
[[]]

It seems like valid C code to me and should be analyzed by our program. Am I missing something?

enhancement wontfix

opened by aubertc 7

delta_iter
delta_iter branch has custom version of Python standard lib itertools.product that is used to generate choices for eval.

TODO:

[x] debug delta_iter to make sure it works correctly

[x] add unit tests to show the above

[ ] there may be cases where standard lib itertools.product is more optimal, in which case should use it over delta_iter

[x] update relation.py to use delta_iter implementation

enhancement
opened by nkrusch 6
Ignoring declarations (updated 8/27)
Analyzing simple program like this yields empty result because we skip declaration statements and currently we pick up variables from when they are being used. This issue is to discuss the analysis output for such simple program.

https://github.com/seiller/pymwp/blob/252b5e3f9e159ab17990d0e16f65dd14187a1adc/c_files/basics/assign_value.c#L6-L8

We could display:

y y +o

but it requires adding some logic to handle declaration nodes in the AST, so that the presence of the variable is discovered and then to run the analysis to generate the matrix.

-or- we display empty output which is not very user-friendly.

-or- we still skip declaration nodes in the AST, and then if the analysis discovers no variables, we can say some alternative output message to indicate "there was nothing to analyze" (I am putting this in quotes because it needs better wording to be clear in meaning).

Comments?
opened by nkrusch 6
changes to eval and choice representation
Please review this changeset and run it on examples to evaluate its impact.

The evaluation procedure is explained with examples in choice.py so I will not repeat it here, but we can discuss it in more.

This approach also changes the representation of choices. Before:

[2,2,2,2,0,0,2], [2,2,2,2,0,1,2], [2,2,2,2,0,2,2], [2,2,2,2,1,0,2], [2,2,2,2,1,1,2], [2,2,2,2,1,2,2], [2,2,2,2,2,0,2], [2,2,2,2,2,1,2], [2,2,2,2,2,2,2]

Equivalent, but now more compact:

[[[2], [2], [2], [2], [0, 1, 2], [0, 1, 2], [2]]]

Which is read as: choose a vector (there could be multiple, it is a list of lists), then choose one of the remaining choices at each index; this will give a valid derivation.

This change has some practical implications:

faster eval+compact representation of choices enables analyzing programs with larger index (more assignments) and also reasonably displaying or writing those results to file

it enables evaluating programs like long.c, whose index is 13, which is also introduced in this changeset

it is now possible to evaluate the original dense.c (in tests/test_examples/complex_dense.c) -- it still takes ~1 min to analyze, but the evaluation step is instantaneous

the analysis phase is now more significant regarding performance

removes progress bar external dependency
opened by nkrusch 5
while_correction behavior
We should discuss the behavior of this method:

https://github.com/seiller/pymwp/blob/4250ab22bab7d7057c4949712ca35b9e76b50fd5/pymwp/relation.py#L130-L136

This is the only place where i may be introduced. It is run on each while loop node during the analysis (after computing fixpoint).

Some open questions:

What I don't get is that it seems that we are applying that correction to all the coef., and not only to the ones on the diagonal. What am I missing?

I think that if the coefficient is not m (meaning it is either 0, w, p or i), then we need to change it to i: as of now, I don't think that's what we do.

this expression is totally different if you change parentheses:

if mon.scalar == "p" or (mon.scalar == "w" and i == j)

to

if (mon.scalar == "p" or mon.scalar == "w") and i == j

the latter checks for diagonal scalars only.
opened by nkrusch 5
Add tools for profiling

This PR adds a utility for running cProfile on multiple examples. I have added docs to explain how to use it.

When I run on GH server (linux) the dense function hangs, but it should timeout after 30 sec. It terminates correctly for me. Please try it and let me know if it gets stuck on all Linux machines or just the build machine, I'm not sure.

opened by nkrusch 4
Profiling simplification
consider time as relative since it depends on # of cores

not able to profile dense when it does not terminate (until I figure out how to add some timeout limit)

for_body.c cannot be parsed and throws

"before" == master branch, "after" === simpl branch

Some comparisons ordered by internal time:

infinite_8.c

before: 13224063 function calls (10707799 primitive calls) in 5.163 seconds

| ncalls | tottime | percall | cumtime | percall | filename:lineno(function) | | --- | --- | --- | --- | --- | --- | 626833 | 0.623 | 0.000 | 0.927 | 0.000 | polynomial.py:296(compare) 158893/1577 | 0.616 | 0.000 | 1.135 | 0.001 | polynomial.py:352(sort_monomials) 6561 | 0.579 | 0.000 | 1.303 | 0.000 | polynomial.py:237(eval) 1907969/448430 | 0.451 | 0.000 | 0.636 | 0.000 | python3.7/json/encoder.py:277(_iterencode_list) 1607264 | 0.369 | 0.000 | 0.369 | 0.000 | semiring.py:69(sum_mwp) 1597239 | 0.360 | 0.000 | 0.360 | 0.000 | monomial.py:116(eval)

after: 2561855 function calls (2483457 primitive calls) in 1.328 seconds

| ncalls | tottime | percall | cumtime | percall | filename:lineno(function) | | --- | --- | --- | --- | --- | --- | 284672 | 0.238 | 0.000 | 0.534 | 0.000 | monomial.py:95(inclusion) 17709 | 0.206 | 0.000 | 0.760 | 0.000 | polynomial.py:76(inclusion) 565173 | 0.193 | 0.000 | 0.193 | 0.000 | monomial.py:79(contains) 334183 | 0.115 | 0.000 | 0.115 | 0.000 | semiring.py:69(sum_mwp) 56065 | 0.057 | 0.000 | 0.083 | 0.000 | polynomial.py:357(compare) 23335/1577 | 0.054 | 0.000 | 0.098 | 0.000 | polynomial.py:413(sort_monomials)

notinfinite_8.c

before: 15474866 function calls (13029729 primitive calls) in 6.015 seconds

| ncalls | tottime | percall | cumtime | percall | filename:lineno(function) | | --- | --- | --- | --- | --- | --- | 8451 | 1.409 | 0.000 | 3.085 | 0.000 | polynomial.py:237(eval) 3659175 | 0.910 | 0.000 | 0.910 | 0.000 | monomial.py:116(eval) 3662513 | 0.767 | 0.000 | 0.767 | 0.000 | semiring.py:69(sum_mwp) 124347/1987 | 0.503 | 0.000 | 0.918 | 0.000 | polynomial.py:352(sort_monomials) 1880855/440563 | 0.448 | 0.000 | 0.629 | 0.000 | python3.7/json/encoder.py:277(_iterencode_list) 357668 | 0.357 | 0.000 | 0.534 | 0.000 | polynomial.py:296(compare)

after: 2463339 function calls (2391520 primitive calls) in 1.308 seconds

| ncalls | tottime | percall | cumtime | percall | filename:lineno(function) | | --- | --- | --- | --- | --- | --- | 290047 | 0.240 | 0.000 | 0.546 | 0.000 | monomial.py:95(inclusion) 14298 | 0.210 | 0.000 | 0.777 | 0.000 | polynomial.py:76(inclusion) 576027 | 0.201 | 0.000 | 0.201 | 0.000 | monomial.py:79(contains) 363732 | 0.121 | 0.000 | 0.121 | 0.000 | semiring.py:69(sum_mwp) 62967 | 0.064 | 0.000 | 0.094| 0.000 | polynomial.py:357(compare) 21887/1987 | 0.050 | 0.000 | 0.089 | 0.000 | polynomial.py:413(sort_monomials)

To profile individual C file, the command is:

python -m cProfile -s tottime -m pymwp c_files/infinite_2.c

replace -m pymwp to profile using old Analysis.py (though it pauses for user input so might not work?)

replace c_files/infinite_2.c with whatever input you want to profile
opened by nkrusch 4
single variable assignment

Analyzing single variable assignment

https://github.com/statycc/pymwp/blob/662f1b9fadfe815e42499d1357d1698467ea29f9/c_files/basics/assign_variable.c#L6-L8

gives the matrix below and choices [0], [1], [2].

| | x | y | |-- | -- | -- | | x | o | o | | y | m | m |

I don't think the choice applies here? This is by rule E1.

opened by nkrusch 1
Is it the case that values *must* relates to an input?
The issue on declarations echoed with a more fundamental question that I have, and that is "must values inside the program relates to an input?".

Let me explain: as of now, we are interpreting x = <any value>; as "reset the coefficient for x to 0", because there is now no connection between the inputs and x. However, I'm not sure that this is permitted, for three reasons:

There is no rule for that case in the original paper, and as far as I can tell it's not even discussed,

In On Decidable Growth-Rate Properties of Imperative Programs, Ben-Amram "just" works on adding a "reset" instruction to a similar language and shows that it is not easy to do so,

When discussing constants, they write:

To deal with constants, just replace the program’s constants by variables and regard the replaced constants as input to these variables.

as if (my reading), values had to be linked to the program's input.

All of that being said, they have an example of re-initialization (Example 3.1)

C′′ ≡ X1 := 1; loop X 2 { X1 := X1 +X1 }

as if it was no big deal.

I may simply be confused. Thoughts?
opened by aubertc 3
Mixing two operations in one statement

example3_4.c gives an error, probably because there I am mixing two operations in one statement.

Originally posted here

The analysis makes some assumptions about left and right operand structure:

https://github.com/seiller/pymwp/blob/51bc64cfd9b3638de20a2e2671535a4975c2549f/pymwp/analysis.py#L165-L173

which prevents using the original expression X3 = X2 * X2 + X5; Adding some recursive step in binary op might allow analyzing this expression in original form (depending on what the AST looks like for this statement, I have not looked into it)?

If this is not worth fixing at the moment then we can add a note in the features table that binary op is for two operands and support for multiple is in progress.
enhancement

opened by nkrusch 1

for loop analysis is broken

During analysis of a for loop, the implementation call a non-existent method rels.conditionRel.

This was, at some point, a method of RelationList class, but going back as far as old_dfg_mwp_python I have not found the implementation.

Trying to resolve this:

what does this method do?
(independent of previous) is it necessary to call this method?

Current implementation with added comments:


        # create empty relation list
        rels = RelationList([])

        # recursively analyze children of current AST node, update index value
        for child in node.stmt.block_items:
            index, rel_list = compute_rel(index, child)
            
            # add to relation list (will resize matrix as needed)
            rels = rels.composition(rel_list)

        # run fixpoint    
        rels = rels.fixpoint()
        
        # ?????
        rels = rels.conditionRel(list_var(node.cond))

        # return updated index, relation list
        return index, rels

(actual source without comments)

list_var extends c_ast.NodeVisitor and returns node names, implementation: [1], [2]

bug

opened by nkrusch 2

Releases(0.2.1)

0.2.1(Oct 4, 2022)
What's Changed

Changes to eval and choice representation by @nkrusch in https://github.com/statycc/pymwp/pull/78

Add support for custom C headers by @nkrusch in https://github.com/statycc/pymwp/pull/79

Option to compute matrix on failure by @nkrusch in https://github.com/statycc/pymwp/pull/86

fix: example names to match paper by @nkrusch in https://github.com/statycc/pymwp/pull/87

Full Changelog: https://github.com/statycc/pymwp/compare/0.1.6...0.2.1
Source code(tar.gz)
Source code(zip)
profile-latest(Oct 4, 2022)
Commits

bdde0d2: edit readme (Neea)

0d76475: actually fix build (Neea)

Source code(tar.gz)
Source code(zip)
assign_expression.txt(11.03 KB)
assign_variable.txt(11.01 KB)
dense.txt(11.05 KB)
dense_loop.txt(11.04 KB)
example14.txt(11.02 KB)
example16.txt(11.02 KB)
example3_1_a.txt(11.03 KB)
example3_1_b.txt(11.04 KB)
example3_1_c.txt(11.01 KB)
example3_1_d.txt(11.08 KB)
example3_2.txt(11.08 KB)
example3_4.txt(11.10 KB)
example5_1.txt(10.98 KB)
example7_10.txt(11.05 KB)
example7_11.txt(11.04 KB)
example8.txt(11.05 KB)
explosion.txt(11.05 KB)
exponent_1.txt(11.08 KB)
exponent_2.txt(11.10 KB)
for_loop.txt(11.02 KB)
gcd.txt(11.11 KB)
if.txt(11.01 KB)
if_else.txt(11.00 KB)
infinite_2.txt(11.11 KB)
infinite_3.txt(11.09 KB)
infinite_4.txt(11.10 KB)
infinite_5.txt(11.07 KB)
infinite_6.txt(11.10 KB)
infinite_7.txt(11.09 KB)
infinite_8.txt(11.08 KB)
inline_variable.txt(11.03 KB)
long.txt(11.03 KB)
notinfinite_2.txt(11.05 KB)
notinfinite_3.txt(11.04 KB)
notinfinite_4.txt(11.06 KB)
notinfinite_5.txt(11.06 KB)
notinfinite_6.txt(11.07 KB)
notinfinite_7.txt(11.07 KB)
notinfinite_8.txt(11.07 KB)
simplified_dense.txt(11.06 KB)
while_1.txt(11.02 KB)
while_2.txt(11.00 KB)
while_if.txt(11.04 KB)
FSCD22(Jun 14, 2022)

pymwp at the time of FSCD 2022 paper.
Source code(tar.gz)
Source code(zip)
0.1.6(Feb 7, 2022)
What's Changed

Added gcd example by @nkrusch in https://github.com/statycc/pymwp/pull/73

Fix: change binary add vector to p p w by @nkrusch in https://github.com/statycc/pymwp/pull/75

Dependency upgrades

Python 3.10

pycparser 2.21

Full Changelog: https://github.com/statycc/pymwp/compare/0.1.5...0.1.6
Source code(tar.gz)
Source code(zip)
0.1.5(Oct 7, 2021)
What's Changed

infinity bug fix by @nkrusch in https://github.com/statycc/pymwp/pull/67

save result of all analyzed functions by @nkrusch in https://github.com/statycc/pymwp/pull/68

add command line flag to skip evaluation by @nkrusch in https://github.com/statycc/pymwp/pull/69

Full Changelog: https://github.com/statycc/pymwp/compare/0.1.4...0.1.5
Source code(tar.gz)
Source code(zip)
0.1.4(Oct 6, 2021)
What's Changed

refactor and simplify analysis methods by @nkrusch in https://github.com/statycc/pymwp/pull/61

example 3: Change scalar order to better match paper by @nkrusch in https://github.com/statycc/pymwp/pull/62

update mkdocs config by @nkrusch in https://github.com/statycc/pymwp/pull/64

rework example 5 / function call by @nkrusch in https://github.com/statycc/pymwp/pull/63

refactor monomial.list by @nkrusch in https://github.com/statycc/pymwp/pull/65

Full Changelog: https://github.com/statycc/pymwp/compare/0.1.3...0.1.4
Source code(tar.gz)
Source code(zip)
0.1.3(Sep 16, 2021)

Analysis run method returns infinity flag as output
Source code(tar.gz)
Source code(zip)
0.1.2(Sep 13, 2021)

Test release using automated workflow
Source code(tar.gz)
Source code(zip)
0.1.1(Aug 27, 2021)

This version with delta graph
Source code(tar.gz)
Source code(zip)
0.1.0(Sep 13, 2021)

Source code(tar.gz)
Source code(zip)
0.0.1(Jun 18, 2021)

first pypi
Source code(tar.gz)
Source code(zip)

Owner

Static Analyses of Program Flows: Types and Certificate for Complexity

GitHub Repository https://statycc.github.io/pymwp

Alarmer is a tool focus on error reporting for your application.

alarmer Alarmer is a tool focus on error reporting for your application. Installation pip install alarmer Usage It's simple to integrate alarmer in yo

20 Jul 03, 2022

Code audit tool for python.

966 Dec 29, 2022

A system for Python that generates static type annotations by collecting runtime types

MonkeyType MonkeyType collects runtime types of function arguments and return values, and can automatically generate stub files or even add draft type

4.1k Jan 02, 2023

Collects all accepted (partial and full scored) codes submitted within the given timeframe and saves them locally for plagiarism check.

Collects all accepted (partial and full scored) codes submitted within the given timeframe of any contest.

2 Dec 28, 2021

Find dead Python code

Vulture - Find dead code Vulture finds unused code in Python programs. This is useful for cleaning up and finding errors in large code bases. If you r

2.4k Jan 03, 2023

C/C++ Dependency Analyzer: a rewrite of John Lakos' dep_utils (adep/cdep/ldep) from

Version bêta d'un système pour suivre les prix des livres chez Books to Scrape, un revendeur de livres en ligne. En pratique, dans cette version bêta, le programme n'effectuera pas une véritable surv

125 Sep 21, 2022

This is a Python program to get the source lines of code (SLOC) count for a given GitHub repository.

2 Mar 10, 2022

Collection of library stubs for Python, with static types

typeshed About Typeshed contains external type annotations for the Python standard library and Python builtins, as well as third party packages as con

3.3k Jan 02, 2023

The strictest and most opinionated python linter ever!

wemake-python-styleguide Welcome to the strictest and most opinionated python linter ever. wemake-python-styleguide is actually a flake8 plugin with s

2.1k Jan 05, 2023

Typical: Fast, simple, & correct data-validation using Python 3 typing.

typical: Python's Typing Toolkit Introduction Typical is a library devoted to runtime analysis, inference, validation, and enforcement of Python types

170 Dec 26, 2022

coala provides a unified command-line interface for linting and fixing all your code, regardless of the programming languages you use.

"Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live." ― John F. Woods coala provides a

3.4k Jan 02, 2023

Pymwp is a tool for automatically performing static analysis on programs written in C

Related tags

Overview

pymwp: MWP analysis in Python

Documentation and Demo

Installation

How to Use

Running from source

Comments

infinite_8.c

notinfinite_8.c

Releases(0.2.1)

0.2.1(Oct 4, 2022)

What's Changed

profile-latest(Oct 4, 2022)

Commits

FSCD22(Jun 14, 2022)

0.1.6(Feb 7, 2022)

What's Changed

Dependency upgrades

0.1.5(Oct 7, 2021)

What's Changed

0.1.4(Oct 6, 2021)

What's Changed

0.1.3(Sep 16, 2021)

0.1.2(Sep 13, 2021)

0.1.1(Aug 27, 2021)

0.1.0(Sep 13, 2021)

0.0.1(Jun 18, 2021)

Owner

Static Analyses of Program Flows: Types and Certificate for Complexity

Alarmer is a tool focus on error reporting for your application.

Code audit tool for python.

A system for Python that generates static type annotations by collecting runtime types

Collects all accepted (partial and full scored) codes submitted within the given timeframe and saves them locally for plagiarism check.

Find dead Python code

C/C++ Dependency Analyzer: a rewrite of John Lakos' dep_utils (adep/cdep/ldep) from

This is a Python program to get the source lines of code (SLOC) count for a given GitHub repository.

Collection of library stubs for Python, with static types

The strictest and most opinionated python linter ever!

Typical: Fast, simple, & correct data-validation using Python 3 typing.

coala provides a unified command-line interface for linting and fixing all your code, regardless of the programming languages you use.

Unbearably fast O(1) runtime type-checking in pure Python.

CodeAnalysis - Static Code Analysis: a code comprehensive analysis platform

A static analysis tool for Python

A simple stopwatch for measuring code performance with static typing.

An app to show the total number of lines of code written by an user.

pycallgraph is a Python module that creates call graphs for Python programs.

A static type analyzer for Python code

🦔 PostHog is developer-friendly, open-source product analytics.

Turn your Python and Javascript code into DOT flowcharts