Visions provides an extensible suite of tools to support common data analysis operations

Overview

Visions

JossPaper PyPiDownloadsBadge PyPiDownloadsMonthlyBadge PyPiVersionBadge PythonBadge BinderBadge

And these visions of data types, they kept us up past the dawn.

Visions provides an extensible suite of tools to support common data analysis operations including

  • type inference on unknown data
  • casting data types
  • automated data summarization

https://github.com/dylan-profiler/visions/raw/develop/docsrc/source/_static/side-by-side.png

Documentation

Full documentation can be found here.

Installation

You can install visions via pip:

pip install visions

Alternatives and more details can be found in the documentation.

Supported frameworks

These frameworks are supported out-of-the-box in addition to native Python types:

https://github.com/dylan-profiler/visions/raw/develop/docsrc/source/_static/frameworks.png

  • Numpy
  • Pandas
  • Spark

Contributing and support

Contributions to visions are welcome. For more information, please visit the Community contributions page. The the Github issues tracker is used for reporting bugs, feature requests and support questions.

Acknowledgements

This package is part of the dylan-profiler project. The package is core component of pandas-profiling. More information can be found here. This work was partially supported by SIDN Fonds.

https://github.com/dylan-profiler/visions/raw/master/SIDNfonds.png

Comments
  • Numpy backend

    Numpy backend

    Summary

    This PR adds complete numpy backend support for the StandardSet of types. The type implementation is fully compatible with the pandas equivalent implementations with the exception of the object type.

    Caveats

    Whereas pandas provides support for Optional[int] and Optional[bool] numpy doesn't - in order to support those types by default I was forced to make object completely disjoint from any other concrete type. A similar story plays out for datetime objects with timezones which also default to object in numpy.

    opened by ieaves 8
  • API and Usage

    API and Usage

    (This is part of the review in openjournals/joss-reviews#2145)

    Hi @sbrugman,

    I am currently going through the package and I found it a very interesting project. The type inference that already exists in built-in Python is frequently not enough, and I often find myself writing my own functions for it on a case-by-case basis. So, it is nice to see that this is being done.

    However, I do find myself having quite a bit of trouble using the package effectively. Funnily enough, this is mostly caused by the (in my opinion) confusing naming schemes and the structure of the namespace. It may require a bit of effort to solve (not to mention that it might create incompatibilities with previous versions), but I believe that it will greatly improve the user experience when fixed.

    So, the main problem in my opinion is that almost all definitions are stored in their own subpackages in the visions.core subpackage, often separated as well over an implementations and a model subpackage. This means that in order to access a definition, let's say VisionsTypeset, I have to import this from visions.core.model.typeset, while pre-defined typesets I have to import from visions.core.implementations.typesets. In my opinion, it is incredibly confusing that these are stored in different subpackages/submodules, as they are related to the same thing, namely typesets, and I would expect to find all of these definitions in a visions.typesets subpackage. Preferably, all definitions the average user would use, should be available either at root (visions) or a single level deep (visions.xxx).

    I also noticed that almost all definitions have the word visions in their name. I get the feeling that the reason for this is to avoid namespace clashes when someone uses a wildcard import (from visions import *). However, as wildcard imports are heavily discouraged in the community, this leads to the user writing the word visions at least twice for every definition used (for example, using the visions_integer type requires me to write visions.visions_integer, which could be simplified to visions.integer or even visions.int).

    Finally, I am not entirely sure if this has to do with the online documentation being outdated as mentioned in #21, but according to the example here, a visions_standard_set object has a method called detect_type_series. In v0.2.3, this object neither has a method called detect_type_series nor type_detect_series (the name that the stand-alone function has in visions.core.functional), but instead it is called detect_series_type. If possible, could you check and make sure that the methods and stand-alone functions use consistent naming schemes?

    Please let me know if you have any questions.

    enhancement 
    opened by 1313e 7
  • Recommended stack overflow tag for questions

    Recommended stack overflow tag for questions

    I have a question about how to use the library. I considered opening an issue, but I see in the documentation that you recommend asking questions about how to use the package on Stack Overflow. Is there a tag that you'd suggest people use when asking questions there? I don't see anything with visions as a tag, but maybe I'm just the first person to ask a question over there.

    If you think visions would be a good tag choice it would make sense to update the stack overflow ask a question link to pre-populate the question with the tag (https://stackoverflow.com/questions/ask?tags=visions).

    Thanks!

    enhancement 
    opened by sterlinm 6
  • Automate building of documentation

    Automate building of documentation

    The documentation should rebuilt at every merge. The differences caused by the documentation convolute code reviews. The steps to build the documentation are simple and can be automated.

    Suggested solution via Github Actions (e.g.): https://github.com/marketplace/actions/sphinx-build https://github.com/ammaraskar/sphinx-action-test/blob/master/.github/workflows/default.yml

    enhancement 
    opened by sbrugman 6
  • Please push an updated version to pypi to correct dependency on attrs not attr

    Please push an updated version to pypi to correct dependency on attrs not attr

    Describe the bug visions uses the @attr.s decorator which comes from the attrs module, not the attr module. The master version of visions has the correct dependency, but the pypi versions do not.

    Additional context When using pandas_profiling which depends on visions, I got the following error:

    AttributeError: module 'attr' has no attribute 's'
    

    which led me to post this issue.

    bug 
    opened by proinsias 5
  • Version 0.7.5

    Version 0.7.5

    0.7.5 Includes:

    • Fixes to numpy backend for complex, object, email_address, URL, boolean
    • Support for new versions of pandas ABCIndex class (previously called ABCIndexClass)
    • Updated tests for numpy backend
    • Automated Github Actions unit tests on PR
    • Additional documentation
    opened by ieaves 4
  • fail to pass the test with 0.6.1 release

    fail to pass the test with 0.6.1 release

    Describe the bug The tests failed with 0.6.1 release. To Reproduce Steps to reproduce the behavior:

    python setup.py build
    PYTHONPATH=build/lib pytest -v
    

    Expected behavior pass all tests

    Additional context error log:

    =================================== FAILURES ===================================
    _____________________ test_contains[file_mixed_ext x File] _____________________
    
    series = 0    /build/python-visions/src/visions-0.6.1/build/...
    1    /build/python-visions/src/visions-0.6.1/build/...
    2    /build/python-visions/src/visions-0.6.1/build/...
    Name: file_mixed_ext, dtype: object
    type = File, member = True
    
        @pytest.mark.parametrize(**get_contains_cases(series, contains_map, typeset))
        def test_contains(series, type, member):
            """Test the generated combinations for "series in type"
        
            Args:
                series: the series to test
                type: the type to test against
                member: the result
            """
            result, message = contains(series, type, member)
    >       assert result, message
    E       AssertionError: file_mixed_ext in File; expected True, got False
    E       assert False
    
    tests/typesets/test_complete_set.py:190: AssertionError
    _______________________ test_contains[image_png x File] ________________________
    
    series = 0    /build/python-visions/src/visions-0.6.1/build/...
    1    /build/python-visions/src/visions-0.6.1/build/...
    2    /build/python-visions/src/visions-0.6.1/build/...
    Name: image_png, dtype: object
    type = File, member = True
    
        @pytest.mark.parametrize(**get_contains_cases(series, contains_map, typeset))
        def test_contains(series, type, member):
            """Test the generated combinations for "series in type"
        
            Args:
                series: the series to test
                type: the type to test against
                member: the result
            """
            result, message = contains(series, type, member)
    >       assert result, message
    E       AssertionError: image_png in File; expected True, got False
    E       assert False
    
    tests/typesets/test_complete_set.py:190: AssertionError
    _______________________ test_contains[image_png x Image] _______________________
    
    series = 0    /build/python-visions/src/visions-0.6.1/build/...
    1    /build/python-visions/src/visions-0.6.1/build/...
    2    /build/python-visions/src/visions-0.6.1/build/...
    Name: image_png, dtype: object
    type = Image, member = True
    
        @pytest.mark.parametrize(**get_contains_cases(series, contains_map, typeset))
        def test_contains(series, type, member):
            """Test the generated combinations for "series in type"
        
            Args:
                series: the series to test
                type: the type to test against
                member: the result
            """
            result, message = contains(series, type, member)
    >       assert result, message
    E       AssertionError: image_png in Image; expected True, got False
    E       assert False
    
    tests/typesets/test_complete_set.py:190: AssertionError
    ___________________ test_contains[image_png_missing x File] ____________________
    
    series = 0    /build/python-visions/src/visions-0.6.1/build/...
    1    /build/python-visions/src/visions-0.6.1/build/...
    2       ...c/visions-0.6.1/build/...
    4                                                 None
    Name: image_png_missing, dtype: object
    type = File, member = True
    
        @pytest.mark.parametrize(**get_contains_cases(series, contains_map, typeset))
        def test_contains(series, type, member):
            """Test the generated combinations for "series in type"
        
            Args:
                series: the series to test
                type: the type to test against
                member: the result
            """
            result, message = contains(series, type, member)
    >       assert result, message
    E       AssertionError: image_png_missing in File; expected True, got False
    E       assert False
    
    tests/typesets/test_complete_set.py:190: AssertionError
    ___________________ test_contains[image_png_missing x Image] ___________________
    
    series = 0    /build/python-visions/src/visions-0.6.1/build/...
    1    /build/python-visions/src/visions-0.6.1/build/...
    2       ...c/visions-0.6.1/build/...
    4                                                 None
    Name: image_png_missing, dtype: object
    type = Image, member = True
    
        @pytest.mark.parametrize(**get_contains_cases(series, contains_map, typeset))
        def test_contains(series, type, member):
            """Test the generated combinations for "series in type"
        
            Args:
                series: the series to test
                type: the type to test against
                member: the result
            """
            result, message = contains(series, type, member)
    >       assert result, message
    E       AssertionError: image_png_missing in Image; expected True, got False
    E       assert False
    
    tests/typesets/test_complete_set.py:190: AssertionError
    _____________ test_inference[file_mixed_ext x File expected True] ______________
    
    series = 0    /build/python-visions/src/visions-0.6.1/build/...
    1    /build/python-visions/src/visions-0.6.1/build/...
    2    /build/python-visions/src/visions-0.6.1/build/...
    Name: file_mixed_ext, dtype: object
    type = File, typeset = CompleteSet, difference = False
    
        @pytest.mark.parametrize(**get_inference_cases(series, inference_map, typeset))
        def test_inference(series, type, typeset, difference):
            """Test the generated combinations for "inference(series) == type"
        
            Args:
                series: the series to test
                type: the type to test against
            """
            result, message = infers(series, type, typeset, difference)
    >       assert result, message
    E       AssertionError: inference of file_mixed_ext expected File to be True (typeset=CompleteSet)
    E       assert False
    
    tests/typesets/test_complete_set.py:317: AssertionError
    _____________ test_inference[file_mixed_ext x Path expected False] _____________
    
    series = 0    /build/python-visions/src/visions-0.6.1/build/...
    1    /build/python-visions/src/visions-0.6.1/build/...
    2    /build/python-visions/src/visions-0.6.1/build/...
    Name: file_mixed_ext, dtype: object
    type = Path, typeset = CompleteSet, difference = True
    
        @pytest.mark.parametrize(**get_inference_cases(series, inference_map, typeset))
        def test_inference(series, type, typeset, difference):
            """Test the generated combinations for "inference(series) == type"
        
            Args:
                series: the series to test
                type: the type to test against
            """
            result, message = infers(series, type, typeset, difference)
    >       assert result, message
    E       AssertionError: inference of file_mixed_ext expected Path to be False (typeset=CompleteSet)
    E       assert False
    
    tests/typesets/test_complete_set.py:317: AssertionError
    _______________ test_inference[image_png x Image expected True] ________________
    
    series = 0    /build/python-visions/src/visions-0.6.1/build/...
    1    /build/python-visions/src/visions-0.6.1/build/...
    2    /build/python-visions/src/visions-0.6.1/build/...
    Name: image_png, dtype: object
    type = Image, typeset = CompleteSet, difference = False
    
        @pytest.mark.parametrize(**get_inference_cases(series, inference_map, typeset))
        def test_inference(series, type, typeset, difference):
            """Test the generated combinations for "inference(series) == type"
        
            Args:
                series: the series to test
                type: the type to test against
            """
            result, message = infers(series, type, typeset, difference)
    >       assert result, message
    E       AssertionError: inference of image_png expected Image to be True (typeset=CompleteSet)
    E       assert False
    
    tests/typesets/test_complete_set.py:317: AssertionError
    _______________ test_inference[image_png x Path expected False] ________________
    
    series = 0    /build/python-visions/src/visions-0.6.1/build/...
    1    /build/python-visions/src/visions-0.6.1/build/...
    2    /build/python-visions/src/visions-0.6.1/build/...
    Name: image_png, dtype: object
    type = Path, typeset = CompleteSet, difference = True
    
        @pytest.mark.parametrize(**get_inference_cases(series, inference_map, typeset))
        def test_inference(series, type, typeset, difference):
            """Test the generated combinations for "inference(series) == type"
        
            Args:
                series: the series to test
                type: the type to test against
            """
            result, message = infers(series, type, typeset, difference)
    >       assert result, message
    E       AssertionError: inference of image_png expected Path to be False (typeset=CompleteSet)
    E       assert False
    
    tests/typesets/test_complete_set.py:317: AssertionError
    ___________ test_inference[image_png_missing x Image expected True] ____________
    
    series = 0    /build/python-visions/src/visions-0.6.1/build/...
    1    /build/python-visions/src/visions-0.6.1/build/...
    2       ...c/visions-0.6.1/build/...
    4                                                 None
    Name: image_png_missing, dtype: object
    type = Image, typeset = CompleteSet, difference = False
    
        @pytest.mark.parametrize(**get_inference_cases(series, inference_map, typeset))
        def test_inference(series, type, typeset, difference):
            """Test the generated combinations for "inference(series) == type"
        
            Args:
                series: the series to test
                type: the type to test against
            """
            result, message = infers(series, type, typeset, difference)
    >       assert result, message
    E       AssertionError: inference of image_png_missing expected Image to be True (typeset=CompleteSet)
    E       assert False
    
    tests/typesets/test_complete_set.py:317: AssertionError
    ___________ test_inference[image_png_missing x Path expected False] ____________
    
    series = 0    /build/python-visions/src/visions-0.6.1/build/...
    1    /build/python-visions/src/visions-0.6.1/build/...
    2       ...c/visions-0.6.1/build/...
    4                                                 None
    Name: image_png_missing, dtype: object
    type = Path, typeset = CompleteSet, difference = True
    
        @pytest.mark.parametrize(**get_inference_cases(series, inference_map, typeset))
        def test_inference(series, type, typeset, difference):
            """Test the generated combinations for "inference(series) == type"
        
            Args:
                series: the series to test
                type: the type to test against
            """
            result, message = infers(series, type, typeset, difference)
    >       assert result, message
    E       AssertionError: inference of image_png_missing expected Path to be False (typeset=CompleteSet)
    E       assert False
    
    tests/typesets/test_complete_set.py:317: AssertionError
    =============================== warnings summary ===============================
    tests/test_root.py::test_multiple_roots
      /build/python-visions/src/visions-0.6.1/build/lib/visions/typesets/typeset.py:88: UserWarning: {Generic} were isolates in the type relation map and consequently orphaned. Please add some mapping to the orphaned nodes.
        warnings.warn(message)
    
    tests/test_summarization.py::test_complex_missing_summary
      /usr/lib/python3.8/site-packages/numpy/core/_methods.py:47: ComplexWarning: Casting complex values to real discards the imaginary part
        return umr_sum(a, axis, dtype, out, keepdims, initial, where)
    
    -- Docs: https://docs.pytest.org/en/stable/warnings.html
    =========================== short test summary info ============================
    FAILED tests/typesets/test_complete_set.py::test_contains[file_mixed_ext x File]
    FAILED tests/typesets/test_complete_set.py::test_contains[image_png x File]
    FAILED tests/typesets/test_complete_set.py::test_contains[image_png x Image]
    FAILED tests/typesets/test_complete_set.py::test_contains[image_png_missing x File]
    FAILED tests/typesets/test_complete_set.py::test_contains[image_png_missing x Image]
    FAILED tests/typesets/test_complete_set.py::test_inference[file_mixed_ext x File expected True]
    FAILED tests/typesets/test_complete_set.py::test_inference[file_mixed_ext x Path expected False]
    FAILED tests/typesets/test_complete_set.py::test_inference[image_png x Image expected True]
    FAILED tests/typesets/test_complete_set.py::test_inference[image_png x Path expected False]
    FAILED tests/typesets/test_complete_set.py::test_inference[image_png_missing x Image expected True]
    FAILED tests/typesets/test_complete_set.py::test_inference[image_png_missing x Path expected False]
    ================= 11 failed, 8954 passed, 2 warnings in 15.94s =================
    

    see also the complete build log here

    bug 
    opened by hubutui 4
  • No module named 'shapely'

    No module named 'shapely'

    (This is part of the review in openjournals/joss-reviews#2145)

    When I try to execute the example given here, I get an error stating No module named 'shapely'. I see that this is a dependency of visions, but is only listed in the requirements_test.txt. You probably have to add this requirement to the requirements.txt as well.

    PS: Currently, the requirements of the package are both listed in their own separate file and in the setup.py file. To avoid confusion for yourself, it is probably better to only use either. You can read in a requirements file and use it in the setup.py file by using:

    # Get the requirements list
    with open('requirements.txt', 'r') as f:
        requirements = f.read().splitlines()
    

    Keep in mind that it is possible to link different requirements files together. For example, you can link requirements.txt and requirements_dev.txt together by adding the line -r requirements.txt to the top of requirements_dev.txt. This means that installing the requirements of requirements_dev.txt will use both files. This however won't work if you parse the file in a setup.py file. In that case, you can simply read both files and append them together if necessary.

    opened by 1313e 4
  • Bump pyarrow from 1.0.1 to 5.0.0

    Bump pyarrow from 1.0.1 to 5.0.0

    Bumps pyarrow from 1.0.1 to 5.0.0.

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies 
    opened by dependabot[bot] 3
  • all nulls should be inferred as generic

    all nulls should be inferred as generic

    i don't think this should be expected behavior.

    In [41]: infer_type(pd.DataFrame({'x':['', '']}), StandardSet())
    Out[41]: {'x': DateTime}
    
    In [39]: infer_type(pd.DataFrame({'x':[None, None]}), StandardSet())
    Out[39]: {'x': Boolean}
    
    bug 
    opened by majidaldo 3
  • comma separator

    comma separator

    New functionality

    • comma separator handling for string digits
    • new utility functionality for working with missing values

    Major Proposed Changes

    • Integer should be a strict subset of Float
    opened by ieaves 3
  • Add CodeQL workflow for GitHub code scanning

    Add CodeQL workflow for GitHub code scanning

    Hi dylan-profiler/visions!

    This is a one-off automatically generated pull request from LGTM.com :robot:. You might have heard that we’ve integrated LGTM’s underlying CodeQL analysis engine natively into GitHub. The result is GitHub code scanning!

    With LGTM fully integrated into code scanning, we are focused on improving CodeQL within the native GitHub code scanning experience. In order to take advantage of current and future improvements to our analysis capabilities, we suggest you enable code scanning on your repository. Please take a look at our blog post for more information.

    This pull request enables code scanning by adding an auto-generated codeql.yml workflow file for GitHub Actions to your repository — take a look! We tested it before opening this pull request, so all should be working :heavy_check_mark:. In fact, you might already have seen some alerts appear on this pull request!

    Where needed and if possible, we’ve adjusted the configuration to the needs of your particular repository. But of course, you should feel free to tweak it further! Check this page for detailed documentation.

    Questions? Check out the FAQ below!

    FAQ

    Click here to expand the FAQ section

    How often will the code scanning analysis run?

    By default, code scanning will trigger a scan with the CodeQL engine on the following events:

    • On every pull request — to flag up potential security problems for you to investigate before merging a PR.
    • On every push to your default branch and other protected branches — this keeps the analysis results on your repository’s Security tab up to date.
    • Once a week at a fixed time — to make sure you benefit from the latest updated security analysis even when no code was committed or PRs were opened.

    What will this cost?

    Nothing! The CodeQL engine will run inside GitHub Actions, making use of your unlimited free compute minutes for public repositories.

    What types of problems does CodeQL find?

    The CodeQL engine that powers GitHub code scanning is the exact same engine that powers LGTM.com. The exact set of rules has been tweaked slightly, but you should see almost exactly the same types of alerts as you were used to on LGTM.com: we’ve enabled the security-and-quality query suite for you.

    How do I upgrade my CodeQL engine?

    No need! New versions of the CodeQL analysis are constantly deployed on GitHub.com; your repository will automatically benefit from the most recently released version.

    The analysis doesn’t seem to be working

    If you get an error in GitHub Actions that indicates that CodeQL wasn’t able to analyze your code, please follow the instructions here to debug the analysis.

    How do I disable LGTM.com?

    If you have LGTM’s automatic pull request analysis enabled, then you can follow these steps to disable the LGTM pull request analysis. You don’t actually need to remove your repository from LGTM.com; it will automatically be removed in the next few months as part of the deprecation of LGTM.com (more info here).

    Which source code hosting platforms does code scanning support?

    GitHub code scanning is deeply integrated within GitHub itself. If you’d like to scan source code that is hosted elsewhere, we suggest that you create a mirror of that code on GitHub.

    How do I know this PR is legitimate?

    This PR is filed by the official LGTM.com GitHub App, in line with the deprecation timeline that was announced on the official GitHub Blog. The proposed GitHub Action workflow uses the official open source GitHub CodeQL Action. If you have any other questions or concerns, please join the discussion here in the official GitHub community!

    I have another question / how do I get in touch?

    Please join the discussion here to ask further questions and send us suggestions!

    opened by lgtm-com[bot] 0
  • Sktime semantic data types for time series & vision

    Sktime semantic data types for time series & vision

    I've recently been made aware of this excellent and imo much needed library by @lmmentel.

    The reason is its similarity to the datatypes module of sktime, which introduces semantic typing for time series related data types - we distinguish "mtypes" (machine representations) and "scitypes" (scientific types, what visions calls semantic type). More details here as reference.

    Few questions for visions devs:

    • time series are known to be a notoriously splintered field in terms of data representation, and even more when it comes to learning tasks (as in your ML example). Do you see visions moving in the direction of typing for ML?
    • would you have time to look into the sktime datatypes module and assess how similar this is to visions? If similar, we might be tempted to take a dependency on visions and contribute. Key features are mtype conversions, scitype inference, checks that also return metadata (e.g., number of time stamps in a series, which can be represented 4 different ways)
    enhancement 
    opened by fkiraly 7
  • src/visions/types/url.py passes non URLs

    src/visions/types/url.py passes non URLs

    src/visions/types/url.py does not correctly validate URLs.

    First, the example code (lines 14--19) from the docs do not return True:

    Python 3.9.4 (default, Apr  9 2021, 09:32:38)
    [Clang 10.0.0 ] :: Anaconda, Inc. on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import visions
    >>> from urllib.parse import urlparse
    >>> urls = ['http://www.cwi.nl:80/%7Eguido/Python.html', 'https://github.com/pandas-profiling/pandas-profiling']
    >>> x = [urlparse(url) for url in urls]
    >>> x in visions.URL
    False
    >>> x
    [ParseResult(scheme='http', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html', params='', query='', fragment=''), ParseResult(scheme='https', netloc='github.com', path='/pandas-profiling/pandas-profiling', params='', query='', fragment='')]
    >>> import pkg_resources
    >>> pkg_resources.get_distribution("visions").version
    '0.7.4'
    

    Second, non URLs are passing:

    >>> urlparse('junk') in visions.URL
    True
    >>>
    

    The code should probably check something like the following for each element of x:

        try:
            result = urlparse(x)
            return all([result.scheme, result.netloc])
        except:
            return False
    

    Finally, and this is a suggested enhancement, I think the behavior would be more useful if it handled raw strings and did the parsing internally without the caller having to supply a parser:

    urls = ['http://www.cwi.nl:80/%7Eguido/Python.html', 'https://github.com/pandas-profiling/pandas-profiling']
    >>> urls in visions.URL
    True
    
    bug 
    opened by leapingllamas 3
  • How to check if a type is/is_not parent of another type ?

    How to check if a type is/is_not parent of another type ?

    Follow the example of "Problem type inference".

    graph

    From one dataframe, I already make a list of type for each column. Here is the type_list:

    [Discrete,
     Nominal,
     Discrete,
     Nominal,
     Nominal,
     Nominal,
     Nominal,
     Nominal,
     Nominal,
     Binary,
     Discrete,
     Discrete,
     Discrete,
     Nominal,
     Binary]
    

    type(type_list[0]) give visions.types.type.VisionsBaseTypeMeta

    Now, I want to check if each type either have parent type of Categorical or Numeric.

    for column, t in zip(column, type_list):
         if is_type_parent_of_categorical(t): 
                category_job(dataframe[column]) 
    
    # binary is child if Categorical
    is_type_parent_of_categorical(type_list[14]) -> True 
    
    # Discrete is child of Numeric 
    is_type_parent_of_categorical(type_list[0]) -> False 
    

    How should I implement is_type_parent_of_categorical ?

    My workaround seem to work because string comparision:

    def is_type_parent_of_categorical(visions_type):
            type_str = str(visions_type)
                if type_str in ["Categorical", "Ordinal", "Nominal", "Binary"]:
                    return True
                return False
    
    enhancement 
    opened by ttpro1995 2
  • function: 'lowest' common type

    function: 'lowest' common type

    Sometimes going through a whole array is not needed. You have the types of the subsets of the array and you just want to get a compatible data type for all subsets.

    A common scenario when assembling horrible csvs is that the same column might be inferred as different types in different csvs. For example, (float <-- int). Worst case is to 'fall back' to string.

    enhancement 
    opened by majidaldo 2
Releases(v0.7.5)
  • v0.7.5(Dec 5, 2021)

  • v0.7.4(Sep 27, 2021)

  • v0.7.2(Sep 27, 2021)

  • v0.7.1(Feb 4, 2021)

  • v0.7.0(Jan 5, 2021)

  • v0.6.4(Oct 17, 2020)

    ENH: swifter apply for pandas backend FIX: fix for issue #147 ENH: __version__ attribute made available ENH: improved typing and CI ENH: contrib types/typesets for a low-threshold contribution of types

    Source code(tar.gz)
    Source code(zip)
  • v0.6.1(Oct 11, 2020)

    ENH: Expose state using typeset.detect and typeset.infer ENH: plotting of typesets improved FIX: fix and test cases for #136 CLN: pre-commit with black, isort, pyupgrade, flake8 ENH: type relations are now accessible by type (e.g. Float.relations[Integer])

    Source code(tar.gz)
    Source code(zip)
  • v0.6.0(Sep 22, 2020)

  • v0.5.1(Sep 22, 2020)

    • Introduce stateful type inference and casting
    • Expose test utils to users and fix diagnostic information
    • Integer consistency for the standard set
    • Use pd.BooleanDtype for newer versions of pandas
    • Latest black formatting
    Source code(tar.gz)
    Source code(zip)
  • v0.5.0(Aug 16, 2020)

    API breaking changes:

    • migration to single dispatch on typeset methods
    • updated API to unify detect / infer / cast against Series and DataFrames
    • improvements to boolean type
    Source code(tar.gz)
    Source code(zip)
  • v0.4.6(Jul 28, 2020)

  • v0.4.5(Jul 28, 2020)

  • v0.4.4(May 11, 2020)

  • v0.4.3(May 10, 2020)

  • v0.4.2(May 10, 2020)

    Support for Files and Images, rewritten summarization functions

    • Renamed ExistingPath to File
    • Renamed ImagePath to Image
    • Version bump to 0.4.2
    • Summaries: return series instead of dict
    • Categorical: unicode counts now based on original character distribution instead of unique characters which are used as intermediate step for increased performance.
    • Categorical: aggregate functions are included for string length (min, max, mean, median).
    • Path: number of unique values for the path parts are returned
    • Image: make Exif and Hash calculations optional. Also return width, height and area.
    • File: in addition to the file_size, return creation, modification and access time (which were already returned).
    Source code(tar.gz)
    Source code(zip)
An Integrated Experimental Platform for time series data anomaly detection.

Curve Sorry to tell contributors and users. We decided to archive the project temporarily due to the employee work plan of collaborators. There are no

Baidu 486 Dec 21, 2022
Employee Turnover Analysis

Employee Turnover Analysis Submission to the DataCamp competition "Can you help reduce employee turnover?"

Jannik Wiedenhaupt 1 Feb 13, 2022
[CVPR2022] This repository contains code for the paper "Nested Collaborative Learning for Long-Tailed Visual Recognition", published at CVPR 2022

Nested Collaborative Learning for Long-Tailed Visual Recognition This repository is the official PyTorch implementation of the paper in CVPR 2022: Nes

Jun Li 65 Dec 09, 2022
Implementation in Python of the reliability measures such as Omega.

reliabiliPy Summary Simple implementation in Python of the [reliability](https://en.wikipedia.org/wiki/Reliability_(statistics) measures for surveys:

Rafael Valero Fernández 2 Apr 27, 2022
Meltano: ELT for the DataOps era. Meltano is open source, self-hosted, CLI-first, debuggable, and extensible.

Meltano is open source, self-hosted, CLI-first, debuggable, and extensible. Pipelines are code, ready to be version c

Meltano 625 Jan 02, 2023
Data pipelines built with polars

valves Warning: the project is very much work in progress. Valves is a collection of functions for your data .pipe()-lines. This project aimes to host

14 Jan 03, 2023
Project: Netflix Data Analysis and Visualization with Python

Project: Netflix Data Analysis and Visualization with Python Table of Contents General Info Installation Demo Usage and Main Functionalities Contribut

Kathrin Hälbich 2 Feb 13, 2022
Using Python to scrape some basic player information from www.premierleague.com and then use Pandas to analyse said data.

PremiershipPlayerAnalysis Using Python to scrape some basic player information from www.premierleague.com and then use Pandas to analyse said data. No

5 Sep 06, 2021
Integrate bus data from a variety of sources (batch processing and real time processing).

Purpose: This is integrate bus data from a variety of sources such as: csv, json api, sensor data ... into Relational Database (batch processing and r

1 Nov 25, 2021
The Master's in Data Science Program run by the Faculty of Mathematics and Information Science

The Master's in Data Science Program run by the Faculty of Mathematics and Information Science is among the first European programs in Data Science and is fully focused on data engineering and data a

Amir Ali 2 Jun 17, 2022
This repo is dedicated to the data extraction and manipulation of the World Bank's database called STEP.

Overview Welcome to the Step-X repository. This repo is dedicated to the data extraction and manipulation of the World Bank's database called STEP. Be

Keanu Pang 0 Jan 20, 2022
A DSL for data-driven computational pipelines

"Dataflow variables are spectacularly expressive in concurrent programming" Henri E. Bal , Jennifer G. Steiner , Andrew S. Tanenbaum Quick overview Ne

1.9k Jan 03, 2023
A pipeline that creates consensus sequences from a Nanopore reads. I

A pipeline that creates consensus sequences from a Nanopore reads. It clusters reads that are similar to each other and creates a consensus that is then identified using BLAST.

Ada Madejska 2 May 15, 2022
Open source platform for Data Science Management automation

Hydrosphere examples This repo contains demo scenarios and pre-trained models to show Hydrosphere capabilities. Data and artifacts management Some mod

hydrosphere.io 6 Aug 10, 2021
Stock Analysis dashboard Using Streamlit and Python

StDashApp Stock Analysis Dashboard Using Streamlit and Python If you found the content useful and want to support my work, you can buy me a coffee! Th

StreamAlpha 27 Dec 09, 2022
ASTR 302: Python for Astronomy (Winter '22)

ASTR 302, Winter 2022, University of Washington: Python for Astronomy Mario Jurić Location When: 2:30-3:50, Monday & Wednesday, Winter quarter 2022 Wh

UW ASTR 302: Python for Astronomy 4 Jan 12, 2022
Hangar is version control for tensor data. Commit, branch, merge, revert, and collaborate in the data-defined software era.

Overview docs tests package Hangar is version control for tensor data. Commit, branch, merge, revert, and collaborate in the data-defined software era

Tensorwerk 193 Nov 29, 2022
ELFXtract is an automated analysis tool used for enumerating ELF binaries

ELFXtract ELFXtract is an automated analysis tool used for enumerating ELF binaries Powered by Radare2 and r2ghidra This is specially developed for PW

Monish Kumar 49 Nov 28, 2022
Larch: Applications and Python Library for Data Analysis of X-ray Absorption Spectroscopy (XAS, XANES, XAFS, EXAFS), X-ray Fluorescence (XRF) Spectroscopy and Imaging

Larch: Data Analysis Tools for X-ray Spectroscopy and More Documentation: http://xraypy.github.io/xraylarch Code: http://github.com/xraypy/xraylarch L

xraypy 95 Dec 13, 2022
Exploratory Data Analysis for Employee Retention Dataset

Exploratory Data Analysis for Employee Retention Dataset Employee turn-over is a very costly problem for companies. The cost of replacing an employee

kana sudheer reddy 2 Oct 01, 2021