Created by: sparkingdark
What is this Python project?
Hub - Fastest unstructured dataset management for TensorFlow/PyTorch by activeloop.ai. Stream & version-control data. Converts large data into a single numpy-like array on the cloud, accessible on any machine.
Describe features.
- Store and retrieve large datasets with version-control
- Collaborate as in Google Docs: Multiple data scientists working on the same data in sync with no interruptions
- Access from multiple machines simultaneously
- Deploy anywhere - locally, on Google Cloud, S3, Azure, and Activeloop (by default - and for free!)
- Integrate with your ML tools like Numpy, Dask, Ray, PyTorch, or TensorFlow
- Create arrays as big as you want. You can store images as big as 100k by 100k!
- Keep the shape of each sample dynamic. This way you can store small and big arrays as 1 array.
- Visualize any slice of the data in a matter of seconds without redundant manipulations
What's the difference between this Python project and similar ones?
Enumerate comparisons.
It's much more deep learning, machine learning-oriented, and makes easy handling of the data.
Anyone who agrees with this pull request could submit an Approve review to it.