Dataspread combines the intuitiveness and flexibility of spreadsheets and the scalability and power of databases
Data Analytics for the 99%
Spreadsheets have found ubiquitous use by scientists, business and financial analysts, researchers, and lay users. However, spreadsheets cannot express complex operations (e.g., joins), cannot handle large datasets, do not support collaboration, and foster errors, redundancy, and stale data. On the other hand, relational databases are well-known to be powerful and scalable, but are not flexible, intuitive, and interactive.
DataSpread addresses these limitations by holistically unifying spread-sheets with databases: preserving spreadsheets as the front-end, and databases as the back-end.
DataSpread supports the holistic integration
of spreadsheets and database systems with the following novel features:
Flexible Storage Model.
Due to the variety of structures that can be found in spreadsheets, DataSpread uses
a flexible storage model that can adapt to any existing spreadsheet structure.
Unlike relational databases, since spreadsheets are ordered,
DataSpread uses positional indexes to locate
and order data very efficiently.
Instead of computing formulae individually, DataSpread uses shared computation
to batch formulae computation, achieving order-of-magnitude speedups.
DataSpread prioritizes computation for what the user is currently seeing, as opposed
to what they are not.
Enables the users to explore large tabular data via panning and zooming operations
and get additional details about the data on demand via aggregation
operations while maintaining the history and context of navigation.
Due to these features, DataSpread can scale to billions of cells, providing interactive response times.
Release 0.3 (August 2017, released): Table functionality: Create and Link tables, add/delete of rows/columns.
Release 0.1 (December 2016, released): Basic scalable spreadsheets: Support for very large (1+ billion) spreadsheets, basic spreadsheet functions, cloud deployment.
The current version of DataSpread (what we're calling DataSpread 2.0) is being developed by a team of undergraduate and graduate
students headed by Prof. Karrie Karahalios
and Prof. Aditya Parameswaran at the University of Illinois and at UC Berkeley.
The list of contributors includes (in alphabetical order):
Mangesh Bendre, Ti-Chung Cheng, Richard Lin, Kelly Mack, Joon Sung Park, Sajjadur Rahman, Tana Wattanawaroon, and Pingjing Yang.
Past contributors for previous versions include Kevin Chang, Neelan Coleman, Himel Dev, Yuyang Liu, Yu Lu, Bofan Sun, Vipul Venkataraman, Yiming Wang, Yining Wang, Ding Zhang, Xinyan Zhou, and Shichu Zhu.
Please reach out to Aditya Parameswaran (firstname.lastname@example.org) if you'd like to either contribute, or be
a beta tester of DataSpread!