Dataspread combines the intuitiveness and flexibility of spreadsheets and the scalability and power of databases
Data Analytics for the 99%
Spreadsheets have found ubiquitous use by scientists, business and financial analysts, researchers, and lay users. However, spreadsheets cannot express complex operations (e.g., joins), cannot handle large datasets, do not support collaboration, and foster errors, redundancy, and stale data. On the other hand, relational databases are well-known to be powerful and scalable, but are not flexible, intuitive, and interactive.
DataSpread addresses these limitations by holistically unifying spread-sheets with databases: preserving spreadsheets as the front-end, and databases as the back-end.
DataSpread supports the holistic integration
of spreadsheets and database systems with the following novel features:
Flexible Storage Model.
Due to the variety of structures that can be found in spreadsheets, DataSpread uses
a flexible storage model that can adapt to any existing spreadsheet structure.
Unlike relational databases, since spreadsheets are ordered,
DataSpread uses positional indexes to locate
and order data very efficiently.
Instead of computing formulae individually, DataSpread uses shared computation
to batch formulae computation, achieving order-of-magnitude speedups.
DataSpread prioritizes computation for what the user is currently seeing, as opposed
to what they are not.
Enables the users to explore large tabular data via panning and zooming operations
and get additional details about the data on demand via aggregation
operations while maintaining the history and context of navigation.
Due to these features, DataSpread can scale to billions of cells, providing interactive response times.
Release 0.3 (August 2017, released): Table functionality: Create and Link tables, add/delete of rows/columns.
Release 0.1 (December 2016, released): Basic scalable spreadsheets: Support for very large (1+ billion) spreadsheets, basic spreadsheet functions, cloud deployment.
The development of DataSpread is being done in collaboration with the
analytics team at Yahoo! Champaign, along with beta-testers from the NIH-BD2K Center
at the University of Illinois and Mayo Clinic.
DataSpread is being developed by a team of undergraduate and graduate
students headed by Prof. Kevin Chang, Prof. Karrie Karahalios
and Prof. Aditya Parameswaran. The list of contributors includes (in alphabetical order):
Mangesh Bendre, Neelan Coleman, Himel Dev, Yuyang Liu, Yu Lu, Sajjadur Rahman, Bofan Sun, Vipul Venkataraman, Yiming Wang, Yining Wang, Tana Wattanawaroon, Ding Zhang, Xinyan Zhou, Shichu Zhu.
Please reach out to the lead PhD student, Mangesh Bendre (firstname.lastname@example.org) if you'd like to either contribute, or be
a beta tester of DataSpread!