Spreadsheets have found ubiquitous use by scientists, business and financial analysts, researchers, and lay users. However, spreadsheets cannot express complex operations (e.g., joins), cannot handle large datasets, do not support collaboration, and foster errors, redundancy, and stale data. On the other hand, relational databases are well-known to be powerful and scalable, but are not flexible, intuitive, and interactive.
DataSpread addresses these limitations by holistically unifying spread-sheets with databases: preserving spreadsheets as the front-end, and databases as the back-end.
DataSpread supports the holistic integration
of spreadsheets and database systems with the following novel features:
Flexible Storage Model.
Due to the variety of structures that can be found in spreadsheets, DataSpread uses
a flexible storage model that can adapt to any existing spreadsheet structure.
Unlike relational databases, since spreadsheets are ordered,
DataSpread uses positional indexes to locate
and order data very efficiently.
Instead of computing formulae individually, DataSpread uses shared computation
to batch formulae computation, achieving order-of-magnitude speedups.
DataSpread prioritizes computation for what the user is currently seeing, as opposed
to what they are not.
Due to these features, DataSpread can scale to billions of cells, providing interactive response times.
Release 0.1 (December 2016, released): Basic scalable spreadsheets: Support for very large (1+ billion) spreadsheets, basic spreadsheet functions, cloud deployment.
Release 0.2 (April 2017, planned): Scalable, computable spreadsheets: Support for efficient formulae computation.
Release 0.3 (July 2017, planned): Scalable, computable, and collaborative spreadsheets: Support for collaboration and transactions.
The development of DataSpread is being done in collaboration with the
analytics team at Yahoo! Champaign, along with beta-testers from the NIH-BD2K Center
at the University of Illinois and Mayo Clinic.
DataSpread is being developed by a team of undergraduate and graduate
students headed by Prof. Kevin Chang
and Prof. Aditya Parameswaran, along with collaborators, including Prof. Karrie Karahalios. The list of contributors includes (in alphabetical order):
Mangesh Bendre, Neelan Coleman, Himel Dev, Bofan Sun, Vipul Venkataraman, Yiming Wang, Yining Wang, Ding Zhang, Xinyan Zhou.
Please reach out to the lead PhD student, Mangesh Bendre (firstname.lastname@example.org) if you'd like to either contribute, or be
a beta tester of DataSpread!