Dataspread combines the intuitiveness and flexibility of spreadsheets and the scalability and power of databases





Data Analytics for the 99%

Spreadsheets have found ubiquitous use by scientists, business and financial analysts, researchers, and lay users. However, spreadsheets cannot express complex operations (e.g., joins), cannot handle large datasets, do not support collaboration, and foster errors, redundancy, and stale data. On the other hand, relational databases are well-known to be powerful and scalable, but are not flexible, intuitive, and interactive.
DataSpread addresses these limitations by holistically unifying spread-sheets with databases: preserving spreadsheets as the front-end, and databases as the back-end.

Key Features

DataSpread supports the holistic integration of spreadsheets and database systems with the following novel features:
  • Flexible Storage Model. Due to the variety of structures that can be found in spreadsheets, DataSpread uses a flexible storage model that can adapt to any existing spreadsheet structure.
  • Positional Awareness. Unlike relational databases, since spreadsheets are ordered, DataSpread uses positional indexes to locate and order data very efficiently.
  • Shared Computation. Instead of computing formulae individually, DataSpread uses shared computation to batch formulae computation, achieving order-of-magnitude speedups.
  • Lazy Computation. DataSpread prioritizes computation for what the user is currently seeing, as opposed to what they are not.
Due to these features, DataSpread can scale to billions of cells, providing interactive response times.

Releases

Our version 0.1 release is out. Check it now!
  • Release 0.1 (December 2016, released): Basic scalable spreadsheets: Support for very large (1+ billion) spreadsheets, basic spreadsheet functions, cloud deployment.
  • Release 0.2 (April 2017, planned): Scalable, computable spreadsheets: Support for efficient formulae computation.
  • Release 0.3 (July 2017, planned): Scalable, computable, and collaborative spreadsheets: Support for collaboration and transactions.

Development Partners

The development of DataSpread is being done in collaboration with the analytics team at Yahoo! Champaign, along with beta-testers from the NIH-BD2K Center at the University of Illinois and Mayo Clinic.

Papers

Contact Us

DataSpread is being developed by a team of undergraduate and graduate students headed by Prof. Kevin Chang and Prof. Aditya Parameswaran, along with collaborators, including Prof. Karrie Karahalios. The list of contributors includes (in alphabetical order): Mangesh Bendre, Neelan Coleman, Himel Dev, Bofan Sun, Vipul Venkataraman, Yiming Wang, Yining Wang, Ding Zhang, Xinyan Zhou.
Please reach out to the lead PhD student, Mangesh Bendre (bendre1@illinois.edu) if you'd like to either contribute, or be a beta tester of DataSpread!

With thanks to our funding sources: