Dataspread

Dataspread combines the intuitiveness and flexibility of spreadsheets and the scalability and power of databases

Data Analytics for the 99%

Spreadsheets have found ubiquitous use by scientists, business and financial analysts, researchers, and lay users. However, spreadsheets cannot express complex operations (e.g., joins), cannot handle large datasets, do not support collaboration, and foster errors, redundancy, and stale data. On the other hand, relational databases are well-known to be powerful and scalable, but are not flexible, intuitive, and interactive.

DataSpread addresses these limitations by holistically unifying spread-sheets with databases: preserving spreadsheets as the front-end, and databases as the back-end.

Key Features

DataSpread supports the holistic integration of spreadsheets and database systems with the following novel features:

Flexible Storage Model. Due to the variety of structures that can be found in spreadsheets, DataSpread uses a flexible storage model that can adapt to any existing spreadsheet structure.

Positional Awareness. Unlike relational databases, since spreadsheets are ordered, DataSpread uses positional indexes to locate and order data very efficiently.

Shared Computation. Instead of computing formulae individually, DataSpread uses shared computation to batch formulae computation, achieving order-of-magnitude speedups.

Lazy Computation. DataSpread prioritizes computation for what the user is currently seeing, as opposed to what they are not.

Navigation Panel. Enables the users to explore large tabular data via panning and zooming operations and get additional details about the data on demand via aggregation operations while maintaining the history and context of navigation.

Due to these features, DataSpread can scale to billions of cells, providing interactive response times.

Releases

Release 0.5 (May 2019, released): Navigation panel, relational operators and asynchronous computation.

Release 0.3 (August 2017, released): Table functionality: Create and Link tables, add/delete of rows/columns.

Release 0.1 (December 2016, released): Basic scalable spreadsheets: Support for very large (1+ billion) spreadsheets, basic spreadsheet functions, cloud deployment.

Papers

Contact Us

The current version of DataSpread (what we're calling DataSpread 2.0) is being developed by a team of undergraduate and graduate students headed by Prof. Karrie Karahalios and Prof. Aditya Parameswaran at the University of Illinois and at UC Berkeley. The list of contributors includes (in alphabetical order): Mangesh Bendre, Ti-Chung Cheng, Richard Lin, Kelly Mack, Joon Sung Park, Sajjadur Rahman, Tana Wattanawaroon, and Pingjing Yang.

Past contributors for previous versions include Kevin Chang, Neelan Coleman, Himel Dev, Yuyang Liu, Yu Lu, Bofan Sun, Vipul Venkataraman, Yiming Wang, Yining Wang, Ding Zhang, Xinyan Zhou, and Shichu Zhu.

Please reach out to Aditya Parameswaran (adityagp@berkeley.edu) if you'd like to either contribute, or be a beta tester of DataSpread!

With thanks to our funding sources: