Activity schema is an open standard for data modeling and transformation in a data warehouse.
It is designed to make data modeling and analysis substantially simpler, faster, and more reliable.
Data is modeled using independent activities.
All warehouse data is in a single time-series table
All plots and analyses for BI run against a single table
An activity schema models all data in the warehouse as a single time series table.
Data is built from independent activities instead of facts and dimensions.
Any activity can be combined with any other by using relationships in time instead of foreign keys, allowing for true ad hoc queries.
Existing data modeling approaches, such as a star schema, have many layers of dependencies.
These are difficult to manage and maintain. The source of truth is not always clear, they are harder to debug, and require more documentation to use.
One business concept per activity means fewer models to manage, understand, and maintain
A single data layer makes tracing data provenance and debugging far easier
No joins between models means no need to tie disparate source systems together
Time-series modeling means incremental updates (rather than full rebuilds) by default
Changes to source data typically only affect a single activity
Fewer models, with one concept each, makes them vastly easier to document
Each activity represents a single concept (like a 'page view' or 'completed order'), so it's always clear which to use
A standard data model means that queries don't have to be written by hand
Time-based joins means any activity can be queried and combined with another without defining foreign keys
Because all activities are related in time, swapping one activity for another requires no structural changes to queries.
A standard data model means that any analysis can be reused across companies. A customer acquisition cost calculation for one company can be shared with another.
Queries run substantially faster against an activity stream table, which has fewer columns, requires fewer joins, and can be easily partitioned by activity or time