Jenga and the Art of Data-Intensive Ecosystems Maintenance
In EDA 2013, vol. RNTI-B-9, pp.5-6
Software maintenance amounts up to 60% of the resources spent on building and operating a software system. Data-intensive ecosystems that include several software applications tightly coupled to underlying data repositories cannot escape the above rule. In such environments, the impact of evolution is three-fold: (a) syntactical (meaning, that the evolution of either the data repositories or the software that implements the operational processes of the data warehouse can lead to operational failures and crashes due to some form of syntactic incorrectness), (b) semantic (meaning that changes in a view or a data transformation module can lead to different semantics for the propagated data), and (c) performance-oriented, as different configurations of the ecosystem's components (be it data or software) lead to different performance for its operations. As in all data-intensive ecosystems, the evolution of the software constructs of a data warehouse environment can severely affect its operations. In this talk, we focus on two aspects of the management of data warehouse evolution. On the one hand, we are interested in predicting the maintenance effort of ETL workflows. To this end, we present the findings of a case study on how a set of graph-theoretic metrics can be used for the prediction of evolution vulnerability for the components of ETL scenarios. On the other hand, we are interested in supporting the graceful evolution of the ecosystem's components and we present a method for the adaptation of ecosystems that assess the potential impact of a change and rewrite the ecosystem's components in order to adapt to the change.