Ontology Driven Conceptualization of Context-Dependent Data Streams and Streaming Databases

. Heterogeneous stream formats, related contexts, vocabularies and schema structures are key difficulties to facilitate sharing and extracting knowledge from stream databases. To resolve these heterogeneities, the key challenge is how to provide common semantic representation for context-dependent data stream formats along with streaming databases. To address such issues, this paper proposes an ontology driven formal semantics of context-dependent data streams together with a universal conceptualization of streaming databases. The novelty of this work is to handle heterogeneity, large volume and availability of streaming data, such as web content, commercial broadcasting data etc. It also facilitates to recognize evolving information from semantic representation of data streams at conceptual modelling level. Besides, the proposed conceptual model is flexible to represent finite partition of stream and thus help in data stream storing and further querying. The conceptualization is implemented using an ontology editorial tool Protégé for the initial validation of proposed set of formal semantics. Several crucial properties of the proposed conceptualization are specified in order to exhibit the benefits of the proposed work. The expressiveness of proposed model is illustrated using a suitable case study.


Introduction
In recent years, with the advancement of information and web technology, several applications need to work with continuous data generating processes.Those data are dynamic, time sensitive and continuous in nature.Data generated from Web-clicks, network monitoring, commercial broadcasting, sensor nets and stock quotes are few examples of such data [7].These types of data are considered as a stream (data stream) rather than static snapshots [6].Distinct Data Stream Management Systems (DSMS) are developed for processing and analysis of these data streams.Those DSMS are built due to limitations of traditional data management systems towards managing distinct data streams [2].Hence, a well-organized model of data streams is the key requirement for proficient management of those data streams by DSMS.However, data streams have several exceptional characteristics, which make them difficult to model.Firstly, a data stream is usually defined as "an unbounded sequence of values continuously appended, each of which carries a time stamp that typically indicates when it has been produced" [6].Secondly, in different applications these continuous data are represented in different ways such as a discrete signal, an event log or a combination of trained series [15].Thirdly, rapid changing of underlying contextual information of data streams generated in diverse domains has serious consequences in deriving useful decisions from complex real time applications [4].Fourthly, distinct back-end databases ranging from strict schema-based (for example Relational Databases) to flexible schema based (for example NoSQL Databases) are used to store theses data streams in structured, semistructured or unstructured way.Finally, a fixed or flexible finite partition, called window, are made from this continuous unbounded sequence while streams have to be stored or retrieved from databases [11].Hence, several challenges exists in efficient modelling of data streams in order to facilitate sharing of information related to data streams across different applications and DSMS.Starting with, how to represent common description of heterogeneous data streams semantically and syntactically.Secondly, how different surrounding contexts (contextual information) of data streams are represented in a uniform way.Thirdly, how evolving contexts of data streams can be recognized so that realization of dynamically added contextual information towards data streams is achieved efficiently.To handle these issues, ontology will be beneficial.
The key reason for applying ontology is that it can establish consensus on unifying conceptualization of heterogeneous data stream formats and related contexts.Ontology is defined as a formal, explicit specification of shared conceptualization in terms of concepts, relationships present between those concepts and related axioms [8].
Existing research works, primarily, focus on semantic representations of resources and devices producing data streams.However, less attention is paid towards uniform semantic representation of distinct context dependent data streams and further heterogeneous streaming databases.In [1,3,7,15], authors have described abstract semantics of streams.Authors in [7] have described an extensible framework that facilitates experimenting with different algorithms related with data stream mining tasks.In [2,11], authors have described powerful operator algebra for data streams.Both of these approaches have facilitated in supporting multiple query languages and data models.Semantic Sensor Network (SSN) ontology [5] represents a high-level general schema of sensor systems.IoT-A and IoT.est [12] provide architectural base for utilization and representation of domain knowledge in sensor networks with some services and test concepts.The Observation & Measurement (O&M) description of sensory Data are described as a part of Sensor Web Enablement (SWE) standards from the Open Geospatial Consortium (OGC) [9].However, this description is based on XML (Extensible Markup Language) which has weak semantic structure for expressing and describing stream data ontology in more detail.Through approaches regarding Semantic Sensor Web (SSW), context information such as time, space is added with sensors.However, these approaches are mostly specific to certain domain and thus are not in high-level abstraction [13].Besides, none of these approaches has explored the representation of contextual information related to data streams and streaming databases.
To address aforementioned challenges regarding modelling of data streams, an effort has been made in this paper to provide precise semantics towards data streams, related contexts and streaming databases.For this purpose, an ontology driven conceptualization of data streams along with its related context is devised.The novelties of the proposed ontology driven conceptual model are many-folds.The proposed conceptualization efficiently deals with generic semantics towards modelling of variety of data streams, resources producing those data streams and streaming databases.It further facilitates in sharing and preserving strong interoperability in heterogeneous applications and DSMS.Next, the proposed conceptualization aids in recognizing static and evolving contextual information related to data streams along with a set of distinct relationships.This essence of context sensitivity approach helps in reducing search spaces during the time of querying on data streams.Besides, the proposed conceptual model may assist in future in the extraction of new knowledge from data streams since it is ontology driven and hence based on Open World Assumption (OWA) [8].Moreover, it has also provided discreetness in continuous streaming by representing finite, indefinite, fixed and flexible partition of data streams.

Proposed Ontology Driven Conceptual Model for Context-Dependent Data Streams
The proposed conceptual model formalizes a common set of constructs and relationships for conceptualization of context-dependent data streams and streaming databases.
The proposed model comprises of three interrelated layers (Collection, Family and Stream) and their identifiable construct types.Besides, the constructs are related with each other using different relationships.The proposed model is specified axiomatically using both first order and higher order logic to represent semantics of data-stream constructs and their interrelationships.The key constructs and distinct relationships of the proposed model are specified in Fig. 1.In this figure, Collection, Family and Stream layers are represented using shapes of rectangle, rounded rectangle and oval respectively.Details of the proposed model are specified in following sections.(b) Family Layer: It is the intermediate layer of the conceptual model.Families are main identifiable constructs of this layer.This layer may be composed of number of levels to reflect the fact of continuous encapsulation of data.Further, the lowest level of Family layer may be combined of semantically related data streams and its contexts.
Explanation: Here, FAllev, FAulev and FAlev are denoted as Families in the bottommost level, in the top-most level and in any level respectively.IcntFA is Inverse Containmnet and CntFA is Containment relationship.Later, in section 2.3 axiom F17 formalizes Inverse Containment relationship.Further, primary_context() and auxil-iary_context() are predicates representing Primary Contexts and Auxiliary Contexts of Stream.Related axioms are specified in axioms F3, F4, F5 and F6.
(c) Stream_Context Layer: This is the lower-most layer of the proposed conceptual model.Data-Streams may be represented in this layer formally.Data stream is an indefinite ordered sequence of data points, each of which carry a time stamp.These data points can be ranged from structured to unstructured type.Besides this, these data points may be related with precise contextual information that are useful to characterize the features of streams which are necessary in order to interact between users and applications.Detailed formalizations of Stream_Context are specified in section 2.2.

Conceptualization of Stream Context
Stream Context is precise information useful to describe data stream, and its surrounding concepts.
Explanation: Here, number_of_data_value() is a predicate implying that a datastream may be infinite; data_value() and time_stamp() are predicates implying data values and their corresponding time stamps respectively; □ operator implies mandatory participation of the argument and Union_of() is a predicate implying the union of arguments.Axioms F4 and F6 formalize the Primary and Auxiliary Context respectively.
(a) Primary Context: This represents basic information about data stream.Basic information of data stream mandatorily includes the data value at a specific time.The data value and its related specific time collectively can be called as a Frame. F4: Explanation: Here, starting_time() and ending_time() are predicates implying start time and end time of respective arguments.Further, predicate existence() implies the existence time duration of the argument.
(b) Auxiliary Context: This context provides additional information relevant for Primary Context.For example, let assume humidity sensor generates data stream of humidity values.Then location may be an auxiliary context related to the primary context humidity.Auxiliary Context can be of several types as specified below. F6: Explanation: Here, pair() is a predicate implying the pairing of the respective Auxiliary Contexts.
(i) Segment Context: Segment represents a finite partition of the data-stream containing ordered sequences of Primary Contexts when the stream is to be going to store in a database.This size of partition may be fixed or flexible depending on the number of instances of a Frame.Axiom F5 formally represent Frame.Axioms related to Segment Context are specified below.These proposed Auxiliary Contexts are of minimal set.More distinct Auxiliary Contexts may be appended towards Primary Contexts based on design demand.Hence, proposed conceptualization realizes both static and evolving contextual information.
(c) Finite Partition of Data Stream: Segment has represented the finite partition of infinite data-streams for storing data-streams in database.Similarly, for the retrieval purpose another finite partition of data-streams can be defined as a Window.The size of Window may be fixed or flexible depending on numbers of instances of time stamps.Later, different data-stream query operators can be defined on this Window.The axiom of Window is as follows.

Relationships in Proposed Conceptual Model
Distinct constructs of proposed conceptual model are interrelated.These relationships can be of two types -Inter layer and Intra layer [14].Inter-layer relationships can be between dissimilar construct types of three different layers.Intra-layer relationships can be between similar construct types of identical layer.Different relationships may be present within a data stream, data stream and its related contextual information, and in the layer hierarchy of streaming databases.These relationships may be Containment, Inverse Containment, Has_auxiliary_Context, Reverse_has_auxiliary_context, Sequence, and HasTime.Former two are of Inter-layer and Intra-layer kind of relationship and the rest all are of Intra-layer kind of relationship.(a) Containment (Cnt): Containment relationships can be present between two construct types when one encapsulates similar or different types of constructs.
(f) Has Time (HT): This relationship represents the connection between data value and its existence time stamps.The axiom of this relationship is Explanation: Here, Tm is a set of timestamps.The proposed conceptual model thus represents formal and universal vocabularies of context-dependent data streams and streaming databases.Using the axioms of stream layer, semantics of data stream and its associated heterogeneous context is specified.Likewise, through the entire layer hierarchy the proposed model is capable to represent common conceptualization of different streaming databases ranging from strict to flexible schema based.Thus, the proposed conceptualization deals with heterogeneity issue of data streams.Further, the proposed conceptual model is in high level abstraction.
Hence, representation of large volume of data streams can be managed using this proposed conceptualization efficiently in conceptual level.Besides, using Has_auxil-iary_Context and Inverse Containment relationships dynamically added contextual information towards the domain have been recognized.In this way, the proposed conceptualization may facilitate in future in deriving new knowledge from data streams.Further, using Sequence relationship the rapid availability of data points towards data stream is realized.Furthermore, Segment and Window partition has facilitated in realizing discreteness among continuous stream.Moreover, the proposed conceptualization model is flexible as it provides flexible finite size towards data-streams using Segment and Window.It has also recognized the communication and available sequences between Segments or Windows of similar or multiple data-streams.Several other related crucial features are described in section 5.

Protégé Implementation of the Proposed Model
The proposed meta-model has been implemented in this section using OWL (Web Ontology Language) based ontology editorial tool Protégé [10].Protégé facilitates representation of formally expressed axiom set of this proposed conceptualization towards OWL logic.It is composed of a number of reasoners for automated inference on ontological theory expressed in OWL logic.OWL is based on Description Logic.

Illustration of the Proposed Conceptual Model
Let, an application is aimed to determine whether a car-driver is relaxed or stressed when the driver has to drive in a predefined route from one starting point to a specific destination and return to the starting point within predefined time duration.Besides, drivers are warned about the remaining time to reach the destination.Five sensor signals -Heart Rate (HR), Finger Temperature (FT), Respiration Rate (RR), Carbon-di-oxide (CO2) and Oxygen Saturation (SpO2) have been recorded.This case study has been adopted from [4].
In this case study, all five sensor signals are of stream data Heart Rate (HR) sensor's primary context is recorded heartbeat.Heartbeats have specific values in a specific time.Besides, Heart Rate is dependent on particular location of the driver.Similarly, other sensors' recorded data have specific values in specific time.Further, all of them have auxiliary contexts.Such as Heart Rate has auxiliary contexts location, age, weight etc.According to this case study, the driver's recorded sensor data will be a Collection.Driver will be a Family.Further, driver has five Primary Contexts -Heartbeat, Finger Temperature, Respiration Rate, Carbon di oxide and Oxygen Saturation.Nomenclatures of key elements in the case study are represented as, (i) Collections are in "bold" letters; (ii) Families are in "italics" letters; (iii) Primary Stream Contexts are in "small" letter cases; and (iv) Auxiliary Stream Contexts are in "CAPITAL" letter.In this section, the case study has been implemented using the ontology editorial tool Protégé.Key constructs of the case study have been mapped towards Protégé as specified in section 2.  (i) Abstraction and Reusability: Proposed conceptualization is in high-level abstraction due to representation of data streams independent of any domain.Hence, it is reused in large numbers of domain.
(ii) Adaptability: Proposed conceptualization is able to recognize evolving contextual information using Reverse_has_auxiliary_context.Thus, it is adaptable towards changing surrounding environment.
(iii) Flexibility: Using this proposed conceptualization bounded, unbounded, fixed and flexible partition of bounded sequence of data streams are represented through Frame, Segment and Window.In this way, the proposed conceptualization is flexible.
(iv) Interoperability: With the aid of generic formal semantics, proposed conceptualization provides interoperable uniform representation towards heterogeneous data streams and streaming databases.
(v) Productivity: The proposed conceptualization is productive as through this specification compatibility among different heterogeneous data streams, streaming databases and applications can be maximized.
(vi) Context Sensitivity: The proposed conceptualization is able to recognize related contextual information of both data streams and resources.Thus, the proposed model is context sensitive.This further facilitates validation and analysis of data streams.

Conclusion and Future Work
The paper has proposed an ontology driven common semantics towards context-dependent data streams and heterogeneous streaming databases.The objective of the proposed work is to model data streams and related contexts in a uniform way so that strong interoperability can be sustained among heterogeneous applications utilizing data streams.The novelty of the proposed ontology driven conceptualization is to support in realization of continuous temporal nature, static and evolving contexts related to data streams, homogeneity in heterogeneity formats, and rapid availability of data streams.The proposed conceptualization is capable to provide generic semantics towards contents of data streams and resources producing those data streams.Further, the proposed conceptualization is flexible enough to represent discreteness within infinite data streams and provide choices of fixed or variable partitions of data streams for storing and retrieval purpose.In this way, the proposed conceptualization may facilitate in future in deriving knowledge and decisions from data streams.
Future work will include semantical validation of the proposed ontology driven conceptualization of stream data.Further, ontology driven formal specification of a query language for retrieval of data streams is another important future work.

Fig. 2 .
Fig. 2. Ontological graph of the proposed Conceptual Model using OntoGraf plug-in in Protégé Fig 3 is displaying the partial ontology graph of this case study showing only heart rate stream along with its auxiliary contexts.The graph is obtained through OntoGraf plug-in of Protégé.

Fig. 3 .
Fig. 3.The partial ontological graph displaying Primary Context of Heart Rate using Auxiliary Context and obtained through OntoGraf plug-in in Protégé

Table 1 .
Each Primary Mapping from Proposed Conceptual Model towards Protégé Context has values and related time stamps.Besides, all are related to Auxiliary Context such as location, body size.Key elements of this case study have been listed below.