I designed my first database in the late 1960s. In those days, they were known as hierarchical databases consisting of a number of separate files. There was no metadata (an IT buzzword meaning “data about data”) providing any relationship between files and the relationships were only implied in the commands set out in the computer languages. Some languages, such as Algol (Algorithmic Language) were designed with this usage in mind, though other scientific languages such as Fortran (formula translation) were also popular.
A huge breakthrough came in 1970 when Ted Codd, a research scientist at IBM conceived the idea of a database structure based on predicate logic (an extension of mathematical ideas put forward by George Boole in the 19th century). In essence, one can extract data from multiple tables by specifying that items of data in different tables are related by statements such as “equal to”, “not equal to”, “greater than”, “less than” and so forth. Unfortunately, it took until about 1978 before ordinary computers were large enough and fast enough to process these relationships. Eventually, however, SQL (Structured Query Language) was created and became the most widely used database programming language for the next 30 years.
We are now at the cusp of a new phase in database design. A great deal of current data is not in the alphanumeric forms that we are used to and which are the basis of SQL. With the advent of the internet, mobile phones and other technologies, a lot of data consists of pixels (e.g. pictures), waves (recorded conversations) and other esoteric formats. These could actually be stored in SQL databases as “binary large objects” (BLOBS), but the only processing available was storage and retrieval. The term “Big Data” has been coined to describe complex data which requires purpose-written procedures (usually called “functions” in database-speak) to extract meaning from it.
Another major development is a concept called “cloud computing”, where all of the data and database programming capability is contained in a huge collection of servers (the “cloud”) and the only functionality required on user-facing computers is to, in effect, ask a question and receive an answer. The advantage of this architecture is that data of different kinds, stored in different structures and managed by different database management software can all be accessed and cross-referenced as though it were all in a single database.
The database concepts to be discussed here will hopefully be able to take advantage of these new developments and many others which are not yet in widespread use (or even imagined!).
2. CORE DATABASE STRUCTURE.
Sustainable population is but one facet of the climate change issue and any analysis of this topic is going to require research and data collection on a great number of fronts, conducted over a substantial period of time. The analytical model must therefore be capable of (1) providing a platform which can accommodate all sorts of disparate data (some of which we have not even begun to imagine yet) and (2) cross-referencing that data in order to draw the appropriate conclusions.
As stated in earlier posts, the model for the determination of the populations that can be supported by various patterns of land usage is built upon a map of Australia overlaid by a grid where each cell is what we will call a basic land unit, or BLU. Each BLU will have a unique RID (record ID) and all data pertaining to that BLU, whether collected in the field or resulting from research will carry this RID. Then, data from different research projects within the same BLU can be compared and amalgamated, as can similar data from different BLUs. In addition, data from a block of adjacent BLUs can be amalgamated and laid over a map of Australia to provide a more comprehensive body of data from which policy decisions can be made.
The only data to be stored in the core database will be location information for each BLU and data defining the physical features of the BLU (rivers, mountains, etc.) All other data pertaining to an investigation will be stored in other databases, linked to the core database through the various BLU RIDs.
The specification for a BLU is a set of four GPS locations. Two are located on a specified lower line of latitude and two on an upper line of latitude. The two locations on the west side are located on a single line of longitude, as are the two on the east side. All BLUs north and south of any nominated BLU will share the same lines of longitude, but the north-south distance between the lines of latitude will vary, so that the areas of all BLUs will be the same. An area of 1 sq. km has been proposed. For convenience or because of limitations in the accuracy of GPS measurements, this uniformity may not be achievable, in which case, the area of the BLU should be included in the database record, with the values of the corner GPS values.
The remaining tables in the core database will identify physical data, such as rivers, mountains, gorges, ridges, escarpments and the like. There will be a table identifying all structures of interest and each structure will be defined by one or more points, each identified by its GPS location. Some points will be standalone, while others will be linked together in sequences which we will call chains. Some points, such as mountain peaks and river beds will have altitude and depth values attached to them, the latter being a measure of depth below the surface or height above it. Where a feature such as a river is represented by chains of points, the order in which they are stored indicates the downstream direction.
Here are the specifications for structure types currently identified.
River. Three chains of GPS locations. The first represents the LHS boundary of the river, facing downstream. The second represents the RHS boundary similarly. The third chain runs down the middle of the river and each point has an altitude value attached to it, indicating the fall of the river from its source, or the point where it enters the BLU to the point where it leaves the BLU, plus a depth value, indicating the depth of the river at that point.
Escarpment. A single chain of points, with each point having an attached altitude, indicating the local high or low point. The order of the points in the chain is such that the RHS represents the steep side of the escarpment.
Hill. A single point with an attached altitude.
Chain of Hills. A single chain of points, similar to an escarpment, but without the implicit interpretation of a steep or shallow slope.
Lake. A chain of points, where the last point is joined back to the first. Deep points in the lake may be represented by individual points, each with an attached altitude.
Island. This is similar to a lake in format, but may contain other structures such as hills within it.
Seaboard. Similar to a lake but the chain remains open.
Data Point. A single point located anywhere in the BLU, possibly with attached altitude or height/depth values, or a research project identifier. Such points may be members of sets defining structures or may be discretionary items used to provide reference points in research projects, etc.
The structure names and types will be stored in a lookup table, so that new types can be added without modifying the main database tables..
The core database table structures are set out in the file CoreDatabaseTables.
The small size of BLU is deliberate, so that features which are limited in size, but are possibly important (e.g. habitats) can be accurately located. Very small areas can be defined by chains with appropriate identifiers. For some projects it may be convenient to aggregate BLUs into larger elements, which we will call Extended Land Units (ELUs). An ELU is defined as a set of symmetrically connected BLUs. Hence, An ELU2 will consist of 2 x 2 BLUs, an ELU3 will consist of 3 x 3 BLUs and so forth. An additional table can be created, quoting the GPS values of the corners held in common. As with a BLU, the GPS value of the south-east corner may constitute a unique reference (though ELUs may also carry their own names, to facilitate discussion).
Note that no data other than physical structure data is stored in the core database. Vegetation, settlements and the like are objects for research and will be stored in their own databases, linked to the core database with references to the RIDs at the appropriate structural level. The data in the core database will always represent the current or basal situation. Thus, intended modification of the surface (such as a deep excavation for an open-cut mine) would be represented within a research database which was essentially a copy of the core database. Then, if the modification did take place, its definition would be incorporated in the core database, while the records of the situation pertaining before the modification would be transferred to an archival database. This latter could then be used to track changes over time.
3. MANAGEMENT OF THE CORE DATABASE.
The intention is to provide maximum public access to the system, so that any person or organization can contribute (via their own database, but linked to the Core database) information which they may feel to be of use. For instance, it would be possible to conduct counts of birds, animals and plants for biodiversity research. Most of this work is conducted by amateurs, through school projects, interest clubs and the like. However, it is also open to corporations, farming and other business organisations to conduct business-oriented research which is of a confidential nature and link it to the Core database. This will provide a context for any information that they subsequently publish.
Management could be carried out by a suitable organization (the CSIRO, perhaps) or an academic committee. The importance of the data, particularly in the context of climate change, is such that there must be no risk of its being modified merely to serve commercial interests (though one might justify making land usage decisions which avoid significant damage to the interests of groups such as agriculturists in general). Representatives of commercial corporations could serve in a committee of management, provided that their interests were known and they excused themselves from decision-making which affected their interests directly.
4. THE LAND USAGE AND SUSTAINABLE POPULATION RESEARCH PROJECT.
In the most general sense, land usage evaluation is but one of the many lines of research and documentation that can be attached to the database (and thereafter become available for examination in relation to other attached endeavours). However, it will be discussed in more detail here because the raison d’etre for setting up the database was to facilitate research into the consequences of increasing population densities on a national scale. The ability to absorb greater numbers of people depends upon identifying (1) locations of new settlements and settlements capable of expansion, (2) the availability of sufficient food-producing land to support the population of each settlement and possibly provide income from external sales, (3) the availability of resources such as power and water and (4) suitable positioning of transport infrastructure, broadband towers and the like.
In designing the land usage database, it is assumed that each land usage is defined by a set of parameters that are totally unique in type and content. This would seem to imply that each land usage would need to be stored in its own table. However, the amount of programming effort to service this structure and make comparisons between different land usages would be huge and inflexible. To facilitate the addition of new land usages and to standardise the methods of access to the data, the following methodology is proposed.
The Observations table holding the collected or computed data which will determine the land usage, will actually consist of a set of fields of different data types, with anonymous names, such as Integer-1, JPG-1 (a picture or graph), Text-1 and so forth. New fields, similarly named, can be added to this table without disturbing the labelling or content of existing fields. Records will be identified by the BLU, Structure, Chain and DP RIDs and a land usage code.
The data in the Observations table will be interpreted by a series of other tables which contain the necessary metadata. This metadata is used to drive the functions which will extract and process the data from the Observations table.
The first metadata table lists all possible land usages and for each usage maps each relevant anonymous field name in the Observations table to a parameter name which is meaningful in the definition of that particular land usage. This will allow computation formulas to be expressed in a meaningful way. Remember, however, that a particular field in the Observations table may have entirely different meanings in relation to different land usages and other fields may remain unused.
The second metadata table set out profiles for practical land usage, listing the values of the various parameter ranges or single values which must be satisfied for a particular land usage to be feasible. Multiple profiles may be stored, defining optimal and sub-optimal usages. This is necessary, because (for instance) a tract of land adjacent to or within a settlement may be useful in a number of different ways, but not all usages will be optimal at the same time. By looking at different combinations of usage, the overall best use the land can be computed.
The Results table stores the profile data which matches the data in the Observations table and assigns rankings so that different combinations with different degrees of optimality (is that a real word?) can be compared and decisions made to meet the criteria required by a particular land usage. For example, a railway construction project would obviously be looking for land usages where food production usages would be significantly sub-optimal, but the underlying land could easily be excavated and compacted.
The data in the Observations table will be derived from research projects or field data gathering exercises which will be registered in their own right, so that the data can be used in projects other than that examining land usage (e.g. water supply planning).
In posting this document, I must stress that my proposals do not provide a complete scenario for sustainable population research. They should be considered as an outline or sketch to which I hope that many other people will provide detail or colour. After a 30-year career in IT, I am fairly sure that I have got the basics right, but I will readily concede that the technical and conceptual aspects of database design have moved on and that inputs from others more in tune with today’s marketplace will almost certainly be required.
With the increasing world population and degradation of food-growing areas, there is going to be huge demand in the future for research into methods of improving food production and making desert areas fertile again. The relatively small element size of the BLU makes it more likely that we can focus in on small areas which lend themselves to experimentation, but which could then be applied to other areas anywhere in the world. This knowledge could be sold or donated as appropriate to other countries, such as those in the Sahara region, Chile, the Middle East and Central Asia. The presence of food-growing capability however modest, would help to moderate the temptation to migrate.
Another idea is to look for hot spots of water supply in really arid areas and use them to develop what are in effect oases. In a very old version of that excellent ABC rural affairs program ‘Landline’, an article dealt with a fellow who replaced his glasshouse, shattered by hail, with a huge plastic structure where all aspects of the environment were computer-controlled. He was able to get six harvests of legumes and tomatoes every year and, if I remember rightly, used only about 60,000 litres of water which he constantly recycled.