(→Using NaturalEarth) |
(→Description of the file format) |
||
| (9 intermediate revisions by 3 users not shown) | |||
| Line 86: | Line 86: | ||
* Would benefit all vector formats, e.g. kml, ppx, shp | * Would benefit all vector formats, e.g. kml, ppx, shp | ||
| − | ==== Possible new file format ==== | + | ==== Possible new file format (Proposed by John) ==== |
Potential new Marble file format based on PNT: | Potential new Marble file format based on PNT: | ||
* 1st integer (32 bit): Latitude in arcseconds highest bit indicates new polygon starts: header information has to be read from 3rd integer | * 1st integer (32 bit): Latitude in arcseconds highest bit indicates new polygon starts: header information has to be read from 3rd integer | ||
| Line 97: | Line 97: | ||
* Roughly 533,202 x 8 bytes = 4 Mb for the country borders alone, not including internal border and coastline files | * Roughly 533,202 x 8 bytes = 4 Mb for the country borders alone, not including internal border and coastline files | ||
* If that's too much to ship, then ship the 1:50m dataset as the default and download the 1:10m dataset once online | * If that's too much to ship, then ship the 1:50m dataset as the default and download the 1:10m dataset once online | ||
| + | |||
| + | ==== Possible more size efficient new file format (Proposed by Torsten) ==== | ||
| + | |||
| + | For the Natural Earth Layer providing the Default data set at 0.5 arcminute resolution should be enough. | ||
| + | This fileformat allows for even better packed data than the PNT format. For detailed polygons at arcminute scale on average it should use only 33% of the amount used by PNT. | ||
| + | |||
| + | =====Description of the file format===== | ||
| + | |||
| + | In the fileformat initally a file header is provided that provides the file format version and the number of polygons stored inside the file. A Polygon starts with the Polygon Header which provides the feature id and the number of so called "absolute nodes" that are about to follow. Absolute nodes always contain absolute geodetic coordinates. The Polygon Header also provides a flag that allows to specify whether the polygon is supposed to represent a line string ("0") or a linear ring ("1"). | ||
| + | Each absolute node can be followed by relative nodes: These relative nodes are always nodes that follow in correct order inside the polygon after "their" absolute node. | ||
| + | Each absolute node specifies the number of relative nodes which contain relative coordinates in reference to their absolute node. So an absolute node provides the absolute reference for relative nodes across a theoretical area of 2x2 squaredegree-area (which in practice frequently might rather amount to 1x1 square degrees). | ||
| + | |||
| + | So much of the compression works by just referencing lat/lon diffs to special "absolute nodes". Hence the compression will especially work well for polygons with many nodes with a high node density. | ||
| + | |||
| + | The parser has to convert these relative coordinates to absolute coordinates. | ||
| + | |||
| + | =====File Structure===== | ||
| + | |||
| + | '''File header''' | ||
| + | |||
| + | * quint8: File format version | ||
| + | * quint32: Number of polygons contained in the file. | ||
| + | |||
| + | '''Polygon Header''' | ||
| + | |||
| + | * quint32: Feature id (either Natural Earth or Geonames). | ||
| + | * quint32: Number of parent node chunks to follow | ||
| + | * quint8: Flags: 1st bit: polygonIsClosed | ||
| + | |||
| + | '''Absolute node chunk''' | ||
| + | |||
| + | * qint16: Latitude in halfarcminutes (allowed range = [-10800;+10800 ] halfarcminutes = [-90;+90 ] degrees ) | ||
| + | * qint16: Longitude in halfarcminutes (allowed range = [-21600;+21600 ] halfarcminutes = [-180;+180 ] degrees ) | ||
| + | * qint16: Number of child node chunks to follow (equals "0" if there are no child nodes) | ||
| + | |||
| + | '''Relative node chunk''' | ||
| + | |||
| + | * qint8: Latitude-diff in halfarcminutes compared to the parent (allowed range = [-60;+60] arcminutes = [-1;+1] degrees) | ||
| + | * qint8: Longitude-diff in halfarcminutes compared to the parent (allowed range = [-60;+60] arcminutes = [-1;+1] degrees) | ||
==== Attribute Database ==== | ==== Attribute Database ==== | ||
| Line 109: | Line 148: | ||
to provide). This would allow look-ups via whatever code or ID is available, | to provide). This would allow look-ups via whatever code or ID is available, | ||
and we wouldn't be reliant on Geonames IDs staying constant or being online. | and we wouldn't be reliant on Geonames IDs staying constant or being online. | ||
| + | |||
| + | ==== Spatialite ==== | ||
| + | |||
| + | One option would be to integrate Spatialite and use this as both the data storage for the vectors and as the attribute database. Spatialite is an extension to SQLite implementing a Spatial SQL database. Among the feature this provides is a compact data storage format and the ability to import Shapefiles and CSV files, as well as access all the standard GEOS tools if installed. | ||
| + | |||
| + | There is a 20Mb zip file available for Natural Earth in Spatialite format, it is unclear how much of Natural Earth is contained in this. A minimal dataset could be shipped by default with the full dataset downloaded later. | ||
| + | |||
| + | Spatial SQL queries could return just those vectors currently in the viewport, but repeated reloading and redrawing could be inefficient. However this may also solve the Level-of-Detail problem. | ||
| + | |||
| + | The major downside is the dependencies which include SQLite, PROJ and GEOS so on a platform like Windows would require a larger monolithic binary which defeats the purpose of shipping slimmed down data. | ||
| + | |||
| + | More research is required here. It may not be a suitable option for the default Atlas view, but would be a very powerful extension for Marble to provide lightweight GIS-like functionality. | ||
=== Action Plan === | === Action Plan === | ||
| Line 114: | Line 165: | ||
A possible action plan is | A possible action plan is | ||
# Fix GeoPainter LinearRings which contain a pole not rendered correctly | # Fix GeoPainter LinearRings which contain a pole not rendered correctly | ||
| − | # Implement Douglas- | + | # Implement Douglas-Peucker reduction in GeoDataLineString |
# New PNT file format definition (with a different name, MBL?) | # New PNT file format definition (with a different name, MBL?) | ||
# Metadata file format definition | # Metadata file format definition | ||
# New GeoData PNT2 file loading code (convert old data). | # New GeoData PNT2 file loading code (convert old data). | ||
| − | # shp2pnt2 script to convert shp to new formats (using Perl::shp? there's | + | # shp2pnt2 script to convert shp to new formats (using Perl::shp? there's shp2xxx scripts out there we could copy?), including matching to Geonames ID |
| − | shp2xxx scripts out there we could copy?), including matching to Geonames ID | + | # split files into 'ship with', 'download asap', 'ghns' |
| − | # split files into 'ship with', 'download asap', 'ghns' | + | |
Later add simple shapefile loading to GeoData, maybe with attibute layer? | Later add simple shapefile loading to GeoData, maybe with attibute layer? | ||
Contents |
Marble currently uses the very old and outdated MWDBII dataset for its vector base map such as national borders and coastlines and we really need to replace it with more up-to-date data. However, MWDBII has two key advantages, it is very compact in size enabling Marble to ship it by default, and the individual nodes have a zoom level value which speeds up drawing.
The current vector layer also has the disadvantage of not being able to be manipulated either programtiacally or by the user. This prevents it from being used for such things as KGeography or other educational uses where you would want to select and manipulate a geographic entity.
Improving the vector base maps would thus consist of two closely related parts:
The Natural Earth data set is a "public domain map dataset available at 1:10m, 1:50m, and 1:110 million scales. Featuring tightly integrated vector and raster data, with Natural Earth you can make a variety of visually pleasing, well-crafted maps with cartography or GIS software." This data set seems ideal as a replacement for the MWDBII.
Advantages:
Disadvantages:
The 1:10m dataset seems ideal as the base map in Marble as it provides a higher level of detail than the current MWDBII. The 1:110m dataset seems ideal for use in a country selector widget in kdelibs. The 1:50m dataset is less detailed than the current MWDBII so may be less useful.
Using the data in the default shapefile format is not considered desirable however:
The ideal solution would seem be to convert the Natural Earth data into a more efficient file format and either calculate and store the zoom level attribute in the file or calculate it on file load. The full Natural Earth dataset would be converted, but would only ship the minimal dataset required with Marble (approx 4-5Mb?) with the remainder of the data later being downloaded via GHNS or as a separate package.
The main changes required to Marble will be in the vector layer itself, removing the old PNT file vector drawing code and implementing the new dataset using the new GeoData library vector support.
Two main issues will need to be solved here
Some possibilities for the file format are:
The zoom level problem can be solved by either:
The Douglas-Peucker algorithm may be able to be used here.
Some pros/cons to consider:
bytes/point compared to the PNT which is 745KB and contains 127,246 points = 5.85 bytes/point, which would suggest the NE data in PNT format would be half the size, so 6 MB in total. This could probably be further reduced by a light application of Douglas-Peucker.
overlapping features like rivers match exactly and other such niceties, applying the Douglas-Peucker algorithm might affect that.
release which could be a lot of effort, but an automated shp2pnt script could prove useful to allow apps/users to display their own shapefiles in a simple way.
Using GeoPainter and GeoDataLineString ("libgeodata"):
Potential new Marble file format based on PNT:
Applying this to the 1:10m dataset:
For the Natural Earth Layer providing the Default data set at 0.5 arcminute resolution should be enough. This fileformat allows for even better packed data than the PNT format. For detailed polygons at arcminute scale on average it should use only 33% of the amount used by PNT.
In the fileformat initally a file header is provided that provides the file format version and the number of polygons stored inside the file. A Polygon starts with the Polygon Header which provides the feature id and the number of so called "absolute nodes" that are about to follow. Absolute nodes always contain absolute geodetic coordinates. The Polygon Header also provides a flag that allows to specify whether the polygon is supposed to represent a line string ("0") or a linear ring ("1"). Each absolute node can be followed by relative nodes: These relative nodes are always nodes that follow in correct order inside the polygon after "their" absolute node. Each absolute node specifies the number of relative nodes which contain relative coordinates in reference to their absolute node. So an absolute node provides the absolute reference for relative nodes across a theoretical area of 2x2 squaredegree-area (which in practice frequently might rather amount to 1x1 square degrees).
So much of the compression works by just referencing lat/lon diffs to special "absolute nodes". Hence the compression will especially work well for polygons with many nodes with a high node density.
The parser has to convert these relative coordinates to absolute coordinates.
File header
Polygon Header
Absolute node chunk
Relative node chunk
Metadata file:
Rather than the Geonames ID, we could just use the Natural Earth object ID, then a look-up file/table that matches the NE ID to the ISO / FIPS / whatever code (NE provides this in the metadata) and Geonames ID (which we would have to provide). This would allow look-ups via whatever code or ID is available, and we wouldn't be reliant on Geonames IDs staying constant or being online.
One option would be to integrate Spatialite and use this as both the data storage for the vectors and as the attribute database. Spatialite is an extension to SQLite implementing a Spatial SQL database. Among the feature this provides is a compact data storage format and the ability to import Shapefiles and CSV files, as well as access all the standard GEOS tools if installed.
There is a 20Mb zip file available for Natural Earth in Spatialite format, it is unclear how much of Natural Earth is contained in this. A minimal dataset could be shipped by default with the full dataset downloaded later.
Spatial SQL queries could return just those vectors currently in the viewport, but repeated reloading and redrawing could be inefficient. However this may also solve the Level-of-Detail problem.
The major downside is the dependencies which include SQLite, PROJ and GEOS so on a platform like Windows would require a larger monolithic binary which defeats the purpose of shipping slimmed down data.
More research is required here. It may not be a suitable option for the default Atlas view, but would be a very powerful extension for Marble to provide lightweight GIS-like functionality.
A possible action plan is
Later add simple shapefile loading to GeoData, maybe with attibute layer?
Key Natural Earth data files from v1.2, recent updates to 1.3 not included.
1:110m 1:50m 1:10m
------ ------- -------
Admin level 0 countries 172 KB 1.36 MB 6.55 MB
Admin level 0 land borders 39 KB 301 KB 896 KB
Admin level 0 sea borders 12 KB 40 KB 79 KB
Admin level 0 disputed 40 KB 157 KB
Admin level 1 regions 39 KB 339 MB 13.9 MB *
Admin level 1 land borders 16 KB 60 KB 4.82 MB
Coastlines 79 KB 883 KB 2.15 MB
Rivers 19 KB 420 KB 3.29 MB
Lakes 10 KB 286 KB 786 MB
Glaciers 13 KB 208 KB 1.23 MB
Dateline 18 KB 18 KB 18 KB
Playas 18 KB 106 KB
Ice Shelves 105 KB 211 KB
Minor Islands 449 KB
Reefs 171 KB
------- ------- ---------
417 KB 4.08 MB 34.03 MB
* level 1 regions are USA/Canada only at 110m and 50m, but whole world at 10m,
perfect for KGeography use :-)
Other useful files:
Physical Features Land 146 KB 1.50 MB 692 KB
Physical Features Sea 348 KB 836 KB 836 MB
Populated Places 347 KB 1.48 MB
Urban Areas 439 KB 3.48 MB
Bathmetry 11.64 MB
sovereign states. Includes dependencies (French Polynesia), map units (U.S. Pacific Island Territories) and sub-national map subunits (Corsica versus mainland Metropolitan France).