viernes, 23 de noviembre de 2012

Spliting a vector layer

A user in the QGIS mailing list asked a couple of days ago about how to split a points layer in n given layers containing a fraction of those points. As this can be solved in many different ways, I thought it would be a good idea for a blog post and I am proposing a couple of alternatives here, of course all of them using SEXTANTE.

You can use any layer you want to test this examples. I will be using the elevations points layer in the Baranja Hill dataset.



Alternative 1

The first way we can split a layer into n new layers is by randomly assigning each feature to a group, with a total of n possible options, and then splitting the layer based on that group. The first step can be easily performed using the great Pyculator plugin, which is already available from SEXTANTE.

Double click on its name and fill the fields in the parameters window as shown below



Here we are creating 10 different groups, but you can enter the number you prefer or need.

There is no guarantee that the number of points will be equal in each group, but since the random values follow an uniform distribution, it should more or less be similar, which is enough for this purpose.

Here is the resulting layer rendered using a classified scheme based on the new field we have created in the layer, which represents the group it belongs to.



Now we can just split the layer using that field, creating a script algorithm to do so.

Here is the code of that algorithm

#Definition of inputs and outputs
#==================================
##[Example scripts]=group
##input=vector
##class_field=field input
##output=output file

#Algorithm body
#==================================
from qgis.core import *
from PyQt4.QtCore import *
from sextante.core.SextanteVectorWriter import SextanteVectorWriter

# "input" contains the location of the selected layer.
# We get the actual object,
layer = sextante.getObjectFromUri(input)
provider = layer.dataProvider()
allAttrs = provider.attributeIndexes()
provider.select( allAttrs )
fields = provider.fields()
writers = {}

# Fields are defined by their names, but QGIS needs the index for the attributes map
class_field_index = provider.fieldNameIndex(class_field)

inFeat = QgsFeature()
outFeat = QgsFeature()
inGeom = QgsGeometry()
nFeat = provider.featureCount()
nElement = 0
writers = {}

while provider.nextFeature(inFeat):
  progress.setPercentage(int((100 * nElement)/nFeat))
  nElement += 1
  atMap = inFeat.attributeMap()
  clazz = atMap[class_field_index].toString()
  if clazz not in writers:
      outputFile = output + "_" + str(len(writers)) + ".shp"
      writers[clazz] = SextanteVectorWriter(outputFile, None, fields, provider.geometryType(), provider.crs() )
  inGeom = inFeat.geometry()
  outFeat.setGeometry(inGeom)
  outFeat.setAttributeMap(atMap)
  writers[clazz].addFeature(outFeat)

for writer in writers.values():
  sextante.load(writer.fileName)  
  del writer


A few things to comment here:
  • There are no declared outputs. That is because we are not going to generate a vector layer, but several of them, and we do not know in advance the number of new layers to create, since it depends on the input layer and the number of classes it has.
  • We have an input of type file. That is the base filename we use for the layers to create. WE could have used a string input as well, but using a file input will make it easier for the user to browse in the file system and select the filename to use.
  • We are loading layers explicitly. This is not needed (and should not be done) if the output is declared, but in this case we have to do it in case we want the layers to be loaded. Once again, this is a problem of the semantics of the algorithm, which are not well-defined, since the number of outputs is not know at design-time.
  • We cannot use this algorithm in the modeler. The semantic problems make it impossible to add it to a model, since it has no declared output.
Now execute the algorithm an use the points layer with the random field as the file to split, and that random field as the one to use for splitting into classes.


You will get your set of new layers, and they will be loaded into QGIS.

We can make it even easier to split a layer. How can we put this two algorithms into just one, which takes the layer to split and the number of final layers we want to create? Again, a bit of Python scripting can be used to create a new script algorithm, which wraps the two algorithm that are involved in this operation.

I will leave this a as an exercise for the reader ;-)

Alternative 2.

A different way of dividing a later is by tiling it. We can do this using two algorithms: one to create a tiling of polygons and another one to clip the layer using each one of the polygons.

First, let's create an algorithm to do the tiling and create a set of rectangles covering the area of the layer to split. A small change  in the script that we used to create the hex grid of the last post, and we have it. Here is the code:

##[Example scripts]=group
##input=vector
##numpolygons=number 10
##polygons=output vector
from sextante.core.QGisLayers import QGisLayers

input = sextante.getObjectFromUri(input)
centerx = (input.extent().xMinimum() + input.extent().xMaximum()) / 2
centery = (input.extent().yMinimum() + input.extent().yMaximum()) / 2
width = (input.extent().xMaximum() - input.extent().xMinimum())
cellsize = width / numpolygons
height = (input.extent().yMaximum() - input.extent().yMinimum())
sextante.runalg("mmqgisx:creategrid", cellsize, height, width, height, centerx, centery, 1, polygons)


This creates n horizontal tiles, but it can be easily adjusted to create a different design.

Save the algorithm as Create_tiling_from_vector_layer.py, and run it setting a total of 5 polygons to be created.



This is the result you will get.



Now, we can use the Clip algorithm, but instead of using it for the whole layer, we will use it with the run iteratively option for the clipping layer. That will cause the algorithm to be executed once for each polygon, just like we want. Remember that to active the iterative execution you have to click the button with the green icon at the right-hand side of the box where you select the layer.



Here you can see the result, with 5 different layers, each one rendered with a different color.


Alternative 3.

Use the GRASS v.kcv module. Very practical, but not as fun as the examples above :-)

Alternative 4

Use lassplit from LASTools, also available from SEXTANTE. Not a very good option, since you need las files to work with it. The point layer to split should be exported to a las file , then processed, and then the resulting ones should be converted back to shp. This last step can be done with the las2shp tool, used through the SEXTANTE batc processing interface to ease the task.

Note:

I found a couple of bugs in the SEXTANTE Pyculator implementation and in the MMQGISX Create grid algorithm when using rectangles. They are already fixed, but only in the GitHub version, so will need it to run these example

3 comentarios:

  1. Buenas tardes,

    Necesitamos crear una submuestra de puntos aleatorios (de una capa de puntos) que sean entorno al 80% del total. Necesitamos tener un grupo de puntos para calibración y otro para validación. Si seguimos los pasos aquí mostrados hemos pensado en crear 4 grupos y usar 4 de ellos para calibración y dos para validación...nos puede recomendar otro método para hacer esto mismo?

    ResponderEliminar
  2. perdón, una confusión, queríamos decir 3 grupos para calibración y 1 para validación.

    ResponderEliminar
  3. Hi, I'm trying to run your algorithm in QGIS 2.12.1-Lyon but am getting the error:
    No module named sextante.core.SextanteVectorWriter See log for more details

    Any ideas how to fix?

    ResponderEliminar