Watching a directory for file changes with Python

requirement

Watch changes in a ftp folder, whenever a new xml file is created, or when an existing file is modified this needs to be parsed and its contents inserted in the database.

tools

  • Python 2..7
  • watchdog

Install from pip

pip install watchdog

Watchdog is a Python API library and shell utilities to monitor file system events.

How to

First create the monitoring script, it will run daemonized and will observe any changes to the given directory. In that script 3 modules/classes will be used

  • time from Python will be used to sleep the main loop
  • watchdog.observers.Observer is the class that will watch for any change, and then dispatch the event to specified the handler.
  • watchdog.events.PatterMatchingHandler is the class that will take the event dispatched by the observer and perform some action

watch_for_changes.py

import time  
from watchdog.observers import Observer  
from watchdog.events import PatternMatchingEventHandler  

PatternMatchingEventHandler inherits from FileSystemEventHandler and exposes some usefull methods:

Events are: modified, created, deleted, moved

  • on_any_event: if defined, will be executed for any event
  • on_created: Executed when a file or a directory is created
  • on_modified: Executed when a file is modified or a directory renamed
  • on_moved: Executed when a file or directory is moved
  • on_deleted: Executed when a file or directory is deleted.

Each one of those methods receives the event object as first parameter, and the event object has 3 attributes.

  • event_type
    'modified' | 'created' | 'moved' | 'deleted'
  • is_directory
    True | False
  • src_path
    path/to/observed/file

So to create a handler just inherit from one of the existing handlers, for this example PatternMatchingEventHandler will be used to match only xml files.

To simplify I will enclose the file processor in just one method, and I will implement method only for on_modified and on_created, which means that my handler will ignore any other events.

Also defining the patterns attribute to watch only for files with xml or lxml extensions.

 class MyHandler(PatternMatchingEventHandler):
    patterns = ["*.xml", "*.lxml"]

    def process(self, event):
        """
        event.event_type 
            'modified' | 'created' | 'moved' | 'deleted'
        event.is_directory
            True | False
        event.src_path
            path/to/observed/file
        """
        # the file will be processed there
        print event.src_path, event.event_type  # print now only for degug

    def on_modified(self, event):
        self.process(event)

    def on_created(self, event):
        self.process(event)

With the above handler only creation and modification will be watched now the Obserser needs to be scheduled.

if __name__ == '__main__':
    args = sys.argv[1:]
    observer = Observer()
    observer.schedule(MyHandler(), path=args[0] if args else '.')
    observer.start()

    try:
        while True:
            time.sleep(1)
    except KeyboardInterrupt:
        observer.stop()

    observer.join()

You can set the named-argument "recursive" to True for observer.schedule. if you want to watch for files in subfolders.

That's all needed to watch for modifications on the given directory, it will take the current directory as default or the path given as first parameter.

python watch_for_changes.py /path/to/directory

let it run in a shell and open another one or the file browser to change or create new .xml files in the /path/to/directory.

echo "testing" > /tmp/test.xml 

Since the handler is printing the results, the outrput should be:

rochacbruno@~/$ python watch_for_changes.py /tmp
/tmp/test.xml created
/tmp/test.xml modified

Now to complete the script only need to implement in the process method, the necessary logic to parse and insert to database.

For example, if the xml file contains some data about current track on a web radio:

<?xml version="1.0" encoding="ISO-8859-1" standalone="yes" ?> 
 <Pulsar>
  <OnAir>
     <media_type>default</media_type> 
     <media>
        <title1>JOVEM PAN FM</title1> 
        <title2>100,9MHz</title2> 
        <title3>A maior rede de radio do Brasil</title3> 
        <title4>00:00:00</title4> 
        <media_id1>#ID_Title#</media_id1> 
        <media_id2>#ID_SubTitle#</media_id2> 
        <media_id3>#ID_Album#</media_id3> 
        <hour>2013-12-07 11:44:32</hour> 
        <length>#Duration#</length> 
        <ISRC>#Code#</ISRC> 
    <id_singer>#ID_Singer#</id_singer>
    <id_song>#ID_Song#</id_song>
    <id_album>#ID_Album#</id_album>
    <id_jpg>#Jpg#</id_jpg>
     </media>
  </OnAir>
</Pulsar>

The easiest way to parse this small xml is using xmltodict library.

pip install xmltodict

With xmltodict.parse function the above xml will be outputed as an OrderedDict

OrderedDict([(u'Pulsar',
    OrderedDict([(u'OnAir',
        OrderedDict([(u'media_type', u'default'),
        (u'media', 
            OrderedDict([(u'title1', u'JOVEM PAN FM'),
                         (u'title2', u'100,9MHz'),
                         (u'title3', u'A maior rede de radio do Brasil'),
                         (u'title4', u'00:00:00'),
                         (u'media_id1', u'#ID_Title#'),
                         (u'media_id2', u'#ID_SubTitle#'),
                         (u'media_id3', u'#ID_Album#'),
                         (u'hour', u'2013-12-07 11:44:32'),
                         (u'length', u'#Duration#'),
                         (u'ISRC', u'#Code#'),
                         (u'id_singer', u'#ID_Singer#'),
                         (u'id_song', u'#ID_Song#'),
                         (u'id_album', u'#ID_Album#'),
                         (u'id_jpg', u'#Jpg#')]))]))]))])

Now we can just access that dict to create the registry on filesystem or something else. Notice that I will use a lot of get method of dict type to avoid KeyErrors.

with open(event.src_path, 'r') as xml_source:
    xml_string = xml_source.read()
    parsed = xmltodict.parse(xml_string)
    element = parsed.get('Pulsar', {}).get('OnAir', {}).get('media')
    if not element:
        return
    print dict(element)

ant the output will be:

{u'hour': u'2013-12-07 11:44:32',
 u'title2': u'100,9MHz',
 u'id_album': u'#ID_Album#',
 u'title1': u'JOVEM PAN FM',
 u'length': u'#Duration#',
 u'title3': u'A maior rede de radio do Brasil',
 u'title4': u'00:00:00',
 u'ISRC': u'#Code#',
 u'id_song': u'#ID_Song#',
 u'media_id2': u'#ID_SubTitle#',
 u'media_id1': u'#ID_Title#',
 u'id_jpg': u'#Jpg#',
 u'media_id3': u'#ID_Album#',
 u'id_singer': u'#ID_Singer#'}

Much better than XPATH, and for this particular case when the xml_source is small there will no relevant performace issue.

Now only need to get the values and populate the database, in my case I will use Redis DataModel as storage.

also I will use magicdate module to automagically convert the date format to datetime object.

import sys
import time
import xmltodict
import magicdate
from watchdog.observers import Observer
from watchdog.events import PatternMatchingEventHandler

from .models import Media


class MyHandler(PatternMatchingEventHandler):
    patterns=["*.xml"]

    def process(self, event):
        """
        event.event_type
            'modified' | 'created' | 'moved' | 'deleted'
        event.is_directory
            True | False
        event.src_path
            path/to/observed/file
        """

        with open(event.src_path, 'r') as xml_source:
            xml_string = xml_source.read()
            parsed = xmltodict.parse(xml_string)
            element = parsed.get('Pulsar', {}).get('OnAir', {}).get('media')
            if not element:
                return

            media = Media(
                title=element.get('title1'),
                description=element.get('title3'),
                media_id=element.get('media_id1'),
                hour=magicdate(element.get('hour')),
                length=element.get('title4')
            )
            media.save()

    def on_modified(self, event):
        self.process(event)

    def on_created(self, event):
        self.process(event)


if __name__ == '__main__':
    args = sys.argv[1:]
    observer = Observer()
    observer.schedule(MyHandler(), path=args[0] if args else '.')
    observer.start()

    try:
        while True:
            time.sleep(1)
    except KeyboardInterrupt:
        observer.stop()

    observer.join()

That is my usecase, but the example can be used for any kind of requirement.

Another useful module is Workflow by Massimo Di Pierro that creates workflows based on rules defined in a config file.

Flask Google Maps (plus: how to write a Flask extension)

Last week I was writing a talk to give at Google Developers Bus and I needed to show how to integrate Flask and Google Maps API, as I did not found any extension to Google Maps I decided to create one.

One of the best things in Flask is the way it is extended by Extensions and Blueprints, by the way, Blueprints is one of the best idea I've seem in Python web frameworks, once you start working with Blueprints you want to use it everywhere (but unfortunatelly not every framework has an ellegant way to be extended)

I will start showing how the extension works and then I will explain how to build it from scratch.

Flask Google Maps

Template filter and a template global to create Google Maps from latitude and longitude

Installing

pip install flask-googlemaps

Loading in your app

from flask import Flask
from flask.ext.googlemaps import GoogleMaps
app = Flask(__name__)
GoogleMaps(app)

Using in templates

<div>
{{ googlemap('identifier', **params)}}
</div>

<div>
{{googlemap("map_name", lat=-0.12, lng=-0.45, markers=[(lat, lng), (lat, lng)]}} 
</div>

Parameters:

- identifier: The name tat will be used to identify your map (you can have multiple maps in one page)
- lat:  latitude to center the map
- lng: Longitude to center the map
- markers:  a list of tuples, each tuple is a (lat, lng) marker (a pointer in the map)
- zoom: percentage of the zoom
- maptype: Google map type, TERRAIN or ROADMAP. defaults to ROADMAP
- varname: The JS varname to bind the map, defaults to "map"
- style: css style to be appended to the < div >
- cls: css class to the map < div >

TODO: In near future it will be possible to pass an address as argument

Example

The template

<body>
<h1>Flask Google Maps Example</h1>

<h2> Google Dev Bus - Rua Quatá, 255 </h2>
{% with %}

    {% set location=(-23.599097,-46.675903) %}
    {% set style="width:500px;height:500px;"%}

    {{
    googlemap(
        "simple-map",
        location.0, location.1,
        markers=[location,],
        style=style
        )
    }}

{% endwith %}
</body>

The output

map

Github

Screenshots and docs on Github.

https://github.com/rochacbruno/Flask-GoogleMaps

How to create a Flask Extension

Flask is extendable by two patterns Extension and Blueprint

An Extension is something like a complete plugin, a distribution containing models, views, templates, static files, template globals and filters etc and usually an Extension is built of Blueprints, which is an app prototype, it can define the way an app will be when registered, exposing resources, and url rules, also the Blueprint has the capability to access the current running application to contribute with things like config values, template filters, etc.

In the Flask docs there is a great explanation on Extensions and Blueprints

Anathomy of an extension

An extension is just a Python package following the naming convention Flask_anything, with packages naming like that Flask will automatically find them in the Python PATH via the ext proxy. So instead of from flask_anything import something you can do from flask.ext.anything import something in that way the code will be very clear and explicit and you know you are dealing with a Flask extension.

Flask-Anything

root folder
|__ flask_anything/
    |__ templates/
    |__ static/
    |__ __init__.py
    |__ __some_module.py
    |__ *
|__ setup.py

That is it! you can write really anything you want and it will be available throught from flask.ext.anything.some_module import FooBar

What is inside?

Usually extensions expose a main Class which will be registered in your app, there is no rule, but there is some conventions, an example:

# flask.ext.anything.some_module.py

from flask import Blueprint

class Anything(object):
    def __init__(self, app=None, **kwargs):
        for k, v in kwargs.items():
            setattr(self, k, v)

        if app:
            self.init_app(app)

    def init_app(self, app):
        # here you get the app object and can do anything you want
        app.add_template_filter("pass a template filter function here")
        app.add_template_global("pass a template global here")

        app.add_url_rule("/anything/<arg>", view_func=self.something)

        #or even better you can register a Blueprint
        module = self.create_blueprint("module")
        another_module = self.create_another_blueprint("another_module")

        # then you can register many blueprints in the app
        app.register_blueprint(module)
        app.register_blueprint(another_module)


    def create_blueprint(self, blueprint_name):
        module =  Blueprint(blueprint_name, __name__, template_folder="a_path_to_relative_folder")
        module.add_app_template_filter(...)
        module.add-app_template_global(...)
        module.add_url_rule("/something/<argument>", view_func=self.some_view)
        return module

    def some_view(self, argument):
        context = {'argument': argument}
        return self.render(self.get_template_name, context)

    def render(self, *args, **kwargs):
        return render_template(*args, **kwargs)

    ...

By the community convention your extension main class should receive its configurations in __init__ method and should have a lazy way to init defined as a method called init_app, also is a good practice to create methods for things like create_blueprint, register_blueprint, get_url_rules and also a render_template method inside your Blueprint is usefull. That is because others could extend your class and overwrite them, in example to use Flask-Themes it is usefull to overwrite the render_template method.

Using your extension/Blueprint

# your_app.py

from flask import Flask
from flask.ext.anything.some_module import Anything

app = Flask(__name__)

# option 1
Anything(app)

# option 2
anything = Anything()
anything.init_app(app)

With the above instantiation/init_app your app will be manipulated by the extension and the views, urls, template filters etc will be available.

There is more conventions to follow as state, namespace, resource access etc, but you can find all the information on Flask docs.

If you have some idea to improve Flask-GoogleMaps, please comment!

Desenvolvendo protótipos para startups com Python e web2py

No dia 18 eu palestrei na semana global do empreendedorismo, la na Plug'n work. A idéia da palestra foi mostrar para os empreendedores que estão começando a desenvolver suas idéias uma maneira de desenvolver seus protótipos (ou até mesmo um MVP) utilizando Python, web2py, bootstrap e o browser.

Além de apresentar Python e ressaltar sua facilidade, assim como todo o poder do web2py para este nicho de público, eu tive a intenção de focar em uma opinião pessoal que é a minha repulsa pelo termo "Sócio técnico" e como isso soa como enganação e é claro mostrar como qualquer empreendedor que saiba usar um computador e pelo menos tenha noção de estrutura de dados (ja tenha usado uma planilha excel) é capaz de desenvolver seu próprio protótipo utilizando o web2py.

Pretendo melhorar este material e quem sabe transformar em um vídeo e tambem estou disponível para dar a mesma palestra em outros eventos, universidades etc..

Seguem os slides.

Este video é parte da aula 4 do cursodepython.com.br