Using Python to get all the external links from a webpage

Based on the Mark Pilgrim - Dive in to Python book

Define the url lister

from sgmllib import SGMLParser

class URLLister(SGMLParser):
    def reset(self):                              
        SGMLParser.reset(self)
        self.urls = []

    def start_a(self, attrs):                     
        href = [v for k, v in attrs if k=='href']  
        if href:
            self.urls.extend(href)

Now The function which receives an URL, read that url and list all href attrs

def get_urls_from(url):
    url_list = []
    import urllib
    usock = urllib.urlopen(url)
    parser = URLLister()
    parser.feed(usock.read())         
    usock.close()      
    parser.close()                    
    map(url_list.append, 
        [item for item in parser.urls \
            if item.startswith(('http', 'ftp', 'www'))])
    return url_list

Ok, Now you can call this:

from pprint import pprint
pprint(get_urls_from("http://www.rochacbruno.com.br"))

and you get:

['http://feeds.feedburner.com/rochacbruno',
 'http://www.python.org',
 'http://www.web2py.com',
 'http://www.djangoproject.com',
 'http://www.jquery.com',
 'http://www.postgresql.org',
 'http://www.linux.org',
 'http://www.DrBusca.com',
 'http://www.cursodepython.com.br',
 'http://facebook.com/rochacbruno',
 'http://twitter.com/rochacbruno',
 'http://linkedin.com/in/rochacbruno',
 'http://angel.co/rochacbruno',
 'http://foursquare.com/rochacbruno',
 'https://plus.google.com/u/0/116110204708544946953/posts',
 'https://kippt.com/rochacbruno',
 'http://about.me/rochacbruno',
 'http://www.movu.ca',
 'http://www.menuvegano.com.br',
 'http://www.web2pyslices.com',
 'http://github.com/rochacbruno',
 'http://www.web2py.com',
 'http://associacao.python.org.br',
 'http://www.python.org/psf/members/#nominated-members',
 'http://amazon.com/author/rochacbruno',
 'https://snipt.net/rochacbruno/blog-sidebar-2/',
 'https://snipt.net/',
 'http://rochacbruno.com.br/web-apps-that-worth-a-try/',
 'http://snipt.net',
 'http://snipt.net',
 'https://github.com/nicksergeant/snipt-old',
 'http://snipt.net/pro',
 'https://kippt.com/',
 'http://kippt.com',
 'http://kippt.com',
 'http://coolendar.com',
 'http://Coolendar.com',
 'http://pythonanywhere.com',
 'http://pythonanywhere.com',
 'http://rochacbruno.com.br/web-apps-that-worth-a-try/#disqus_thread',
 'http://rochacbruno.com.br/i-am-now-a-member-of-python-software-foundation/',
 'http://linkedin.com/in/rochacbruno',
 'http://www.cursodepython.com.br',
 'http://www.blouweb.com',
 'http://www.amazon.com/Bruno-Cezar-Rocha/e/B007KZBV4M',
 'http://pyfound.blogspot.com.br/2012/08/welcome-new-psf-members.html',
 'http://www.python.org/psf/members/',
 'http://rochacbruno.com.br/i-am-now-a-member-of-python-software-foundation/#disqus_thread',
 'http://rochacbruno.com.br/web2py-manage-users-and-membership-in-the-same-form/',
 'http://stackoverflow.com/questions/11992749/web2py-how-edit-user-profile-and-membership-in-one-view',
 'http://rochacbruno.com.br/web2py-manage-users-and-membership-in-the-same-form/#disqus_thread',
 'http://rochacbruno.com.br/lazy-dal-beta-working/',
 'http://rochacbruno.com.br/lazy-dal-beta-working/#disqus_thread',
 'http://rochacbruno.com.br/lazy-dal-attempt-3-pbreit/',
 'http://rochacbruno.com.br/lazy-dal-attempt-3-pbreit/#disqus_thread',
 'http://rochacbruno.com.br/open-links-which-points-outside-your-own-site-in-a-new-window/',
 'http://rochacbruno.com.br/open-links-which-points-outside-your-own-site-in-a-new-window/#disqus_thread',
 'http://rochacbruno.com.br/websockets-com-tornado-web2py-python-jquery/',
 'http://rochacbruno.com.br/websockets-com-tornado-web2py-python-jquery/#disqus_thread',
 'http://rochacbruno.com.br/loading-html-elements-dynamically-with-web2py-and-ajax/',
 'http://rochacbruno.com.br/loading-html-elements-dynamically-with-web2py-and-ajax/#disqus_thread',
 'http://rochacbruno.com.br/breaking-a-simple-captcha-with-26-lines-of-code/',
 'http://rochacbruno.com.br/breaking-a-simple-captcha-with-26-lines-of-code/#disqus_thread',
 'http://rochacbruno.com.br/sending-emails-with-python-and-gmail/',
 'http://rochacbruno.com.br/sending-emails-with-python-and-gmail/#disqus_thread']

That was based on some examples from DiveIntoPython book

WEB APPS THAT WORTH A TRY

I was google addicted, I used gmail drafts as bookmarks and todo tasks, I used google calendar for any little appointment and I always consider it too much complicated to manage simple scheduler there.

So pointing the problems I had centralizing everything in Google account, I tried a lot of web apps.

Finally I think I've find the perfect set of web tools to rely on


.net

Blogging and snippeting

I am not sure the word "snippeting" really exists, if not, I just invented. Programmers are always snippeting somewhere and there are a lot of tools over the web for doing this. gist, pastebin, paster and others. In 2009 I came through Snipt.net, at that time this already was an excellent tool and also I knew it was made with Python. Sometime ago @nick opened the source code and created a brand new snipt.net.

This is the platform tht powers this blog you are reading, also this is a snippet repository. It has a very good API and a web editor codemirror which supports almost every language in the world (even Brainfuck) and also as every good app wih an API it already has google chrome plugin for instant snippeting and also a plugin for the amazing Sublime text 2 - You can blog and create snippets from your prefered code editor.

  • Blogging
    • Create a snippet, choose markdown as syntax, mark as public/blog post and done!
  • Snippeting
    • from web using codemirror, from Chrome selecting a text, right click send to snipt or using the snipt.net addon, from Sublime text, select a text and send to snipt, also you can sync your sublime snipts with snipt.net. (note: the sublime plugin is under development)
things I really like in snipt.net
  • It is fast and has a nice UX
  • Chrome addon for blogging/snippeting
  • Sublime Text plugin (which I am trying to improve)
  • The "snipt" based configuration for blog
  • Amazing support

Ok, shut up and take my money! well, snipt.net is free, go there and create your account, Although, @nick is creating a lot of new features and improvements and to support his work you can sign-up for a PRO account (only $4 month or $40 year) - You will have a new PRO theme for blog soon, a very nice support and you will be able to use your own domain.

Trust me, you can put your $40/year on this Snipt.net/PRO

things I would like too see on snipt soon
  • Android/iOS apps
  • Create snippets via e-mail
  • Better Sublime-Text integration
  • A Python command line client, maybe integrated with iPython
  • Better blog options (snipt based configs) for theme, header, footer, widgets etc.
  • Better search
  • RSS feeds for specific tags
  • Flat pages

You can support @nick to develop all those great things, go there and purchase your PRO account!


The Pinterest for Bookmarks!

I also tried to use many tools for bookmarking, it always turns in to a hell. Delicious and its bad UX, ZooTool and its extremally fat UX, also the Chrome sync and firefox bookmarks did not help me.

So I found KIPPT. This is the Pinterest for bookmarks, you can use the addon on your browser, you can use the API or you can keep items directly on web interface.

The web interface is perfect! clean, easy to use and understand and it is a Pinterest like feed of links you want to keep, Also it has social features, you can share the links, you can follow people and their link feeds, you can comment and like links keeped by your friends. You can create shared collections so you allowed friends can manage links on that collection. It is fully integrated with your twitter favorites, github starred repositories and another APIs. It has Android and iOs apps.

Found an interesting link? Go the and Kippt

The only thing I am missing is the ability to read my mind. it would be better if I can think Keep this link on my Read Later list and so it automagically keeps it, But it is under development :)


Calendar finally is easy!

Do you really use Google Calendar? Do you always remember about appointments created there? For me Google Calandar is too much. So I want a place where I can just drop an email with the subject tomorrow 9am meeting, or I can send an IM using my Gtalk 15/07/2012 10pm watch the final episose of my favourite program or you can do it in a clean web interface.

So that tool is Coolendar - the best calendar i've ever used. Create your plans easily and receive reminders right on your e-mail or Gtalk. It is also a very nice TODO list just sends an IM or email to coolendar with today Finish the reports #todo So it will be tagged as a TODO plan. Yes it has a Chrome and firefox addon and also a Android and iOs apps.

Obviously it has Sync for Google Calendar and iCal.

Things I miss on coolendar

  • API, API, API
  • Command Line client
  • Shared calendars

Yes, i will put Python Anywhere on the category of web apps, because it is more than a simple host platform. It is a powerful and complete environment for developers.

Everyone who loves Python want to use Python Anywhere, and the best approach for this task is obviously called Python Anywhere what made PA so wonderful?

  • Cloud based consoles, not only for Python but also for Bash and MySQL
  • You can share live console session with others (Yes, code together, you can even do an online DOJO)
  • It offers Python (2.6, 2.7, 3.2)
  • A bunch of batteries included on Python instalation
  • SSH access
  • web consoles
  • web browser and editor for files
  • Dropbox integration on your /home folder
  • Out of the box, one click install and deploy for web2py, Django and Flask
  • WSGI and access to wsgi.py, so you can use any framework
  • It is a good web host
  • Built in Cron-like task scheduler
  • It has free acoount and very nice price for payed accouts
  • API, API, API - A very good API
  • Sublime text plugin (you can edit your cloud files and start/stop your web apps from Sublime Text)

go there create a free account and consider purchase a payed plan, it worth a try!

PythonAnywhere

I am now a member of Python Software Foundation

Now that's official, So I can blog about it here.

Massimo Di Pierro, The lead developer/creator of web2py framework which I started to use/contribute 3 years ago. Nominated me and Mariano Reingart to become Python Software Foundation members. In Aug 13th, 2012 PSF had an election, and we were accepted as PSF elected members.

I am honored to be nominated by Massimo, and also very happy to have been accepted. Not just a personal recognition for me, but also a recognition for all the work done with web2py, and also highlighted the importance of a web2py within the Python community.

I guess that the PSF members who voted have read my CV and obviously my historical analysis did not take into account only the work I do with web2py, but also the work I've been doing to promote and strengthen the use of the Python language in my country with my participation in events as well as advocating and teaching courses about the language and promoting it in companies where I give consulting services.

However, taking into account the fact that my dedication in the Python community is mostly focused on the promotion and development with/of web2py. I assume that this election has taken this into consideration.


Here it comes the reproduction from PSFBLOG

Just the other day the Python Software Foundation held an election, the second and final one of the year, and the results are in! 18 new members were introduced, and the membership approved three new sponsor members. Please join us in welcoming all of them!
Candidates for PSF membership are nominated by an existing member for their work in the Python community. The membership is comprised of people from around the world and from many areas of the community.
These new members are selected from many different areas of the Python community. While some members are known for their contributions of code, many are known for their work to grow their local and regional communities. Some members are known for their work in educational workshops and conferences. It takes a diverse membership to ensure the success of a foundation steering a diverse community, so we're happy to have members of all types from all areas, both geographically and within the Python world.

Please join us in welcoming all of the new members to the Foundation!

Nick Barcet
Dana Bauer
James Blair
Thierry Carrez
Anand Chittipothu
Antonio Cuni
Anne Gentle
Noufal Ibrahim
Vish Ishaya
Christopher MacGown
Dave Malcolm
Joshua McKenty
Mark McLoughlin
Mariano Reingart
Bruno Rocha
Monty Taylor
Dean Troyer
Vicky Twomey-Lee

The following sponsor members were approved:

DreamHost
Globo.com
Hood Media GmbH

For the full PSF membership roster, please see http://www.python.org/psf/members/


Thank you Massimo, and PSF members. I really hope (and I will do my best) to retribute this nomination with continuous work for Python Community.

web2py - manage users and membership in the same form

As requested by user of Stack Overflow.
http://stackoverflow.com/questions/11992749/web2py-how-edit-user-profile-and-membership-in-one-view

How to manage users and memberships at the same form

NOTE: You have to register the first admin user first, because to manage users and memberships we require to be admin

For the purpose of the example we are going to use the file controllers/default.py accessible at the url localhost:8000/YOURAPP/default

A grid to list your users

this is the list admins will see when hit http://..../default/list_users

The user list grid

1 - Put on the default.py file

#@auth.requires_membership("admin") # uncomment to enable security 
def list_users():
    btn = lambda row: A("Edit", _href=URL('manage_user', args=row.auth_user.id))
    db.auth_user.edit = Field.Virtual(btn)
    rows = db(db.auth_user).select()
    headers = ["ID", "Name", "Last Name", "Email", "Edit"]
    fields = ['id', 'first_name', 'last_name', "email", "edit"]
    table = TABLE(THEAD(TR(*[B(header) for header in headers])),
                  TBODY(*[TR(*[TD(row[field]) for field in fields]) \
                        for row in rows]))
    table["_class"] = "table table-striped table-bordered table-condensed"
    return dict(table=table)

With generic views will see this

2 - The edit links to manage_users

Now accessing http://..../default/list_users you are going to see the grid showing all users, now if you click on the edit link on grid it goes to manage_user function we defined on btn = lambda row: A("Edit", _href=URL('manage_user', args=row.auth_user.id))

Create this two functions in the same controller

The user form
#@auth.requires_membership("admin") # uncomment to enable security 
def manage_user():
    user_id = request.args(0) or redirect(URL('list_users'))
    form = SQLFORM(db.auth_user, user_id).process()
    membership_panel = LOAD(request.controller,
                            'manage_membership.html',
                             args=[user_id],
                             ajax=True)
    return dict(form=form,membership_panel=membership_panel)

On the above function we are going to create two objects form which is the form to edit the user object, also we create membership_panel which is an ajax panel to load the manage_membership inside it and ajax managed.

note: that this function takes user_id from request.args(0) then if it is not provided it redirects back to the list_users

The membership panel
#@auth.requires_membership("admin") # uncomment to enable security 
def manage_membership():
    user_id = request.args(0) or redirect(URL('list_users'))
    db.auth_membership.user_id.default = int(user_id)
    db.auth_membership.user_id.writable = False
    form = SQLFORM.grid(db.auth_membership.user_id == user_id,
                       args=[user_id],
                       searchable=False,
                       deletable=False,
                       details=False,
                       selectable=False,
                       csv=False,
                       user_signature=False)
    return form

note that on the manage_membership we are returning the form directly, so we can input it inside the ajax panel membership_panel

The manage_user view

3 - Create an html file in YOURAPP/views/default/manage_user.html

{{extend 'layout.html'}}
<h4> Edit The user </h4>
{{=form}}
<hr>
<h4> User membership </h4>
{{=membership_panel}}

The end result

User Form

Add membership


Done, using web2py 2.0 (trunk)

Lazy DAL - Attempt 3 - Pbreit

based on Pbreit request

On Wed, Aug 15, 2012 at 2:32 PM, pbreit wrote:
What would it take to set it up such that models are defined in mostly the same way as now but in "module" files and then imports are done in controllers/functions that need access to the table.

This file goes on modules/mymodels.py

# -*- coding: utf-8 -*-

from gluon.dal import DAL, Field
from gluon import current

DBURI = "sqlite://....."

TABLE_DEFINITIONS = {
    "owners": {
        "fields": [Field("name")],
        "kwargs": dict(format="%(name)s")       
    },
    "cars": {
        "fields": [Field("name"),
                   Field("owner", "reference owner")],
        "kwargs": dict(format="%(name)s")       
    }
}

class Models(object):
    def __init__(self):
        self.db = DAL(DBURI)

    @property 
    def tables(self):
        return self.db.tables

    def __call__(self, *args, **kwargs):
        return self.db(*args, **kwargs)

    def table_definer(self, tablename):   
      if not tablename in self.db.tables:
          fields = TABLE_DEFINITIONS.get(tablename, {}).get('fields', [])
          kwargs = TABLE_DEFINITIONS.get(tablename, {}).get("kwargs", {})
          return self.db.define_table(tablename, *fields, **kwargs)
      return self.db[tablename]

    def __getattr__(self, key):
      if hasattr(self.db, key):
          return getattr(self.db, key)
      elif key not in TABLE_DEFINITIONS.keys():
          raise AttributeError("attr not found")
      else:
          return self.table_definer(key)

Now in any controller controllers/default.py

from mymodels import Models()
db = Models()

def list_owners():
    rows = db(db.owners).select()
    return dict(rows=rows)

I tested and works, the only caveat is that you are not going to use db.define_table but you will put your model definitions on TABLE_DEFINITIONS dict

Open links which points outside your own site in a new window

If you have a website where users can add content and links and you want all the existing links which points outside your own domain to open in a new window, you can add a target="_blank" automatically to every link on the page using Jquery.

Take care! it can cost a while of processing if your page has a lot of links

Pure Java Script

var links = document.links;

for (var i = 0, linksLength = links.length; i < linksLength; i++) {
   if (links[i].hostname != window.location.hostname) {
       links[i].target = '_blank';
   } 
}

If you're using jQuery

$(document.links).filter(function() {
    return this.hostname != window.location.hostname;
}).attr('target', '_blank');‚Äč

websockets com tornado, web2py, Python, jQuery

Comet messaging with Tornado and web2py

(in Portuguese)

Loading html elements dynamically with web2py and ajax

Put this code in a view /views/default/index.html

only works if you have static/js/web2py.js, normally comes with welcome app

<!-- begin -->
<button class="load_content" data-url="{{=URL('default', 'otherthing1')}}" data-target="ajax_container"> Click to load content1 </button>
<button class="load_content" data-url="{{=URL('default', 'otherthing2')}}" data-target="ajax_container"> Click to load content2 </button>
<button class="load_content" data-url="{{=URL('default', 'otherthing3')}}" data-target="ajax_container"> Click to load content3 </button>
<button class="load_content" data-url="{{=URL('default', 'otherthing4')}}" data-target="ajax_container"> Click to load content4 </button>


<div id="ajax_container"> <!-- CONTENT COMES HERE --> </div>

<script>
$(function () {
   $('.load_content').on('click', function (e) {
        elem = $(this); // elem = $(e.target)
        url = elem.attr("data-url");
        target = elem.attr("data-target");
        web2py_ajax_page("GET", url, "", target);
        return false; // e.preventDefault()
      });
})
</script>
<!-- end -->

When the user clicks on buttons, the content is ajax-loaded in to ajax_container

web2py routes

customizing routes in web2py

web2py comes with defaults for url routes, the default configuration uses the following pattern

http://host:port//<controller name>/<function name>/args/?vars=value

Sometimes we need to change it to a better and beauty URL like

http://host:port/<controller name>/<function name>
or even
http://host:port/<function name>

thats how to to it

Put those lines in /web2py/routes.py - if you have an application called myapp this would be

# -*- coding: utf-8 -*-

routers = dict(
# base router
    BASE=dict(
        default_application='myapp',
    ),
    # app specific router
    myapp=dict(
        default_controller='home',
        default_function='index'
    )
)

logging = 'print'

# routes_onerror = [
#     (r'myapp/404', r'/myapp/static/fail404.html'),
#     (r'myapp/*', r'/myapp/static/fail.html'),
#     (r'*/404', r'/myapp/static/cantfind.html'),
#     (r'*/*', r'/myapp/error/index'),
# ]

error_message = ('<html><body>'
                  '<strong>ERROR DETECTED </strong>'
                  '<h1>%s</h1>'
                  '</body></html>')

error_message_ticket = ('<html><body><h1>Internal error</h1>Ticket issued:'
                         '<a href="/admin/default/ticket/%(ticket)s"'
                         ' target="_blank">%(ticket)s</a>'
                         '<h1>ERROR DETECTED</h1>'
                         '</body></html>')

Custom validator for web2py forms

web2py allows us to write custom form validators, by pattern validators in web2py are UPPERCASE

Hi Pythonista! I know that you are now wondering why does not follow PEP8 name convention? Ok, this is the web2py way, it breaks PEP8 a bit, but it has a good reason The naming convention is for preventing conflicts between objects named form and FORM helper class

Validators templates

There are two types of validators

  • VALIDATOR: It validates if some data is valid or if follow some pattern

  • TRANSFORMATION; It only takes the entered value and returns the transformed value

The patterns

class VALIDATOR(object):
    def __init__(self, error_message="SOMETHING WRONG"):
        self.error_message = error_message

    def __call__(self, value):
        error = None
        # CONDITION COMES HERE
        if "ERROR":
            error = self.error_message

        # IF error != None - value is invalid 
        return (value, error)

class TRANSFORMATION(object):
    def __init__(self, search, replace):
        self.search = search
        self.replace = replace

    def __call__(self, value):
        error = None
        try:
            # TRANSFORMATION COMES HERE
            value = value.replace(self.search, self.replace)
        except:
            error = "Not possible to transform"
        return (value, error)

How to write my own validator

Follow the patterns above, lets see some example

Validate if a zip-code starts with "051"

By some reason our system does not allows registering from other regions out of "051" zip code

class IS_ALLOWED_ZIP_CODE(object):
    def __init__(self, zip_area, error_message="Zip code not allowed"):
        self.zip_area = zip_ares
        self.error_message = error_message

    def __call__(self, value):
        error = None
        value = value.strip()
        if not value.startswith(self.zip_area):
            error = self.error_message
        return (value, error)

Now you can use this in your models

db.define_table("address",
    ...
    Field("zipcode", requires=IS_ALLOWED_ZIP_CODE("051"), notnull=True)
    ....)

Simply like that! now your forms will fire the "Not allowed Zip Code" when users tries to input a zip code as "04509-890"

Transform some text

class REPLACE_TEXT(object):
    def __init__(self, search, replace):
        self.search = str(search)
        self.replace = str(replace)

    def __call__(self, value):
        error = None
        try:
             value = value.replace(self.search, self.replace)
        except:
            error = "Error replacing"
        return (value, error)

In the same way you now replace values in your forms
Example: User enters latitude, longitude with "," and you replace with "."

replace_comma_with_dot = REPLACE_TEXT(",", ".")
db.define_table("address",
    Field("latitude", requires=replace_comma_with_dot),
    Field("longitude", requires=replace_comma_with_dot)
)

HOW IT WORKS?

web2py forms and DAL validate_and_* methods has a step where it takes all values in requires list
for every field in the table, so it takes the entered value and do a call to the validator class
as you can see, the validator implements the __call__ magic method, which means that it will execute when
the instance of the class are called. Yes, instances are callables

class Foo(object):
    def __call__(self, *args):
        print("I've been called with some args %s" % str(args))

>>> foo = Foo()
>>> foo("web2py", "rocks")  # we are calling the instance directly
I've been called with some args ('web2py', 'rocks')

also, is it possible to do some validation and transformation using compute for fields and form events as onvalidation and onaccepts

Thats it!