Confession intime

J'ai une confession à vous faire.
J'ai écrit il y a quelques temps déja des petits scripts en Groovy pour exporter mes données de Posterous et Goodreads. Au début, ça
me paraissait facile. Et puis j'ai voulu rajouter des fonctionnalités : exporter plus de trucs, exporter dans plus de formats … Et ce qui devait m'arriver m'est arrivé. un jour (je crois que c'était l'année dernière au mois d'octobre) j'en ai eu marre.
J'en ai eu marre de la limitation débile que je m'imposais de tout mettre dans le même fichier.
J'en ai eu marre également de ne pas pouvoir débugger, d'être coincé dans une version spécifique de http-builder, et de ne pas vraiment pouvoir faire ce que je voulais.
Et puis je commencais à me perdre dans mes closures.
J'ai donc longuement réfléchi au problème, et pris la décision qui s'imposait.
J'arrête ces scripts en groovy. Ou plutôt, en m'appuyant sur cette expérience, je fais un reboot et je redémarre les projets dans mon langage de prédilection : Java.
Je ne crois pas que Groovy soit en cause, en fait. Mais, commeles développeurs de free.fr qui sont passés de JSF à PHP en deux jours, j'ai ma zone de confort. Et il semble que cette zone de confort soit bien définie (autrement dit je commence à me scléroser) : Java, Maven, et Eclipse de préférence (mais j'y reviens plus bas, pas de panique). Donc, je retourne dans mes pantoufles, et je m'y remets.
Avec toutefois une nuance. je me suis dit que tant qu'à faire un aspirateur de site web, autant avoir une prétention, même minime, à l'universalité. Je développe donc maintenant une espèce de "truc" à laquelle je donne un fichier de configuration (avec les différents login/password/clés d'accès aux APIs), et qui, pour chaque site "connu", va aspirer les éléments intéressants. Un truc somme toute basique.
Tellement basique, d'ailleurs, que le truc avec lequel j'ai eu pour l'instant le plus de diffculités à comprendre est le mécanisme d'authentification basique avec http-client. Pour le reste, laissez-moi juste vous dire que Jackson est sacrément balaize pour le mapping JSON/objet. En revanche, une chose me chagrine : il semble que personne n'ait pensé à utiliser les pouvoirs "magiques" des proxies Java pour écrire un accès à une API web sous forme d'une interface .. quelque chose comme ça :

Avec ça, j'écris vite fait les parties utiles de l'API du site, et après mon code devient hyper-facile, vous ne trouvez pas ?
Bon, je crois que je vais le faire tellement c'est simple.
En revanche, le truc qui em déçoit beaucoup, c'est GitHub. je croyais que c'était facile d'y créer un dépôt pour y placer mon code, mais en fait non : il faut à chaque fois passer par cette saleté du millénaire précédent de ligne de commande pour créer la branche originale avec msysgit. Parce qu'évidement (ou tout au moins d'après ma compréhension du truc) EGit n'arrive pas à faire le premier push sur GitHub. Et ça, franchement, ça me donne envie d'abandonner ce "truc" hyper-hype pour revenir à un simple Subversion (chez origo ou google code). Notez bien que je n'ai pas encore essayé avec NetBeans.
Parce que oui, en ce moment, j'utilise un peu NetBeans. Je ne comprend pas tout, et l'absence des pespectives d'Eclipse me manque beaucoup, mais l'un dans l'autre il y a de bonnes idées dans cet IDE (par exemple dans les refactorings comme le pull up qui permet de créer automatiquement les déclarations de méthode abstraite dans la super-classes).
Bon, ben avec tout ça, normalement, je devrais avoir vite codé l'export, non ? NON ? non.
Parce qu'il y a quand même du boulot : il faut récupérer plein de données en accédant à plein d'APIs, faire en sorte que les URLs soient toutes réécrites pour ne pas retourner sur les sites initiaux, télécharger les pièces jointes, et finallement produire du contenu (mais pour ça, je crois bien que ej vais passer par un quelconque (lire StringTemplate) moteur de templates Java. Mais, l'un dans l'autre, je suis confiant, même si on verra ce que l'avenir me réserve à ce sujet.

Posterous backup script v2

As a follow-up to Posterous official blog message concerning most wanted feature, here is great (or maybe not) news for you, users of posterous backup script.
I’ve just released v2.1 of Posterous backup script.
Why such a tremendous version jump ?
Because many things changed in that little script.

New and noteworthy
  • First, the script now uses Posterous API, thanks to the greatness of Groovy HTTP Builder.
  • What’s more, I now export not only posts, but also tags and pages, leading to an updated organization : each of your posterous site now have, under its very own folder, the following architecture
  +— pages
  |      +— list of your user pages
  +— tags
  |      +— list of your tags, for each the generated page contains all associated posts
  +— posts
  |      +— list of your all your posts
  +— images
  |      +— list of all images used in both your posts and page. For each, both thumbnail and full size image are downloaded
  +— audio_files
  |      +— list of all audio used in both your posts and page.
  +— videos
  |      +— list of all videos used in both your posts and page.
  +— posts.html           # a page listing all your posts
  +— pages.html           # a page listing all your pages
  +— tags.html            # a page listing all your tags

Actually, the links are not exported. Do you want me to export them ? If so, how ?
  • Concerning page content, even if the layout is far simpler than elegant layouts found on our beloved site, I tried to respect, as far as possible, microformats recomendations. As a consequence, posts are written using hatom hentry, users infos (in posts and comments) are written using hcard, which theorically makes CSS skinning easy (even if I’ve not yet thought about it, I must confess)
The mandatory how-to

Concerning the usage guide of that script, nothing has changed (more or less) since initial release of v1 : you must still install both Java and Groovy. Since I’ve kept v1 of that script, you now download it at http://dl.dropbox.com/u/2753331/posterous_2.groovy and then run it using Groovy using either standard command line :

groovy posterous_2.groovy

which will show you the available options :

This is posterous export script v 2.1
2.1 is mainly due to the use of posterous api, instead of old-style http queries
You like that script ? You already use flattr ? Please go to http://flattr.com/thing/54243/Posterous-backup-script to sh
ow your appreciation
error: Missing required options: u, p, o
usage: groovy posterous.groovy
-f,–forceRewrite When present, all sites are regenerated. Existing
data on disk is totally wiped out
-h,–help provides full help and usage information
-o,–output <arg> An eventually existing output folder, where all
data will be output. Beware, if some data exists in
that folder, it may be overwritten.
-p,–password <arg> Unfortunatly one have to give its password to this
little script
-s,–separeMedia Separates media from posts directory. When set, all
medias are copied in subfolders of output folders
named as posts they’re associated with. This option
is by request of Eli Weinberg, with my best wishes.
-u,–username <arg> Sets posterous mail address here

Besides, you may have notice some options changes : the -downloadThumbnails options has been removed, and -forceRewrite option to ensure previous files were removed has been added.
As for version 1, you can still be sure to always launch latest script version but be sure to change used URL, otherwise result won’t necessary meet your expectations.

Troubleshooting guide ?

Naaah, there does not seems to be that much issues for that over-simple script written on the shoulders of giants.
Anway, if you really want to ask, this script is slow, and it’s a known problem (but there is nothing I can do) (except re-writing it in Java with great usage of multi-threading for parallelizing posts and medias downloads … well, Ok, I’ve thought about that, but I won’t implement it before … v2.2).

C’est mal fait, ça tiens

Ce matin (bon, ce midi, en fait), j’ai pris une grande décision.
J’ai décidé d’arrêter de passer mes midis à jouer à Urban Terror pour faire des trucs plus constructifs.
En fait, j’avais déja fortement diminué mon temps de jeu il y a quelques mois, pour pouvoir blogger d’une façon plus régulière. Je pourrais bien entendu vous sortir une timeline de derrière les fagots, mais je ne suis pas sûr que ça en vaille la peine.
Donc, j’ai décidé de diminuer encore mon temps de jeu (poussé par le fait que mon niveau commence à stagner), pour reprendre des occupations utiles.
Au premier lieu de ces occupations, il y a la mise à jour de mes différents scripts. Et évidement, en premier, l’export posterous sur lequel je dois pratiquer deux opérations
  1. Remplacer le format de sortie personnalisé que j’avais créé (vaguement inspiré, il faut le dire, par Docbook) par un micro-format très adapté au contenu.
  2. utiliser l’API v2 de posterous
Et sur ce deuxième point, je cale déja un peu (oui, je commence par le deuxième point parce que, tant qu’à faire, autant se faciliter la vie et commencer par lire les données avant de les écrire, ce qui me donne d’ailleurs une diée foudryante de puissance quant à l’utilisation de GPars et du Groovy fonctionnel pour accélérer le bouzin).
En effet, les gens de Posterous disent de récupérer simplement ce token d’identification à une url, et évidement ça ne marche pas.
Bon, je détaille, sinon personne n’y comprendra rien.
Dans mon code Groovy, pour me connecter, j’utilise simplement

 

 

Et devinez ce que me dit Groovy quand j’exécute ça ?

 

Ben oui, ça ne marche pas.
Et apparement, c’est dû au fait que posterous utilise une authentification basique préemptive, alors que HttpClient (et donc HttpBuilder) n’aime pas ça.

 

Ce qui me pousse tout de suite à changer mon plan pour faire d’abord en premier la première partie de mon plan : exporter dans un joli HTMl réutilisable facilement.
Ensuite, je migrerai en API v2 (et oui, sur les petits projets, je n’hésite pas à modifier l’ordre des tâches si ça peut me permettre de réussir plus de trucs plus vite).

Why posterous backup script execution takes

On Tue, Sep 7, 2010 at 1:13 PM, Daniel ***** <****> wrote:

Any idea why it takes so long (~30s, see below)?

 

Oh yes man !
But before all, pardon me to put this reply on my blog, for all the script users to know that (as a consequence, i’ve added the post@posterous.com mail to the « To » field. So don’t reply to all, unless you want our private discussion to become totally public ;-)).
And if I’m too beginner for you, or too hard core, aprdon me again, as I’m quite bad at teaching anything.

 

Unfortunatly, if coding using Groovy is damn fast, executing Groovy code, especially using some of the options I use, is waaaaaay slower than quite anything else. Le me detail them for you.

 

First of all, Groovy code is run in the Java VM, and before to load even the first element of my script, the Groovy execution environment has to load. Since the ten years I start coding in Java, this language has always been recognized for the completeness of its standard library, its ability to run everywhere (thanks to Sun work), and its damn slow startup time. Obviously, when using an interpreted language like groovy, there is no way for the default startup time to fasten, as the posterous class needs to be parsed, and interpreted as executable code.

 

Once the code is loaded in the Java Virtual Machine, some of the dependencies I use have to be resolved (namely nekohtml and groovy http-builder).
Resolving dependencies is usually an enterprise thing, made by heavy metal tools like maven. I use that too often in Java code for going this path in my Groovy code (and Groovy is too agile for that). Instead, I preferred the Groovy way by using Groovy grapes, which resolves dependencies at runtime. yes, at runtime, at the very moment you choose to run the script. As a consequence, once the script is loaded, each time it encounters one dependency declaration, it stops, the time to verify that your ~/.groovy/grapes folder well contains the required dependency, and the time to add the jar to your classpath. which, again, is quite slow.

 

Finally, in order for you to enter your arguments with ease, I use a CLI parser (provided as default by groovy). This, obviously, takes some time.
And once all that job’s done, you can see that unfortunate execution time :

 

$ time Desktop/groovy-1.7.4/bin/groovy posterous.groovy
This is posterous export script v 0.6
You like that script ? You already use flattr ? Please go to http://flattr.com/thing/54243/Posterous-backup-script to show your appreciation
error: Missing required options: u, p, o
usage: groovy posterous.groovy -u email@posterous -p password -o
outputFolder
-d,–downloadThumbnails When present, thumbnails as well as normal size
 images are downloaded. This option is by
 request of Eli Weinberg, with my best wishes.
-h,–help  provides full help and usage information
-o,–output <arg>  An eventually existing output folder, where all
 data will be output. Beware, if some data
 exists in that folder, it may be overwritten.
-p,–password <arg>  Unfortunatly one have to give its password to
 this little script
-s,–separeMedia Separates media from posts directory. When set,
 all medias are copied in subfolders of output
 folders named as posts they’re associated with.
 This option is by request of Eli Weinberg, with
 my best wishes.
-u,–username <arg>  Sets posterous mail address here
-x,–xsl <arg> Gives an XSL stylmesheet URL that will be put
 in all generated files

 

real
0m26.524s
user
0m6.318s
sys
0m0.385s

 

 

 

Now all is say, some questions arise :
  • can I make my code faster ? for which the easy reply is : YES, definitely. there are at least three obvious improvements ways : translate the code into Java, use static dependencies, and improvded command-line parsing (obviously, going to Java code will imply the two others, due to classical performances figures)
  • will I make my code faster ? for which my definitive (unfortunatly) reply is NO. Going the performance way is obviously an interesting intellectual challenge, but I’ve already solved it many times and, believe me, it’s not that interesting when done for the tenth time. Furthermore, I don’t think backing up a posterous account needs to be faster than light. As a consequence, even if I perfectly understand that you’re someway concerned about that script slowlyness, I won’t make it faster by performing all those modifications
There is in fact an other reason. This script currently is only 13 Kb. Creating an equivalent Java application and packaging it would consume hundred Kb. Of course, you download nekohtml and groovy http builder. But you downlod them only once, even if script is updated. As a consequence, i consider it a valid trade between feature and weight.

 

Pardon me if my language may seem a little rough for the above sentences, as I’m absolutely not trying to bully you. In fact, each time one of posterous backup script users sends me a mail, I consider myself happy as these discussions allow me (like in this very case) to clarify some points that may not be clear. In other words, thank you for asking that question. I hope my reply was clear enough. If you have any question, remark, love declaration (I write that just for my usual blog readers, Daniel), don’t hesitate, i will be more than happy to reply.

 

Thanks again for that question.

Always launch latest version of posterous backup

Fed up with downloads and copy of that groovy script ? Feel like you deserve the "bleeding edge" and all that new features I'm talking about ?

Well, if so, let me give you an advice : change your script to replace the local version of posterous.groovy by the online one. As an example, for my very own posterous, command line becomes

PS : Thanks to the StackOverflow community for that one.

More posterous backup script goodness

So, for all the users of my script (there seems to be at least three of you).

I've decided to highlight a little that script by giving it a specific tag, making this tag visible by a link in my navigation menu, with also gives you a RSS feed specific to that script (for being up-to-date on it). This way you won't have to be bothered by news regarding my life, and only have news concerning that small piece of code.

Stay tuned, I will soon give you some new powers (like the one Eli and his father asks me) … and some new questions

Posterous backup, an upgrade

This message is an update to the previous one.
it is due to my users being true bentlemen, that wants nothing more than a sweet script allowing them to do whatever kind of backup they want.

One of these users, Eli, asked me if it was possible to save images in another folder than the messages one.

As a matter of fact, I thought it would be even more convenient to have ilmages stored in a folder named from the message.

As a consequence, I added a new flag to posterous backup command line : "-s". When using this flag, images are all stored in a folder named like the message url.

When opitting it, the "historical" mode has been kept, and all images are saved alongside the messages.

Obviously, you can download that script by following that link : download it immediatly and without any kind of nagware.

Feel free to manifest your appreciation !

Posterous groovy bug due to bad HTTP Builder dependency.

I allow myself to post this on my posterous blog, as a follow-up to my message concerning the availability of the script.

 

2010/4/7 E*** B******* L*** <******>
>

> Hi Nicolas,

 

Hi, E***

>
> I just tried to use your script on my Ubuntu machine but it can not find the following class:

>

> unable to resolve class groovyx.net.http.URIBuilder
>
> Can you maybe tell me how I can install this class to be available in Java? I found it with Google but don’t know how to make it available for your script / the JVM.

 

Will have to dive deep in groovy internals for that, I hope you’ll understand the whole.

 

The posterous groovy script, for getting all the entries, make use of the groovy version of HTTP Builder : http://groovy.codehaus.org/modules/http-builder/

This dependency is resolved using groovy specific mechanism : grape. You can see the @Grab instruction for HTTP builder at line 289 :

 

@Grab(group=’org.codehaus.groovy.modules.http-builder’, module=’http-builder’, version=’0.5.0-RC2′)

 

I’ll make the guess you don’t know at all groovy grape. It’s a kind of « live » version of Ruby Gems : your program dependencies are expressed using annotations in the body of your program. Once run by the groovy runtime, each time one of these annotations is encountered, grape checks if the dependency is already known (that’s to say already present as ~/.groovy/grapes/ »group »/ »module »/jars/ »module.version »). If not, it is downloaded and its checksum is verified.

 

Currently, the HTTP Builder website says on its front page

 

So I had a couple people mention that they had trouble using Grape to download RC3 due to a bad checksum on the POM file. I tried re-deploying it, (thinking it was a bad upload) and now I’m getting a _bunch_ of people saying they can’t download it!

As a work-around, please add the following line to ~/.groovy/grapeConfig.xml: <property name= »ivy.checksums » value= » »/>. This should allow Grape to ignore the checksum and download the file, until I’ve got the checksum problem resolved. Stay tuned!

 

May I suggest you to do the mentionned operation ? Notice however this operation seems to disable checksum verification on all jars, not only on HTTBuilder one. So, don’t forget to remvoe that line once the application is run successfully !
>

> Greetings from Germany,
>

Thanks for the feedback !

Posterous backup updated !

So, thanks to a comment from Richard, I discovered on saturday that my little posterous backup script did not backup private posts.
I initially found it weird since I knew that, in my script, I correctly set the basic authentication mechanism.
In fact, it was due to the way posterous api is built, I think. From what I’ve understood, there is no session maintained on server side (weird choice, Sachin).
So, from Tom’s advice, I updated my script to ensure the basic authentication token where written on all requests.
Now, Richard, posterous export should grab all your entries, even private ones !
Besides, I’ve also copied that script on git (since relying solely upon dropbox survival does not seems enough to me). So, if you want to update it, it must be rather easy, no ?

Posterous backup : beta reached !

Exceptionnellement, j’écris en anglais histoire de toucher la plus large audience possible …

One easy way to backup all of your posterous blogs

As anybody (and as the posterous creators stated), I don’t want my data to be locked in an application, be it as smart as posterous currently is.
As a consequence, I’ve decided it was time to ensure my posts (this one like all the others) would remains even if posterous collapsed (what I absolutely not want).
For that, relying upon posterous API, I wrote a groovy script that will backup all my posts, photos, sounds, and videos on the computer I use.

Organized backup

This backup is quite simply organized :
One folder for each site.
In each folder, entries keep the file name they have in posterous, followed by a nice .xml extension.
Each media associated to an entry uses the entry name, followed by _#anumer, where the #number is the media number.
Notice I also do URI replacement in the entry, for backup media to be used instead of posterous ones.

As an example, my own backup contains the following


+—knackfx.posterous.com
|   all-your-bases-are-belong-to-rest.xml
|   cant-wait-for-the-671.xml
|   knack-it.xml
|   netbeans-671-available-for-download.xml
|   rest-in-game.xml
|   serve-me-good-games.xml
|   whats-new-today.xml
|
riduidel.posterous.com
   il-mavait-prevenu-le-bougre.xml
   il-mavait-prevenu-le-bougre_0.png
   il-mavait-prevenu-le-bougre_1.jpg
   tester-lutilisation-de-la-memo.xml
   tester-lutilisation-de-la-memo_0.gif
   tester-lutilisation-de-la-memo_1.jpg
   the-big-band-theory.xml
   the-gaf-collection-collected.xml
   the-gaf-collection-collected_0.jpg
   the-gaf-collection-collected_1.jpg
   xkcd-movie-narrative-charts-0.xml

Well, in fact, I only shows here an excerpt, since backup generates 205 files on my machine.

How to use

Here comes the hardest part : installing this backup script. There are two little prerequisites :

  1. Install a recent Java (theorically, your machine should laready use a Java 5 or Java 6 compatible version, what you can check by issuing the java -version command in a shell)
  2. Install a recent Groovy (which is just a little harder).
  3. Download the posterous.groovy script (this is quite easier). You can put this script anywhere on your disk.

One all is downloaded, go in the script folder and

groovy posterous.groovy

should output the following

This is posterous export script v 0.1error: Missing required options: u, p, ousage: groovy posterous.groovy -u email@posterous -p password -ooutputFolder -h,--help provides full help and usage information -o,--output  An eventually existing output folder, where all data will be output. Beware, if some data exists in that folder, it may be overwritten. -p,--password  Unfortunatly one have to give its password to this little script -u,--username  Sets posterous mail address here
Notice that first run may show error messages, since grape download (the tool used by groovy to pull dependencies from the interweb) has some issues regarding used dependencies.

The result

So, as told before, this script generates a bunch of files on your machine.
Each of this file is an XML file containing all infos from posterous (in another format I plan to use to pull data from various websites).
This format seems to me quite legible :

<post>
  <title><!– posterous title–></title>
  <date>2008-12-31T17:48:00.000+0100<!– posterous post date in a quite standard form –></date>
<author>
  <name><!– author name –></name>
  <pic><!– author pic –></pic>
</author>
<body><![CDATA[
<!– post body, protected from XML interpreting by the CDATA section
   ]]></body>
<comments>
<comment>
<author>
  <name></name>
  <pic></pic>
</author>
  <date></date>
<body><![CDATA[
   ]]></body>
</comment>
</comments>
</post>

I thinkk this format will allow you to execute any complementary treatment, as nothing since to be lost from original (expect medias file sizes, which you can see in your OS).
Additionnaly notice that only medias stored by posterous are downloaded, not Flickr images or Youtube videos.

The license

This script uses a creative commons license : paternity, share-alike, no commercial use.

Feed me back !

Any feed back you can send me will be greatly appreciated ! Don’t hesitate to use comments below to mark your appreciation.