Sunday, July 3, 2011

Google+ Test

Seeing if this will let me post to my Google+ profile directly.

Sunday, September 26, 2010

Ripping a Website in One Line

I posted this as an answer to a question on "site-ripping" on another site, but I figured I'd mirror it here since I wrote it.

Question: "I need software that will rip a site via HTTP. It needs to download the images, HTML, CSS, and JavaScript as well as organize it in a file system."

Here's my answer: Use GNU wget.


wget -erobots=off --no-parent --wait=3 --limit-rate=20K -r -p \
-U "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)" \
-A htm,html,css,js,json,gif,jpeg,jpg,bmp http://example.com


This runs in the console. It will grab a site, wait 3 seconds between requests, limit how fast it downloads so it doesn't kill the site, and mask itself in a way that makes it appear to just be a browser so the site doesn't cut you off using an anti-leech mechanism.

Note the -A parameter that indicates a list of the file types you want to download.

You can also use another tag,
-D domain1.com,domain2.com
to indicate a series of domains you want to download if they have another server or whatever for hosting different kinds of files. There's no safe way to automate that for all cases, so you just have to try it and keep an eye on it.

Wget is commonly preinstalled on Linux, but can be trivially compiled for most any other Unix systems or downloaded easily for Windows: GNUwin32 WGET

Labels:

Wednesday, April 22, 2009

Setting up standalone Oracle JDeveloper OC4J Server

This is how to set up the Oracle standalone OC4J server that comes with JDeveloper. There's no good quickstart guide, so I wrote one.


Assumptions

  1. Windows platform.

  2. Java and Apache ANT are installed and work from the command line.

  3. Oracle JDeveloper 10.1.2.0.0 installed at C:\jdev1012. If not, substitute your version and the directory it is installed in wherever it is referenced.

  4. Username and password for the OC4J server will be admin/admin. Make them whatever you want as long as you update the code that uses these.

  5. The application name is called "whatever," replace it with whatever you want.

  6. Your app server is installed on localhost, port 8888, and the RPC interface is on port 23791. These are the defaults, yours may vary.

  7. you can change the URL from "/yourURL" to whatever you want as long as you replace it in http-web-site.xml.

Set up server

C:\jdev1012\j2ee\home>java -jar oc4j.jar -install

enter an admin username and password.

Start server

C:\jdev1012\jdev\bin>start_oc4j.bat

Deploy app to server

Ant target
<property name="oc4j.deploy.ormi" value="ormi://localhost:23791" />
<property name="oc4j.deploy.username" value="admin"/>
<property name="oc4j.deploy.password" value="admin"/>

<target name="deploy">
<java jar="${jdev.home}/j2ee/home/admin.jar" fork="yes">
<arg value="${oc4j.deploy.ormi}"/>
<arg value="${oc4j.deploy.username}"/>
<arg value="${oc4j.deploy.password}"/>
<arg value="-deploy"/>
<arg value="-file"/>
<arg value="whatever.ear"/>
<arg value="-deploymentName"/>
<arg value="whatever"/>
</java>
</target>

Bind to server

notepad c:\jdev1012\j2ee\home\config\http-web-site.xml

add:

<web-app application="whatever" name="warfile name without .war" root="/yourURL">
</web-app>

Run

http://localhost:8888/yourURL

Conclusion

Look at that, up and running without 500 pages of Oracle "documentation."

Labels: , , ,

Windows, Java and Cygwin: a Wrapper Script to Fix File Paths

So, at work I sometimes have to run Java code on both AIX and on Windows. If I want to test things, I have found it beneficial to run the entire thing under Cygwin. Unfortunately because of the difference in how paths are handled, stuff usually doesn't work right. So I created the script below. It doesn't work on everything, in particular it doesn't work for referencing your Log4J config from the command line. But, it works for my purposes.

In short, name this java.bat, put it first on your path under Cygwin, and it will fix MOST filenames so that they properly run.


#!/bin/ksh
#fix java programs to run correctly with cygwin
export CLASSPATH="$(cygpath -wp "$CLASSPATH")"

x=0
lastvar=
for p in "$@"
do
#get path of the parameter if it has one
ppath="$(dirname "$p")" > /dev/null 2>&1
#exclude current dir
if [[ "$ppath" = "." ]]; then ppath=""; fi
#exclude if its not a directory already
if ! test -d "$ppath"; then ppath=""; fi

if [[ "$lastvar" = "-cp" ]]
then
parms[x]="$(cygpath -wp "$p")"
elif [[ "$lastvar" = "-classpath" ]]
then
parms[x]="$(cygpath -wp "$p")"
elif test -e "$p"
then
parms[x]="$(cygpath -wp "$p")"
elif [[ "$ppath" != "" ]]
then
parms[x]="$(cygpath -wp "$p")"
else
parms[x]="$p"
fi

lastvar="$p"
x=$(( $x + 1 ))
done

"$JAVA_HOME/bin/java" "${parms[@]}"


If there is a better way to do this, let me know.

Labels: , , ,

First Post

Testing.

Labels: