portal.htmmines.htm → koeln_2009.htm   Version 0.05, Updated 1/MAY/2009, two days before you, dear Fravia, passed away
Current version 0.05a slightly edited and published by his web friends on 10 May 2009
 
The educated seeker ready to seek
 
Powersearching for the educated seeker
(How to comb the web and other searching lore)

"let's fetch anything we fancy for free"



Fravia's talk at the SIGINT (Köln, Germany - 22-24 Mai 2009)
(just grep for SIGINT in /usr/include/asm/signal.h :-)
This file dwells @ http://www.searchlores.org/koeln_2009.htm



    ••     Introduction and caveat
    ••     The educated seeker
    ••     The educated seeker's box (basic weapons) √
    ••     How big is the web? (external link) √
    ••     How to comb the web (external link) √
    ••     A mighty, underrated weapon: rhetoric (Today's target)
    ••     Today's targets (effective ways of books searching)
    ••     Potpourri searches
    ••     Let's search elsewhere
    ••     Conclusions & raccomandations
    ••     Assignment
    ••     Forms


red... means that some stuff will be added orally during the talk.
√... means that these part have been at least checked and some time already completely updated


Introduction
top

Bet you didn't fully realize that *anything* is already in the public domain!

Please excuse my English, which is not even my first foreign language, and please excuse my utter disdain for political correctness. That -and my motivated hate for all commercialization attempt of the web- is one of the reasons I'm still among the few speakers who prefer to use a pseudonym :-)

Caveat

Some of the techniques and approaches described here might be illegal according to the specific legislation of the country you happen to (have to) live inside. No matter if this legislation is utterly stupid and most probably made by corrupted and lobbied unwashed that wouldn't understand a web protocol if it would bite their pants off, the rule NUMBER ONE for the educated seeker is NEVER to violate any law. So check your own legislative constraints and always respect them.
Remember, however, that notwithstanding their pathetic attempts to "fence" nationally the web, Internet remains a truly frontierless international adventure, whose very STRUCTURE has been created in order to still manage to share information in case of an atomic attack... petty politicians "censorship" attacks against the web seem to me still "small fry" in comparison :-)
So if you still want to experiment, you can often apply (some of) the following tips in order to (try to) remain legit:
  • go forth and multiply
    Move to a less oppressive country (forget the alleged "freedom and democracy" rhetoric of the powers that rule you: "less oppressive" here means exactly that: less censorship, less intrusion upon your own data and so on. Use this list as a rough but useful guide). Leaving your own country might seem an haphazardly adventurous move (especially to young people) but it's mostly a quite interesting and refreshing deed. And you'll probably also find a better job (reverse engineers are welcome everywhere) and in many cases a better gastronomical and social habitat :-)

  • use a proxy
    Remain where you are and just use a proxy from a less oppressive country in order to to snoop around. Hard to prove that you are doing something non legit: you are using a proxy (still legit even in the most oppressive and "patent hysteric" countries: more than half of the commercial advertisements and sniffing procedures that pollute the web are based on proxies kicking in without even telling you). Therefore proxies and people using them are NOT violating any law per se: since our censors will always allow their unrestricted use to their commercial masters (for sniffing purposes), they can't have everything: "le beurre, l'argent du beurre, le papier pour l'emballer et les faveurs de la bergère".

  • be invisible
    Learn as quickly as possible some sound anonymity technique, and yet remember that there is NO true anonymity on a web where your provider knows how many freckles you have on your nose (and elsewhere), and will immediately and gladly deliver all your data and home address to anyone that claims to be "authorized" to have them. That's incidentally also a good reason to learn how to connect to open (or closed if legit where you live, he) wifi access points on the fly.

  • be somebody else
    Merge among the zombies as much as you can. Have a windows partition on your box (in order to look like a moron: l'apparenza inganna), if possible have windows on a physically separate specific hard disk; Of course you'll keep no real personal data whatsoever there, but instead a completely fake -but credible- personality; Build it with a fake and boring blog, facebook and twitter presence, homepage with utterly banal photos of your cat/spouse/new car. Use *that* guy or chick to browse around and download stuff. Mirror it somewhere before viruses, rootkits and malware trash down that toy operating system and simply reinstall the whole bazaar every couple of weeks just in case.
    Surfing with windows and MSIE explorer is a slow nightmare, but your mimicry levels will be quite high.

  • your box shouldn't really exist in the records anyway
    Ça va sans dire: buy your box cash in another town and give fake data or none for the insurance (cashier ticket is anyway insurance proof in the EU). When browsing remember that there is a staggering amount of personal data on your box that you are delivering at every connection. While you can change some of this (for instance your wifi MAC using any MACchanger program), you still can be easily individuated trough your many hard disk, motherboard, graphic card and so on specific numbers by anyone and his cat. In general the old rule applies: "On the web you never give real info about yourself, no matter how pressing they are and you should always lie so much that your falsehood cannot possibly be outdone"



Thank-you for inviting me. I do love the ccc and owe more than I'll ever able to repay to my disappeared friend Wau Holland (that I will anyway soon join wherever reversers choose to dwell after they leave for good)
I do like reversers, and crackers and hackers. Oh boy, I do. Always did. Sons of the light: they deserve some cosmic searching power. And some weapons. Let's see what we can do today.

We'll examine some rather effective searching techniques.
As a proof of concept we will search some mp3s and some books Today I suggest books about a very powerful weapon: rhetoric, one of the many useful sciences that are in full decadence in societies like ours, dedicated to the commercial exploit of slaves, guinea pigs and zombies, whose useless lives find fulfillment in consuming useless stuff, working themselves to the bones in order to buy a car with a different color.
Of course you'll be able to apply similar book searching approaches to any other kind of books you might fancy. Don't worry, it doesn't need to be rhetoric :-)

Searching such simple targets (music and books) we will, maybe, at the same time demonstrate
  1. that everything is on the web;
  2. that you can search rather effectively using many innovative searching approaches;
  3. that some -ahem- "alternative searching paths" can be quite useful for the educated seeker.
Of course our searches are not going to be limited to just music and books.
Seekers can always find anything: whatever, whenever, duh. Any image, any journal, any film, any software you fancy is somewhere on the nether void is at your disposal... anything that may have been digitized is indeed somewhere out there (one of the basic, if 'optimistic', laws of searching). Should you do not find it, don't worry, it is not missing: your searching strategy is wrong, try a different one.

MP3s and books searching is so easy...
Finding digitized files is like shooting the red cross.
But there's much more on the web: Solutions are there!

Not only all kind of "tangible" and "digitized" targets are available to anyone, but also all kind of solutions are there, at your disposal. And I don't mean just messageboard solutions on -say- how to port a proprietary driver to GNU/Linux.
I mean real concrete solutions for those "everyday" battles an educated seeker loves to fight. Here just three examples, but the educated seeker will know how to apply mutatis mutandis to anything he fancies:
  • STALKING: You want to counter a specific nasty politician? Say one of the many unwashed idiot MPs that want to introduce sniffing powers for private companies that want to sue torrent and file sharing downloaders?
    First of all it is easy to find, and already embarrassing enough, to document how he voted on past important topics. E.g.: abortion. Some countries allow you to check every single votation on the fly: IT;
    But with a little deeper seeking you'll find out (and document) all his nasty deeds during -say- the last 10 years. Will you find such "nasty deeds" at all? You bet: in order to have now enough power to propose draft legislation, your target most probably already went through all the smeary "sell myself" | "do favours to that crony" | "steal something here and there", practices that characterize most politicians of our oh so free society. Using a position of trust for dishonest gain has always been "l'esprit de la chose" for most politicians.
    A little stalking some luring and trolling, and of course a moderate amount of searching skills, are all you need to (at least try to) send him in Jail.

  • You want to counter and diminish those annoying ubiquitous advertisement panels that pollute your city?
    Someone, somewhere has posted experiences, best practices and ideas. Chances are that anyway 1) there is a single agency (illegally) monopolizing all billboard advertisements; 2) half of them are abusive and unpaid or no more paid and still there; 3) probably both points 1 and 2 imply local authorities connivance (read good ole corruption).
    Easy to find out and document, and maybe you'll even be able to counter or annoy the responsibles... once you know how to search.

  • You want to free once for all a nice square of your town from those stinking private cars?
    Someone, somewhere has already done it. Find the many methods available... I'll give you one of the most effective as example: buy a cheap 20 meter long stacheldraht (barbed wire) roll (available in any DIY "brico" shop), cut its barbed knots out, then walk around your preferred square/street/park seeding, using for instance with a loden coat (very useful kind of vestment: has "through" pockets that allow your hand to rummage underneath the coat) or just bike around with a holed box full of barbed knots on the backpack, seeding knots as you go. Rinse and repeat every week or two, like the good "semeuse": those small knots are quite effective tire killers. Most driving maniacs will soon stop polluting the area with their cars. Of course you shouldn't do anything like this if deemed illegal in your habitat. Please check first.
Synergistic cosmic power: Ideas, Methods, Techniques, Lines of attack, Tactics, Approaches, Strategies...red
Gimme a knot...

 
...and I'll seed a nice square without stinking cars

A propos mp3 and books: a small caveat: Somebody told me long ago -politely- that we should not download songs or books that are patented, or have not yet been released in the public domain. OK. I can understand that, as strange as it sounds (because, silly me, I always understood copyright as "the right to copy").
Therefore we will today just have an innocent look at our targets on line. More generally, if your local political clowns forbid it, don't download forbidden fruits, please, (and if you insist in doing it, do practice first some basic anonymity precautions... see the caveats above.

red (Torpark, Ubuntu on USB ---> Casper Cow, wardriving and downloading...). Note also that NOT all countries respect the patent mafia's conventions. In S.Marino and Somalia, just to make two examples, you could download whatever to your heart content. So just find a proxy from one of those countries and the proxy will download, fully legit, the kind of stuff you of course won't touch. The educated seeker does not need to violate any law. And laws, in our oh so nice society, are just made and carefully drafted in order to punish the unwashed and to allow anything for those in the know. So the educated seeker just knows how to find loopholes, even quicker and better than "those in the know"  :-)

Ah yes... but what is actually an "educated seeker"?


The educated seeker
top

Aut inveniam viam aut faciam!

What is an educated seeker?

A most rare and rafinate searcher. Able to defend himself against the many commercial beasts that pollute our web. He knows how to slain the moronic SEOs spammers red..., how to melt inside the unwashed idiots' "twitter" and "facebook" sites to hide information red.... He knows how to pick the fruits of the forest, gliding in the shadows between protocols, and how to download all kind of targets red....
He know< how to use usenet in order to find, or troll for information. He knows the difference between a short term and a long term searching approach, how to mine the deep web and all the other basic techniques.

He also masters some advanced searching techniques like for instance combing, stalking and luring.

Even more important he has some powerful weapons: his operating system, his browser (his "sword") his anonymity and proxy knowledge. red...

A most important weapon are the searchstring "arrows" that he carries in his quiver :-)

But the most important "characteristic" of the educated seeker is that he does contribute and brings his small (or big) crumb to the collectivity red....


The educated seeker's box
top

Estote parati!

browser     ••     OS     ••     tools



Please excuse if this might sound (and be) banal for many in this audience. Just ignore it :-)
Unfortunately I know very well how many friends are still using windows. Note also that using windows is perfectly sensible (in a virtual box) for cracking and gaming, especially gaming, for obvious reasons, due to our own mistaken choices in the past and notwithstanding the fantastic recent improvements of the amazing wine approach...


Your main Browser (should be Opera btw) is of tantamount importance for searching purposes, and can do wonders, if correctly trimmed...
red (HOST file, Proxomitron...)


So WHY? Why "should your main browser be" Opera?
Well, there are many "philosophical" schools about that, and there are few things as boring and annoying as "browser wars" and biased browsers aficionados that defend "no matter what" their own browser choices. And yet I'm gonna do something dangerously similar right now: over the last 10 years I have gathered some experience with different browsers and (believe I) can tell you the following:
Opera is speed and opera is a configuration dream.
With SPEED I mean that there's no other browser on earth (bar elinks of course) that can give you better speed when searching and more generally browsing, especially when you take care to learn some fundamental keyboard shortcuts. This is no minor feat in a web that is getting more and more overloaded with frill and useless "graphic" content and where the real info must at times be digged out laboriously from some "dark" sites that dwell in the remaining web-caves, often under the horizon.
Granted:elinks offers ven more speed, but, frankly, I have yet to see someone really using elinks for a complete session and not just to specifically probe a specific site :-)
With CONFIGURATION DREAM I mean that there's no other browser on earth (bar maybe firefox with a TON of additional plug-ins, that will allow you to finetune the browser configuration to your own specific needs. Just have a look at the configuration option available for Opera (Tools → Preferences → Advanced AND Tools → Appearance) and compare by yourself with the configuration options available for Firefox out of the box (Edit → Preferences → Advanced).
I know, I know, there are some very good plug-ins (e.g. the "Live http headers" for firefox) but for seekers that have continuously to use different boxes on the fly it's a tag annoying to have to carry "their" finetuned firefox around on a stick or to finetune on the fly a new box's firefox, isn't it?
Then there's another fundamental aspect: anti-frills weapons (that as stated we need more and ore to simply survive a browsing session among the commercial crapola and the sticky morasses of moronic frills). Opera's NO IMAGES ON THE FLY and Opera's BLOCK CONTENT options are excellent, out of the box solutions for this serious problem.

Just take these info cum grano salis and try them out on your own, then judge and decide by yourself! A seeker also uses a main browser, but keeps always a couple of other browsers ready to check -whenever necessary- a specific or suspicious site. So you should imo learn well how to use at least a couple of browsers. Simple ones à la Galeon and more complex ones like Opera and Firefox. What you should ALWAYS avoid is MSIE, Microsoft explorer, that is more a sniffing malware than a browser: "Using MSIE is tantamount to walk barefooted in a Mine field" wrote +Alistair long ago :-)



Your Operating system (should be a GNU/Linux distro btw) is of tantamount importance for searching purposes, and can do wonders, if correctly trimmed...
red (happy to waste money).


So WHY? Why "should your main operating system be" GNU/Linux?
Well, there are many "philosophical" schools about that, and there are few things as boring and annoying as "OS wars" and biased OS aficionados that defend "no matter what" their own choices. And yet I'm gonna do something dangerously similar right now:
GNU/Linux is SECURITY
D'you really want that awful "Norton" can of worms running in the background just to ask you after a short while to pay money? First of all it does not make sense even in windows, where you have plenty of BETTER programs for free instead to loosing time with the crap that a laptop/netbook/you name it offers nowaday when you leave the shop smirking for having paid a license you don't need:

openoffice FULLY substitute ms-office, especially when you take into account that open office is free and will always be free in all its frequent and quite interesting updates, while in 90% cases (unless you really paid through your nose when you bought your box) MS-office has some "activation" moment and its "free" use last only 60 days.

Just the time to get acquainted with it and lull you into inertia: "Ah well, everybody is using MS-office, and I don't really need to go through the "save into doc format" routine (that doesn't take any time at all btw), let's stick with what all morons use, after all for me 40/60/whatever euro mean nothing". Clearly the clowns at Microsoft feel the pressure of Richard Stallmann and his glorious FSF!

Equally, AVG (for instance, but there's even better for windows) will cover your antivirus needs much better than the Norton intrusive and sniffing can of worms...

But you can easily find through some -ahem- "grayer areas" of the web complete symantec suites fully regged if you really insist and if it is legit to install them in your country (it is... in more countries that you would suspect)...

But the really interesting thing for those still asleep inside the windows inertia is that they don't even realize (as we all jolly well know) that once you switch to GNU/Linux you don't need at all (well at least atm) an antivirus. So that only the most security concerned buffs care to install the (good powerful and existing) GNU/Linux antivirusses...
Amazing, isn't it? People choose an inferior, less powerful, slower, bugs-encumbered system (windows xp and windows 7, we don't even examine the huge flop that was Vista) against a fantastic OS for seekers: GNU/Linux.

No if you want my personal advice (but I don't want to start a useless distro-war), I would say: Start with a Debian distribution (e.g. Ubuntu), try Fedora and try Sabayon. tools

Your Main internet tools should be powerful, ready and their syntax well known to you. A good tracer, netstat and many other tools are of tantamount importance for searching purposes, and can do wonders, if correctly trimmed...
red (mtr netstat-putan/putain...).

   
   
   
   
   
   
   
   
   

When searching the educated seeker also always considers the fact that the main search engines DO NOT overlap too much and yet that they cover together (at best) only 1/4 of the web, this may be quite significant when deciding your search strategies.
Many clueless zombies consider "searching the web" tantamount to digit one term inside google and then clicking enter. Such a simplistic approach is wrong, and not only for the "one-termness" of it. The real problem is that google covers only a small part of the web.
In order to access a bigger part of it you will need to use techniques that go from stalking to social engineering, through trolling and passwords breaking.
     
coverage



Let's analyze some simple music queries
top


Pulling some MP3-webbits out of the web
(a "Webbit" is a "Querystring Rabbit" out of a magician's hat")

A quick list of the most important operators:
Yahoo operators:
site: hostname: link: linkdomain: (links that points to one domain) url: intitle: inurl: (a specific keyword as part of indexed urls, example: inurl:searching)
intitle & inurl are VERY important parameters... nomen est omen: redimages giotto5.jpg...

Google's operators:
site: allintitle: (all of the query words in the title) intitle: (that word in the title) allinURL: (all of the query words in the URL) inURL: (that word in the URL) cache: link: related: (pages that are "similar" to a specified web page) info: (google's info)

Altavista's most important operator:
NEAR (more on this later)

MSN Live's operators:
contains: Restricts results to sites that have links to the file type(s) you specify. For example, to search for websites that contain links to mp3 files, type music contains:mp3. filetype: Returns only web pages created in the file format you specify. Live Search recognizes html, txt, and pdf extensions. Live Search also recognizes the extensions for primary Office document types. For example, to find reports created in PDF format, type your subject, followed by filetype:pdf. For example, type information filetype:pdf. inanchor:, inbody:, intitle:, inurl: Returns pages that contain the specified term in the anchor, body, title, or web address of the site, respectively. Specify only one term per keyword. You can string multiple keyword entries as needed. For example, to find pages that contain google in the anchor, and the terms black and blue in the body, type inanchor:google inbody:black inbody:blue. ip: Finds sites that are hosted by a specific IP address. The IP address must be a dotted quad address. Type the IP: keyword, followed by the IP address of the website. For example, type IP:80.83.47.151. language: Returns web pages for a specific language. Specify the language code directly after the language: keyword. link: Finds sites that have links to the specified website or domain. This is useful for determining who links to whom. Do not add a space between link: and the web address. For example, to find pages that contain the word games and that link to searchlores.org, type games link:searchlores.org   linkdomain: Finds sites that link to any page within the specified domain. Use this keyword to determine how many links are being made to a specific page, as well as how those links are made. For example, to see pages that link to searchlores, type linkdomain:searchlores.org. linkfromdomain: Finds sites that are linked from the specified domain. Use this keyword to determine how many links are being made from a specific page, as well as how those links are made. For example, to see pages that are linked from my site, type linkfromdomain:fravia.com   loc:, location: Returns web pages from a specific country or region. Specify the country or region code directly after the loc: keyword. To focus on two or more languages, use a logical OR and group the languages. For example, "core python" (loc:RU OR loc:CN)   prefer: Adds emphasis on either a word or another operator. For example, type searching prefer:internet   site: Returns web pages that belong to the specified site. To focus on two or more domains, use a logical OR and group the domains. Do not add a space after the colon (:). You can use site search for web domains, top level domains, and directories that are not more than two levels deep. For example, to see web pages about media reporting from the BBC or CNN websites, type "media reporting" (site:bbc.co.uk OR site:cnn.com). You can also search for web pages that contain a specific search word on a site. For example, to find the library pages on searchlores, type site:www.searchlores.org/library feed: Finds RSS or Atom feeds on a website. For example, to find RSS or Atom feeds about web searching, type feed:"web searching"   hasfeed: Finds web pages that contain an RSS or Atom feed on a website. You can add search words to narrow your search. For example, to find web pages on the Guardian website that contain RSS or Atom feeds about google, type site:www.guardian.co.uk hasfeed:google   url: Checks whether the listed domain or web address is in the Live Search index. Do not add a space between url: and the domain or web address. For example, to verify that searchlores is in the index, type url:searchlores.org  
Most important MSNLive operator:
linkfromdomain: (an outbound links operator)



We'll now use as an example the intitle: operator.
The structure of the following old -and already "blunt" red- mp3s webbit, has various interesting characteristics, that may be used to exemplify general webbits' structures and purposes.

Click to try
1
2
3
4
5
6
Try s.e. swap!
High precision
beatles
imagine
mp3 OR ma4 OR ogg
intitle:"Index of"
-metallica
+"4.2M"
On google
High recall
lavigne
 
mp3 OR ma4 OR ogg
intitle:Index.of
-beatles
+"4.4M"
On Yahoo
 
group
title
format variants
index of in title
spamkiller
variable parameter
(guarantees length)
 
  1. The "group" (or singer name) is mandatory.
  2. Simply specifying a "title" adds precision and loses recall (precision and recall are -most of the time- inversely proportional). (This means that if you add your target's title you diminish excessive noise but may miss some target sites).
  3. The "format variants" will guarantee a broader spectrum. If the search engine you are using is heavily censored (as it happens more and more often) just eliminate the mp3 parameter. Chances are that some (yet) uncensored ma4 or some ogg file will be present inside our "real target" (mp3 censored music lists), and that these "ogg oddballs" will allow their retrieval. When they will censor ma4s we'll invent something else :-)
  4. The intitle:"index of" (or intitle:index.of, which is the same but avoids two key-presses) is mandatory, and -spammers notwithstanding- still allow fairly decent results. Of course the intitle: operator is to be used with google and yahoo, check the different operators for the other search engines, or just use a more simple (and spammed) "index of" string snippet.
  5. The -metallica (or -beatles, or whatnots) serves as a spamkiller, because many clowns still try to fish zombies out of the knowledge web uploading huge lists of groups' names. If you're going for high recall, then re-launch the same query with a different singer acting as spamkiller.
  6. Finally we come to the LENGTH parameter, which not only guarantees the presence of at least some Megabyte heavy mp3, thus cutting away all the irrelevant noise of those bogus "index of" spammer sites with "small snippets" of music, but also can be varied ad libitum and will thus guarantee you hours of fishing pleasure. You may try for instance all variations in the range 1.6M - 6.4M. I suggest starting in the range 3.5-4.5 (which is a good signal ratio for mp3s) and moving upward if you are an optimist and downward if you are a pessimist.


A more recent webbit:

-inurl:htm -inurl:html -inurl:jsp -inurl:php -inurl:pdf -inurl:asp -inurl:txt -inurl:shtml -inurl:phtml -inurl:cgi -intitle:free -intitle:download -intitle:archive +intitle:index+of/ +parent-directory +name +"last modified" +size +description (oasis OR shakira) (mp3 OR wma OR m4a) -download

Note the useful spamkiller -download and the fact that we search only for (mp3 OR wma OR m4a). You may add ogg files as well. In fact there's no point in searching those infamous Ipod's m4p. However, should you happen to download some of those Ipod's "fairplay infected" AAC files, you may use a program like JHymn to play them wherever and whenever despite their ridiculous patents, and/or converting them on the fly to a more useful (and unprotected) m4a format.

Here a very simple, extremely short and yet quite useful query. You can launch this (or a very similar one) with any good search engine... even with google...
shakira "4.6m * snd"
"Grown up" humans will probably substitute "shakira" with -say- "bach" :-)
Note -again- the added "variable" parameter +"4.6m". which is quite important in order to reduce noise and spam (again: you can modulate as much as you fancy: 4.5, 4.6, 6.2 etc.)


Today's targets (rhetoric galore)
top


How to get inside libraries at night

Ah, Rhetoric! Maybe the most powerful weapon for a reverser. Allows to easily see through the fog of media that always belong to someone with a biased agenda.
Some important, basic books, are there for the take, scattered on the deep web, our fathomless cornucopia of knowledge.
The most powerful ones offers imho a sound knowledge of rhetoric (especially euphemisms, which nowadays can even betray a rather comic sarcasm)
The following books about rhetoric will today serve for our "example queries".

Note that finding books on the web is extremely easy.

In fact there's even a direct relation between the "celebrity" of a target and how easy it will be to find it on the web.
For example: if you want hic et nunc the "Lord of the ring", you just search for a passage of it: "They could see little, for the night was now so deep that they were hardly aware of the stems of trees before they stumbled against them.". See? Even the heavy censored MSNSearch gives us some results. This is true for all "celebrity level" books: "But few of any sort and none of name" and you immediately have Shakespeare's "Much ado about nothing" (and, I may add, you discover en passant a lot of book repositories as well).

Let's start our bookish search with a simple yahoo webbit: title:index title:of -originurlextension:htm -originurlextension:html -papers -copyright +rhetoric

Now let's limit the same search to pdf files, a very stupid format vis-à-vis html files, but preferred by self appointed "scientists": &vf=pdf

Hence finding well known books, patented or not, is extremely easy. Yet today's more rare and difficult "examples" will give us -I hope- an opportunity to investigate some alternative searching methods... some different ways of cutting the noise and getting the signal without just using google (or any other among the many main search engines).

The following three books will be our 'virtual targets' One of them is even in German, which will suit many among those here today: the German rhetorical school is by far the most important one of the planet, after all (see the "assignment" below)
  1. Handbook of classical rhetoric in the Hellenistic period (330 BC -- AD 400) by Brill
    1997
    blah blah
  2. Historisches Woerterbuch der Rhetorik by Gert Ueding and Gregor Kalivoda Franz-Hubert Robling
    Max Niemeyer Verlag, (8 Baende) Tuebingen 1992 - 1994 -1996...
    blah blah
  3. Quarterly Journal "Rhetorica" by the International society for the history of Rhetoric
    blah blah

Moreover there is such a wealth of free useful texts on the web that this fact alone (not the fact that you can easily find all these patented books for free) makes one wonder whether nowadays it makes any sense at all to "buy" a book.

Again a caveat: the following queries are just a proof of concept, showing how we could search for books... the specific "quarries" we are using (in this case our three "rhetoric" books) don't matter that much: you'll be able to adapt the following approaches to OTHER, different, targets of yours... replace for instance "rhetoric" with -say- "python" or "digital photography" or "assembly" and you'll obtain a different complete library instead.
The approaches and the techniques we are examining together are important, the targets themselves are infinite.

Proximity galore

Another interesting side effect of a correct web-seeking approach is that often, when searching for something, you will find on the same servers many other targets related to your topic, targets that you did not even know existed. While this is true for many different targets and not only for books, we still call this the "being inside the library" effect. Imagine you are in a library, but not filling out a request form at the counter: imagine you are physically retrieving a book inside a library, with shelves and shelves of books around you, within your immediate reach. Thus you can scan with your eyes all other books in the proximity: books, more or less related to the topic you are searching for, that are, imagine again, physically located on the same shelf, next to your target book and that you can therefore also pick up and examine at leisure. Well this is by all means possible (in fact even more easily than in an actual library) in our virtual netherworld!
Gee... the amazing power of knowing how to search!

Enough theory! A good webbit for our rhetoric books? Here a "regional" one (say: China): rhetoric site:.cn "index+of"

Now let's leave theory and enter practice...

Today's targets (effective ways of books searching)
top


"A posse... ad esse" :-)


Yep: searching for books on the web is -most of the time- extremely simple, especially if you have the exact title, the name of the author, the ISBN number and/or some snippets of text.

As per April 2009, I would suggest the following "basic" methods:
  1. General global book searches red
  2. Filesharing (à la rapidshare) searches red
  3. Known good repositories searches red
  4. Googlebooks and Amazon cracking searches red
  5. IRC channel searches red


Potpourri searches
top


Alltogether now!



We can also search all our titles together with our nice "potpourri" approach.
At times simply guessing that interesting places MUST have all your targets on the same page can be useful for combing purposes... in order to find these interesting places and also your targets :-)
Here a "potpourri" example: on yahoo Core Python Programming" "Python Cookbook" "Python Essential Reference" (note the &vst=.org&vs=.org&n=100 snippet in the search string)
and on google: "Core Python Programming" "Python Cookbook" "Python Essential Reference" (note the &as_sitesearch=.org&num=100 snippet in the search string)

When you search the web, the biggest problem is noise. Your target, your signal, will be often half-drowned underneath it.
And today's web has a lot of commercial noise.
If you search for an image for instance, say a picture of a famous painter, you will immediately find a gazillion spammers who want to sell you those very images you could easily find for free.
So you'll find thousand of low resolution images, or images defaced with an ugly watermark, put on line by "snake oil" sellers. So "cutting the noise" is crucial.
This holds true for everything, and for books as well. Hence we must "clean up" our queries a little, and, as the title of this conference: "searching underneath the commercial web" implies, a simple approach is for instance to limit our searches to "edu" and "org" sites.

The first query above will give us this link that brings us in this subdirectory, and the second query will give us this "blocked link", yet through google's cached copy we will still be able to find this nice Russian site.
(Note in this example the importance of cached copies. In fact many important search engines offer them: Ask, MSNSearch, Yahoo, Google, Alexa, Baidu, Gigablast...).

So, that's it: as you have seen, any and every book is at seekers' disposal.

(Caveat: Download your targets only if you are positively sure they have been released on the public domain. Else limit yourself at reading them on line. In general real seekers do not need to waste their hard disk space downloading doubtful stuff, downloads that could -ludicrously enough- even be constructed as 'illegal' by the patent holders' mafias and their political lackeys. But Downloading is not necessary! Seekers will always find again and again on the fly -and consult on line- whatever they fancy).

Ok, ok. It was too easy. Much too easy. In order to continue, let's imagine for a moment that it would NOT have been so incredibly easy to find these books through such simple searches. Imagine we didn't find them. Imagine we will not find them again on those URLs.
After all, the web is a quicksand, and the specific locations where we found our targets will probably quickly disappear after today's talk :-)


Look Ma: no google!
Go for the format, go for the name, do it like the lamers or search elsewhere
top


Sprinkles of cosmic searching power

Let's find again the same three targets WITHOUT using the same simple querystrings