portal.htm → mines.htm → koeln_2009.htm |
Version 0.05, Updated 1/MAY/2009, two days before you, dear Fravia, passed away
Current version 0.05a slightly edited and published by his web friends on 10 May 2009 | ||
![]() | (How to comb the web and other searching lore) "let's fetch anything we fancy for free" Fravia's talk at the SIGINT (Köln, Germany - 22-24 Mai 2009) (just grep for SIGINT in /usr/include/asm/signal.h :-) This file dwells @ http://www.searchlores.org/koeln_2009.htm •• Introduction and caveat √ •• The educated seeker √ •• The educated seeker's box (basic weapons) √ •• How big is the web? (external link) √ •• How to comb the web (external link) √ •• A mighty, underrated weapon: rhetoric (Today's target) •• Today's targets (effective ways of books searching) √ •• Potpourri searches •• Let's search elsewhere •• Conclusions & raccomandations √ •• Assignment •• Forms ![]() √... means that these part have been at least checked and some time already completely updated |
Caveat Some of the techniques and approaches described here might be illegal according to the specific legislation of the country you happen to (have to) live inside. No matter if this legislation is utterly stupid and most probably made by corrupted and lobbied unwashed that wouldn't understand a web protocol if it would bite their pants off, the rule NUMBER ONE for the educated seeker is NEVER to violate any law. So check your own legislative constraints and always respect them. Remember, however, that notwithstanding their pathetic attempts to "fence" nationally the web, Internet remains a truly frontierless international adventure, whose very STRUCTURE has been created in order to still manage to share information in case of an atomic attack... petty politicians "censorship" attacks against the web seem to me still "small fry" in comparison :-) So if you still want to experiment, you can often apply (some of) the following tips in order to (try to) remain legit:
|
MP3s and books searching is so easy... Finding digitized files is like shooting the red cross. But there's much more on the web: Solutions are there! Not only all kind of "tangible" and "digitized" targets are available to anyone, but also all kind of solutions are there, at your disposal. And I don't mean just messageboard solutions on -say- how to port a proprietary driver to GNU/Linux. I mean real concrete solutions for those "everyday" battles an educated seeker loves to fight. Here just three examples, but the educated seeker will know how to apply mutatis mutandis to anything he fancies:
![]() |
Gimme a knot...![]() ...and I'll seed a nice square without stinking cars ![]() |
Please excuse if this might sound (and be) banal for many in this audience. Just ignore it :-) Unfortunately I know very well how many friends are still using windows. Note also that using windows is perfectly sensible (in a virtual box) for cracking and gaming, especially gaming, for obvious reasons, due to our own mistaken choices in the past and notwithstanding the fantastic recent improvements of the amazing wine approach... |
When searching the educated seeker also always considers the fact
that the main search engines DO NOT overlap too much and yet
that they cover together (at best) only 1/4 of the web,
this may be quite significant when deciding your search strategies. Many clueless zombies consider "searching the web" tantamount to digit one term inside google and then clicking enter. Such a simplistic approach is wrong, and not only for the "one-termness" of it. The real problem is that google covers only a small part of the web. In order to access a bigger part of it you will need to use techniques that go from stalking to social engineering, through trolling and passwords breaking. | ![]() |
Yahoo operators: site: hostname: link: linkdomain: (links that points to one domain) url: intitle: inurl: (a specific keyword as part of indexed urls, example: inurl:searching) intitle & inurl are VERY important parameters... nomen est omen: ![]() Google's operators: site: allintitle: (all of the query words in the title) intitle: (that word in the title) allinURL: (all of the query words in the URL) inURL: (that word in the URL) cache: link: related: (pages that are "similar" to a specified web page) info: (google's info) Altavista's most important operator: NEAR (more on this later) MSN Live's operators: contains: Restricts results to sites that have links to the file type(s) you specify. For example, to search for websites that contain links to mp3 files, type music contains:mp3. filetype: Returns only web pages created in the file format you specify. Live Search recognizes html, txt, and pdf extensions. Live Search also recognizes the extensions for primary Office document types. For example, to find reports created in PDF format, type your subject, followed by filetype:pdf. For example, type information filetype:pdf. inanchor:, inbody:, intitle:, inurl: Returns pages that contain the specified term in the anchor, body, title, or web address of the site, respectively. Specify only one term per keyword. You can string multiple keyword entries as needed. For example, to find pages that contain google in the anchor, and the terms black and blue in the body, type inanchor:google inbody:black inbody:blue. ip: Finds sites that are hosted by a specific IP address. The IP address must be a dotted quad address. Type the IP: keyword, followed by the IP address of the website. For example, type IP:80.83.47.151. language: Returns web pages for a specific language. Specify the language code directly after the language: keyword. link: Finds sites that have links to the specified website or domain. This is useful for determining who links to whom. Do not add a space between link: and the web address. For example, to find pages that contain the word games and that link to searchlores.org, type games link:searchlores.org linkdomain: Finds sites that link to any page within the specified domain. Use this keyword to determine how many links are being made to a specific page, as well as how those links are made. For example, to see pages that link to searchlores, type linkdomain:searchlores.org. linkfromdomain: Finds sites that are linked from the specified domain. Use this keyword to determine how many links are being made from a specific page, as well as how those links are made. For example, to see pages that are linked from my site, type linkfromdomain:fravia.com loc:, location: Returns web pages from a specific country or region. Specify the country or region code directly after the loc: keyword. To focus on two or more languages, use a logical OR and group the languages. For example, "core python" (loc:RU OR loc:CN) prefer: Adds emphasis on either a word or another operator. For example, type searching prefer:internet site: Returns web pages that belong to the specified site. To focus on two or more domains, use a logical OR and group the domains. Do not add a space after the colon (:). You can use site search for web domains, top level domains, and directories that are not more than two levels deep. For example, to see web pages about media reporting from the BBC or CNN websites, type "media reporting" (site:bbc.co.uk OR site:cnn.com). You can also search for web pages that contain a specific search word on a site. For example, to find the library pages on searchlores, type site:www.searchlores.org/library feed: Finds RSS or Atom feeds on a website. For example, to find RSS or Atom feeds about web searching, type feed:"web searching" hasfeed: Finds web pages that contain an RSS or Atom feed on a website. You can add search words to narrow your search. For example, to find web pages on the Guardian website that contain RSS or Atom feeds about google, type site:www.guardian.co.uk hasfeed:google url: Checks whether the listed domain or web address is in the Live Search index. Do not add a space between url: and the domain or web address. For example, to verify that searchlores is in the index, type url:searchlores.org Most important MSNLive operator: linkfromdomain: (an outbound links operator) |
(guarantees length) |
Go for the format, go for the name, do it like the lamers or search elsewhere |
You can search ftp, you can go local, or even better: regional. You can zap IRC channels and explore uncommon search engines |
Going "regional" is ALWAYS a very good idea when searching. We have already seen how adding a simple .ru to
our queries can help. But why Russia? WHERE should we search? Which are the, how should I say? the "less copyright-obsessed" countries?
Here
you can
see a interesting
"piracy subdivision" published this summer by
The
Economist.
We may as well use these 'scarecrow' data (produced by US-lobbyist Robert Holleyman's "Business Software alliance" in order to scrap some money) for our own purposes... And look! As you can see, Vietnam, Zimbabwe, Indonesia, China, Pakistan, Kazakistan, Ukraine, Cameroon, Russia, Bolivia, Paraguay and Algeria seem to have a more relaxed attitude towards patent holders. Good to know :-) Here the relevant country codes: .vn, .zw, .id, .cn, .pk, .kz, .ua, .cm, .ru, .bo, .py and .dz, codes, that we could use to restrict searches only and/or especially to such relaxed places. Of course some of these countries are just tiny local niches, with next to no activity and extremely weak signals, and can be ignored: throwing our clever queries in -say- Zimbabwe or Cameroon we'll probably just wasting our (or our bots) precious searching time.. Let's say that -in general- .vn(Vietnam), .id (Indonesia), .cn (China), .pk (Pakistan), .ua (Ukraine) and .ru (Russia) look promising enough. We may add -out of our experience- Iran, Korea, Bulgaria and India (.ir, .kr, .bg and .in). So let's go local: let's visit China, where we can find, among hundreds other, for instance this link, that requires just some guessing capacity (or some understanding of Chinese :-) Of course we should also have a look in Vietnam, in Russia/Ukraine (where we will at once retrieve our Target and as many other programming books as you fancy), and here is how you would search in KOREA or in RUSSIA using MSN Search. Caveat: this was all just academically speaking, duh. Once again: seekers don't need to download anything from the web, since they can always find their targets again and again if and when needed :-P Searching through IRC channels and blogs can be -for specific targets- quite useful. However the ratio noise/signal is quite bad on these channels, and therefore IRC-searching and blog-searching is -in many cases- a waste of time if compared to more effective searching techniques. After all, and behind the hype, blogs are just messageboards where only the Author can start a thread, and IRC channels need, in order to be useful, a lot of social engineering. I'll just direct you to some blogs search engines and to some IRC search engines like this one. Nuff said. At times simply switching to less known (but quite interesting) search engines can cut mustard. Here's a related search with kartoo and here's another search using gigablast. Finally, since we are speaking of a programming language, we may also have a look at the recent google codesearch: return lang:python gives 283000 scripts, enough for some serious studying. Samo with MSNsearch macros. So we found our targets again and again using a palette of different searching colours. These are all paths that lead into the forest, and you'll be able to find many more on your own. Now let's go back to the theory. Quaeras ut possis, quando non quis ut velis Try out things! Nothing beats personal experimentation on the web. You have a good searching box? A good browser? Know all the various techniques to get to your info "from behind"? Now it's time to try your hand at some difficult search, and to document and describe your efforts (and your mistakes) so that others can take advantage from that (and help you ameliorate further). It is ONLY trough collaboration and sharing of information that we can continue to keep (wide) abreast of the commercial morons and the beastly SEOs. Some advice:
Nil perpetuum, pauca diuturna sunt An easy assignment for this evening: (just in order to practice the various techniques explained today, lest you forget everything by tomorrow morning): find a fundamental text: Lausberg Handbook of Literary Rhetoric This search should take you some time. It is NOT easy at all. You'll probably have to use some stalking, luring and of course combing techniques and approaches. But you'll find a LOT of interesting stuff -and book repositories- on the way there. That's for sure, future seekers! And now I'm finished. Thank-you for your patience. Any questions? SEARCHING THE PAST (DISAPPEARED SITES) http://webdev.archive.org/ ~ The 'Wayback' machine at Alexa: explore the Net as it was! Visit The 'Wayback' machine at Alexa, or try your luck with the form below. Alternatively, learn how to navigate through [Google's cache]! Alternatively a new "preservation" project from Webcapture: the International Internet Preservation Consortium is coming along. A quick tour of the main search engines... ![]() Uhhh.. almost forgot, a small book-searching present for those that solved the assignment (all others lazy scoundrels shouldn't even look): finding quickly (some minor) rhetoric related books back to portal back to top |