Factbites
 Where results make sense
About us   |   Why use us?   |   Reviews   |   PR   |   Contact us  

Topic: HtDig


Related Topics

In the News (Thu 17 Dec 09)

  
  Ht-//dig - Wikipedia, the free encyclopedia
The correct title of this article is ht://Dig.
ht://Dig is a free software system for indexing and searching a finite set of sites or an intranet and is licensed under the GNU General Public License.
ht://Dig is used by the GNU Project's website, gnu.org.
en.wikipedia.org /wiki/Htdig   (144 words)

  
 Htdig Overview   (Site not responding. Last check: 2007-10-12)
Htdig is a type of search robot that retrieves HTML documents using the HTTP protocol and gathers information from these documents, which can later be used to search them.
In this process, the htdig acts as a regular Web user, except that it follows all hyperlinks that it comes across within the domain on which it needs to gather information.
When reading the documentation on www.htdig.org, realize that htdig is only one component of the larger ht://dig system.  Therefore, we recommend that you familiarize yourself with ht://dig as a whole before focusing on technical issues regarding only htdig.
www.verio.com /support/documents/view_article.cfm?doc_id=3912   (797 words)

  
 ht://Dig: Configuration file attributes
htdig will only record those descriptions which are shorter than this length.
When htdig is run in initial mode, documents which were referred to but could not be accessed should probably be removed, and hence this option should then be set to TRUE, however, if htdig is run to update the database, this may cause documents on a server which is temporarily unavailable to be removed.
This directive tells htdig to ensure a server has had a delay (in seconds) from the beginning of the last connection.
www1.appstate.edu /htdig/attrs.html   (4298 words)

  
 Re: htdig: Htdig and wwwoffle
Hi, I have now installed htdig (3.08b2) and tried to use it, this raises more questions.
If htdig could read from stdin then it would be better.
I was planning to use the search forms etc. that you provide, this would ensure that htdig gets publicity when it is used as part of wwwoffle as well as me not having to understand exactly how it works and what the command line parameters are.
www.mail-archive.com /htdig@htdig.org/msg07640.html   (470 words)

  
 ht://Dig: Configuration file attributes
Set this to tell ht://Dig to access files only through the local filesystem, for URLs matching the patterns in the local_urls or local_user_urls attribute.
Ht://Dig version 3.1.3 and later include a work-around for this bug such that when acroread is the parser, and the -pairs option is not given, the second parameter will be the output directory rather than the output file name.
If htdig is run with the -l option and interrupted, it will write out its progress to this file.
search.uni-heidelberg.de /htdig/local/attrs.html   (7856 words)

  
 ht://Dig Frequently Asked Questions
You can either mail the ht://Dig mailing list at or better yet, report it to the bug database, which ensures it won't become lost amongst all of the other mail on the list.
htdig needs a fairly modern version of db, which is why it ships with one that works.
If htdig and htmerge have run to completion, and the problem still occurs, this is usually an indication of a corrupted database.
search.uni-heidelberg.de /htdig/local/FAQ.html   (4583 words)

  
 ht://Dig Installation and Usage
ht://Dig is a free web crawler and search engine developed at San Diego State.
ht://Dig also gives a very useful report on links which returned errors; this can be used to check your site for broken links.
I expect that most installations will not be interested in getting down to that level, but information on how to do it is in the htsearch documentation.
ls.berkeley.edu /~tom/htdig.html   (1151 words)

  
 HTDIG installation
Ht://Dig is an indexing and search engine useful for a specific domain or intranet that is capable of searching the World Wide Web.
The htdig web site addresses this problem, saying this message results from the fact the index is not building (duh!) and advising that you check the verbose output to see why the site can't be indexed.
We tried to fix this by altering the db.words.db file (we even tried adding words manually!), replacing it with a copy, and even dumping it altogether and making a new one that was empty in hopes that running rundig again would populate it.
www.ils.unc.edu /~mercv/inls183/Oldstuff/ex8htdig.htm   (942 words)

  
 ht://Dig Frequently Asked Questions
You can either mail the ht://Dig mailing list at or better yet, report it to the bug database, which ensures it won't become lost amongst all of the other mail on the list.
In ht://Dig's terminology, the settings in its configuration files are called configuration attributes, to distinguish them from command line options, CGI input parameters and template variables.
If htdig seems to be missing the last part of a large directory or document, see question 5.1.
www.utexas.edu /local/doc/htdig/FAQ.html   (9345 words)

  
 htdig   (Site not responding. Last check: 2007-10-12)
htstat − returns statistics on the document and word databases, much like the -s option to htdig or htmerge.
Htdig retrieves HTML documents using the HTTP protocol and gathers information from these documents which can later be used to search these documents.
This manual page was written by Robert Ribnitz, based on the HTML documentation of ht://Dig.
www.math.ucla.edu /computing/docindex/htdig-man-81.html   (168 words)

  
 Library HtDig Project
RightNow Technologies is a strong participant in the HtDig Group and is leading the development of HtDig 4.0.
HtDig 3.2b6 (with RNT patch 5) with libhtdig, libhtdigphp
CLucene is a candidate for incorporation into HtDig 4.0 to replace the core search/store code.
opensource.rightnow.com /htdig.php   (164 words)

  
 htdig(1): HTML documents for ht://Dig search ... - Linux man page
Tells htdig to append.work to database files, causing a second copy of the database to be built.
Tells htdig to send the supplied username and password with each HTTP request.
It is updated and maintained by Robert Ribnitz and based on the HTML documentation of ht://Dig.
www.die.net /doc/linux/man/man1/htdig.1.html   (606 words)

  
 A way to use htdig with php
This page is an htdig contribution will show you a way to set up htdig so that you can use it with php.
ht://dig (www.htdig.org) is a popular server side tool to index websites so that they can be searched without ads.
If you are hosting with us, then this is much simpler, and please folow the instructions at our htdig with webhosting information page.
www.computerengineering.ca /a_way_to_use_htdig_with_php   (1044 words)

  
 Searching Web Sites with ht://Dig
Nevertheless, it can be considered a very powerful engine offering a wide range of options that are easy to configure and which are all superbly commented on the htdig homepage.
The fuzzy search, which uses the endings script and the endings dictionary to create a database from all the words found, and with which all possible forms of a word in the relevant language are recognized and included in the search results.
Unfortunately, the ht://Dig site uses frames, which means that most of the links below will take you to the respective frame only but without displaying the necessary navigation bar on the left.
www.suse.de /za/private/support/online_help/howto/htdig/index.html   (2550 words)

  
 Installing htdig   (Site not responding. Last check: 2007-10-12)
Htdig works with the Web server apache to create a database of terms that people can use to search your website.
Unfortunately, in the default installation on many Linux distributions htdig contains hard-coded paths indicating that the htdig files are to be found in weird locations.
The compilation script still may ignore the specified parameters and build a copy of htdig that expects a configuration file to be in some weird location, such as the /usr/local/htdig/conf directory.
brneurosci.org /linuxsetup78.html   (439 words)

  
 Forum OpenACS Q&A: Response to htDig
Since the address is the localhost, the public and the h4x0r should not be able to gain access to your system.
Your strategy should work, but may have small problems: since htDig won't maintain cookie state everytime anything wants the user id it will have to go through a code patch that encounters your "fix".
Point htdig towards your proxy, and have your proxy httpget the actual pages and return them to htdig.
openacs.org /forums/message-view?message_id=25313   (241 words)

  
 Implementing and Using HtDig
Htdig is a tool that provides search functionality for your web site.
Htdig includes programs that will search and index your site.
While the htdig index is running, your site will not have search capability.
www.oekosoft.ch /services/htdig.html   (250 words)

  
 ht://Dig: Where to get it
The current beta release of ht://Dig is 3.2.0b6.
The ht://Dig source releases are available from multiple sources around the world.
The latest documentation of ht://Dig is always available at http://www.htdig.org/ or any of the ht://Dig mirrors.
www.htdig.org /where.html   (171 words)

  
 ht://Dig: Configuration file attributes
Even with this attribute set, htdig still strips out all white space (leading, trailing and embedded), except that space characters embedded within the URL will be encoded as %20.
This enables htdig to index documents with URL schemes it does not understand, or to use more advanced authentication for the documents it is retrieving.
Determines whether htdig will continue to index URLs from a server after an attempted connection to the server fails as "no host found" or "host not found (port)." If set to false, htdig will try every URL from that server.
www.math.temple.edu /doc/packages/htdig/htdoc/attrs.html   (7073 words)

  
 Forum OpenACS Q&A: htDig
htDig will still create the index, but when site users attempt to go to the protected pages from the search results, they are then required to login.
If you can customize it you should be able to identify your htDig and setup robot detection to allow htDig to see those parts of your site.
Htdig has been around for quite sometime I am sure they have done this or someone out there on the net had done this.
openacs.org /forums/message-view?message_id=25311   (591 words)

  
 Debian -- htdig
The ht://Dig system is a complete World Wide Web indexing and searching system for a small domain or intranet.
Instead it is meant to cover the search needs of a single company, campus, or even a particular subsection of a website.
Please note that ht://Dig is a resource-hog, with respect to processor usage, when indexing.
packages.debian.org /unstable/web/htdig   (232 words)

  
 ASPN : HtDig::Config 1.01   (Site not responding. Last check: 2007-10-12)
ht://Dig allows you to specify a configuration file to use when beginning an indexing run or doing a search, thus allowing you to maintain multiple databases of indexed web pages.
The Config object's main job is to help you keep track of all the site configuration files you have, using a sort of "registry" to keep track of sites you've registered with ConfigDig.
If use_env_path is specified, the PATH environment variable is used to search for the htdig executable and the base htdig directory is determined using its location.
aspn.activestate.com /ASPN/CodeDoc/HtDig-Config/Config.html   (421 words)

  
 ISS X-Force Database: htdig-htsearch-infinite-loop(7262): ht://dig htsearch.cgi allows a remote attacker to cause ...
The ht://dig program is a Web indexing and searching system for intranets and small domains.
The ht://dig program in some Linux distributions is vulnerable to a denial of service attack, caused by a vulnerability in the htsearch CGI.
A remote attacker can pass a -c parameter to ht://dig and specify a file such as /dev/zero to cause the system to enter an infinite loop.
xforce.iss.net /xforce/xfdb/7262   (487 words)

  
 Xaraya :: Installing HTDig to use for Xaraya Searching
That is, if you search for "loading after a core dump" a page all about "loading trucks to dump nuclear reactor cores" might come up and get a high rating when you're user is not interested in nuclear waste at all.
If you are doing a first time htdig install on a bitkeeper repository, go to section 2.
# # Since ht://Dig does not (and cannot) parse every document type, this # attribute is a list of strings (extensions) that will be ignored during # indexing.
www.xaraya.com /index.php/documentation/203   (2802 words)

  
 ht://Dig: Release notes
htdig now will display a list of errors when the statistics option (-s) is used.
htdig now sends the 'Referer:' header in HTTP requests so that any link errors will be logged in the server's log files.
The verbose display of htdig was enhanced to show '+' for a link that will be followed and '-' for a link that was discarded.
www.ou.edu /www/stevebackup/src/htdig-3.0.8b1/htdoc/RELEASE.html   (1590 words)

  
 ht://Dig - SearchTools Report
ht://Dig: Recognized META information How to set ht://Dig to recognize meta keywords, email addresses and other information.
List of more than 35 ht://Dig installations, including number of servers, of documents, of words; update frequency; number of hits per day; index size, primary use (intranet, educational, etc.), and problems.
Ht://Dig is open source and free, but can be hard to configure, slow to index and search large indexes, and provides minimal monitoring features.
www.searchtools.com /tools/htdig.html   (809 words)

  
 htdig Search
Note: When reading the documentation on www.htdig.org, realize that htdig is only one component of the larger ht://dig system.
Therefore, we recommend that you familiarize yourself with ht://dig as a whole before focusing on technical issues regarding only htdig.
The htdig FAQ also indicates how to restrict searches to certain folders, and other features.
www.teamits.com /internet/support/htdig.html   (455 words)

  
 ht://Dig for MPE/iX
ht://Dig is a freeware package for indexing and search web sites similar to what well-known services like Lycos, Altavista or Google provide.
This version of ht://Dig for MPE/iX is a successor to my initial port, which still can be found at the 3k.com ftp site.
Besides being based on a more recent version of the original ht://Dig freeware sources, it no longer uses the libbsd library from Jazz, and is thus available with full source code now.
www.editcorp.com /Personal/Lars_Appel/htdig   (160 words)

  
 htdig Log Analyzer
Sawmill is a htdig log analyzer (it also supports 722 other log formats).
It can process log files in htdig format, and generate dynamic statistics from them, analyzing and reporting events.
Sawmill can parse htdig logs, import them into a SQL database (or its own built-in database), aggregate them, and generate dynamically filtered reports, all through a web interface.
www.sawmill.net /formats/htdig.html   (186 words)

  
 Linux Headquarters: htdig Installation and Configuration: Internet Search Engine
htdig is a webpage search engine licensed under the GNU Public License.
    Once you have htdig installed, you must make a few changes to the configuration file and the HTML templates into which the search results are embedded.
The paths may be different if you choose to change the paths of them in your configuration file.
www.linuxheadquarters.com /howto/webserver/htdig.shtml   (431 words)

  
 SourceForge.net: htdig-general   (Site not responding. Last check: 2007-10-12)
htdig results from php with total control over the displayed results,
the htdig database and the displaying of the results.
The main idea is to request to htdig the results with a minimun and very
sourceforge.net /mailarchive/message.php?msg_id=19304821   (815 words)

Try your search on: Qwika (all wikis)

Factbites
  About us   |   Why use us?   |   Reviews   |   Press   |   Contact us  
Copyright © 2005-2007 www.factbites.com Usage implies agreement with terms.