Frequently Asked Questions
Q) I have run the perl scripts from the command line
and I don't see anything happening. Is it working?
A) The Perl script is most likely working. There is
no echoing to the screen as the progress of the import.
When the import is complete you will be returned to the
command prompt. We advise running the import scripts in
a seperate 'screen' so that if your shell process terminates
or disconnects the perl script carries on working in the
background. You can check the progress by looking at the
database directly and seeing the records being added.
Q) How do I detach the import process using a 'screen'
A) On Linux/Unix, if you have the program called screen
(run 'locate screen' to see) then you can force the import
process to run in the background on the server on another
screen so that you can quit your SSH or Telnet session without
affecting the import process
To run the import process of content.rdf.u8 in a screen
for example:
screen -A -m -d
-S odp ./content.pl
To retrieve the screen use
screen -r odp
To quit the screen back to the shell use
CTRL A+D
Q) I don't have the Perl DBI modules. How can I get
them?
A) The Perl DBI modules can be found and http://www.cpan.org/.
The MySQL
documentation on the Perl API will give you more details
on the DBI and DBD::mysql interfaces and where to get them.
Q) How long should it take to import the RDF files?
A) This varies from server to server but our initial import
took approx 60 minutes for structure.rdf.u8 and 16 hours
for content.rdf.u8. On a clean server with no other running
processes this may be much faster.
Q) How do I update to a new RDF?
A) We don't have an update script. Unfortunately you will
have to empty the database and re-import
Q) How do I know if my server supports PHP?
A) Upload a file called phpinfo.php with the following in
it
<?php
phpinfo();
?>
Call this file from your web browser and you should see
information about your current PHP installation. If you
simply see this code again PHP is probably not supported.
However you can download our test
scripts to run a few tests on your server.
Q) What version of PHP do I need to use DWodp pro?
A) DWodp pro requires PHP version 4.1.0 or higher. This
software will not work on older versions of PHP because
of fundamental changes in the global variable handling.
Q) Does it matter if register_globals is disabled in
PHP?
A) No, DWodp pro was designed using the new super global
arrays so it doesn't matter if register_globals is disabled
or not
Q) Does it matter if safe_mode or open_basedir is set
in PHP?
A) It shouldn't matter, so long as you have shell access
to run the Perl import scripts then the PHP side of the
program does not rely any of these settings being on or
off.
Q) Can I run the Perl scripts from the browser?
A) We don't recommend or support it and we have not tested
it. As these scripts run for extended periods of time it
is likely that they will timeout if run through a web browser.
They should be run from the shell
Q) I don't have access to the shell through SSH or Telnet,
how can I use DWodp pro?
A) If you don't have access to SSH or Telnet then you might
like to consider using DWodp
live which queries live ODP data from their site. We
only recommend that server owners or dedicated server clients
run this software. Because of the size of the database created
and the load the initial import process takes on a server,
many virtual server hosts would not accept this software.
Q) I'm using 'short mode', how do I get rid of the index.php
part and just make it look like a directory?
A) If you are using Apache you can do this by removing the
file extension of index.php e.g. rename to 'directory' with
no file extension and then creating a .htaccess file in
the same directory with the following details contained
within.
<Files "directory">
ForceType application/x-httpd-php
</Files>
DirectoryIndex directory
Q) How do I change the templates?
A) All of the template files for DWodp pro are contained
within the templates directory with .tpl file extensions
- attribution.tpl - The ODP attribution
- category.tpl - The category template page when browsing
through the directory
- copyright.tpl - The Dominion Web copyright footer (must
be included in HTML output)
- footer.tpl - The page footer
- header.tpl - The page header
- mainpage.tpl - Directory category index
- search_noresult.tpl - Error message displayed when no
results are displayed for a search
- searchbox.tpl - The search box
There are a number of variables you can use across these
files
- [sitetitle] - The title of your directory
- [searchbox] - The contents of searchbox.tpl (should
only be used in header or footer)
- [breadcrumb] - The breadcrumb trail through the directory
- [attribution] - The ODP attribution code (must appear
somewhere on EVERY page)
- [copyright] - The copyright code (must appear somewhere
on EVERY page)
- [imagedirectory] - The local image directory for DWodp
- [mainlink="/link/"]Content[/mainlink] - Code
for mainpage.tpl to render a top level category
- [link="/link/"]Content[/link] - Code for mainpage.tpl
to render a second level category
- [dwodp] - The current script name (used mainly for the
search box)
- [currentcat] - The current category (mainly used for
the ODP attribution)
- [options] - Search box options to allow you to restrict
the search to the current category or a global search
(should only be used in searchbox.tpl)
The file, category.tpl has it's own set of replacement
variables which should be self-explainatory.
Q) I want to restrict my directory to a category level
and not the main page. How do I do this?
A) in ./includes/config.inc.php simply set the $rootcategory
variable. e.g. for /Arts set $rootcategory = "/Arts";
Always remember to include a forward slash at the beginning.
You should not include a forward slash at the end of the
directory name.
Q) What files to I need to configure?
A) For the import process you will need to edit both structure.pl
and content.pl. The readme.txt file in the import directory
will explain more on this. For the PHP script you should
only need to edit ./includes/config.inc.php to get DWodp
pro up and running but you may with to edit the template
files in the 'templates' directory. You will need to also
create the initial database using the supplied SQL file
called dwodp.sql (also included in the import directory).
Q) Where can I find the ODP RDF files?
A) http://rdf.dmoz.org/rdf/
Q) Why does your
search use a like statement for $searchstring%, surely %$searchstring%
would produces a better result?
A) It is quite correct that %$searchstring% would produce
a better result, however the search would be very slow because
we can't make use of the index on the title field.
In version 1.1 we added an optional search process using
MySQL's full text search capabilities (using MySQL 3.23.23
or higher) on both title and descriptions of websites. This
will return results in a page rank order and producing much
faster results. The decision to run full text mode or standard
mode (above) must be taken at installation as it would take
a very long time to change once the data has been imported
to the database.
Q) Which search mode should I be using, standard or
full text or full text title only mode?
A)
Search mode 1 (standard mode) If you have
a version of MySQL below 3.23.23 then you can only use the
standard search mode. However if you are running a newer
version of MySQL than this, you can take advantage of full
text searching.
Search mode 2 (full text mode) searches on
both title and description of sites. In production tests
the full text mode is only faster for SINGLE keyword searches.The
import process will take longer but will produce much better
results and ordered by rank. This decision must be taken
before you install the software. If you have MySQL 4.0.1
you could amend the code to use boolean mode which will
massively speed up the full text search as MySQL's default
is OR searching rather than AND. As we don't use MySQL 4.0.1
on our production servers yet we haven't included or tested
this process however you might like to try the following.
Replace the following code in index.php
$totalresults
= mysql_result(mysql_query("SELECT count(title) FROM
Link WHERE MATCH (title,description) AGAINST ('$searchstring')
$no_adult $whichcatsql"),0);
$result = mysql_query
("SELECT title, description, page, parentTopic, MATCH
(title,description) AGAINST ('$searchstring') AS score FROM
Link WHERE MATCH (title,description) AGAINST ('$searchstring')
$no_adult $whichcatsql limit $start, 25");
With this code:
$searchstring =
trim(str_replace(" ", " ",
$searchstring));
$search_arry = explode(" ", $searchstring);
foreach($search_arry
as $key=>$value) {
$new_search .= "+" . $value . " ";
}
$totalresults =
mysql_result(mysql_query("SELECT count(title) FROM
Link WHERE MATCH (title,description) AGAINST ('$new_search'
IN BOOLEAN MODE) $no_adult $whichcatsql"),0);
$result = mysql_query
("SELECT title, description, page, parentTopic, MATCH
(title,description) AGAINST ('$new_search' IN BOOLEAN MODE)
AS score FROM Link WHERE MATCH (title,description) AGAINST
('$new_search' IN BOOLEAN MODE) $no_adult $whichcatsql limit
$start, 25");
Search mode 3 (full text title only mode)
searches on only the title field. Out of all three imports
this was by far the quickest executing in under 12 hours.
Again this requires MySQL greater than MySQL 3.23.23 but
the search is MUCH faster than mode 2 for multiple keyword
searches (although still slower than mode 1).
In conclusion if you require speed for your searches we
recommend search mode 1 (standard), for accuracy we recommend
search mode 3 (full text title only).
Search mode 3 is available from version 1.1.1.