| 
|  | 
  | Sequence Database Setup: Top Tips
  Always test a new database configuration
  Check the statistics file after a new database has been compressed
  Be selective when locking databases into memory
  Move sequence databases to an empty drive
  Include a date or version stamp in the database filename
  Choose accession strings carefully
  Don't forget the taxonomy files
 
 
	Always test a new database configuration
	Use the test facility in the database maintenance utility before applying the changes 
	to mascot.dat. This checks the parse rules against the first five and last five entries in the database.
	It will also pick up problems with paths, illegal characters in names, etc.
	 
Check the statistics file after a new database has been compressed
	In particular, verify that the number of entries is reasonable and that there are no entries
	reported as "too long". (If so, you need to increase the value of MaxSequenceLen in
	mascot.dat). If
	taxonomy is defined, look at the fraction of entries with no taxonomy. It is rare
	to have 100% success with taxonomy, but a failure rate greater than 1% is a cause for concern.
	Maybe the taxonomy indexes are out of date?
	 
Be selective when locking databases into memory
	All databases should be memory mapped, because this makes access much faster. But, unless you have bucket loads of RAM,
	only the smaller databases, which are searched regularly, should be locked in memory. If you try to lock a 
	database in memory and there isn't enough room, the operation fails, and everything is OK. The real problem is
	when there is just enough RAM to lock the database, but very little left over for Mascot searches and other 
	applications. Searches will be very slow, the disk will thrash, and eventually the the system
	is likely to crash or hang.
	 
Move sequence databases to an empty drive
	Running out of disk space can be a problem when you have several large databases plus a growing collection of search result files.
	The databases don't have to be in subdirectories of mascot/sequence, new ones can easily be placed on other drives. If space is
	running low, and you want to move your existing database files, the general procedure is: 
	 - Stop the Mascot Monitor service / daemon- Move the mascot/sequence directory or selected databases subdirectories to the new drive
 - Using the database maintenance utility, update the affected database paths
 - Start the Mascot Monitor service / daemon
 
Include a date or version stamp in the database filename
	Most sequence databases are growing rapidly, and it can be useful to have a record of which database version 
	a particular search was run against. Another reason is that Mascot requires the old and new copies of a 
	database to have different filenames if it is going to perform an automatic update without interrupting 
	ongoing searches.
	 
Choose accession strings carefully
	Some database accession strings have a constant component. For example, NCBI unique identifiers look like
	"gi|123456" and IPI accessions look like "IPI:IPI00140098.1". Its not a good idea to include 
	unnecessary constant components, such as the "IPI:" prefix, because this contains no useful 
	information. It just makes the Mascot files a bit larger and the reports a bit longer. (In Mascot 2.0 and earlier, whenever
	you setup a new database with long accession strings, remember to check that the worst-case length is less than value 
	of MaxAccessionLen in mascot.dat. If you have to increase MaxAccessionLen, you must then rebuild the 
	compressed files for all databases)
	 On the other hand, it can be risky to remove prefixes altogether. If you plan to reduce 
	"IPI:IPI00140098.1" to "00140098.1" or "140098", you must make sure that this is still a unique 
	identifier within the database. Also, whenever you merge two databases together, having a purely numeric
	accession greatly increases the chance of having duplicates.
	
	 Another consideration is linking to external sources for full text reports. In the IPI case, 
	you need to choose "IPI00140098" if you plan to link to the EBI SRS server.
	 
Don't forget the taxonomy files
	If database entries contain taxonomy information, Mascot can use this as a filter during a search. Many of the
  most popular databases, such as Swiss-Prot and NCBI nr, include taxonomy. To determine taxonomy accurately, 
  Mascot requires database specific supporting files. Details of these can be found in the help pages
  for the individual databases. Note that these supporting files have to be downloaded into the taxonomy 
  directory, not into the sequence database directory. Also, some files need to be unpacked (using tar) as well as
  uncompressed.
	
 |  |  |  
|  | 
 |