| PC Hardware for Mascot ServerAny recent, high specification PC containing either Intel or AMD processor(s) should make a 
suitable platform for Mascot. If you are buying a new PC, then a dual processor system, or 
one which can be upgraded to dual processors, will be a good investment. Systems with more 
than two processors usually carry a substantial price premium. If you plan to do high 
throughput work, and need to run Mascot on more than two processors, a cluster of dual 
processor boxes will usually offer the most cost effective solution.If you don't have time to read the whole of this document, then simply choose a system 
with the highest speed processor(s) that are readily available, at least 2GB of RAM, and 
one or more 200GB IDE hard disks. 
The two main PC processor manufacturers are Intel and AMD. Matrix Science does not 
currently support Mascot on other PC processors. We have observed excellent scalability with dual processor systems running under 
Microsoft Windows and Linux. That is, throughput from a dual processor system comes 
very close to double that obtained from a single processor. However, we cannot predict 
or guarantee the scalability of Mascot on hardware configurations that have not been 
specifically tested. 
The main factor affecting Mascot performance is processor clock speed. It is not possible to compare processor speeds directly for different architectures. 
For example, a system with a 1.8GHz AMD Opteron will run a Mascot search 
in about the same time as another system with a 3.2GHz Intel Xeon processor.
For any given processor type, the search speed will be to be proportional to 
processor speed unless:
 
  Intel and AMD both released their first dual core processors in early 2005. 
Mascot searches can run nearly twice as fast on a dual core processor as on 
a single core processor.Disk access becomes a bottleneck, possibly because the FASTA sequence 
  database has to be read into memory from disk (see section on RAM below) orThe processor cache is too small and causes a bottleneck between processor 
  and memory. For a dual core processor, Mascot is licensed on a "per socket" basis.
 Mascot 2.0 and later for Windows and Linux has full support for dual 
core Intel processors. For example, a single processor Mascot license will 
use both cores on a dual core Intel Pentium D. In this case, the number of 
threads must be set to 2 using the database maintenance utility. 
 Dual core AMD Opteron systems are supported in Mascot 2.1.02 and 
later (Windows and Linux). As with the Intel processors, the number of 
threads should be set to 2 times the number of licenses using the database 
maintenance utility.
 On a cluster system, Mascot 2.1 or later (Windows or Linux) is required 
to make full use of either AMD or Intel dual core nodes. The nodelist.txt 
file should specify the number of physical cpus (sockets). Mascot will 
automatically create the correct number of threads on the node.
Many of the recent Intel and AMD processors are "64 bit" or, in Intel terms, 
have "Intel EM64T" technology. 32 bit applications can also run on these 
processors. All versions of Mascot will run on 64 bit Linux, but Mascot 2.2 
or later is required for 64 bit Windows. For earlier versions of Mascot, 
you must install standard 32 bit Windows. 
This is available on Intel processors, but not AMD. Hyper-Threading works by 
duplicating certain sections of the processor - those that store the architectural 
state - but not duplicating the main execution resources. 
This allows a Hyper-Threading equipped processor to pretend to be two 
"logical" processors to the host operating system, allowing the operating 
system to schedule two threads or processes simultaneously. When HT is enabled, 2 logical "processors" per physical processor will be visible to Mascot. 
So, for example, a single physical Xeon 5000 processor with dual cores 
and HT will appear to have 4 cpus. In this case, the number of threads 
should be set to 4 using the database maintenance utility.
 Hyper-threading can give up to a 12% performance increase. It is not equivalent to
a true multi-core processor. 
 Versions prior to Mascot 2.0 did not support HT, so this needed to be 
disabled in the BIOS.
An on-board memory cache is used by the CPU to reduce the average time to access the 
main memory. The cache uses faster memory to store copies of 
the data from the most frequently used main memory locations. As long as 
most memory accesses are to the cache, the processor will 
be able to run at nearly full speed. If the cache is too small, then the 
processor will often be waiting for data from main memory, and searches 
will run more slowly. For this reason, we don't recommend the Intel Celeron 
processors, which have a rather small cache. However, it seems that there 
is only about a 5% performance increase when going from 512kB cache to 
1MB. We don't have figures for going from 1MB to 2MB cache.
The current (January 2006) range of processors available are listed below: 
  Matrix Science does not support Itanium processors. Although Mascot will 
run in the 32 bit compatibility mode, the performance will be very poor. 
It will also be necessary to install a 32 bit version of Perl, which may be 
non-trivial. A test Mascot port to native Itanium processors gave comparatively 
poor performance compared with Pentium processors, so Matrix Science 
has no plans to provide an Itanium version.
We have experienced some issues with the Athlon 64 processor, when trying to 
lock databases into memory under Windows, and therefore don't recommend 
this processor.
    | Name | Fastest speed (GHz) | Max CPUs per system | Dual core | 64 bit | Hyper-threading | L2 cache size (per core) | FSB speed (MHz) |  
    | Extreme Edition | 3.73 | 1 | Yes | Yes | Yes | 2MB | 1066 |  
    | Pentium D | 3.40 | 1 | Yes | Yes | No | 2MB | 800 |  
    | Pentium 4 Extreme edition | 3.73 | 1 | No | Yes | Yes | 2MB | 1066 |  
    | Pentium 4 with HT | 3.8 | 1 | No | Yes | Yes | 1MB | 800 |  
    | Pentium 4 no HT | 3.06 | 1 | No | No | No | 512kB | 533 |  
    | Xeon 5000 series | 3.8 | 2 | No | Yes | Yes | 2MB | 800 |  
    | Xeon 5000 series dual core | 3.73 | 2 | Yes | Yes | Yes | 2MB | 1066 |  
    | Xeon 7000 series | 3.33 | 8 | Yes | Yes | Yes | 2MB | 800 |  
    | Celeron D | 3.33 | 1 | No | Yes | No | 256kB | 533 |  
    | Core™2 Duo/Extreme | 2.93 | 1 | Yes | Yes | No | 2MB | 1066 |  
  RAM requirements are strongly dependent on the selection of databases you plan to search.
    | Name | Fastest speed (GHz) | Max CPUs per system | Dual core | 64 bit | Hyper-threading | L2 cache size (per core) | FSB speed (MHz) |  
    | Opteron Dual Core e.g Model 880 | 2.4 | 8 | Yes | Yes | N/a | 1MB | 2400 |  
    | Opteron Single Core e.g. Model 852 | 2.8 | 8 | No | Yes | N/a | 1MB | 2800 |  
    | Athlon 64 X2 | 2.4 | 1 | Yes | Yes | N/a | 1MB | 2400 |  Mascot Monitor makes a compressed copy of each FASTA database, in which the title lines 
have been removed and the sequence strings have been packed in a byte efficient manner. 
The compressed copy of each database is mapped into RAM and, if there is sufficient room, 
can even be locked into memory. 
 When a search calls for a database that is not in memory, the search duration is increased 
by the time taken to read the database from disk. For a search that takes longer than a
couple of minutes, this additional time will be negligible. For a short search, 
for example a PMF or an MS/MS search of a few spectra, reading from disk may take 
longer than the search itself. 
 Databases should always be memory mapped, even though a system might not have sufficient 
physical RAM to hold them all. Memory mapping only consumes virtual address space, and enables 
the file to be accessed more efficiently. However, it doesn't guarantee that a particular 
database will be in memory when a search calls for it; some other process may have kicked 
it out. So, it may be advantageous to lock a small, frequently searched database into memory, 
guaranteeing that it is always resident in RAM.
 Whether you have sufficient RAM to lock a database in memory can be estimated from the size
of the FASTA file. For a protein database, the required RAM is roughly 80% of the FASTA file size, while 
for a nucleic acid database it is roughly 50%. Some examples are given in the following table, 
but the comprehensive sequence databases increase significantly in size every month.
 
  
    | Database | FASTA (MB) | RAM (MB) | Compression |  
    | Swiss-Prot | 161 | 134 | 1 : 0.83 |  
    | NCBInr | 1,599 | 1,343 | 1 : 0.84 |  
    | EST_others | 13,644 | 6,898 | 1 : 0.50 |  You also need to allow approximately 60 MB for the operating system 
(Windows) and at least 150 MB for each executing Mascot search. Therefore, 
to search NCBInr and SwissProt, in January 2006, 2GB RAM is only just 
sufficient, and with the databases growing every month, this will soon 
not be enough. We do not recommend that NCBInr or MSDB be 
locked into memory except when using a cluster.
 In practice, it is rarely a sensible for a database as large as EST_others 
to be locked in memory. Being composed of short stretches of nucleic acid sequence, 
it is not suitable for peptide mass fingerprint searches, and tends to be 
used as a database of last resort for large searches, where the overhead 
of reading it from disk represents only a small part of the total search 
time.
The Mascot program files require very little disk space in comparison to 
the sequence databases and the accumulating result files. For the sequence databases, you will need to maintain free disk space 
of the order of 3 times the largest database. This is because, during a 
database update, there may be the current FASTA file, reference file and 
its associated compressed files plus the equivalent for the incoming 
database. Mascot also keeps a copy of one previous database. Current 
(January 2006) disk requirements for the common databases are:
 
  
    | Database | Total size of files (GB) | Max disk space (GB) |  
    | Swiss-Prot | 1 | 3 |  
    | NCBInr | 3 | 9 |  
    | MSDB | 5 | 15 |  
    | EST_human | 7 | 21 |  
    | EST_mouse | 4 | 12 |  
    | EST_others | 21 | 63 |  It would not be unreasonable to allow 120GB for databases, and this 
could grow to 200GB within 2 years. However, it is unusual to require 
all three EST databases.
 The space needed for result files depends on the overall search profile 
and on how long results are to remain on-line. Individual result file sizes 
range from 20 kB for a peptide mass fingerprint search through to several 
hundreds of MB for a large LC-MS/MS dataset.
 Disk drives are very inexpensive, and most PC's support up to four IDE 
devices. It is difficult to have too much disk space, especially if you plan 
to search databases similar in size to dbEST.
 If any databases are not memory mapped, short searches may be disk I/O bound, 
and a fast disk (e.g. fast wide SCSI) or a disk array (e.g. RAID) can then 
become an important factor in maximising throughput.
The following versions of Windows are supported: 
  
    | Operating System | Max CPU | Max RAM (GB) |  
    | 2000 Professional | 2 | 4 |  
    | 2000 Server | 4 | 4 |  
    | 2000 Advanced Server | 8 | 8 |  
    | 2000 Data Center | 32 | 32 |  
    | XP Professional | 2 | 4 |  
    | 2003 Web Edition | 2 | 2 |  
    | 2003 Standard Edition | 4 | 4 |  
    | 2003 Enterprise Edition | 8 | 32 |  
    | 2003 Data Center Edition | 32 | 64 |  
    | XP Professional - 64 bit edition | 2 | 16 |  
    | 2003 Standard Edition - 64 bit | 4 | 32 |  
    | 2003 Enterprise Edition - 64 bit | 8 | 2048 |  
    | 2003 Data Center Edition - 64 bit | 64 | 2048 |  
  Mascot will run on most Linux distributions, but is only tested in-house on:Windows NT4 SP6 is currently supported, but support will be discontinued in 2007.Microsoft Windows XP home is not supported. 
  RedHat 8.0RedHat Fedora Core 2RedHat Fedora Core 4RedHat Enterprise Linux 3Debian 3.1SUSE Linux 9.3 If the number of processors (sockets) in the system is greater than the number of
CPUs in the Mascot license, the kernel needs to be 2.6 or later.
Mascot requires a web server for administration and interactive use. 
In the case of Windows, Microsoft Internet Information Server (IIS) 
is the obvious choice unless you are committed to some other package. 
IIS is bundled with Windows 2000, XP Professional and Windows 2003 server. The Mascot installation program automatically configures IIS versions 
4 and later. 
 Apache is a good choice for Linux. Apache Version 2.0 can also be 
used under Windows.
 Running a web browser on the same PC as the web server can take 
a surprising amount of processor time, so search times may suffer. 
If the same PC is also used for instrument control and data acquisition, 
you may need to adjust job priorities using Windows Task Manager to 
ensure that the instrument gets adequate priority.
A Mascot licence for 4 or more processors automatically supports operation on 
a cluster of systems connected by a dedicated 100 Base-T or Gigabit LAN. 
A cluster offers several advantages over a single, multi-processor system: 
  Mass market, reliable, low cost PC hardware can be used.The cluster can be incrementally expanded as workload increases.The RAM required to map sequence databases is distributed 
  across multiple systems, circumventing the limits of a single system.The limited bandwidth of the PC bus is effectively multiplied 
  by the number of systems in the cluster. |