Context Navigation

source: npl/mailserver/dspam/dspam-3.10.2/README @ 0105685

gcc484ntopperl-5.22

Last change on this file since 0105685 was c5c522c, checked in by Edwin Eefting <edwin@datux.nl>, 9 years ago
initial commit, transferred from cleaned syn3 svn tree
Property mode set to `100644`
File size: 96.3 KB

Rev	Line
[c5c522c]	1	DSPAM v3.10.2
	2	COPYRIGHT (C) 2002-2012 DSPAM Project
	3	http://dspam.sourceforge.net/
	4
	5	LICENSE
	6
	7	This program is free software: you can redistribute it and/or modify
	8	it under the terms of the GNU Affero General Public License as
	9	published by the Free Software Foundation, either version 3 of the
	10	License, or (at your option) any later version.
	11
	12	This program is distributed in the hope that it will be useful,
	13	but WITHOUT ANY WARRANTY; without even the implied warranty of
	14	MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
	15	GNU Affero General Public License for more details.
	16
	17	You should have received a copy of the GNU Affero General Public License
	18	along with this program. If not, see <http://www.gnu.org/licenses/>.
	19
	20	CREDITS
	21
	22	Original Work By
	23	Lead development till 3.8.0: Jonathan A. Zdziarski <jonathan@nuclearelephant.com>
	24	Lead development after 3.8.0: Stevan Bajic <stevan@bajic.ch>
	25	PostgreSQL driver: Rustam Aliyev <rustam@azernews.com>
	26	External Lookup module: Hugo Monteiro <hugo.monteiro@fct.unl.pt>
	27	Various:
	28	Feb/2006 Cove Schneider <cove@wildpackets.com>
	29	Jan/2006 Norman Maurer <nm@byteaction.de>
	30
	31	Your name is missing? Let us know with a reference to your commit, and we'll
	32	add you to the list.
	33
	34	COPYRIGHT
	35
	36	As of 12 January 2009 the copyright is owned by the DSPAM Project, represented
	37	by a team of people, including:
	38	Alexander Prinsier
	39	Dov Zamir
	40	Hugo Monteiro
	41	Ion-Mihai Tetcu
	42	Paul Cockings
	43	Stevan Bajic
	44
	45	TABLE OF CONTENTS
	46
	47	General DSPAM Information
	48
	49	1.0 About DSPAM
	50	1.1 Installation and Configuration
	51	1.2 Testing
	52	1.3 Troubleshooting
	53	1.4 DSPAM Tools
	54	1.5 Agent Commandline Arguments
	55
	56	Advanced DSPAM functionality
	57
	58	2.0 Linking with libdspam
	59	2.1 Configuring groups
	60	2.2 External Inoculation Theory
	61	2.3 Client/Server Mode
	62	2.4 LMTP
	63	2.5 DSPAM User Preferences
	64	2.6 Fallback Domains
	65	2.7 External User Lookup
	66
	67	Miscellaneous
	68
	69	3.0 Bugs, Feature Requests
	70	3.1 Ports / Packages
	71	3.2 GIT Access
	72
	73	1.0 ABOUT DSPAM
	74
	75	DSPAM is an open-source, freely available anti-spam solution designed to combat
	76	unsolicited commercial email using advanced statistical analysis. In short,
	77	DSPAM filters spam by learning what spam is and isn't. It does this by learning
	78	each user's individual mail behavior. This allows DSPAM to provide
	79	highly-accurate, personalized filtering for each user on even a large system
	80	and provides an administratively maintenance free solution capable of learning
	81	each user's email behaviors with very few false positives.
	82
	83	While DSPAM is focused around spam filtering, many have found alternative
	84	uses for all types of two-concept document classification.
	85
	86	DSPAM is rapidly gaining a large support forum and being used in many large-
	87	scale implementations. Contributions to the project are welcome via the
	88	dspam-dev mailing list or in the form of financial contributions.
	89
	90	Many of the foundational principles incorporated into this software were
	91	contributed by Paul Graham's white paper on combatting spam, which can be
	92	found at http://paulgraham.com/spam.html. Much research and development has
	93	resulted in many new approaches being added onto the DPSAM project as well,
	94	some of which are explained in white papers on the DSPAM home page.
	95
	96	DSPAM can be implemented as a total solution, or as a library which developers
	97	may link their projects to the dspam core engine (libdspam) in accordance with
	98	the GPL license agreement. This enables developers to incorporate libdspam as
	99	a "drop-in" for instant spam filtering within their applications - such as mail
	100	clients, other anti-spam tools, and so on.
	101
	102	PLEASE NOTE: DSPAM and libdspam are distributed under the AGPL license, not the
	103	LGPL. Commercial licensing is available for those who seek to redistribute
	104	DSPAM or some of DSPAM's components/libraries in their non-GPL products.
	105	Please contact us for more information about commercial licensing.
	106
	107	The DSPAM package is split up into the following pieces:
	108
	109	DSPAM AGENT
	110
	111	The DSPAM agent is the command center for all shell and daemon operations.
	112	If you're using DSPAM as a filtering solution, this is the 'dspam' (or dspamc)
	113	binary you're likely going to be talking to via commandline.
	114
	115	LIBDSPAM: CORE ENGINE
	116
	117	The DSPAM core processing engine, also known as libdspam, provides all critical
	118	spam filtering functions. The engine is embedded into other dspam components
	119	(such as the agent) and is responsbile for the actual filtering logic.
	120	If you're not a developer, you don't need to be concerned with this component
	121	as it is automatically compiled in with the build.
	122
	123	WEB UI
	124
	125	The Web UI (User Interface) is designed to allow end-users to review their
	126	spam quarantine and history, graphs, and to delete their spam permanently.
	127	They can also optionally use the quarantine to perform all of their training.
	128	The UI also includes some basic administrative tools to change settings and
	129	manage user quarantines.
	130
	131	TOOLS
	132
	133	Some basic tools which have been provided to manage dictionaries, automate
	134	corpus feeding, and perform other diagnostic operations related to DSPAM.
	135	Some of these include dspam_train, dspam_stats, and dspam_dump.
	136
	137	HISTORY OF COPYRIGHT
	138
	139	Original work was done by Jonathan A. Zdziarski.
	140
	141	In 2006 the copyright was handed over to Sensory Networks.
	142
	143	In 2009 Sensory Networks handed over the full copyright to the DSPAM Project,
	144	represented by a team of people, including:
	145	Alexander Prinsier
	146	Dov Zamir
	147	Hugo Monteiro
	148	Ion-Mihai Tetcu
	149	Paul Cockings
	150	Stevan Bajic
	151
	152	1.1 INSTALLATION
	153
	154	IMPLEMENTATION OPTIONS
	155
	156	There are many different ways to deploy DSPAM onto an existing network. The
	157	most popular approaches are:
	158
	159	1. As a delivery agent proxy
	160
	161	When your mail server gets ready to deliver mail to a user's mailbox it calls
	162	a delivery agent of some sort. On most UNIX systems, this is procmail, maildrop,
	163	mail.local, or a similar tool. When used as a delivery proxy, the DSPAM agent
	164	is called in place of your existing agent - or better put, it can masquerade
	165	as the local delivery agent. DSPAM then processes the message and will call
	166	the /real/ delivery agent to pass the good mail into the user's mailbox,
	167	quarantining the bad mail. DSPAM can optionally tag and deliver both spam
	168	and legitimate mail.
	169
	170	In the diagram below, MTA refers to Mail Transfer Agent, or your mail server
	171	software: Postfix, Sendmail, Exim, etc. LDA refers to the Local Delivery
	172	Agent: Procmail, Maildrop, etc..
	173
	174	BEFORE:
	175
	176	[MTA] ---> [LDA] ---> (User's Mailbox)
	177
	178	AFTER:
	179
	180	[MTA] ---> [DSPAM] ---> [LDA] ---> (User's Mailbox)
	181	\
	182	\--> [Quarantine]
	183	[End User] ------> [Web UI]
	184
	185	2. As a POP3 Proxy
	186
	187	If you don't want to tinker with your existing mail server setup, DSPAM can
	188	be combined with one of a few open source programs designed to act as a POP3
	189	proxy. This means spam is filtered whenever the user checks their mail,
	190	rather than when it is delivered. The benefit to this is that you can set up
	191	a small machine on your network that will connect to your existing mail server,
	192	so no integration is needed. It also allows your users to arbitarily point their
	193	mail client at it if they desire filtering. The drawback to this approach is
	194	that the POP3 protocol has no way to tell the mail client that a message is
	195	spam, and so the user will have to download the spam (tagged, of course).
	196
	197	BEFORE:
	198
	199	[End User] ---> [POP3 Server]
	200
	201	AFTER:
	202
	203	[End User] ---> [POP3 Proxy] <--> [DSPAM]
	204	\
	205	\--> [POP3 Server]
	206
	207	3. As an SMTP Relay
	208
	209	Newer versions of DSPAM have seen features that allow it to function more
	210	easily as an SMTP relay. An SMTP relay sits in front of your existing mail
	211	server (requiring no integration). To use an SMTP relay, the MX records for
	212	your domains are repointed to the relay machine running DSPAM. DSPAM then
	213	relays the good (and optionally bad) mail to the existing SMTP server. This
	214	allows you to use DSPAM with even a Windows-based destination mail server
	215	as no integration is necessary. See doc/relay.txt for one example of how to
	216	do this with Postfix.
	217
	218	BEFORE:
	219
	220	{ Internet } ---> [Company Mail Server]
	221
	222	AFTER:
	223
	224	{ Internet } ---> [ Inbound SMTP Relay ] ---> [Company Mail Server]
	225	( MTA <> DSPAM ) SMTP
	226	\ or
	227	\--> [Quarantine] LMTP
	228	[End User] ------> [Web UI]
	229
	230	UPGRADING DSPAM
	231
	232	Please see the file UPGRADING
	233
	234	FRESH INSTALLATION
	235
	236	0. PREREQUISITES
	237
	238	DSPAM can use one of many different backends to store its information, and
	239	you will need to decide on one and install the appropriate software before
	240	you can build DSPAM. The following storage backends are presently available:
	241
	242	Driver Requirements
	243	-------------------------------------------------------------------------
	244	T mysql_drv: MySQL client libraries (and a server to connect to)
	245	T pgsql_drv: PostgreSQL client libraries (and a server to connect to)
	246	sqlite_drv: SQLite v2.7.7 or above (scheduled for removal)
	247	sqlite3_drv: SQLite v3.x
	248	*T hash_drv: None (Self-Contained Hash-Based Driver)
	249
	250	Legend:
	251	* Default storage driver
	252	T Thread-safe (Required for running DSPAM in server daemon mode)
	253
	254	In general, MySQL is one of the faster solutions with a smaller storage
	255	footprint and is well suited for both small and large-scale implementations.
	256
	257	The hash driver (inspired by Bill Yerazunis' CRM Sparse Spectra algorithm)
	258	is the fastest solution by far and requires no dependencies. It supports
	259	an auto-extend feature to grow the file size as needed and is very
	260	fast and compact. It does however lack some features (such as merged
	261	groups support) and uses a lot of memory to mmap() users.
	262
	263	Also note that a database created with the hash driver is currently not safe
	264	to move between 32/64 bit systems or big/little endian systems.
	265
	266	Documentation for any additional setup of your selected storage driver can
	267	be found in the doc/ directory. You'll need to follow any steps outlined in
	268	the storage driver documentation before continuing.
	269
	270	You can download MySQL from http://www.mysql.com.
	271	You can download PostgreSQL from http://www.postgresql.com.
	272	You can download SQLite from http://www.sqlite.org.
	273
	274	1. CONFIGURATION
	275
	276	DSPAM uses autoconf, so configuration is fairly standardised with other
	277	UNIX-based software:
	278
	279	./configure [options]
	280
	281	DSPAM supports the configuration options below. Generally, the default
	282	configuration is more than acceptable, so it's a good idea not to tweak too
	283	many settings unless you know what you are doing.
	284
	285	PATH SWITCHES
	286
	287	--prefix=DIR
	288	Specify an alternative root prefix for installation. The default is
	289	/usr/local. This does not affect the location of dspam.conf (which
	290	defaults to /etc). Use --sysconfdir= for this.
	291
	292	--sysconfdir=DIR
	293	Specify an alternative home for the dspam.conf file. The default is /etc.
	294
	295	--with-dspam-home=DIR
	296	Specify an alternative DSPAM home for installation. This can alternatively
	297	be changed in dspam.conf, but is convenient to do on the configure line.
	298	The default is $prefix/var/dspam, or /usr/local/var/dspam.
	299
	300	--with-logdir=DIR
	301	Specify an alternative log directory. The default is $dspam_home/log. Do
	302	not set this to /var/log unless DSPAM will have permissions to write to
	303	the directory.
	304
	305	FILESYSTEM SCALE
	306
	307	The default filesystem scale is "small-scale", and writes each user to
	308	its own directory in the top-level DSPAM home data directory.
	309	The following two switches allow the scale to be changed to be more
	310	suitable for larger installations.
	311
	312	--enable-large-scale
	313	Switch for large-scale implementation. User data will be stored as
	314	$HOME/data/u/s/user instead of $HOME/data/user
	315
	316	--enable-domain-scale
	317	Switch for domain-scale implementation. When used, DSPAM expects
	318	username@domain to be passed in as the user id and user data will be
	319	stored as $HOME/data/example.org/user and $HOME/opt-in/example.org/user.dspam
	320	instead of $HOME/data/user
	321
	322	INTEGRATION SWITCHES
	323
	324	--with-storage-driver=DRIVER[,DRIVER2[...,DRIVERN]]
	325	Specify your storage driver selection(s). A storage driver is a driver
	326	written specifically for DSPAM to store tokens, signature data, and
	327	perform other proprietary operations. The default driver is hash_drv.
	328	The following drivers have been provided:
	329
	330	mysql_drv: MySQL Drivers
	331	pgsql_drv: PostgreSQL Drivers
	332	sqlite_drv: SQLite v2.x Drivers (scheduled for removal)
	333	sqlite3_drv: SQLite v3.x Drivers
	334	hash_drv: Self-Contained Hash Database
	335
	336	If you are a packager, or wish to have multiple drivers built for any
	337	reason you may specify multiple drivers by separating them with commas.
	338	This will cause the storage driver specified in dspam.conf to be
	339	dynamically loaded at runtime rather than statically linked. If you wish
	340	to build only one driver, but dynamically, then specify it twice as in
	341	--with-storage-driver=mysql_drv,mysql_drv.
	342
	343	If you will be compiling DSPAM to operate as a server daemon or to deliver
	344	via SMTP/LMTP, you will need to use a thread-safe driver (outlined in the
	345	chart earlier in this document).
	346
	347	You may also need to use some of the driver-specific configure flags
	348	(discussed in the DRIVER SPECIFIC CONFIGURATION OPTIONS section below).
	349
	350	--disable-trusted-user-security
	351	Administrators who wish to disable trusted user security may do so by
	352	using this configure flag. This will cause DSPAM to treat each user as
	353	if they were "trusted" which could allow them to potentially execute
	354	arbitrary commands on the server via DSPAM. Because of this, administrators
	355	should only use this option on either a closed server, or configure their
	356	DSPAM binary to be executable only by users who can be trusted. This
	357	option SHOULD NOT be used as a solution to your MTA dropping privileges
	358	prior to calling DSPAM. Instead, see the TRUSTED SECURITY section of this
	359	document.
	360
	361	--enable-homedir
	362	When enabled, instead of checking for $HOME/$USER/opt-in/
	363	$USER[.dspam\|.nodspam], DSPAM will check for a .dspam\|.nodspam file in the
	364	user's home directory. DSPAM will also store each user's data in ~/.dspam
	365	when this option is enabled. Because of this, DSPAM will automatically
	366	install and run setuid root so that it can read each user's home directory.
	367
	368	Note:
	369
	370	This function is incompatible with most implementations of the Web UI,
	371	since it requires access to read each user's home directory. Therefore,
	372	only use this option if you will not be using the Web UI or plan on
	373	doing something asinine like running it as root.
	374
	375	--enable-daemon
	376	Builds DSPAM with support for daemon mode, and builds associated dspamc
	377	thin client. Pthreads is required to build for daemon mode and the
	378	storage driver used must be thread-safe.
	379
	380	DRIVER SPECIFIC CONFIGURE SWITCHES
	381
	382	Some storage drivers have their own custom configuration switches:
	383
	384	mysql_drv:
	385	--with-mysql-includes=DIR
	386	Specify a path to the MySQL includes
	387
	388	--with-mysql-libraries=DIR
	389	Specify a path to the MySQL libraries
	390	(Currently links to -lmysqlclient, also -lcrypto on some systems)
	391
	392	--enable-virtual-users
	393	Tells DSPAM to create virtual user ids. Use this if your users don't
	394	actually exist on the system (e.g. in /etc/passwd if using a password
	395	file)
	396
	397	--enable-preferences-extension
	398	MySQL supports the preferences extension, which stores user preferences
	399	in mysql instead of flat files (the built-in method)
	400
	401	--disable-mysql4-initialization
	402	If you are compiling libdspam for use with a third party application,
	403	and the third party application makes its own calls to libmysqlclient,
	404	you should use this option to disable libdspam's initialization and
	405	cleanup of libmysqlclient, and allow the application to manage this.
	406	This option suppresses libdspam's calls to mysql_server_init and
	407	mysql_server_end.
	408
	409	Note:
	410
	411	Please see the file doc/mysql_drv.txt for more information
	412	about configuring the mysql_drv storage driver.
	413
	414	pgsql_drv:
	415	--with-pgsql-includes=DIR
	416	Specify a path to the PgSQL includes
	417
	418	--with-pgsql-libraries=DIR
	419	Specify a path to the PgSQL libraries
	420	(Currently links to -lpq, and netlibs on some systems)
	421
	422	--enable-virtual-users
	423	Tells DSPAM to create virtual user ids. Use this if your users don't
	424	actually exist on the system (e.g. in /etc/passwd if using a password
	425	file)
	426
	427	--enable-preferences-extension
	428	Postgres supports the preferences extension, which stores user
	429	preferences in pgsql instead of flat files (the built-in method)
	430
	431	Note:
	432
	433	Please see the file doc/pgsql_drv.txt for more information about
	434	configuring the pgsql_drv storage driver.
	435
	436	sqlite_drv:
	437	sqlite3_drv:
	438	--with-sqlite-includes=DIR
	439	Specify a path to the SQLite includes
	440
	441	--with-sqlite-libraries=DIR
	442	Specify a path to the SQLite libraries
	443
	444	DEBUGGING SWITCHES
	445
	446	--enable-debug
	447	Turns on support for debugging output. This option allows you to turn on
	448	debugging messages for all or some users by editing dspam.conf or setting
	449	--debug on the commandline. Enabling debug in configure only adds support
	450	for debug to be compiled in, it must still be activated using one of the
	451	options prescribed above. Debugging support itself doesn't use up very
	452	many additional resources, so it should be safe to leave enabled on
	453	non-enterprise class systems.
	454
	455	--enable-verbose-debug
	456	Turns on extremely verbose debugging output. --enable-debug is implied.
	457	Never use this on production builds!
	458
	459	Note:
	460
	461	When verbose debug is compiled in, DSPAM performs many additional
	462	mathematical calculations regardless of whether or not it's been
	463	activated. You shouldn't use --enable-verbose-debug for production
	464	builds unless you have serious issues you can't resolve.
	465
	466	FEATURE ACTIVATION
	467
	468	--enable-clamav
	469	Enables support for Clam Antivirus. DSPAM can interface directly with
	470	clamd to perform virus scanning and can be configured to react in
	471	different ways to viruses. See dspam.conf for more information.
	472
	473	ADDITIONAL CONFIGURATION OPTIONS
	474
	475	The remainder of configuration options are located in dspam.conf, which
	476	is installed in sysconfdir (default: /usr/local/etc) upon a make install.
	477	It is generally a good idea to review dspam.conf and make any changes
	478	necessary prior to using DSPAM.
	479
	480	2. BUILDING AND INSTALLING
	481
	482	After you have run configure with the correct options, build and install
	483	DSPAM by performing:
	484
	485	make && make install
	486
	487	Note:
	488
	489	If you are a developer wanting to link to the core engine of dspam,
	490	libdspam will be built during this process. Please see the
	491	example.c file for examples of how to link to and use libdspam. Static
	492	and dynamic libraries are built in the .libs directory. Needed headers
	493	will be installed in $prefix$/include/dspam.
	494
	495	3. PERMISSIONS
	496
	497	In the typical UNIX environment, you'll need to worry about the following
	498	permissions:
	499
	500	The CGI User: This is the user your web server (most likely Apache) is
	501	running as. This is commonly 'nobody' or 'web'. You can find this in
	502	Apache's httpd.conf by searching for 'User'. The CGI user will need
	503	the ability to access the following components of DSPAM:
	504	- Ability to execute the dspam binary
	505	- Ability to read and write to dspam_home/data/
	506	- Trusted user permissions in dspam.conf ("Trust [username]")
	507	- The execution 'Group' used must match the group dspam is running as
	508	(this is typically 'mail', 'dspam', or similar)
	509
	510	The MTA User: This is the user your mail server software is running as when
	511	it executes DSPAM. This is usually daemon, mail, exim, etc. This is
	512	typically different from the user the MTA runs and polices itself as, to
	513	avoid security problems. Consult your MTA's documentation for more info.
	514	The MTA user will require:
	515	- The ability to execute the dspam binary
	516	- Trusted user permissions in dspam.conf ("Trust [username]")
	517
	518	Systems Administrators: In order to perform administrative functions,
	519	systems administratiors will require:
	520	- The ability to execute dspam-related binaries
	521	- Trusted user permissions in dspam.conf ("Trust [username]")
	522
	523	Note:
	524
	525	If the MTA is communicating with DSPAM via LMTP (explained later), then
	526	execution permissions are not necessary
	527
	528	Note about FreeBSD:
	529
	530	FreeBSD's default MTA user is 'mailnull'
	531	FreeBSD's default delivery agent also changes its uid, and so in order
	532	to call it, dspam must be installed as setuid root to work on the
	533	commandline properly. This is done automatically on install.
	534
	535
	536	Understanding Trusted User Security
	537
	538	DSPAM has tighter security for untrusted users on the system to prevent
	539	them from touching other user's data or passing arbitrary commands to the
	540	delivery agent DSPAM calls. "Trusted User Security" is a simple system
	541	whereby any unsafe functions are not available to a user calling dspam
	542	unless they are within dspam.conf's trusted user list.
	543
	544	Local non-privileged users should be able to use DSPAM without any problems
	545	while remaining untrusted, as long as they behave. For example, an untrusted
	546	user cannot set their DSPAM username to any name other than their username.
	547	Untrusted users are also limited to the delivery options set by the
	548	system administrator, and cannot redirect how DSPAM delivers mail.
	549
	550	A list of trusted users is maintained in dspam.conf. This file should
	551	include a list of trusted users who should be allowed to set the dspam user,
	552	passthru parameters, and other information that would be potentially
	553	dangerous for a malicious user to be able to set. You'll need to ensure
	554	that your CGI user, MTA user, and system administrators are on the list.
	555
	556	4. MAIL SERVER INTEGRATION
	557
	558	As previously mentioned, there are three popular ways to implement DSPAM:
	559
	560	As a delivery proxy:
	561	The default approach integrates DSPAM directly with the mail server and
	562	filters spam as mail comes in. Please see the appropriate instructions
	563	in doc/ pertaining to your MTA.
	564
	565	As a POP3 proxy:
	566	This alternative approach implements a POP3 proxy where users
	567	connect to the proxy to check their email, and email is filtered when
	568	being downloaded. The POP3 proxy is a much easier approach, as it
	569	requires much less integration work with the mail server (and is ideal
	570	for implementing DSPAM on Exchange, etcetera). Please see the file
	571	doc/pop3filter.txt.
	572
	573	As an SMTP Relay:
	574	DSPAM can be configured as an SMTP relay, a.k.a appliance. You
	575	can set it up to sit in front of your real mail server and then point
	576	your MX records at it. DSPAM will then pass along the good mail to
	577	your real SMTP server. See doc/relay.txt for more information. The
	578	example provided uses Postfix and MySQL.
	579
	580	Trusted users and the MTA
	581
	582	If you are using an MTA that changes its userid to match the destination
	583	user before calling DSPAM, you won't be able to provide pass-thru
	584	arguments to DSPAM (these are the commandline arguments that DSPAM in turn
	585	passed to the local delivery agent, in such a configuration).
	586	You will need to pre-configure the "default" pass-thru arguments in DSPAM.
	587	This can be done by declaring an untrusted delivery agent in dspam.conf.
	588	When DSPAM is called by an untrusted user, it will automatically force their
	589	DSPAM user id and passthru delivery agent arguments specified in dspam.conf.
	590
	591	This information will override any passthru commandline parameters
	592	specified by the user. For example:
	593
	594	UntrustedDeliveryAgent "/bin/mail -d $u"
	595
	596	The variable $u informs DSPAM that you would like the destination username
	597	to be used in the position $u is specified, so when DSPAM calls your LDA
	598	for user 'bob', it will call it with:
	599
	600	/bin/mail -d bob
	601
	602	5. ALIASES
	603
	604	There are essentially two different ways a user might train DSPAM. The first
	605	is by using the Web UI, which allows them to retrain via the "History"
	606	tab. This works quite well, as users must visit the Web UI occasionally
	607	to review their quarantine anyway (and reverse any false positives). We'll
	608	discuss this shortly in section 1.1.8.
	609
	610	The more common approach to training, discussed here, is to allow users to
	611	simply forward their spam to an email address where DSPAM can analyze and
	612	learn it. DSPAM uses a signature-based system, where a serial number of
	613	sorts is appended to each email processed by DSPAM. DSPAM reads this serial
	614	number when the user forwards (or bounced) a message to what is called their
	615	"spam email address". The serial number points to temporary information
	616	stored on the server (for 14 days by default) containing all of the
	617	information necessary for DSPAM to relearn the message. This is necessary
	618	in order to relearn the exact message DSPAM originally processed.
	619
	620	Note:
	621
	622	If you are using an IMAP based system, Web-based email, or other form of
	623	email management where the original messages are stored on the server in
	624	pristine format, you can turn this signature feature off by setting
	625	"TrainPristine on" in dspam.conf. DSPAM will then use the message itself
	626	that you provide it to train, which MUST be identical to the original
	627	message in order to retrain properly.
	628
	629	Because DSPAM learns each user's specific email behavior, it's necessary
	630	to identify the user in order to program their specific filtering database.
	631	This can be done in one of three ways:
	632
	633	The Simple Way:
	634
	635	If you are using the MySQL or PgSQL storage drivers, the original
	636	numeric user id can be embedded in the signature, requiring only one
	637	central spam alias to be necessary for the entire system. To configure
	638	this, uncomment the appropriate UIDInSignature option in dspam.conf:
	639
	640	# MySQLUIDInSignature on
	641	# PgSQLUIDInSignature on
	642
	643	Now all you'll need is a single system-wide alias, and DSPAM will train
	644	the appropriate user when it sees the signature. An example of an alias
	645	might look like:
	646
	647	spam:"\|/usr/local/bin/dspam --user root --class=spam --source=error"
	648
	649	Similarly, you may also wish to have a false-positive alias for users who
	650	prefer to tag spam rather than quarantine it:
	651
	652	notspam:"\|/usr/local/bin/dspam --user root --class=innocent --source=error"
	653
	654	Note:
	655
	656	The 'root' user represents any active dspam user. It is necessary to
	657	supply a username on the commandline or DSPAM will bail on
	658	an error, however the user will be changed internally once the signature
	659	is read.
	660
	661	The Kind-of-Simple Way:
	662
	663	If you're not using one of the above storage drivers, the next easiest
	664	way to configure aliases is to have DSPAM parse the 'To:' header of the
	665	message and use a catch-all subdomain to direct all mail into DSPAM for
	666	retraining. You can then instruct your users to email addresses like
	667	'spam-bob@relearn.example.org'. The ParseToHeaders option (available
	668	in dspam.conf) will parse the To: header of forwarded messages and
	669	set the username to either 'bob' or 'bob@relearn.example.org', depending
	670	on how it is configured. DSPAM can also set the training mode to either
	671	"learn spam" or "learn notspam" depending on whether the user specified
	672	a spam- or notspam- address in the To: header.
	673
	674	This is ideal if you don't want to set up a separate alias for each user
	675	on your system (The Hard Way). If you're fortunate enough to have a
	676	mail server that can perform regular expression matching, you can set up
	677	your system without a subdomain, and just use addresses like
	678	spam-bob@example.org. For the rest of us, it will be necessary to set up
	679	a subdomain catch-all directly into DSPAM. For example:
	680
	681	@relearn.example.org "\|/usr/local/bin/dspam"
	682
	683	Don't forget to set the appropriate ParseToHeaders and related options in
	684	dspam.conf as well. More specific instructions can be found in dspam.conf
	685	itself. In most cases, the following will suffice:
	686
	687	ParseToHeaders on
	688	ChangeUserOnParse user
	689	ChangeModeOnParse on
	690
	691	The Old Way (A.K.A. The Hard Way)
	692
	693	If neither of the easy ways are possible, you're stuck with doing it
	694	the hard way. This means you'll need a separate spam alias (and notspam
	695	alias, if users are tagging mail) for each user. To do this, you will
	696	need to create an email address for each user, so that DSPAM can
	697	analyze and learn for that specific user. For example:
	698
	699	spam-bob: "\|/usr/local/bin/dspam --user bob --class=spam --source=error"
	700
	701	You will end up having one alias per mail user on the system, two if you
	702	do not use DSPAM's CGI quarantine (an additional one using notspam-). Be
	703	sure the aliases are unique and each username matches the name after the
	704	--user flag. A tool has been provided called dspam_genaliases. This tool
	705	will read the /etc/passwd file and write out a dspam aliases file that can
	706	be included in your master aliases table.
	707
	708	To report spam, the user should be instructed to forward each spam to
	709	spam-user@yourhost
	710
	711	It doesn't really matter what you name these aliases, so long as the flags
	712	being passed to dspam are correct for each user. It might be a good idea
	713	to create an alias custom to your network, so that spammers don't forward
	714	spam into it. For example, notspam-yourcompany-bob or something.
	715
	716	Note About Security:
	717
	718	You might be wondering if a user can forward a spam to another user's
	719	address, or whether a spammer can forward a spam to another user's
	720	notspam address. The answer is "no". The key to all mail-based retraining
	721	is the signature embedded in each email. The signature is stored with
	722	each user's own user id, and so not only does the incoming message have
	723	to bear a valid signature, but it also has to be stored on the system with
	724	the correct user id. This prevents any kind of alias abuse.
	725
	726	6. NIGHTLY MAINTENANCE AND HOUSEKEEPING CRONS
	727
	728	Non-SQL Based Nightly Purge
	729
	730	If you are NOT running a SQL-based solution, then you should configure
	731	dspam_clean to run under cron nightly. This clean tool will read all
	732	signature databases and purge signatures that are older than 14 days
	733	(configurable), purge abandoned tokens, and remove unimportant tokens.
	734	Without this tool, old signatures will continue to pile up.
	735	Be sure the user running cleanup has full read/write permissions on the
	736	DSPAM data files.
	737
	738	0 0 * * * /usr/local/bin/dspam_clean [options]
	739
	740	See the dspam_clean description for more information
	741
	742	SQL-Based Nightly Purge
	743
	744	SQL-Based solutions include a nightly SQL script to perform the same basic
	745	tasks as dspam_clean, and it does it much faster and with more finesse.
	746	You can find instructions about each driver's purge functions in
	747	the driver's README (doc/[driver].txt) for performing nightly
	748	maintenance. Most SQL drivers will include a purge script in the
	749	src/tools.[driver] directory. For example:
	750
	751	0 0 * * * mysql --user=[user] --pass=[pass] [db] < /path/to/purge-4.1.sql
	752
	753	Log Rotation
	754
	755	The system log and user logs can fill up fairly quickly, when all that's
	756	really needed to generate graphs are the last two to three weeks of data.
	757	You can configure a nightly log cleanup using dspam_logrotate:
	758
	759	0 0 * * * dspam_logrotate -a 30 -d /usr/local/var/dspam/data
	760
	761	7. NOTIFICATIONS
	762
	763	DSPAM is capable of sending three different notifications to users:
	764
	765	- A "First Run" message sent to each user when they receive their first
	766	message through DSPAM.
	767
	768	- A "First Spam" message sent to each user when they receive their first
	769	spam
	770
	771	- A "Quarantine Full" message sent to each user when their quarantine box
	772	is > 2MB in size (note: the 2MB limit is hardcoded in DSPAM).
	773
	774	These notifications can be activated by copying the txt/ directory from the
	775	distribution into DSPAM's home (by default /usr/local/var/dspam). You can
	776	alter the location of this directory by setting "TxtDirectory" in dspam.conf.
	777
	778	Example:
	779	/usr/local/var/dspam/txt/firstrun.txt
	780	/usr/local/var/dspam/txt/firstspam.txt
	781	/usr/local/var/dspam/txt/quarantinefull.txt
	782
	783	You will want to modify these templates prior to installing them to reflect the
	784	correct email addresses and URLs (look for 'example.org').
	785
	786	NOTE: The quarantine warning is reset when the user clicks 'Delete All', but
	787	is not reset if they use "Delete Selected". If the user doesn't wish to
	788	receive reminders, they should use the "Delete Selected" function instead
	789	of "Delete All".
	790
	791	You'll need to also set "Notifications" to "on" in dspam.conf.
	792
	793	8. THE WEB UI
	794
	795	The Web UI (CGI client) can be run from any executable location on
	796	a web server, and detects its user's identity from the REMOTE_USER
	797	environment variable. This means you'll need to use HTTP password
	798	authentication to access the CGI (Any type of authentication will work,
	799	so long as Apache supports the module). This is also convenient in that you
	800	can set up authentication using almost any existing system you have.
	801	The only catch is that you'll need the usernames to match the actual
	802	DSPAM usernames used the system. A copy of the shadow password file
	803	will suffice for most common installs.
	804
	805	The accompanying files in the webui/ folder should be copied into your
	806	document root and cgi-bin, as specified.
	807
	808	Note:
	809
	810	Some authentication mechanisms are case insensitive and will
	811	authenticate the user regardless of the case they type it in. DSPAM,
	812	on the other hand, is case sensitive and the case of the username used
	813	will need to match the case on the system. If you suffer from this
	814	authentication problem, and are certain all of your users' usernames are
	815	in lowercase, you can add the following line of code to the CGI right
	816	after the call to &ReadParse...
	817
	818	$ENV{'REMOTE_USER'} = lc($ENV{'REMOTE_USER'});
	819
	820	The CGI will need to function in the same group as the dspam agent in order
	821	to work with the files in dspam_home. The best way to do this is to create
	822	a separate virtualhost specifically for the CGI and assign it to run in the
	823	MTA group using Apache's suexec. If you are using procmail, additional
	824	configuration may also be necessary (see below).
	825
	826	Note:
	827
	828	Apache users do NOT take on the identity of the groups specified in
	829	/etc/group so you will need to specifically assign the group in
	830	httpd.conf.
	831
	832	Note about Procmail:
	833
	834	Because the DSPAM Web UI is a CGI script, DSPAM will not retain its
	835	setuid privileges when called. If you are running procmail, this will
	836	become a problem as procmail requires root privileges to deliver. The
	837	easiest hack around this is to create a procmail.dspam binary and make it
	838	setuid root, then make it executable only by the mail group (or
	839	whatever group DSPAM and the CGI run in).
	840
	841	The DSPAM Web UI has a minimal configuration inside the configure.pl script.
	842	You'll want to check and make sure all of the settings are correct. In
	843	most cases, the only that will be necessary to change are the large-scale
	844	or domain-scale flags.
	845
	846	BEFORE PROCEEDING:
	847	Check and make sure (Again) that the CGI user from Apache's httpd.conf is
	848	added as a trusted user in dspam.conf.
	849
	850	Default Preferences
	851
	852	Now would be a good time to set the system's default preferences. This can
	853	be done using the dspam_admin tool. For example:
	854
	855	dspam_admin ch pref default trainingMode TEFT
	856	dspam_admin ch pref default spamAction quarantine
	857	dspam_admin ch pref default spamSubject "[SPAM]"
	858	dspam_admin ch pref default enableWhitelist on
	859	dspam_admin ch pref showFactors off
	860
	861	The default preferences are used for any users who have not yet set their
	862	own preferences. You can also control which preferences the user may
	863	override by changing the "AllowOverride" settings in dspam.conf.
	864
	865	By default, the parameters specified on the commandline will be used (if
	866	any). If, however, a preference is found for the particular user those
	867	preferences will override the commandline.
	868
	869	GD Graphing Library
	870
	871	If you plan on leaving DSPAM's logging function enabled, and would like to
	872	produce pretty graphs for your users, the graph.cgi script requires the
	873	following be installed on your machine:
	874
	875	- GD Graphics Library (http://www.boutell.com/gd/)
	876	Compile with png support
	877
	878	- The following PERL modules:
	879	(http://www.perl.com/CPAN/modules/by-module/GD/)
	880
	881	. GD
	882	. GD-Graph3d
	883	. GDGraph
	884	. GDTextUtil
	885	. CGI
	886
	887	Typically this can be accomplished on the commandline:
	888
	889	perl -MCPAN -e 'install GD::Graph3d'
	890
	891	Configuring Administrators
	892
	893	Once you've configured the Web UI, you'll want to edit the 'admins' file to
	894	contain a list of users who are permitted to use the administration suite.
	895
	896	Configuring Sub-Administrators / Domain Level Administrators
	897
	898	It is possible to delegate the management of users to a list of sub-admins/
	899	domain level admins. To accomplish that you should edit the 'subadmins'
	900	file to contain a list of sub-admins/domain level admins which are permitted
	901	to switch their username while using the DSPAM control center.
	902
	903	Opt-In/Out
	904
	905	If you would like your users to be able to opt in/out of DSPAM filtering,
	906	add the correct option to the nav_preferences.html template, depending on
	907	your configuration (for example, if you have an opt-in system, you'll want to
	908	add the opt-in option). Note: This currently only works with the preferences
	909	extension, and not drop files.
	910
	911	<INPUT TYPE=CHECKBOX NAME=optIn $C_OPTIN$>
	912	Opt into DSPAM filtering
	913
	914	<INPUT TYPE=CHECKBOX NAME=optOut $C_OPTOUT$>
	915	Opt out of DSPAM filtering
	916
	917	1.2 TESTING
	918
	919	If you've installed from an RPM, there's a good chance that the packager
	920	went to the trouble of testing already. If you're building from sources,
	921	however, you'll need to find a way to ensure your configuration isn't broken.
	922
	923	Most software packages are supplied with a test suite to determine if the
	924	software is functioning properly. Since DSPAM's correct function relies
	925	primarily on having the correct permissions and mail server configuration,
	926	a test script fails to provide the level of testing required for such a
	927	package. The following exercise has been provided to test dspam's correct
	928	functioning on your system. This exercise does not test the Web UI, but only
	929	the core dspam agent.
	930
	931	Before running the test, you should have completed section 1.1's instructions
	932	for compiling and installing dspam as well as configured your mail server
	933	to support dspam.
	934
	935	1. Create a new user account on your system. It is important that this be a
	936	new account to prevent any unrelated email from being delivered during
	937	testing. Be sure to configure a spam alias for the test account.
	938
	939	2. Send a short (10 words or less) email to the account, and pick it up
	940	using your favorite mail client.
	941
	942	3. Run dspam_stats [username] on the server. You should see a value of 1
	943	for "TI" or "Total Innocent" as shown below:
	944
	945	dspam-test 0 TP 1 TN 0 FN 0 FP
	946
	947	If you receive an error such as "unable to open /usr/local/var/dspam... for
	948	reading", then the dspam agent is not configured correctly. The problem
	949	could exist in either your mail server configuration or one or more of the
	950	permissions on the directory or agent. Check your configuration and
	951	permissions, and repeat this step until the correct results are experienced.
	952
	953	4. Run dspam_dump [username] to get a complete list of tokens and their
	954	statistics. Each token should have an I: (innocent) hit count of 1. The
	955	tokens will be represented as 64-bit values, for example:
	956
	957	3126549390380922317 S: 0 I: 1 LH: Mon Aug 4 11:40:12 2003
	958	13884833415944681423 S: 0 I: 1 LH: Mon Aug 4 11:40:12 2003
	959	14519792632472852948 S: 0 I: 1 LH: Mon Aug 4 11:40:12 2003
	960	8851970219880318167 S: 0 I: 1 LH: Mon Aug 4 11:40:12 2003
	961
	962	To view statistics for a particular token, run dspam_dump [username] [token]
	963	where token is the plain-text token value. For example:
	964
	965	% dspam_dump bill FREE
	966	7717766825815048192 S: 00265 I: 00068 P: 0.7358
	967
	968	5. Forward the test message to the spam alias you've created for the test
	969	account. Provide enough time for the message to have processed.
	970
	971	6. Run dspam_stats [username] on the server again. Now, the value for TN
	972	should be zero and the value for FN (false negatives) should be 1 as shown
	973	below:
	974
	975	dspam-test 0 TP 0 TN 1 FN 0 FP
	976
	977	If this is not the case, check the group permissions of the dspam agent as
	978	well as the permissions your MTA uses when piping to aliases.
	979
	980	7. Run dspam_dump [username] again. make sure that _EVERY_ token now has an
	981	I: of zero and a S: of 1:
	982
	983	3126549390380922317 S: 1 I: 0 LH: Mon Aug 4 11:44:29 2003
	984	13884833415944681423 S: 1 I: 0 LH: Mon Aug 4 11:44:29 2003
	985	14519792632472852948 S: 1 I: 0 LH: Mon Aug 4 11:44:29 2003
	986	8851970219880318167 S: 1 I: 0 LH: Mon Aug 4 11:44:29 2003
	987
	988	If you have some tokens that do not have an S: of 1 or an I: of 0, the dspam
	989	signature was not found on the email, and this could be due to a lot of
	990	things.
	991
	992	1.3 TROUBLESHOOTING
	993
	994	Problem: No files are being created in the user directory
	995	Solution: Check the directory permissions of the directory. The user
	996	directory must be writable by the user the dspam agent is running
	997	as as well as the CGI user.
	998
	999	Problem: False positives are never being delivered
	1000	Solution: Your CGI most likely doesn't have the privileges required by
	1001	the LDA to deliver the messages. Make sure the CGI user is in
	1002	the correct group. Also consider setting the dspam agent to
	1003	setuid or setgid with the correct permissions.
	1004
	1005	Problem: My database is getting huge!
	1006	Solution: DSPAM's default training mode is TEFT. On top of this, the
	1007	purging defaults are very lax. You might consider switching to
	1008	TOE (Train-on-Error) mode training if you require a minimal
	1009	database. If you are willing to sacrifice accuracy for disk space,
	1010	disabling the 'chain' tokenizer from dspam.conf will prevent
	1011	the use of multi-word (chained) tokens, which will also cut your
	1012	database size considerably. You may also consider more frequent
	1013	calls to dspam_clean -p to purge neutral data, which comprises a
	1014	majority of most databases.
	1015
	1016	For more help, please see the DSPAM FAQ at http://dspam.sourceforge.net.
	1017
	1018	1.4 DSPAM TOOLS
	1019
	1020	A few useful tools have been provided to make DSPAM management a bit easier.
	1021	These tools include:
	1022
	1023	dspam_admin - A tool used to perform specific administrative functions. These
	1024	functions are usually included as part of an extensions package (such as
	1025	the preferences extension). Available functions are listed in the tool's
	1026	usage output.
	1027
	1028	dspam_train - Used to train and test a corpus of ham and spam (in maildir
	1029	format).
	1030	Syntax: dspam_train [username] [spam_dir] [nonspam_dir]
	1031	where username is the username of the user to apply the training to, and
	1032	the two dirs represent directories containing messages in individual
	1033	files (e.g. maildir/corpus format). dspam_train can be used on an existing
	1034	user's database, to further improve accuracy, or to train from scratch.
	1035	it also provides a solid test jig for testing the efficiency and accuracy
	1036	of a test corpus against the filter.
	1037	NOTE: dspam_train will automatically balance training of the corpus to
	1038	ensure both spam and nonspam are trained based on the ratio of
	1039	spam/nonspam. this means if you have twice as much spam as nonspam,
	1040	two spam will be trained for every nonspam.
	1041
	1042	dspam_dump - Dumps a DSPAM dictionary. This can be used to view the
	1043	entire contents of a user's dictionary, or used in combination
	1044	with grep to view a subset of data. Syntax: dspam_dump [username] [token]
	1045	where username is the DSPAM user's username. If a token is specified,
	1046	statistics only for that token will be printed.
	1047
	1048	dspam_clean - Performs nightly housecleaning by deleting old or useless
	1049	data from user data. If using the hash driver (hash_drv) please use
	1050	cssclean instead (see doc/README.cssclean)
	1051
	1052	dspam_clean performs the following operations:
	1053
	1054	1. Using the -s flag, dspam_clean will continue to perform stale signature
	1055	purging. If an age is specified, for example -s14, the age defined as the
	1056	default will be overridden. Specifying an age of 0 will delete all
	1057	signatures for the users processed.
	1058
	1059	2. Using the -p flag, dspam_clean will delete all tokens from a user's
	1060	database whose probability is between 0.35 and 0.65 (fairly neutral,
	1061	useless tokens) that fall beyond the default age. If an age is specified,
	1062	for example -p30, the age defined as the default will be overridden. It
	1063	is a good idea to use this type of clean with an age of 0 on users after
	1064	a lot of corpus training.
	1065
	1066	3. Using the -u flag, dspam_clean will delete all unused tokens from a
	1067	user's database. There are four different types of unused tokens:
	1068
	1069	- Tokens which have not been used for a long time
	1070	- Tokens which have a total hit count below 5
	1071	- Tokens which have only one spam hit
	1072	- Tokens which have only one innocent hit
	1073
	1074	Ages may be overridden by specifying a format such as -u30,15,10,10
	1075	where each number represents the respective age. Specifying an age of
	1076	zero will delete all unused tokens in the category. Defaults are set in
	1077	dspam.conf.
	1078
	1079	Optionally, usernames may be specified to override the default behavior of
	1080	processing all users.
	1081
	1082	Examples:
	1083
	1084	Process all users on the system using all clean operations:
	1085	dspam_clean -s -p15 -u90,30,15,15
	1086
	1087	Delete all of user 'dick' and 'jane's signatures:
	1088	dspam_clean -s0 dick jane
	1089
	1090	Perform a post-corpus training clean on user 'spot':
	1091	dspam_clean -p0 -u0,0,0,0 spot
	1092
	1093	Run dspam_clean with all default options, all clean modes enabled, on all
	1094	users on the system:
	1095	dspam_clean -s -p -u
	1096
	1097	NOTE: You may wish to only run certain cleaning modes depending on the type
	1098	of storage driver you are using. For example, the MySQL storage driver
	1099	includes a script which performs signature and unused token operations,
	1100	leaving only probability operations as useful. If you are using a SQL-based
	1101	storage driver, it is strongly recommended that you use the maintenance
	1102	scripts wherever possible for optimum efficiency.
	1103
	1104	dspam_stats - Displays the spam statistics for one or all users on the system.
	1105	Syntax: dspam_stats [username]. If no username is provided, all users
	1106	will be displayed. Displays TP (true positives), TN (true negatives),
	1107	FN (false negatives), and FP (false positives).
	1108
	1109	dspam_genaliases - Reads the /etc/passwd file and outputs a dspam aliases
	1110	table which can be included in the master aliases table. You may try
	1111	Art Sackett's generate_dspam_aliases tool at
	1112	http://www.artsackett.com/freebies/generate_dspam_aliases/ if you need
	1113	some better functionality. This will eventually be merged in as a
	1114	replacement for the existing tool.
	1115
	1116	dspam_merge - Merges multiple users' dictionaries together into one user's
	1117	dictionary (does not affect the merge users). This can be used to create
	1118	a seeded dictionary for a new user, or to copy a single user's dictionary
	1119	to a new file. This is great for building global dictionaries, but
	1120	crunches a lot of time and disk.
	1121
	1122	1.5 AGENT COMMANDLINE ARGUMENTS
	1123
	1124	The DSPAM agent (dspam) recognizes the following commandline arguments:
	1125
	1126	--user [user1 user2 ... userN]
	1127	Specifies the destination user(s) of the incoming message. DSPAM then
	1128	processes the message once for each user individually. If the message is to
	1129	be delivered, the $u (or %u) parameters of the arguments string will be
	1130	interpolated for the current user being processed.
	1131
	1132	--class=[spam\|innocent]
	1133	Tells DSPAM that the message being presented has already been classified by
	1134	the user. This flag should be used when a misclassification has occurred,
	1135	when the user is corpus-feeding a message, or an inoculation is being
	1136	presented. This flag must be used in conjunction with the --source flag.
	1137	Providing no classification invokes the SOP of DSPAM, which is to determine
	1138	the message's nature on its own.
	1139
	1140	--source=[error\|corpus\|inoculation]
	1141	Wherever --class is used, the source of the user-provided
	1142	classification must also be provided. The source is very important and
	1143	dramatically affects DSPAM's training behavior:
	1144
	1145	error: The message being presented was a message previously misclassified
	1146	by DSPAM. When 'error' is provided as a source, DSPAM requires that
	1147	the DSPAM signature be present in the message, and will use the
	1148	signature to recall the original training metadata. If the signature
	1149	is not present, the message will be rejected. In this source mode,
	1150	DSPAM will also decrement each token's previous classification's
	1151	count as well as the user totals.
	1152
	1153	You should use error only when DSPAM has made an error in
	1154	classifying the message, and should present the modified version of
	1155	the message with the DSPAM signature when doing so.
	1156
	1157	corpus: The message being presented is from a mail corpus, and should be
	1158	trained as a new message, rather than re-trained based on a
	1159	signature. The message's full headers and body will be analyzed and
	1160	the correct classification will be incremented, without its
	1161	opposite being decremented.
	1162
	1163	You should use corpus only when feeding messages in from corpus, not
	1164	for correcting errors.
	1165
	1166	inoculation: The message being presented is in pristine form, and should
	1167	be trained as an inoculation. Inoculations are a more
	1168	intense mode of training designed to cause DSPAM to
	1169	train the user's metadata repeatedly on previously unknown
	1170	tokens, in an attepmt to vaccinate the user from future
	1171	messages similar to the one being presented.
	1172
	1173	You should use inoculation only on honeypots and the like.
	1174
	1175	--deliver=[spam,[innocent\|nonspam],summary,stdout]
	1176	Tells DSPAM to deliver the message if its result falls within the criteria
	1177	specified. For example, --deliver=innocent or --deliver=nonspam will cause
	1178	DSPAM to only deliver the message if its classification has been determined
	1179	as innocent. Providing --deliver=innocent,spam or --deliver=nonspam,spam will
	1180	cause DSPAM to deliver the message regardless of its classification. This flag
	1181	provides a significant amount of flexibility for nonstandard implementations,
	1182	where false positives may not be delivered but spam is, and etcetera.
	1183
	1184	summary : Deliver (to stdout) a summary indentical to the output of message
	1185	classification:
	1186	X-DSPAM-Result: User; result="Innocent"; class="Innocent";
	1187	probability=0.0000; confidence=1.00;
	1188	signature=4b11c532158749980119923
	1189
	1190	stdout : Is a shortcut for for --deliver=innocent,spam --stdout
	1191
	1192	--stdout
	1193	If the message is indeed deemed "deliverable" by the --deliver flag, this
	1194	flag will cause DSPAM to deliver the message to stdout, rather than
	1195	the configured delivery agent.
	1196
	1197	--process
	1198	Tells DSPAM to process the message. This is the default behavior, and the
	1199	flag is implied unless --classify is used - but is a good idea to use to
	1200	avoid ambiguity.
	1201
	1202	--classify
	1203	Tells DSPAM only to classify the message, and not make any writes to the
	1204	user's metadata or attempt to deliver/quarantine the message.
	1205
	1206	NOTE: The output of the classification is specific to the user, not including
	1207	the output of any groups they might be affiliated with, so it is
	1208	entirely possible that the message would be caught as spam by the group,
	1209	even if it didn't appear in the classification. If you want to get
	1210	the classification for the GROUP, use the group name as the user
	1211	instead of an individual.
	1212
	1213	--signature=[signature]
	1214	For some implementations, the admin may wish to pass the signature in
	1215	via commandline instead of allowing DSPAM to find it on its own. This is
	1216	especially useful when front-ending the agent with other tools. Using this
	1217	option will set the active signature and will also forego reading of stdin.
	1218
	1219	--mode=[toe\|tum\|teft\|notrain\|unlearn]
	1220	Configures the training mode to be used for this process:
	1221
	1222	teft: Train-Everything. Trains on all messages processed. This is
	1223	a very thorough training approach and should be considered the
	1224	standard training approach for most users. TEFT may, however,
	1225	prove too volatile on installations with extremely high per-user
	1226	traffic, or prove not very scalable on systems with extremely large
	1227	user-bases. In the event that TEFT is proving ineffective, one of
	1228	the other modes is recommended.
	1229
	1230	NOTE: Until a user reaches 100 innocent messages in their
	1231	metadata, train-on-error will also be teft-based, even if
	1232	otherwise specified on the commandline.
	1233
	1234	toe: Train-on-Error. Trains only on a classification error, once the
	1235	user's metadata has matured to 2500 innocent messages. This
	1236	training mode is much less resource intensive, as only occasional
	1237	metadata writes are necessary. It is also far less volatile than
	1238	the TEFT mode of training. One drawback, however, is that TOE only
	1239	learns when DSPAM has made a mistake - which means the data is
	1240	sometimes too static, and unable to "ease into" a different type of
	1241	behavior.
	1242
	1243	tum: Train-until-Mature. This training mode is a hybrid between the other
	1244	two training modes and provides a great balance between volatility
	1245	and static metadata. TuM will train on a per-token basis only
	1246	tokens which have had fewer than 50 "hits" on them, unless an error
	1247	is being retrained in which case all tokens are trained. This
	1248	training mode provides a solid core of stable tokens to keep
	1249	accuracy consistent, but also allows for dynamic adaptation to any
	1250	new types of email behavior a user might be experiencing. It is a
	1251	balance of resources as well, as only less-than-mature tokens are
	1252	written to the database. NOTE: You should corpus train before
	1253	using tum.
	1254
	1255	notrain: No training. Do not train the user's data, and do not keep totals.
	1256	This should only be used in cases where you want to process mail for
	1257	a particular user (based on a group, for example), but don't want
	1258	the user to accumulate any learning data.
	1259
	1260	unlearn: Unlearn original training. Use this if you wish to unlearn a
	1261	previously learned message. Be sure to specify --source=error and
	1262	--class to whatever the original classification the message was
	1263	learned under. If not using TrainPristine, this will require the
	1264	original signature from training.
	1265
	1266	RECOMMENDATIONS:
	1267	In general, it is recommended that users begin with TEFT. If a user
	1268	is experiencing between a 75-85% spam ratio, they may benefit from
	1269	Train-on-Mature mode. If a user is experiencing over 90% spam, then
	1270	Train-on-Error mode should make a noticeable improvement in accuracy.
	1271	It eventually boils down to what works best for your users. There is
	1272	no reason a system could not be configured (with a script) to
	1273	analyze a user's *.stats file and determine the best training mode
	1274	for that user.
	1275
	1276	--feature=[no,wh,tb=N]
	1277	Specifies the features that should be activated for this filter instance.
	1278	The following features may be used individually or combined using a comma
	1279	as a delimiter:
	1280
	1281	no: Bayesian Noise Reduction (BNR). Bayesian Noise Reduction kicks in
	1282	at 2500 innocent messages and provides an advanced progressive
	1283	noise logic to reduce Bayesian Noise (wordlist attacks) in
	1284	spams. BNR is not for everyone, and so users should try it out
	1285	after they've trained to see if it helps improve accuracy.
	1286
	1287	tb=N: Sets the training loop buffering level.
	1288	Training loop buffering is the amount of statistical sedation
	1289	performed to water down statistics and avoid false positives
	1290	during the user's training loop. The training buffer sets the
	1291	buffer sensitivity, and should be a number between 0 (no buffering
	1292	whatsoever) to 10 (heavy buffering). The default is 5, half of
	1293	what previous versions of DSPAM used.
	1294	To avoid dulling down statistics at all during the training loop,
	1295	set this to 0. This feature should be disabled if you're not
	1296	paranoid about false positives, as it does increase the number of
	1297	spam misses significantly during training.
	1298
	1299	wh: Automatic whitelisting. DSPAM will keep track of the entire
	1300	"From:" line for each message received per user, and automatically
	1301	whitelist messages from senders with more than 10 innocent
	1302	messages and zero spams. Once the user reports a spam from the
	1303	sender, automatic whitelisting will automatically be deactivated
	1304	for that sender. Since DSPAM uses the entire "From:" line, and
	1305	not just the sender's email address, automatic whitelisting is
	1306	a very safe approach to improving accuracy during initial training.
	1307
	1308	NOTE: None of the present features are necessary when the source is "error",
	1309	because the original training data is used from the signature to
	1310	retrain, instantiating whatever features (such as whitelisting) were
	1311	active at the time of the initial classification. Since BNR is only
	1312	necessary when a message is being classified, the
	1313	--feature flag can be safely omitted from error source calls.
	1314
	1315	--daemon
	1316	Puts DSPAM in daemon mode; e.g. DSPAM acts like a server when started with
	1317	this parameter. See section 2.3 for more information about daemon mode.
	1318
	1319	2.0 LINKING WITH LIBDSPAM
	1320
	1321	Developers are able to link to the DSPAM core engine (libdspam) to provide
	1322	"drop-in" spam-filtering for their applications. Examples of the libdspam
	1323	API can be found in the example.c file included with this distribution.
	1324
	1325	<COMMERCIAL LICENSING>
	1326
	1327	IF YOUR PROJECT USES THE LIBDSPAM API, A GPL-COMPATIBLE OPEN SOURCE LICENSE
	1328	IS REQUIRED IN ORDER TO REDISTRIBUTE. IF YOU ARE DEVELOPING A CLOSED-SOURCE
	1329	APPLICATION OR APPLICATION THAT DOES NOT CONFORM TO GPL STANDARD, YOU MAY
	1330	NOT REDISTRIBUTE ANY APPLICATIONS USING LIBDSPAM WITHOUT A COMMERCIAL
	1331	LICENSE.
	1332
	1333	Please contact project administrators paulcockings@users.sourceforge.net
	1334	or sbajic@users.sourceforge.net for information about commercial licensing.
	1335
	1336	</COMMERCIAL LICENSING>
	1337
	1338	To link to libdspam, follow the instructions for compiling and installing
	1339	DSPAM. When compiled, the libdspam static and shared libraries are also
	1340	built. This library contains all the functions necessary to use dspam's
	1341	filtering in your application.
	1342
	1343	Your application will also need to link to the correct storage driver
	1344	libraries. If you are using libdspam in a multithreaded application, you
	1345	will need to either use a thread-safe storage driver or control access to
	1346	libdspam using a mutex lock.
	1347
	1348	If you are using libdspam in a multithreaded environment, each thread will
	1349	require its own DSPAM context. Fortunately, you can attach the same
	1350	database handle to each context using dspam_attach(). See the man page for
	1351	more information.
	1352
	1353	To build with the dspam API, you will also need the header files from
	1354	the distribution. You can copy these to /usr/include/dspam for ease of
	1355	use, and then use -I/usr/include/dspam
	1356
	1357	Please see example.c for API examples.
	1358
	1359	If you are interested in linking libdspam with your project and have
	1360	questions or concerns, please contact the dspam-devel@lists.sourceforge.net
	1361	mailing list.
	1362
	1363	2.1 CONFIGURING GROUPS
	1364
	1365	Groups enable a group of users to share information.
	1366
	1367	To create groups, you'll want to create a group configuration file. The location
	1368	of this file is defined as GroupConfig in dspam.conf, and defaults to
	1369	/usr/local/var/dspam/group. The format of the file is:
	1370
	1371	group1:type:user1,user2,user3
	1372	group2:type:*globaluser
	1373
	1374	DSPAM will read this file upon startup and determine if the user fits into
	1375	any particular group.
	1376
	1377	DSPAM supports the following group types:
	1378
	1379	SHARED
	1380	Enables users with similar email behavior to share the same dictionary
	1381	while still maintaining a private quarantine box. The benefits of this
	1382	type of group are faster learning, and sharing a single spam alias. Shared
	1383	groups can have both positive and negative effects on accuracy. If a shared
	1384	group consists of users with similar, predictable email behavior, the users
	1385	in the group can benefit from a larger dictionary of spam and faster
	1386	learning (especially for newcomers in the group). If a group consists of
	1387	users with different email behavior, however, the users in the group will
	1388	experience poor spam filtering and a higher number of false positives.
	1389
	1390	NOTE: The SQL-based storage drivers support shared groups, but has one caveat:
	1391	If you are NOT enabling "virtual users" support, you will need to create
	1392	an actual user on your system named after each group you create.
	1393
	1394	On top of shared group support, a shared group can also be made to be
	1395	'managed'. Using the group type 'SHARED,MANAGED' will cause the group to
	1396	share a single quarantine mailbox which could be managed by the group's
	1397	administrator (aka: the group name). This would enable one individual to
	1398	monitor quarantine for the entire group, however personal emails marked as
	1399	false positives could potentially be viewed as well. For this reason,
	1400	managed groups should only be used when this is not an issue.
	1401
	1402	NOTE: Use the dspam_stats tool to keep an eye on the effectiveness of
	1403	shared groups. If a shared group experiences poor performance, find
	1404	the users whose email behavior is inconsistent with that of the group
	1405	and remove them from the group.
	1406
	1407	The format for a shared or shared,managed group is:
	1408
	1409	group1:shared:user1,user2,userN
	1410	group2:shared,managed:user1,user2,userN
	1411	group3:shared:*@example.org
	1412	group4:shared:*
	1413
	1414	The group name (in the example above 'group1', 'group2', 'group3', 'group4')
	1415	can be anything you like. If you set the shared group to be managed then the
	1416	groupname (in the example above 'group2') will be used by DSPAM as the shared
	1417	group administrator.
	1418
	1419	The user/member list for shared group allows the following syntax:
	1420	user1 : Exact match of user with the name "user1"
	1421	* : Match any user
	1422	*@example.org : Match any user having '@example.org' at the end of ther
	1423	username. The matching only works for the '@' character.
	1424	You can not use something like '*user' to include user
	1425	'infouser', 'testuser', 'dummyuser', etc.
	1426
	1427	INOCULATION
	1428	An inoculation group allows users to maintain their own private dictionaries
	1429	with their own spam alias, but all members of the group will inoculate other
	1430	members with spams they manually forward into their alias. This allows users
	1431	to report spams to one another and maintain their own private dictionary.
	1432	Another advantage to this is that users do not necessarily have to share the
	1433	same email behavior.
	1434
	1435	VERSATILE LANGUAGE INOCULATION MESSAGES
	1436
	1437	A new Internet-Draft has been released to the public:
	1438
	1439	http://tools.ietf.org/html/draft-spamfilt-inoculation-01
	1440	http://tools.ietf.org/html/draft-yerazunis-spamfilt-inoculation-03
	1441
	1442	To create a message format standard for sending inoculation data via email.
	1443	This will allow users on different servers, and even using different
	1444	anti-spam tools to share inoculation information with one-another.
	1445
	1446	DSPAM presently implements support for this message standard with the
	1447	following limitations:
	1448
	1449	- Only inbound inoculation messages are supported. DSPAM does not yet send
	1450	out inoculations using this message format. This should not be confused
	1451	with local inoculation, which is supported.
	1452
	1453	- The message/inoculation format is the only inoculation type presently
	1454	supported. text/inoculation and multipart/inoculation coming soon.
	1455
	1456	- The only supported authentication mechanism is presently md5 verification
	1457	codes/checksums.
	1458
	1459	Any unsupported inoculations will simply be dropped.
	1460
	1461	A list of identifies and authentication information can be set up in the file
	1462	[username].inoc or in the user's home directory in a .inoc file if
	1463	homedir-dotfiles is enabled. The format of this file is:
	1464
	1465	sender1:shared secret
	1466	sender2:shared secret
	1467
	1468	Each sender should specify the correct sender id when sending an
	1469	inoculation, and should generate their checksum based on the shared secret
	1470	established between both parties.
	1471
	1472	NOTE: Users should only be added to an inoculation group after their initial
	1473	learning period, to avoid potential false positives due to lack of data.
	1474
	1475	The format for a innoculation group is:
	1476
	1477	group1:inoculation:user1,user2,userN
	1478	group2:inoculation:user3,user4,userN
	1479
	1480	The group name (in the example above 'group1', 'group2') can be anything you
	1481	like. It is not used by DSPAM and does even not have to be unique.
	1482
	1483	The user/member list for inoculation group allows the following syntax:
	1484	user1 : Exact match of user with the name "user1"
	1485
	1486	CLASSIFICATION
	1487	Classification groups allow a group of users to network their results
	1488	together. If DSPAM is uncertain of whether a message is spam or nonspam for
	1489	a group member, all other members of the group are queried. If another member
	1490	believes the message to be spam, it will be marked as spam. DSPAM is querying
	1491	the members one by one and stopps as soon as a member reports believes that
	1492	the message is spam.
	1493
	1494	The format for a classification group is:
	1495
	1496	group1:classification:user1,user2,userN
	1497	group2:classification:user3,user4,userN
	1498
	1499	The group name (in the example above 'group1', 'group2') can be anything you
	1500	like. It is not used by DSPAM and does even not have to be unique.
	1501
	1502	The user/member list for inoculation group allows the following syntax:
	1503	user1 : Exact match of user with the name "user1"
	1504
	1505	GLOBAL
	1506	Global groups allows DSPAM to provide a "SpamAssassin type out-of-the-box
	1507	filtering" for all new users until they have built their own useful
	1508	dictionaries. A global group can be created by adding a CLASSIFICATION
	1509	group definition (see above) but prefix the group member/user with a '*'.
	1510
	1511	The format for a global classification group is:
	1512
	1513	groupname:classification:*globaluser
	1514
	1515	This will automatically add user globaluser as a classification peer to all
	1516	users. Any user who has less than 1000 innocent messages or 250 spam messages
	1517	in their corpus, or whose filter is uncertain (confidence less than 0.65)
	1518	about a particular message will consult the globaluser dictionary for an
	1519	answer.
	1520
	1521	The Global group user (in this case 'globaluser') will need to be trained
	1522	using corpus, by using the dspam_merge tool, or other means. The Global
	1523	group user (in this case 'globaluser') is treated just as any other user on
	1524	the system.
	1525
	1526	The group name (in the example above 'groupname') can be anything you like. It
	1527	is not used by DSPAM and does even not have to be unique.
	1528
	1529	NOTE: Be sure and set your global user's preferences so that trainingMode
	1530	is set to TOE. This will prevent the purge tools you use from
	1531	purging them empty in 90 days.
	1532
	1533	MERGED
	1534	Merged groups are similar to global groups in that the entire system uses a
	1535	single global user as a parent. What's different is that the merged group is
	1536	merged with the individual user's training data at run-time, instead of
	1537	switching between the two. This allows the merged group to be treated like a
	1538	base dataset for all users, and provides for quicker learning and correction
	1539	than the previous approach. It is recommended merged groups are only used with
	1540	TOE-mode training so that only corrective data is stored, but systems with
	1541	ample amounts of disk may wish to run in TUM mode to learn the user's behavior
	1542	dynamically.
	1543
	1544	The group's data is merged with the user's data in real-time, so if you have:
	1545
	1546	Group : Viagra = 10 Spam Hits, 0 Innocent Hits
	1547	User1 : Viagra = 5 Spam Hits, 15 Innocent Hits
	1548	User2 : Viagra = 20 Spam Hits, 1 Innocent Hits
	1549
	1550	Then the token is loaded as:
	1551	User1 : Viagra = 15 Spam Hits, 15 Innocent Hits = 0.50 (50%) = neutral
	1552	User2 : Viagra = 30 Spam Hits, 1 Innocent Hits
	1553
	1554	No data is written to the group by DSPAM; only the user's data. This then
	1555	offsets the group's data without affecting other users. Because of the way
	1556	this data is merged, it's not recommended that you update the merged group
	1557	with more than a handful of messages periodically, as it affects how all
	1558	stats are defined for each user.
	1559
	1560	The format for a merged group is:
	1561
	1562	group1:merged:user1,user2,userN
	1563	group2:merged:user3,user4,userN
	1564
	1565	The group name (in the example above 'group1', 'group2') can be anything you
	1566	like and represents the name of the group user to merge with all members of
	1567	the group. DSPAM will use that group name (in the example above 'group1',
	1568	'group2') and merge at run-time the tokens from that group name with the tokens
	1569	of the user (if the user is member of the merged group).
	1570
	1571	The user/member list for merged group allows the following syntax:
	1572	user1 : exact match of user with the name "user1"
	1573	-user1 : exclude user with the name "user1"
	1574	* : match any user
	1575	*@example.org : match users having "@example.org" at the end of ther
	1576	username. The matching only works for the '@' character.
	1577	You can not use something like '*user' to include user
	1578	'infouser', 'testuser', 'dummyuser', etc.
	1579	-*@example.org : exclude users having "@example.org" at the end of their
	1580	username. The matching only works for the '@' character.
	1581	You can not use something like '-*user' to exclude user
	1582	'infouser', 'testuser', 'dummyuser', etc.
	1583
	1584	NOTE: Merged Groups are great for providing out-of-the-box adaptive filtering,
	1585	but allowing users to build their own data from scratch will still
	1586	result in the best possible accuracy in the longrun.
	1587
	1588	NOTE: Be sure and set your group user's preferences so that trainingMode is
	1589	set to TOE. This will prevent the purge tools you use from purging them
	1590	empty in 90 days.
	1591
	1592	RESTRICTIONS!
	1593
	1594	A user can simultaneously be a member of multiple classification / global
	1595	group(s) and multiple inoculation group(s), but a user cannot be a member
	1596	of both a classification / global group(s) or inoculation group(s) and a
	1597	shared or shared,managed group.
	1598
	1599	A user can not be member of:
	1600	* both a classification group and a global group
	1601	* multiple merged groups
	1602	* multiple shared or shared,managed groups
	1603	* both a shared group or shared,managed group and a merged group
	1604
	1605	2.2 EXTERNAL INOCULATION THEORY
	1606
	1607	Bill Yerazunis recently expressed his theory of inoculation on an anti-spam
	1608	development list, using the term "vaccination":
	1609
	1610	"Part of the problem is that spam isn't stationary, it evolves. That
	1611	pesky .1% error rate is in some part due to the base mutation rate of spam
	1612	itself. Maybe the answer is "vaccination". Vaccination is using _one_
	1613	person's misery be used to generate some protective agent that protects the
	1614	rest of the population; only the first person to get the spam actually has
	1615	to read it.
	1616
	1617	My expectation is this: say you have ten friends, and you all agree to share
	1618	your training errors. Each of you will (statistically) expect to be the
	1619	first to see a new mutation of spam about 9% of the time; the other ten
	1620	friends in this group will have their bayesian filter trained preemptively
	1621	to prevent this. Net result: you get a tenfold decrease in error rate -
	1622	down to 99.99% accuracy. With a hundred such (trusted) friends, you may be
	1623	down to 99.999% accuracy."
	1624
	1625	DSPAM has taken this concept and rolled it into support for what we call
	1626	"inoculation groups" providing the exact functionality Bill describes. This
	1627	could be considered an "internal inoculation" practice.
	1628
	1629	On top of this, DSPAM has been designed to support external inoculation as
	1630	a complement to internal inoculation. This is where instead of your internal
	1631	circle of friends inoculate you, you rely on external elements - namely
	1632	spammers themselves - to inoculate you.
	1633
	1634	The theory behind external inoculation is this: why put _anyone_ through
	1635	the misery of being the first to receive a new spam when you can have
	1636	the spammers themselves send it directly to you. On top of this,
	1637	external inoculation can be combined with internal inoculation by taking
	1638	the spam you received externally and inoculating your friends with it
	1639	internally.
	1640
	1641	Inoculation is a little different from learning, as inoculation causes
	1642	tokens to be given additional hit counts in an attempt to learn from a
	1643	single email. As a result, any form of inoculation should _only_ be
	1644	attempted after an initial learning phase (perhaps when your filtering
	1645	accuracy exceeds 99.0%). DSPAM inoculates like this:
	1646
	1647	1. Every token that doesn't already exist in the database, or have fewer
	1648	than two hits will be hit five times.
	1649
	1650	2. All other tokens are hit twice.
	1651
	1652	External inoculation is accomplished by creating a covert, external alias
	1653	that is configured to automatically inoculate your dictionary from any
	1654	messages it receives. The covert alias can then be published onto a series
	1655	of public newsgroups and websites where it is sure to be harvested by
	1656	a spammer's tools. One could even pro-actively subscribe one's self to
	1657	several different opt-in spam lists, etcetera.
	1658
	1659	The first step is to configure an alias. To do this you would use something
	1660	like:
	1661
	1662	bob_c: "\|/path/to/dspam --process --class=spam --source=inoculation --user bob"
	1663
	1664	The 'C' in bob is for 'Covert'. We must use a covert alias because if we
	1665	use something obvious like 'bob-spam', harvester tools will automatically
	1666	strip the -spam off and spam your real account.
	1667
	1668	Once the alias is set up, make sure this alias gets out only on lists where
	1669	harvesters will grab it, and nobody will send legitimate email to it.
	1670	It may even be a good idea to put it at the bottom of your tagline in all
	1671	your publicly archived emails, something like...
	1672
	1673	Spammers, send me mail here: bob_c@example.org
	1674
	1675	Finally, you can multiply the effects of this by sharing an inoculation
	1676	group with your friends. If all of your friends have a public covert
	1677	alias, then you will all be able to inoculate eachother should one of you
	1678	receive a spam to the account. What a great way to train your filter!
	1679
	1680	On top of this, should external inoculation become commonplace to the
	1681	point where harvesters are picking up an equal amount of them as legitimate
	1682	email addresses, spammers will start to realize that harvesters are just
	1683	plain too dumb to tell the difference (the spammers themselves couldn't tell
	1684	if mine was or not). This could, best case scenario, put an end to
	1685	harvester bots, making them obsolete as counter-productive tools.
	1686
	1687	2.3 CLIENT/SERVER MODE
	1688
	1689	DSPAM supports two different modes of operation. In standard operating
	1690	mode, the DSPAM agent is called by the MTA (or proxy) and each agent process
	1691	performs independently, establishing its own connection to a database and
	1692	performs delivery on its own. The second operating mode, client/server mode,
	1693	allows the DSPAM agent to act more like a thin client, connecting to the
	1694	DSPAM server process which then does all the work of analyzing and delivering
	1695	or quarantining the message. The advantages to using DSPAM in client/server
	1696	mode are:
	1697
	1698	- Maintaining a set of stateful database connections (within the server),
	1699	which should enhance performance on some systems by eliminating the need
	1700	to establish a new database connection for every message processed.
	1701
	1702	- Providing a central point of processing. Having one server perform all
	1703	processing and delivery, while having multiple thin clients on your mail
	1704	servers may be more desirable than having multiple agents performing
	1705	processing and delivery on all your servers.
	1706
	1707	- The DSPAM server speaks LMTP, which some implementations may be able to
	1708	take advantage of, eliminating the need for the DSPAM client all together.
	1709
	1710	- Having a single multithreaded daemon should use less memory and other
	1711	resources than having independently operating clients.
	1712
	1713	If you've already got DSPAM set up, client/server mode won't require any
	1714	changes to your mail server's configuration - it's completely transparent.
	1715
	1716	The DSPAM agent can be compiled with client/server support by configuring
	1717	with --enable-daemon. You will need to use a multithread-safe storage driver
	1718	(presently mysql_drv, pgsql_drv and hash_drv are supported). Once you have
	1719	compiled with daemon support, you'll need to modify your dspam.conf to
	1720	provide the settings necessary for client/server mode:
	1721
	1722	ServerHost 127.0.0.1
	1723
	1724	The host to listen on. The default is to comment this setting which will
	1725	force DSPAM to listen on all available interfaces.
	1726
	1727	ServerPort 24
	1728
	1729	The port to listen on. The default is 24, the LMTP port.
	1730
	1731	ServerQueueSize 32
	1732
	1733	The maximum number of connections which may remain backlogged before they
	1734	are accepted.
	1735
	1736	ServerPass.Relay1 "secret"
	1737	ServerPass.Relay2 "password"
	1738
	1739	Each client server allowed to connect should have its own password. They
	1740	can be defined here.
	1741
	1742	The DSPAM server can listen on either a network socket or a local unix
	1743	domain socket. If you're running the client and server on the same machine,
	1744	a domain socket should be used as it eliminates additional overhead. To use
	1745	a domain socket, you'll also need to add the following option:
	1746
	1747	ServerDomainSocketPath "/tmp/dspam.sock"
	1748
	1749	Once you've configured the server config, you'll want to set the client
	1750	configuration on all client machines. If you are using network sockets,
	1751	set the following to appropriate values:
	1752
	1753	ClientHost 127.0.0.1
	1754	ClientPort 24
	1755
	1756	Or if using a domain socket:
	1757
	1758	ClientHost /tmp/dspam.sock
	1759
	1760	In both cases, you'll need to set the client's authentication ident:
	1761
	1762	ClientIdent "secret@Relay1"
	1763
	1764	Now you're ready to go. To start the DSPAM server, run:
	1765
	1766	dspam --daemon &
	1767
	1768	Or alternatively, if you have debugging enabled:
	1769
	1770	dspam --debug --daemon &
	1771
	1772	The DSPAM agent can then be called the same as if you were running in
	1773	standard (non-client/server) mode and adding --client to the set of
	1774	parameters. Running dspam without --client specified will cause DSPAM to
	1775	revert to its normal non-daemon behavior and establish database connections
	1776	on its own. The client settings will be loaded from dspam.conf, and the
	1777	agent will act as a thin client instead. For example:
	1778
	1779	dspam --client --user dick jane --deliver=innocent -d %u
	1780
	1781	Alternatively, if you'd like to use a thinner client, dspamc is identical
	1782	to the dspam binary in behavior, but has been stripped down to only include
	1783	the lightweight client.
	1784
	1785	dspamc --user dick jane --deliver=innocent -d %u
	1786
	1787	The conversation that takes place between the client/server is LMTP-based,
	1788	and will look like this:
	1789
	1790	SERVER> 220 DSPAM DLMTP 3.10.0 Authentication Required
	1791	CLIENT> LHLO Relay1
	1792	SERVER> 250-PIPELINING
	1793	SERVER> 250-ENHANCEDSTATUSCODES
	1794	SERVER> 250-DSPAMPROCESSMODE
	1795	SERVER> 250 SIZE
	1796	CLIENT> MAIL FROM: <secret@Relay1> DSPAMPROCESSMODE="--deliver=innocent -d %u"
	1797	SERVER> 250 2.1.0 OK
	1798	CLIENT> RCPT TO: dick
	1799	SERVER> 250 2.1.5 OK
	1800	CLIENT> RCPT TO: jane
	1801	SERVER> 250 2.1.5 OK
	1802	CLIENT> DATA
	1803	SERVER> 354 Enter mail, end with "." on a line by itself
	1804	CLIENT> Subject: Cheap Viagra!
	1805	CLIENT>
	1806	CLIENT> Click Here: http://www.cheapviagra.example.org
	1807	CLIENT> .
	1808	SERVER> 250 2.0.0 <dick> Message accepted for delivery: INNOCENT
	1809	SERVER> 250 2.0.0 <jane> Message accepted for delivery: SPAM
	1810
	1811	Optionally, if you'd like the clients to perform delivery, you can use
	1812	DSPAM's --stdout or --classify functionality to obtain a dump of the message
	1813	or results, respectively. From there, it's up to you and your MTA to
	1814	deliver the message. The DSPAM client will output the results to stdout in
	1815	this case, just as it would in standard operating mode.
	1816
	1817	Once the server is running, its configuration can be reloaded with a SIGHUP.
	1818	When the daemon is reloaded, the following occurs:
	1819
	1820	- The daemon stops listening for new requests
	1821	- All threads are allowed to finish processing and exit
	1822	- All connections to the database are closed
	1823	- The dspam.conf configuration is reloaded
	1824	- All connections to the database are re-opened
	1825	- The daemon starts listening for new requests
	1826
	1827	This allows database and listener configurations to also be reloaded from
	1828	dspam.conf without the need to interrupt the process.
	1829
	1830	NOTE: During the period of time the daemon is reloading, client connections
	1831	will fail. Depending on how the MTA reacts, this may cause messages to
	1832	fall back to queue or to bounce.
	1833
	1834	2.4 LMTP
	1835
	1836	DSPAM supports LMTP both on the front-end and back-end (delivery). This
	1837	section will briefly provide instructions for configuring either or both of
	1838	these advanced options.
	1839
	1840	LMTP (AND SMTP) DELIVERY
	1841
	1842	DSPAM supports LMTP delivery for admins who would prefer to use this instead
	1843	of local delivery. While LMTP delivery doesn't _require_ operating in
	1844	daemon mode, it is necessary to compile DSPAM with --enable-daemon to take
	1845	advantage of LMTP delivery. To configure LMTP delivery, perform the following
	1846	steps:
	1847
	1848	1. Compile DSPAM with --enable-daemon to enable LMTP delivery code
	1849
	1850	2. Configure your DeliveryHost and DeliveryIdent in dspam.conf. Set
	1851	DeliveryProto based on whether you would like to delivery via LMTP or SMTP.
	1852
	1853	NOTE: If you would like to delivery to different hosts based on domain,
	1854	specify DeliveryHost.example.org as the configuration directive. Use
	1855	DeliveryPort.example.org to specify a port for the delivery.
	1856
	1857	3. Add the --lmtp-recipient flag to the arguments passed into DSPAM. This is
	1858	used to specify the destination address for the message. For example, in
	1859	postfix:
	1860
	1861	--lmtp-recipient=${recipient}
	1862
	1863	DSPAM will then connect to the specified host, and deliver using a standard
	1864	LMTP looking like:
	1865
	1866	LHLO [ident]
	1867	MAIL FROM:<> SIZE=[message_length]
	1868	RCPT TO: <recipient>
	1869	DATA
	1870	[Message]
	1871	.
	1872
	1873	LMTP SERVER
	1874
	1875	DSPAM supports a "daemon" mode where it will sit and listen for inbound
	1876	connections. Depending on how the server is configured, DSPAM can speak
	1877	either standard LMTP (for interaction with a mail server, such as postfix)
	1878	or DLMTP (DSPAM LMTP) which is a proprietary implementation of LMTP between
	1879	the DSPAM client and server. If you plan on calling DSPAM from the commandline
	1880	via dspamc, but wish to have a stateful daemon perform processing, then
	1881	you'll want to use the "dspam" server mode. If you want to call DSPAM by
	1882	having your mail server connect to it via LMTP, then you'll need to specify
	1883	the "standard" server mode.
	1884
	1885	The ServerMode can be set in dspam.conf. Each mode has its own custom
	1886	tweaks and configurations that will need to be set in dspam.conf.
	1887
	1888	"dspam" mode settings.
	1889	In "dspam" mode, you'll need to set up authentication for each dspam client
	1890	relay. This involves configuring the relay ident and password. Examples are
	1891	provided.
	1892
	1893	"dspam" mode notes.
	1894	In dspam mode, only the dspam client will be connecting to your LMTP server.
	1895	This can be dspamc (a thin-client) or the dspam binary. In either case,
	1896	you'll need to specify --client to tell DSPAM to act as a client. DLMTP
	1897	allows the client to pass in any commandline arguments provided, so it should
	1898	function identical to if you were running it as a dedicated (non-stateful)
	1899	process.
	1900
	1901	"standard" mode settings.
	1902	In "standard" mode, you will need to configure the ServerParameters flag to
	1903	reflect the commandline parameters you would normally want to pass to DSPAM.
	1904
	1905	"standard" mode notes.
	1906	One thing to watch out for is that the recipient you're sending via LMTP is
	1907	unique to a specific user. This means that all of your aliases should be
	1908	resolved before the MTA relays to DSPAM. Because DSPAM uses the addresses in
	1909	the RCPT TO as usernames, _not_ resolving any aliases will result in
	1910	multiple databases being created for one user. Since the signature will be
	1911	different for each user, and since the message must be processed
	1912	differently for each user, DSPAM demultiplexes a multi-recipient email. This
	1913	means that while it can receive an email with multiple RCPT TO's specified, it
	1914	will perform delivery individually.
	1915
	1916	"auto" mode setting.
	1917	If you would like to support both connecting MTAs and remote dspam client
	1918	processes (such as for inoculations), you can set the server mode to auto,
	1919	which will base its dialect on the ident supplied in the LHLO. If the LHLO
	1920	ident matches an ident in dspam.conf's ServerPass section, the server will
	1921	default to DLMTP. Otherwise, DSPAM will assume the client is a standard
	1922	LMTP client and speak standard LMTP.
	1923
	1924	LOCAL DELIVERY WITH LMTP FRONT-END
	1925
	1926	In some circumstances, you may want to relay to DSPAM via LMTP, but have
	1927	DSPAM deliver via LDA. In these cases, you may use the following
	1928	conventions in your ServerParameters configuration:
	1929
	1930	%r - The RCPT TO passed in via LMTP
	1931	%s - The MAIL FROM passed in via LMTP
	1932
	1933	In both cases, the content provided between < > is what is actually used.
	1934
	1935	2.5 DSPAM USER PREFERENCES
	1936
	1937	Preferences are settings that can be configured globally in dspam.conf or
	1938	for individual users via the dspam_admin command.
	1939
	1940	trainingMode { TOE \| TUM \| TEFT \| NOTRAIN }
	1941	How DSPAM should train messages it analyzes. See section 1.5 --mode
	1942	(default:teft, see dspam.conf)
	1943
	1944	spamAction { quarantine \| tag \| deliver }
	1945	What to do with spam. The tag and deliver options both deliver, but tag
	1946	adds a special prefix to the subject, whereas deliver merely sets
	1947	X-DSPAM-Result. (default:quarantine)
	1948
	1949	spamSubject
	1950	A customized subject to prefix when spamAction=tag. (default:[SPAM])
	1951
	1952	statisticalSedation { 0 - 10 }
	1953	The level of dampening during training (0-10, 0 = no dampening, default:0)
	1954
	1955	enableBNR { on \| off }
	1956	Enables or disables bayesian noise reduction (default:off)
	1957
	1958	enableWhitelist { on \| off }
	1959	Enables or disables automatic whitelisting (default:on)
	1960
	1961	signatureLocation { message \| headers }
	1962	Where to place the DSPAM signature. Placement affects forwarding approach.
	1963	(default:message)
	1964
	1965	tagSpam / tagNonspam { on \| off }
	1966	Adds a tagline to the end of a message based on its classification; useful
	1967	for things such as "Scanned by your ISP example.org". If set to on, the file
	1968	msgtag.spam and/or msgtag.nonspam will be looked for in "TxtDirectory"
	1969	(see dspam.conf) and appended to appropriate messages.
	1970
	1971	NOTE: Signed messages will not be tagged in this fashion
	1972
	1973	showFactors { on \| off }
	1974	Whether to include an X-DSPAM-Factors header including decision-making
	1975	factors (clues). NOTE: This can break RFC in some cases, and should only
	1976	be used for debugging. (default:off)
	1977
	1978	optIn / optOut { on \| off }
	1979	Depending on whether the system is opt-in or opt-out, sets the user's
	1980	membership. If user is opted out (or not opted in), mail will be delivered
	1981	by DSPAM without being processed.
	1982
	1983	whitelistThreshold { Integer }
	1984	Overrides the default number of times a From: header has been seen before
	1985	it is automatically whitelisted. (default:10)
	1986
	1987	makeCorpus { on \| off }
	1988	When activated, a maildir-style corpus is maintained in the user's data
	1989	directory (DSPAM_HOME/DATA/USERNAME), suitable for future retraining or
	1990	other analysis. (default:off)
	1991
	1992	storeFragments { on \| off }
	1993	When activated, the first 1k of each message are temporarily stored on
	1994	the server for reference via the webui's history function. (default:off)
	1995
	1996	localStore { on \| off }
	1997	Overrides the directory name used for the user's dspam data directory. This
	1998	is useful when using recipient addresses as usernames, as it will allow
	1999	all addresses belonging to a specific user to be written to a single
	2000	webui directory. (default:username)
	2001
	2002	processorBias { on \| off }
	2003	Overrides the "bias" setting in dspam.conf, which biases mail as
	2004	innocent. (default:on, see dspam.conf)
	2005
	2006	fallbackDomain { on \| off }
	2007	Allows a dspam user ("@example.org") to be marked as a fallback user for
	2008	the entire domain, so if the destination dspam user does not exist in
	2009	the database, the fallback user's database will be used. The
	2010	dspam.conf "FallbackDomains" setting must also be "on". (default:off)
	2011	NOTE: You will need to set "FallbackDomains on" in dspam.conf to use this.
	2012
	2013	trainPristine { on \| off }
	2014	Override's the default signature mode and treats messages as if they were
	2015	in pristine format when retraining. This requires all retraining to use
	2016	the original message that was processed as no dspam signature is stored
	2017	for pristine training. (default:off)
	2018
	2019	optOutClamAV { on \| off }
	2020	Opts out of ClamAV virus scanning (if ClamAV is directly integrated with
	2021	dspam via dspam.conf). (default:off)
	2022
	2023	ignoreRBLLookups { on \| off }
	2024	Overrides the "Lookup" setting in dspam.conf, which lookups senders IP
	2025	addresses in a Realtime Blackhole List (RBL). (default:off)
	2026
	2027	RBLInoculate { on \| off }
	2028	Overrides the "RBLInoculate" setting in dspam.conf, which inoculates mail
	2029	as spam if lookup result is positive. (default: depending on dspam.conf)
	2030
	2031	NOTE: This user preference has higher weight then the one set in dspam.conf.
	2032	If you don't set this user preference to on/off then whatever is set in
	2033	dspam.conf will be used for every user.
	2034
	2035	2.6 FALLBACK DOMAINS
	2036
	2037	Fallback domains allow you to default some or all users for a particular
	2038	domain to a single domain user; this allows you to set preferences (including
	2039	opting out of filtering entirely) for users based on domain name. Any user
	2040	who does not exist as a known user to DSPAM will be defaulted to the
	2041	domain it belongs to if it is designated as a fallback domain. This
	2042	means that you can create bob@example.org and alice@example.org with their own
	2043	databases and preferences, but also default all other users to @example.org.
	2044	Alternatively, you could create just the domain without any other users and
	2045	default all users to @example.org
	2046
	2047	To use fallback domains, you'll first need to activate this feature in
	2048	dspam.conf:
	2049
	2050	FallbackDomains on
	2051
	2052	Next, you'll need to create a dspam user for each domain you wish to use
	2053	as a fallback domain. For example, @example.org. Depending on your
	2054	implementation, this may be a simple insert into dspam_virtual_uids or may
	2055	be created automatically when setting a user's preferences.
	2056
	2057	Finally, designate that special user as a fallback domain by setting a
	2058	preference:
	2059
	2060	dspam_admin ch pref @example.org fallbackDomain on
	2061
	2062	Any mail coming in for that domain that does _not_ match a known user in
	2063	dspam will now fall back to this user; you can then set specific preferences
	2064	or even opt out the entire user. Alternatively, you can create a domain-based
	2065	database for filtering mail specific to that domain, just as you would a
	2066	normal user.
	2067
	2068	2.7 EXTERNAL USER LOOKUP
	2069	External User Lookup has two major applications. It allows DSPAM to validate
	2070	the supplied username in setups where users are Opt'ed-In by default, and there
	2071	is no prior recipient checking from the MTA. In those cases, it can be configured
	2072	not to automatically create the user entries in the DSPAM system and thus spare
	2073	you from polute the DSPAM database with inexistent users.
	2074	The other application is when you need username rewritting/mapping. That will
	2075	happen when you need to map several email addresses (aliases) into a single
	2076	user account or when you wish to integrate DSPAM into systems where the users
	2077	email addresses or usernames can change. This will allow you to define alternate
	2078	static identifiers while still keeping the users DSPAM dictionaries, across
	2079	username/email address change, without dictionary maintenance.
	2080
	2081	Currently, there are three different modes of operation and two backend lookup
	2082	drivers. The mode can be set using the ExtLookupMode directive and the available
	2083	possibilities are:
	2084
	2085	verify - It will verify that the supplied username exists in lookup backend. In
	2086	the event that it cannot be verified, DSPAM will not create the user entry in it's
	2087	backend facilities.
	2088
	2089	map - It will NOT verify that the supplied username exists in the lookup backend.
	2090	It will, though, try to use the lookup backend to map (rewrite) the username. If
	2091	There is a map/rewrite available, it will use the retrieved username, instead of
	2092	the supplied one. On the other hand, if there is no map/rewrite available, DSPAM
	2093	will use the supplied username and create the respective entries in it's backend.
	2094
	2095	strict - It will enforce both verify AND map modes. Meaning that it will rewrite
	2096	the username, if a rewrite is available, and will also only create that user entry
	2097	in it's backend system if there was a successful map/rewrite.
	2098
	2099	The backend lookup drivers available are only two at the moment, LDAP and Program.
	2100	The LDAP drivers allows DSPAM to query an LDAP server for a custom attribute, defined
	2101	by the ExtLookupLDAPAttribute directive. The query can be fine grained using the
	2102	ExtLookupQuery directive to provide a standard LDAP filter, where %u will be replaced
	2103	by the username provided to DSPAM. Literal percentage can used if escaped with
	2104	another % sign, i.e., %% will match % in the query filter.
	2105	The Program driver exists because this seemed a neat feature and not every one
	2106	uses LDAP. In this case, the ExtLookupServer directive will be used to define
	2107	the custom program/script call, with the respective arguments. Also here %u can
	2108	be used to define the provided username and literal % can be achieved by escaping
	2109	the percentage sign with another '%'. Using the program driver, DSPAM will use
	2110	whatever was the first line output of the program/script execution.
	2111
	2112
	2113	3.0 BUGS, FEATURE REQUESTS
	2114
	2115	Please use our Bug Tracker on the sourceforge project page at
	2116	http://sourceforge.net/projects/dspam for the current known bugs list and
	2117	proper reporting procedure.
	2118
	2119	In the same place you can ask for new feature via the Feature Request Tracker.
	2120
	2121	Please note that everything under contrib/ is not officially supported by the
	2122	DSPAM Project but by the respective authors; however, in order to help the
	2123	authors, facilitate integration with DSPAM and release procedures, we provide
	2124	a bug tracker for each script/plugin at the same URL.
	2125
	2126	3.1 PORTS / PACKAGES
	2127
	2128	The DSPAM Project does not provide binary packages of DSPAM. Each
	2129	OS/distribution has its own contributors (they know perfectly their
	2130	distribution's policy, their special guidelines, testing procedures, etc.).
	2131
	2132	Take a look at the DSPAM Wiki for packages/ports for various distributions located
	2133	at http://sourceforge.net/apps/mediawiki/dspam/index.php?title=Main_Page or read
	2134	http://dspam.sourceforge.net
	2135
	2136	If you wish to port DSPAM to an other OS/distro/platform and need help or have
	2137	patches you would like to be merged in the repo please email
	2138	dspam-devel@lists.sourceforge.net mailing list.
	2139
	2140
	2141	Note:
	2142
	2143	In order to keep DSPAM unencumbered by intellectual property abuses, all
	2144	external contributors to the project are asked to release any rights to the
	2145	submission. This keeps the DSPAM project a healthy, unencumbered GPL project.
	2146	Please accompany your patch, code, or other submission with the following
	2147	statement. By submitting a patch to the project, you agree to be bound by
	2148	the terms of this statement whether it is specifically included in the
	2149	submission or not, however we still require that it be attached to the
	2150	submission:
	2151
	2152	The author or authors of this submission hereby release any and all
	2153	copyright interest in this code, documentation, or other materials
	2154	included to the DSPAM project and its primary governors. We intend this
	2155	relinquishment of copyright interest in perpetuity of all present and
	2156	future rights to said submission under copyright law.
	2157
	2158	3.2 GIT ACCESS
	2159
	2160	The DSPAM source tree can be downloaded via read-only git access using the
	2161	following commands:
	2162
	2163	git clone git://dspam.git.sourceforge.net/gitroot/dspam/dspam

Note: See TracBrowser for help on using the repository browser.

Download in other formats: