Context Navigation

source: npl/mailserver/dspam/dspam-3.10.2/README @ c5c522c

gcc484ntopperl-5.22

Last change on this file since c5c522c was c5c522c, checked in by Edwin Eefting <edwin@datux.nl>, 9 years ago
initial commit, transferred from cleaned syn3 svn tree
Property mode set to `100644`
File size: 96.3 KB

Line
1	DSPAM v3.10.2
2	COPYRIGHT (C) 2002-2012 DSPAM Project
3	http://dspam.sourceforge.net/
4
5	LICENSE
6
7	This program is free software: you can redistribute it and/or modify
8	it under the terms of the GNU Affero General Public License as
9	published by the Free Software Foundation, either version 3 of the
10	License, or (at your option) any later version.
11
12	This program is distributed in the hope that it will be useful,
13	but WITHOUT ANY WARRANTY; without even the implied warranty of
14	MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
15	GNU Affero General Public License for more details.
16
17	You should have received a copy of the GNU Affero General Public License
18	along with this program. If not, see <http://www.gnu.org/licenses/>.
19
20	CREDITS
21
22	Original Work By
23	Lead development till 3.8.0: Jonathan A. Zdziarski <jonathan@nuclearelephant.com>
24	Lead development after 3.8.0: Stevan Bajic <stevan@bajic.ch>
25	PostgreSQL driver: Rustam Aliyev <rustam@azernews.com>
26	External Lookup module: Hugo Monteiro <hugo.monteiro@fct.unl.pt>
27	Various:
28	Feb/2006 Cove Schneider <cove@wildpackets.com>
29	Jan/2006 Norman Maurer <nm@byteaction.de>
30
31	Your name is missing? Let us know with a reference to your commit, and we'll
32	add you to the list.
33
34	COPYRIGHT
35
36	As of 12 January 2009 the copyright is owned by the DSPAM Project, represented
37	by a team of people, including:
38	Alexander Prinsier
39	Dov Zamir
40	Hugo Monteiro
41	Ion-Mihai Tetcu
42	Paul Cockings
43	Stevan Bajic
44
45	TABLE OF CONTENTS
46
47	General DSPAM Information
48
49	1.0 About DSPAM
50	1.1 Installation and Configuration
51	1.2 Testing
52	1.3 Troubleshooting
53	1.4 DSPAM Tools
54	1.5 Agent Commandline Arguments
55
56	Advanced DSPAM functionality
57
58	2.0 Linking with libdspam
59	2.1 Configuring groups
60	2.2 External Inoculation Theory
61	2.3 Client/Server Mode
62	2.4 LMTP
63	2.5 DSPAM User Preferences
64	2.6 Fallback Domains
65	2.7 External User Lookup
66
67	Miscellaneous
68
69	3.0 Bugs, Feature Requests
70	3.1 Ports / Packages
71	3.2 GIT Access
72
73	1.0 ABOUT DSPAM
74
75	DSPAM is an open-source, freely available anti-spam solution designed to combat
76	unsolicited commercial email using advanced statistical analysis. In short,
77	DSPAM filters spam by learning what spam is and isn't. It does this by learning
78	each user's individual mail behavior. This allows DSPAM to provide
79	highly-accurate, personalized filtering for each user on even a large system
80	and provides an administratively maintenance free solution capable of learning
81	each user's email behaviors with very few false positives.
82
83	While DSPAM is focused around spam filtering, many have found alternative
84	uses for all types of two-concept document classification.
85
86	DSPAM is rapidly gaining a large support forum and being used in many large-
87	scale implementations. Contributions to the project are welcome via the
88	dspam-dev mailing list or in the form of financial contributions.
89
90	Many of the foundational principles incorporated into this software were
91	contributed by Paul Graham's white paper on combatting spam, which can be
92	found at http://paulgraham.com/spam.html. Much research and development has
93	resulted in many new approaches being added onto the DPSAM project as well,
94	some of which are explained in white papers on the DSPAM home page.
95
96	DSPAM can be implemented as a total solution, or as a library which developers
97	may link their projects to the dspam core engine (libdspam) in accordance with
98	the GPL license agreement. This enables developers to incorporate libdspam as
99	a "drop-in" for instant spam filtering within their applications - such as mail
100	clients, other anti-spam tools, and so on.
101
102	PLEASE NOTE: DSPAM and libdspam are distributed under the AGPL license, not the
103	LGPL. Commercial licensing is available for those who seek to redistribute
104	DSPAM or some of DSPAM's components/libraries in their non-GPL products.
105	Please contact us for more information about commercial licensing.
106
107	The DSPAM package is split up into the following pieces:
108
109	DSPAM AGENT
110
111	The DSPAM agent is the command center for all shell and daemon operations.
112	If you're using DSPAM as a filtering solution, this is the 'dspam' (or dspamc)
113	binary you're likely going to be talking to via commandline.
114
115	LIBDSPAM: CORE ENGINE
116
117	The DSPAM core processing engine, also known as libdspam, provides all critical
118	spam filtering functions. The engine is embedded into other dspam components
119	(such as the agent) and is responsbile for the actual filtering logic.
120	If you're not a developer, you don't need to be concerned with this component
121	as it is automatically compiled in with the build.
122
123	WEB UI
124
125	The Web UI (User Interface) is designed to allow end-users to review their
126	spam quarantine and history, graphs, and to delete their spam permanently.
127	They can also optionally use the quarantine to perform all of their training.
128	The UI also includes some basic administrative tools to change settings and
129	manage user quarantines.
130
131	TOOLS
132
133	Some basic tools which have been provided to manage dictionaries, automate
134	corpus feeding, and perform other diagnostic operations related to DSPAM.
135	Some of these include dspam_train, dspam_stats, and dspam_dump.
136
137	HISTORY OF COPYRIGHT
138
139	Original work was done by Jonathan A. Zdziarski.
140
141	In 2006 the copyright was handed over to Sensory Networks.
142
143	In 2009 Sensory Networks handed over the full copyright to the DSPAM Project,
144	represented by a team of people, including:
145	Alexander Prinsier
146	Dov Zamir
147	Hugo Monteiro
148	Ion-Mihai Tetcu
149	Paul Cockings
150	Stevan Bajic
151
152	1.1 INSTALLATION
153
154	IMPLEMENTATION OPTIONS
155
156	There are many different ways to deploy DSPAM onto an existing network. The
157	most popular approaches are:
158
159	1. As a delivery agent proxy
160
161	When your mail server gets ready to deliver mail to a user's mailbox it calls
162	a delivery agent of some sort. On most UNIX systems, this is procmail, maildrop,
163	mail.local, or a similar tool. When used as a delivery proxy, the DSPAM agent
164	is called in place of your existing agent - or better put, it can masquerade
165	as the local delivery agent. DSPAM then processes the message and will call
166	the /real/ delivery agent to pass the good mail into the user's mailbox,
167	quarantining the bad mail. DSPAM can optionally tag and deliver both spam
168	and legitimate mail.
169
170	In the diagram below, MTA refers to Mail Transfer Agent, or your mail server
171	software: Postfix, Sendmail, Exim, etc. LDA refers to the Local Delivery
172	Agent: Procmail, Maildrop, etc..
173
174	BEFORE:
175
176	[MTA] ---> [LDA] ---> (User's Mailbox)
177
178	AFTER:
179
180	[MTA] ---> [DSPAM] ---> [LDA] ---> (User's Mailbox)
181	\
182	\--> [Quarantine]
183	[End User] ------> [Web UI]
184
185	2. As a POP3 Proxy
186
187	If you don't want to tinker with your existing mail server setup, DSPAM can
188	be combined with one of a few open source programs designed to act as a POP3
189	proxy. This means spam is filtered whenever the user checks their mail,
190	rather than when it is delivered. The benefit to this is that you can set up
191	a small machine on your network that will connect to your existing mail server,
192	so no integration is needed. It also allows your users to arbitarily point their
193	mail client at it if they desire filtering. The drawback to this approach is
194	that the POP3 protocol has no way to tell the mail client that a message is
195	spam, and so the user will have to download the spam (tagged, of course).
196
197	BEFORE:
198
199	[End User] ---> [POP3 Server]
200
201	AFTER:
202
203	[End User] ---> [POP3 Proxy] <--> [DSPAM]
204	\
205	\--> [POP3 Server]
206
207	3. As an SMTP Relay
208
209	Newer versions of DSPAM have seen features that allow it to function more
210	easily as an SMTP relay. An SMTP relay sits in front of your existing mail
211	server (requiring no integration). To use an SMTP relay, the MX records for
212	your domains are repointed to the relay machine running DSPAM. DSPAM then
213	relays the good (and optionally bad) mail to the existing SMTP server. This
214	allows you to use DSPAM with even a Windows-based destination mail server
215	as no integration is necessary. See doc/relay.txt for one example of how to
216	do this with Postfix.
217
218	BEFORE:
219
220	{ Internet } ---> [Company Mail Server]
221
222	AFTER:
223
224	{ Internet } ---> [ Inbound SMTP Relay ] ---> [Company Mail Server]
225	( MTA <> DSPAM ) SMTP
226	\ or
227	\--> [Quarantine] LMTP
228	[End User] ------> [Web UI]
229
230	UPGRADING DSPAM
231
232	Please see the file UPGRADING
233
234	FRESH INSTALLATION
235
236	0. PREREQUISITES
237
238	DSPAM can use one of many different backends to store its information, and
239	you will need to decide on one and install the appropriate software before
240	you can build DSPAM. The following storage backends are presently available:
241
242	Driver Requirements
243	-------------------------------------------------------------------------
244	T mysql_drv: MySQL client libraries (and a server to connect to)
245	T pgsql_drv: PostgreSQL client libraries (and a server to connect to)
246	sqlite_drv: SQLite v2.7.7 or above (scheduled for removal)
247	sqlite3_drv: SQLite v3.x
248	*T hash_drv: None (Self-Contained Hash-Based Driver)
249
250	Legend:
251	* Default storage driver
252	T Thread-safe (Required for running DSPAM in server daemon mode)
253
254	In general, MySQL is one of the faster solutions with a smaller storage
255	footprint and is well suited for both small and large-scale implementations.
256
257	The hash driver (inspired by Bill Yerazunis' CRM Sparse Spectra algorithm)
258	is the fastest solution by far and requires no dependencies. It supports
259	an auto-extend feature to grow the file size as needed and is very
260	fast and compact. It does however lack some features (such as merged
261	groups support) and uses a lot of memory to mmap() users.
262
263	Also note that a database created with the hash driver is currently not safe
264	to move between 32/64 bit systems or big/little endian systems.
265
266	Documentation for any additional setup of your selected storage driver can
267	be found in the doc/ directory. You'll need to follow any steps outlined in
268	the storage driver documentation before continuing.
269
270	You can download MySQL from http://www.mysql.com.
271	You can download PostgreSQL from http://www.postgresql.com.
272	You can download SQLite from http://www.sqlite.org.
273
274	1. CONFIGURATION
275
276	DSPAM uses autoconf, so configuration is fairly standardised with other
277	UNIX-based software:
278
279	./configure [options]
280
281	DSPAM supports the configuration options below. Generally, the default
282	configuration is more than acceptable, so it's a good idea not to tweak too
283	many settings unless you know what you are doing.
284
285	PATH SWITCHES
286
287	--prefix=DIR
288	Specify an alternative root prefix for installation. The default is
289	/usr/local. This does not affect the location of dspam.conf (which
290	defaults to /etc). Use --sysconfdir= for this.
291
292	--sysconfdir=DIR
293	Specify an alternative home for the dspam.conf file. The default is /etc.
294
295	--with-dspam-home=DIR
296	Specify an alternative DSPAM home for installation. This can alternatively
297	be changed in dspam.conf, but is convenient to do on the configure line.
298	The default is $prefix/var/dspam, or /usr/local/var/dspam.
299
300	--with-logdir=DIR
301	Specify an alternative log directory. The default is $dspam_home/log. Do
302	not set this to /var/log unless DSPAM will have permissions to write to
303	the directory.
304
305	FILESYSTEM SCALE
306
307	The default filesystem scale is "small-scale", and writes each user to
308	its own directory in the top-level DSPAM home data directory.
309	The following two switches allow the scale to be changed to be more
310	suitable for larger installations.
311
312	--enable-large-scale
313	Switch for large-scale implementation. User data will be stored as
314	$HOME/data/u/s/user instead of $HOME/data/user
315
316	--enable-domain-scale
317	Switch for domain-scale implementation. When used, DSPAM expects
318	username@domain to be passed in as the user id and user data will be
319	stored as $HOME/data/example.org/user and $HOME/opt-in/example.org/user.dspam
320	instead of $HOME/data/user
321
322	INTEGRATION SWITCHES
323
324	--with-storage-driver=DRIVER[,DRIVER2[...,DRIVERN]]
325	Specify your storage driver selection(s). A storage driver is a driver
326	written specifically for DSPAM to store tokens, signature data, and
327	perform other proprietary operations. The default driver is hash_drv.
328	The following drivers have been provided:
329
330	mysql_drv: MySQL Drivers
331	pgsql_drv: PostgreSQL Drivers
332	sqlite_drv: SQLite v2.x Drivers (scheduled for removal)
333	sqlite3_drv: SQLite v3.x Drivers
334	hash_drv: Self-Contained Hash Database
335
336	If you are a packager, or wish to have multiple drivers built for any
337	reason you may specify multiple drivers by separating them with commas.
338	This will cause the storage driver specified in dspam.conf to be
339	dynamically loaded at runtime rather than statically linked. If you wish
340	to build only one driver, but dynamically, then specify it twice as in
341	--with-storage-driver=mysql_drv,mysql_drv.
342
343	If you will be compiling DSPAM to operate as a server daemon or to deliver
344	via SMTP/LMTP, you will need to use a thread-safe driver (outlined in the
345	chart earlier in this document).
346
347	You may also need to use some of the driver-specific configure flags
348	(discussed in the DRIVER SPECIFIC CONFIGURATION OPTIONS section below).
349
350	--disable-trusted-user-security
351	Administrators who wish to disable trusted user security may do so by
352	using this configure flag. This will cause DSPAM to treat each user as
353	if they were "trusted" which could allow them to potentially execute
354	arbitrary commands on the server via DSPAM. Because of this, administrators
355	should only use this option on either a closed server, or configure their
356	DSPAM binary to be executable only by users who can be trusted. This
357	option SHOULD NOT be used as a solution to your MTA dropping privileges
358	prior to calling DSPAM. Instead, see the TRUSTED SECURITY section of this
359	document.
360
361	--enable-homedir
362	When enabled, instead of checking for $HOME/$USER/opt-in/
363	$USER[.dspam\|.nodspam], DSPAM will check for a .dspam\|.nodspam file in the
364	user's home directory. DSPAM will also store each user's data in ~/.dspam
365	when this option is enabled. Because of this, DSPAM will automatically
366	install and run setuid root so that it can read each user's home directory.
367
368	Note:
369
370	This function is incompatible with most implementations of the Web UI,
371	since it requires access to read each user's home directory. Therefore,
372	only use this option if you will not be using the Web UI or plan on
373	doing something asinine like running it as root.
374
375	--enable-daemon
376	Builds DSPAM with support for daemon mode, and builds associated dspamc
377	thin client. Pthreads is required to build for daemon mode and the
378	storage driver used must be thread-safe.
379
380	DRIVER SPECIFIC CONFIGURE SWITCHES
381
382	Some storage drivers have their own custom configuration switches:
383
384	mysql_drv:
385	--with-mysql-includes=DIR
386	Specify a path to the MySQL includes
387
388	--with-mysql-libraries=DIR
389	Specify a path to the MySQL libraries
390	(Currently links to -lmysqlclient, also -lcrypto on some systems)
391
392	--enable-virtual-users
393	Tells DSPAM to create virtual user ids. Use this if your users don't
394	actually exist on the system (e.g. in /etc/passwd if using a password
395	file)
396
397	--enable-preferences-extension
398	MySQL supports the preferences extension, which stores user preferences
399	in mysql instead of flat files (the built-in method)
400
401	--disable-mysql4-initialization
402	If you are compiling libdspam for use with a third party application,
403	and the third party application makes its own calls to libmysqlclient,
404	you should use this option to disable libdspam's initialization and
405	cleanup of libmysqlclient, and allow the application to manage this.
406	This option suppresses libdspam's calls to mysql_server_init and
407	mysql_server_end.
408
409	Note:
410
411	Please see the file doc/mysql_drv.txt for more information
412	about configuring the mysql_drv storage driver.
413
414	pgsql_drv:
415	--with-pgsql-includes=DIR
416	Specify a path to the PgSQL includes
417
418	--with-pgsql-libraries=DIR
419	Specify a path to the PgSQL libraries
420	(Currently links to -lpq, and netlibs on some systems)
421
422	--enable-virtual-users
423	Tells DSPAM to create virtual user ids. Use this if your users don't
424	actually exist on the system (e.g. in /etc/passwd if using a password
425	file)
426
427	--enable-preferences-extension
428	Postgres supports the preferences extension, which stores user
429	preferences in pgsql instead of flat files (the built-in method)
430
431	Note:
432
433	Please see the file doc/pgsql_drv.txt for more information about
434	configuring the pgsql_drv storage driver.
435
436	sqlite_drv:
437	sqlite3_drv:
438	--with-sqlite-includes=DIR
439	Specify a path to the SQLite includes
440
441	--with-sqlite-libraries=DIR
442	Specify a path to the SQLite libraries
443
444	DEBUGGING SWITCHES
445
446	--enable-debug
447	Turns on support for debugging output. This option allows you to turn on
448	debugging messages for all or some users by editing dspam.conf or setting
449	--debug on the commandline. Enabling debug in configure only adds support
450	for debug to be compiled in, it must still be activated using one of the
451	options prescribed above. Debugging support itself doesn't use up very
452	many additional resources, so it should be safe to leave enabled on
453	non-enterprise class systems.
454
455	--enable-verbose-debug
456	Turns on extremely verbose debugging output. --enable-debug is implied.
457	Never use this on production builds!
458
459	Note:
460
461	When verbose debug is compiled in, DSPAM performs many additional
462	mathematical calculations regardless of whether or not it's been
463	activated. You shouldn't use --enable-verbose-debug for production
464	builds unless you have serious issues you can't resolve.
465
466	FEATURE ACTIVATION
467
468	--enable-clamav
469	Enables support for Clam Antivirus. DSPAM can interface directly with
470	clamd to perform virus scanning and can be configured to react in
471	different ways to viruses. See dspam.conf for more information.
472
473	ADDITIONAL CONFIGURATION OPTIONS
474
475	The remainder of configuration options are located in dspam.conf, which
476	is installed in sysconfdir (default: /usr/local/etc) upon a make install.
477	It is generally a good idea to review dspam.conf and make any changes
478	necessary prior to using DSPAM.
479
480	2. BUILDING AND INSTALLING
481
482	After you have run configure with the correct options, build and install
483	DSPAM by performing:
484
485	make && make install
486
487	Note:
488
489	If you are a developer wanting to link to the core engine of dspam,
490	libdspam will be built during this process. Please see the
491	example.c file for examples of how to link to and use libdspam. Static
492	and dynamic libraries are built in the .libs directory. Needed headers
493	will be installed in $prefix$/include/dspam.
494
495	3. PERMISSIONS
496
497	In the typical UNIX environment, you'll need to worry about the following
498	permissions:
499
500	The CGI User: This is the user your web server (most likely Apache) is
501	running as. This is commonly 'nobody' or 'web'. You can find this in
502	Apache's httpd.conf by searching for 'User'. The CGI user will need
503	the ability to access the following components of DSPAM:
504	- Ability to execute the dspam binary
505	- Ability to read and write to dspam_home/data/
506	- Trusted user permissions in dspam.conf ("Trust [username]")
507	- The execution 'Group' used must match the group dspam is running as
508	(this is typically 'mail', 'dspam', or similar)
509
510	The MTA User: This is the user your mail server software is running as when
511	it executes DSPAM. This is usually daemon, mail, exim, etc. This is
512	typically different from the user the MTA runs and polices itself as, to
513	avoid security problems. Consult your MTA's documentation for more info.
514	The MTA user will require:
515	- The ability to execute the dspam binary
516	- Trusted user permissions in dspam.conf ("Trust [username]")
517
518	Systems Administrators: In order to perform administrative functions,
519	systems administratiors will require:
520	- The ability to execute dspam-related binaries
521	- Trusted user permissions in dspam.conf ("Trust [username]")
522
523	Note:
524
525	If the MTA is communicating with DSPAM via LMTP (explained later), then
526	execution permissions are not necessary
527
528	Note about FreeBSD:
529
530	FreeBSD's default MTA user is 'mailnull'
531	FreeBSD's default delivery agent also changes its uid, and so in order
532	to call it, dspam must be installed as setuid root to work on the
533	commandline properly. This is done automatically on install.
534
535
536	Understanding Trusted User Security
537
538	DSPAM has tighter security for untrusted users on the system to prevent
539	them from touching other user's data or passing arbitrary commands to the
540	delivery agent DSPAM calls. "Trusted User Security" is a simple system
541	whereby any unsafe functions are not available to a user calling dspam
542	unless they are within dspam.conf's trusted user list.
543
544	Local non-privileged users should be able to use DSPAM without any problems
545	while remaining untrusted, as long as they behave. For example, an untrusted
546	user cannot set their DSPAM username to any name other than their username.
547	Untrusted users are also limited to the delivery options set by the
548	system administrator, and cannot redirect how DSPAM delivers mail.
549
550	A list of trusted users is maintained in dspam.conf. This file should
551	include a list of trusted users who should be allowed to set the dspam user,
552	passthru parameters, and other information that would be potentially
553	dangerous for a malicious user to be able to set. You'll need to ensure
554	that your CGI user, MTA user, and system administrators are on the list.
555
556	4. MAIL SERVER INTEGRATION
557
558	As previously mentioned, there are three popular ways to implement DSPAM:
559
560	As a delivery proxy:
561	The default approach integrates DSPAM directly with the mail server and
562	filters spam as mail comes in. Please see the appropriate instructions
563	in doc/ pertaining to your MTA.
564
565	As a POP3 proxy:
566	This alternative approach implements a POP3 proxy where users
567	connect to the proxy to check their email, and email is filtered when
568	being downloaded. The POP3 proxy is a much easier approach, as it
569	requires much less integration work with the mail server (and is ideal
570	for implementing DSPAM on Exchange, etcetera). Please see the file
571	doc/pop3filter.txt.
572
573	As an SMTP Relay:
574	DSPAM can be configured as an SMTP relay, a.k.a appliance. You
575	can set it up to sit in front of your real mail server and then point
576	your MX records at it. DSPAM will then pass along the good mail to
577	your real SMTP server. See doc/relay.txt for more information. The
578	example provided uses Postfix and MySQL.
579
580	Trusted users and the MTA
581
582	If you are using an MTA that changes its userid to match the destination
583	user before calling DSPAM, you won't be able to provide pass-thru
584	arguments to DSPAM (these are the commandline arguments that DSPAM in turn
585	passed to the local delivery agent, in such a configuration).
586	You will need to pre-configure the "default" pass-thru arguments in DSPAM.
587	This can be done by declaring an untrusted delivery agent in dspam.conf.
588	When DSPAM is called by an untrusted user, it will automatically force their
589	DSPAM user id and passthru delivery agent arguments specified in dspam.conf.
590
591	This information will override any passthru commandline parameters
592	specified by the user. For example:
593
594	UntrustedDeliveryAgent "/bin/mail -d $u"
595
596	The variable $u informs DSPAM that you would like the destination username
597	to be used in the position $u is specified, so when DSPAM calls your LDA
598	for user 'bob', it will call it with:
599
600	/bin/mail -d bob
601
602	5. ALIASES
603
604	There are essentially two different ways a user might train DSPAM. The first
605	is by using the Web UI, which allows them to retrain via the "History"
606	tab. This works quite well, as users must visit the Web UI occasionally
607	to review their quarantine anyway (and reverse any false positives). We'll
608	discuss this shortly in section 1.1.8.
609
610	The more common approach to training, discussed here, is to allow users to
611	simply forward their spam to an email address where DSPAM can analyze and
612	learn it. DSPAM uses a signature-based system, where a serial number of
613	sorts is appended to each email processed by DSPAM. DSPAM reads this serial
614	number when the user forwards (or bounced) a message to what is called their
615	"spam email address". The serial number points to temporary information
616	stored on the server (for 14 days by default) containing all of the
617	information necessary for DSPAM to relearn the message. This is necessary
618	in order to relearn the exact message DSPAM originally processed.
619
620	Note:
621
622	If you are using an IMAP based system, Web-based email, or other form of
623	email management where the original messages are stored on the server in
624	pristine format, you can turn this signature feature off by setting
625	"TrainPristine on" in dspam.conf. DSPAM will then use the message itself
626	that you provide it to train, which MUST be identical to the original
627	message in order to retrain properly.
628
629	Because DSPAM learns each user's specific email behavior, it's necessary
630	to identify the user in order to program their specific filtering database.
631	This can be done in one of three ways:
632
633	The Simple Way:
634
635	If you are using the MySQL or PgSQL storage drivers, the original
636	numeric user id can be embedded in the signature, requiring only one
637	central spam alias to be necessary for the entire system. To configure
638	this, uncomment the appropriate UIDInSignature option in dspam.conf:
639
640	# MySQLUIDInSignature on
641	# PgSQLUIDInSignature on
642
643	Now all you'll need is a single system-wide alias, and DSPAM will train
644	the appropriate user when it sees the signature. An example of an alias
645	might look like:
646
647	spam:"\|/usr/local/bin/dspam --user root --class=spam --source=error"
648
649	Similarly, you may also wish to have a false-positive alias for users who
650	prefer to tag spam rather than quarantine it:
651
652	notspam:"\|/usr/local/bin/dspam --user root --class=innocent --source=error"
653
654	Note:
655
656	The 'root' user represents any active dspam user. It is necessary to
657	supply a username on the commandline or DSPAM will bail on
658	an error, however the user will be changed internally once the signature
659	is read.
660
661	The Kind-of-Simple Way:
662
663	If you're not using one of the above storage drivers, the next easiest
664	way to configure aliases is to have DSPAM parse the 'To:' header of the
665	message and use a catch-all subdomain to direct all mail into DSPAM for
666	retraining. You can then instruct your users to email addresses like
667	'spam-bob@relearn.example.org'. The ParseToHeaders option (available
668	in dspam.conf) will parse the To: header of forwarded messages and
669	set the username to either 'bob' or 'bob@relearn.example.org', depending
670	on how it is configured. DSPAM can also set the training mode to either
671	"learn spam" or "learn notspam" depending on whether the user specified
672	a spam- or notspam- address in the To: header.
673
674	This is ideal if you don't want to set up a separate alias for each user
675	on your system (The Hard Way). If you're fortunate enough to have a
676	mail server that can perform regular expression matching, you can set up
677	your system without a subdomain, and just use addresses like
678	spam-bob@example.org. For the rest of us, it will be necessary to set up
679	a subdomain catch-all directly into DSPAM. For example:
680
681	@relearn.example.org "\|/usr/local/bin/dspam"
682
683	Don't forget to set the appropriate ParseToHeaders and related options in
684	dspam.conf as well. More specific instructions can be found in dspam.conf
685	itself. In most cases, the following will suffice:
686
687	ParseToHeaders on
688	ChangeUserOnParse user
689	ChangeModeOnParse on
690
691	The Old Way (A.K.A. The Hard Way)
692
693	If neither of the easy ways are possible, you're stuck with doing it
694	the hard way. This means you'll need a separate spam alias (and notspam
695	alias, if users are tagging mail) for each user. To do this, you will
696	need to create an email address for each user, so that DSPAM can
697	analyze and learn for that specific user. For example:
698
699	spam-bob: "\|/usr/local/bin/dspam --user bob --class=spam --source=error"
700
701	You will end up having one alias per mail user on the system, two if you
702	do not use DSPAM's CGI quarantine (an additional one using notspam-). Be
703	sure the aliases are unique and each username matches the name after the
704	--user flag. A tool has been provided called dspam_genaliases. This tool
705	will read the /etc/passwd file and write out a dspam aliases file that can
706	be included in your master aliases table.
707
708	To report spam, the user should be instructed to forward each spam to
709	spam-user@yourhost
710
711	It doesn't really matter what you name these aliases, so long as the flags
712	being passed to dspam are correct for each user. It might be a good idea
713	to create an alias custom to your network, so that spammers don't forward
714	spam into it. For example, notspam-yourcompany-bob or something.
715
716	Note About Security:
717
718	You might be wondering if a user can forward a spam to another user's
719	address, or whether a spammer can forward a spam to another user's
720	notspam address. The answer is "no". The key to all mail-based retraining
721	is the signature embedded in each email. The signature is stored with
722	each user's own user id, and so not only does the incoming message have
723	to bear a valid signature, but it also has to be stored on the system with
724	the correct user id. This prevents any kind of alias abuse.
725
726	6. NIGHTLY MAINTENANCE AND HOUSEKEEPING CRONS
727
728	Non-SQL Based Nightly Purge
729
730	If you are NOT running a SQL-based solution, then you should configure
731	dspam_clean to run under cron nightly. This clean tool will read all
732	signature databases and purge signatures that are older than 14 days
733	(configurable), purge abandoned tokens, and remove unimportant tokens.
734	Without this tool, old signatures will continue to pile up.
735	Be sure the user running cleanup has full read/write permissions on the
736	DSPAM data files.
737
738	0 0 * * * /usr/local/bin/dspam_clean [options]
739
740	See the dspam_clean description for more information
741
742	SQL-Based Nightly Purge
743
744	SQL-Based solutions include a nightly SQL script to perform the same basic
745	tasks as dspam_clean, and it does it much faster and with more finesse.
746	You can find instructions about each driver's purge functions in
747	the driver's README (doc/[driver].txt) for performing nightly
748	maintenance. Most SQL drivers will include a purge script in the
749	src/tools.[driver] directory. For example:
750
751	0 0 * * * mysql --user=[user] --pass=[pass] [db] < /path/to/purge-4.1.sql
752
753	Log Rotation
754
755	The system log and user logs can fill up fairly quickly, when all that's
756	really needed to generate graphs are the last two to three weeks of data.
757	You can configure a nightly log cleanup using dspam_logrotate:
758
759	0 0 * * * dspam_logrotate -a 30 -d /usr/local/var/dspam/data
760
761	7. NOTIFICATIONS
762
763	DSPAM is capable of sending three different notifications to users:
764
765	- A "First Run" message sent to each user when they receive their first
766	message through DSPAM.
767
768	- A "First Spam" message sent to each user when they receive their first
769	spam
770
771	- A "Quarantine Full" message sent to each user when their quarantine box
772	is > 2MB in size (note: the 2MB limit is hardcoded in DSPAM).
773
774	These notifications can be activated by copying the txt/ directory from the
775	distribution into DSPAM's home (by default /usr/local/var/dspam). You can
776	alter the location of this directory by setting "TxtDirectory" in dspam.conf.
777
778	Example:
779	/usr/local/var/dspam/txt/firstrun.txt
780	/usr/local/var/dspam/txt/firstspam.txt
781	/usr/local/var/dspam/txt/quarantinefull.txt
782
783	You will want to modify these templates prior to installing them to reflect the
784	correct email addresses and URLs (look for 'example.org').
785
786	NOTE: The quarantine warning is reset when the user clicks 'Delete All', but
787	is not reset if they use "Delete Selected". If the user doesn't wish to
788	receive reminders, they should use the "Delete Selected" function instead
789	of "Delete All".
790
791	You'll need to also set "Notifications" to "on" in dspam.conf.
792
793	8. THE WEB UI
794
795	The Web UI (CGI client) can be run from any executable location on
796	a web server, and detects its user's identity from the REMOTE_USER
797	environment variable. This means you'll need to use HTTP password
798	authentication to access the CGI (Any type of authentication will work,
799	so long as Apache supports the module). This is also convenient in that you
800	can set up authentication using almost any existing system you have.
801	The only catch is that you'll need the usernames to match the actual
802	DSPAM usernames used the system. A copy of the shadow password file
803	will suffice for most common installs.
804
805	The accompanying files in the webui/ folder should be copied into your
806	document root and cgi-bin, as specified.
807
808	Note:
809
810	Some authentication mechanisms are case insensitive and will
811	authenticate the user regardless of the case they type it in. DSPAM,
812	on the other hand, is case sensitive and the case of the username used
813	will need to match the case on the system. If you suffer from this
814	authentication problem, and are certain all of your users' usernames are
815	in lowercase, you can add the following line of code to the CGI right
816	after the call to &ReadParse...
817
818	$ENV{'REMOTE_USER'} = lc($ENV{'REMOTE_USER'});
819
820	The CGI will need to function in the same group as the dspam agent in order
821	to work with the files in dspam_home. The best way to do this is to create
822	a separate virtualhost specifically for the CGI and assign it to run in the
823	MTA group using Apache's suexec. If you are using procmail, additional
824	configuration may also be necessary (see below).
825
826	Note:
827
828	Apache users do NOT take on the identity of the groups specified in
829	/etc/group so you will need to specifically assign the group in
830	httpd.conf.
831
832	Note about Procmail:
833
834	Because the DSPAM Web UI is a CGI script, DSPAM will not retain its
835	setuid privileges when called. If you are running procmail, this will
836	become a problem as procmail requires root privileges to deliver. The
837	easiest hack around this is to create a procmail.dspam binary and make it
838	setuid root, then make it executable only by the mail group (or
839	whatever group DSPAM and the CGI run in).
840
841	The DSPAM Web UI has a minimal configuration inside the configure.pl script.
842	You'll want to check and make sure all of the settings are correct. In
843	most cases, the only that will be necessary to change are the large-scale
844	or domain-scale flags.
845
846	BEFORE PROCEEDING:
847	Check and make sure (Again) that the CGI user from Apache's httpd.conf is
848	added as a trusted user in dspam.conf.
849
850	Default Preferences
851
852	Now would be a good time to set the system's default preferences. This can
853	be done using the dspam_admin tool. For example:
854
855	dspam_admin ch pref default trainingMode TEFT
856	dspam_admin ch pref default spamAction quarantine
857	dspam_admin ch pref default spamSubject "[SPAM]"
858	dspam_admin ch pref default enableWhitelist on
859	dspam_admin ch pref showFactors off
860
861	The default preferences are used for any users who have not yet set their
862	own preferences. You can also control which preferences the user may
863	override by changing the "AllowOverride" settings in dspam.conf.
864
865	By default, the parameters specified on the commandline will be used (if
866	any). If, however, a preference is found for the particular user those
867	preferences will override the commandline.
868
869	GD Graphing Library
870
871	If you plan on leaving DSPAM's logging function enabled, and would like to
872	produce pretty graphs for your users, the graph.cgi script requires the
873	following be installed on your machine:
874
875	- GD Graphics Library (http://www.boutell.com/gd/)
876	Compile with png support
877
878	- The following PERL modules:
879	(http://www.perl.com/CPAN/modules/by-module/GD/)
880
881	. GD
882	. GD-Graph3d
883	. GDGraph
884	. GDTextUtil
885	. CGI
886
887	Typically this can be accomplished on the commandline:
888
889	perl -MCPAN -e 'install GD::Graph3d'
890
891	Configuring Administrators
892
893	Once you've configured the Web UI, you'll want to edit the 'admins' file to
894	contain a list of users who are permitted to use the administration suite.
895
896	Configuring Sub-Administrators / Domain Level Administrators
897
898	It is possible to delegate the management of users to a list of sub-admins/
899	domain level admins. To accomplish that you should edit the 'subadmins'
900	file to contain a list of sub-admins/domain level admins which are permitted
901	to switch their username while using the DSPAM control center.
902
903	Opt-In/Out
904
905	If you would like your users to be able to opt in/out of DSPAM filtering,
906	add the correct option to the nav_preferences.html template, depending on
907	your configuration (for example, if you have an opt-in system, you'll want to
908	add the opt-in option). Note: This currently only works with the preferences
909	extension, and not drop files.
910
911	<INPUT TYPE=CHECKBOX NAME=optIn $C_OPTIN$>
912	Opt into DSPAM filtering
913
914	<INPUT TYPE=CHECKBOX NAME=optOut $C_OPTOUT$>
915	Opt out of DSPAM filtering
916
917	1.2 TESTING
918
919	If you've installed from an RPM, there's a good chance that the packager
920	went to the trouble of testing already. If you're building from sources,
921	however, you'll need to find a way to ensure your configuration isn't broken.
922
923	Most software packages are supplied with a test suite to determine if the
924	software is functioning properly. Since DSPAM's correct function relies
925	primarily on having the correct permissions and mail server configuration,
926	a test script fails to provide the level of testing required for such a
927	package. The following exercise has been provided to test dspam's correct
928	functioning on your system. This exercise does not test the Web UI, but only
929	the core dspam agent.
930
931	Before running the test, you should have completed section 1.1's instructions
932	for compiling and installing dspam as well as configured your mail server
933	to support dspam.
934
935	1. Create a new user account on your system. It is important that this be a
936	new account to prevent any unrelated email from being delivered during
937	testing. Be sure to configure a spam alias for the test account.
938
939	2. Send a short (10 words or less) email to the account, and pick it up
940	using your favorite mail client.
941
942	3. Run dspam_stats [username] on the server. You should see a value of 1
943	for "TI" or "Total Innocent" as shown below:
944
945	dspam-test 0 TP 1 TN 0 FN 0 FP
946
947	If you receive an error such as "unable to open /usr/local/var/dspam... for
948	reading", then the dspam agent is not configured correctly. The problem
949	could exist in either your mail server configuration or one or more of the
950	permissions on the directory or agent. Check your configuration and
951	permissions, and repeat this step until the correct results are experienced.
952
953	4. Run dspam_dump [username] to get a complete list of tokens and their
954	statistics. Each token should have an I: (innocent) hit count of 1. The
955	tokens will be represented as 64-bit values, for example:
956
957	3126549390380922317 S: 0 I: 1 LH: Mon Aug 4 11:40:12 2003
958	13884833415944681423 S: 0 I: 1 LH: Mon Aug 4 11:40:12 2003
959	14519792632472852948 S: 0 I: 1 LH: Mon Aug 4 11:40:12 2003
960	8851970219880318167 S: 0 I: 1 LH: Mon Aug 4 11:40:12 2003
961
962	To view statistics for a particular token, run dspam_dump [username] [token]
963	where token is the plain-text token value. For example:
964
965	% dspam_dump bill FREE
966	7717766825815048192 S: 00265 I: 00068 P: 0.7358
967
968	5. Forward the test message to the spam alias you've created for the test
969	account. Provide enough time for the message to have processed.
970
971	6. Run dspam_stats [username] on the server again. Now, the value for TN
972	should be zero and the value for FN (false negatives) should be 1 as shown
973	below:
974
975	dspam-test 0 TP 0 TN 1 FN 0 FP
976
977	If this is not the case, check the group permissions of the dspam agent as
978	well as the permissions your MTA uses when piping to aliases.
979
980	7. Run dspam_dump [username] again. make sure that _EVERY_ token now has an
981	I: of zero and a S: of 1:
982
983	3126549390380922317 S: 1 I: 0 LH: Mon Aug 4 11:44:29 2003
984	13884833415944681423 S: 1 I: 0 LH: Mon Aug 4 11:44:29 2003
985	14519792632472852948 S: 1 I: 0 LH: Mon Aug 4 11:44:29 2003
986	8851970219880318167 S: 1 I: 0 LH: Mon Aug 4 11:44:29 2003
987
988	If you have some tokens that do not have an S: of 1 or an I: of 0, the dspam
989	signature was not found on the email, and this could be due to a lot of
990	things.
991
992	1.3 TROUBLESHOOTING
993
994	Problem: No files are being created in the user directory
995	Solution: Check the directory permissions of the directory. The user
996	directory must be writable by the user the dspam agent is running
997	as as well as the CGI user.
998
999	Problem: False positives are never being delivered
1000	Solution: Your CGI most likely doesn't have the privileges required by
1001	the LDA to deliver the messages. Make sure the CGI user is in
1002	the correct group. Also consider setting the dspam agent to
1003	setuid or setgid with the correct permissions.
1004
1005	Problem: My database is getting huge!
1006	Solution: DSPAM's default training mode is TEFT. On top of this, the
1007	purging defaults are very lax. You might consider switching to
1008	TOE (Train-on-Error) mode training if you require a minimal
1009	database. If you are willing to sacrifice accuracy for disk space,
1010	disabling the 'chain' tokenizer from dspam.conf will prevent
1011	the use of multi-word (chained) tokens, which will also cut your
1012	database size considerably. You may also consider more frequent
1013	calls to dspam_clean -p to purge neutral data, which comprises a
1014	majority of most databases.
1015
1016	For more help, please see the DSPAM FAQ at http://dspam.sourceforge.net.
1017
1018	1.4 DSPAM TOOLS
1019
1020	A few useful tools have been provided to make DSPAM management a bit easier.
1021	These tools include:
1022
1023	dspam_admin - A tool used to perform specific administrative functions. These
1024	functions are usually included as part of an extensions package (such as
1025	the preferences extension). Available functions are listed in the tool's
1026	usage output.
1027
1028	dspam_train - Used to train and test a corpus of ham and spam (in maildir
1029	format).
1030	Syntax: dspam_train [username] [spam_dir] [nonspam_dir]
1031	where username is the username of the user to apply the training to, and
1032	the two dirs represent directories containing messages in individual
1033	files (e.g. maildir/corpus format). dspam_train can be used on an existing
1034	user's database, to further improve accuracy, or to train from scratch.
1035	it also provides a solid test jig for testing the efficiency and accuracy
1036	of a test corpus against the filter.
1037	NOTE: dspam_train will automatically balance training of the corpus to
1038	ensure both spam and nonspam are trained based on the ratio of
1039	spam/nonspam. this means if you have twice as much spam as nonspam,
1040	two spam will be trained for every nonspam.
1041
1042	dspam_dump - Dumps a DSPAM dictionary. This can be used to view the
1043	entire contents of a user's dictionary, or used in combination
1044	with grep to view a subset of data. Syntax: dspam_dump [username] [token]
1045	where username is the DSPAM user's username. If a token is specified,
1046	statistics only for that token will be printed.
1047
1048	dspam_clean - Performs nightly housecleaning by deleting old or useless
1049	data from user data. If using the hash driver (hash_drv) please use
1050	cssclean instead (see doc/README.cssclean)
1051
1052	dspam_clean performs the following operations:
1053
1054	1. Using the -s flag, dspam_clean will continue to perform stale signature
1055	purging. If an age is specified, for example -s14, the age defined as the
1056	default will be overridden. Specifying an age of 0 will delete all
1057	signatures for the users processed.
1058
1059	2. Using the -p flag, dspam_clean will delete all tokens from a user's
1060	database whose probability is between 0.35 and 0.65 (fairly neutral,
1061	useless tokens) that fall beyond the default age. If an age is specified,
1062	for example -p30, the age defined as the default will be overridden. It
1063	is a good idea to use this type of clean with an age of 0 on users after
1064	a lot of corpus training.
1065
1066	3. Using the -u flag, dspam_clean will delete all unused tokens from a
1067	user's database. There are four different types of unused tokens:
1068
1069	- Tokens which have not been used for a long time
1070	- Tokens which have a total hit count below 5
1071	- Tokens which have only one spam hit
1072	- Tokens which have only one innocent hit
1073
1074	Ages may be overridden by specifying a format such as -u30,15,10,10
1075	where each number represents the respective age. Specifying an age of
1076	zero will delete all unused tokens in the category. Defaults are set in
1077	dspam.conf.
1078
1079	Optionally, usernames may be specified to override the default behavior of
1080	processing all users.
1081
1082	Examples:
1083
1084	Process all users on the system using all clean operations:
1085	dspam_clean -s -p15 -u90,30,15,15
1086
1087	Delete all of user 'dick' and 'jane's signatures:
1088	dspam_clean -s0 dick jane
1089
1090	Perform a post-corpus training clean on user 'spot':
1091	dspam_clean -p0 -u0,0,0,0 spot
1092
1093	Run dspam_clean with all default options, all clean modes enabled, on all
1094	users on the system:
1095	dspam_clean -s -p -u
1096
1097	NOTE: You may wish to only run certain cleaning modes depending on the type
1098	of storage driver you are using. For example, the MySQL storage driver
1099	includes a script which performs signature and unused token operations,
1100	leaving only probability operations as useful. If you are using a SQL-based
1101	storage driver, it is strongly recommended that you use the maintenance
1102	scripts wherever possible for optimum efficiency.
1103
1104	dspam_stats - Displays the spam statistics for one or all users on the system.
1105	Syntax: dspam_stats [username]. If no username is provided, all users
1106	will be displayed. Displays TP (true positives), TN (true negatives),
1107	FN (false negatives), and FP (false positives).
1108
1109	dspam_genaliases - Reads the /etc/passwd file and outputs a dspam aliases
1110	table which can be included in the master aliases table. You may try
1111	Art Sackett's generate_dspam_aliases tool at
1112	http://www.artsackett.com/freebies/generate_dspam_aliases/ if you need
1113	some better functionality. This will eventually be merged in as a
1114	replacement for the existing tool.
1115
1116	dspam_merge - Merges multiple users' dictionaries together into one user's
1117	dictionary (does not affect the merge users). This can be used to create
1118	a seeded dictionary for a new user, or to copy a single user's dictionary
1119	to a new file. This is great for building global dictionaries, but
1120	crunches a lot of time and disk.
1121
1122	1.5 AGENT COMMANDLINE ARGUMENTS
1123
1124	The DSPAM agent (dspam) recognizes the following commandline arguments:
1125
1126	--user [user1 user2 ... userN]
1127	Specifies the destination user(s) of the incoming message. DSPAM then
1128	processes the message once for each user individually. If the message is to
1129	be delivered, the $u (or %u) parameters of the arguments string will be
1130	interpolated for the current user being processed.
1131
1132	--class=[spam\|innocent]
1133	Tells DSPAM that the message being presented has already been classified by
1134	the user. This flag should be used when a misclassification has occurred,
1135	when the user is corpus-feeding a message, or an inoculation is being
1136	presented. This flag must be used in conjunction with the --source flag.
1137	Providing no classification invokes the SOP of DSPAM, which is to determine
1138	the message's nature on its own.
1139
1140	--source=[error\|corpus\|inoculation]
1141	Wherever --class is used, the source of the user-provided
1142	classification must also be provided. The source is very important and
1143	dramatically affects DSPAM's training behavior:
1144
1145	error: The message being presented was a message previously misclassified
1146	by DSPAM. When 'error' is provided as a source, DSPAM requires that
1147	the DSPAM signature be present in the message, and will use the
1148	signature to recall the original training metadata. If the signature
1149	is not present, the message will be rejected. In this source mode,
1150	DSPAM will also decrement each token's previous classification's
1151	count as well as the user totals.
1152
1153	You should use error only when DSPAM has made an error in
1154	classifying the message, and should present the modified version of
1155	the message with the DSPAM signature when doing so.
1156
1157	corpus: The message being presented is from a mail corpus, and should be
1158	trained as a new message, rather than re-trained based on a
1159	signature. The message's full headers and body will be analyzed and
1160	the correct classification will be incremented, without its
1161	opposite being decremented.
1162
1163	You should use corpus only when feeding messages in from corpus, not
1164	for correcting errors.
1165
1166	inoculation: The message being presented is in pristine form, and should
1167	be trained as an inoculation. Inoculations are a more
1168	intense mode of training designed to cause DSPAM to
1169	train the user's metadata repeatedly on previously unknown
1170	tokens, in an attepmt to vaccinate the user from future
1171	messages similar to the one being presented.
1172
1173	You should use inoculation only on honeypots and the like.
1174
1175	--deliver=[spam,[innocent\|nonspam],summary,stdout]
1176	Tells DSPAM to deliver the message if its result falls within the criteria
1177	specified. For example, --deliver=innocent or --deliver=nonspam will cause
1178	DSPAM to only deliver the message if its classification has been determined
1179	as innocent. Providing --deliver=innocent,spam or --deliver=nonspam,spam will
1180	cause DSPAM to deliver the message regardless of its classification. This flag
1181	provides a significant amount of flexibility for nonstandard implementations,
1182	where false positives may not be delivered but spam is, and etcetera.
1183
1184	summary : Deliver (to stdout) a summary indentical to the output of message
1185	classification:
1186	X-DSPAM-Result: User; result="Innocent"; class="Innocent";
1187	probability=0.0000; confidence=1.00;
1188	signature=4b11c532158749980119923
1189
1190	stdout : Is a shortcut for for --deliver=innocent,spam --stdout
1191
1192	--stdout
1193	If the message is indeed deemed "deliverable" by the --deliver flag, this
1194	flag will cause DSPAM to deliver the message to stdout, rather than
1195	the configured delivery agent.
1196
1197	--process
1198	Tells DSPAM to process the message. This is the default behavior, and the
1199	flag is implied unless --classify is used - but is a good idea to use to
1200	avoid ambiguity.
1201
1202	--classify
1203	Tells DSPAM only to classify the message, and not make any writes to the
1204	user's metadata or attempt to deliver/quarantine the message.
1205
1206	NOTE: The output of the classification is specific to the user, not including
1207	the output of any groups they might be affiliated with, so it is
1208	entirely possible that the message would be caught as spam by the group,
1209	even if it didn't appear in the classification. If you want to get
1210	the classification for the GROUP, use the group name as the user
1211	instead of an individual.
1212
1213	--signature=[signature]
1214	For some implementations, the admin may wish to pass the signature in
1215	via commandline instead of allowing DSPAM to find it on its own. This is
1216	especially useful when front-ending the agent with other tools. Using this
1217	option will set the active signature and will also forego reading of stdin.
1218
1219	--mode=[toe\|tum\|teft\|notrain\|unlearn]
1220	Configures the training mode to be used for this process:
1221
1222	teft: Train-Everything. Trains on all messages processed. This is
1223	a very thorough training approach and should be considered the
1224	standard training approach for most users. TEFT may, however,
1225	prove too volatile on installations with extremely high per-user
1226	traffic, or prove not very scalable on systems with extremely large
1227	user-bases. In the event that TEFT is proving ineffective, one of
1228	the other modes is recommended.
1229
1230	NOTE: Until a user reaches 100 innocent messages in their
1231	metadata, train-on-error will also be teft-based, even if
1232	otherwise specified on the commandline.
1233
1234	toe: Train-on-Error. Trains only on a classification error, once the
1235	user's metadata has matured to 2500 innocent messages. This
1236	training mode is much less resource intensive, as only occasional
1237	metadata writes are necessary. It is also far less volatile than
1238	the TEFT mode of training. One drawback, however, is that TOE only
1239	learns when DSPAM has made a mistake - which means the data is
1240	sometimes too static, and unable to "ease into" a different type of
1241	behavior.
1242
1243	tum: Train-until-Mature. This training mode is a hybrid between the other
1244	two training modes and provides a great balance between volatility
1245	and static metadata. TuM will train on a per-token basis only
1246	tokens which have had fewer than 50 "hits" on them, unless an error
1247	is being retrained in which case all tokens are trained. This
1248	training mode provides a solid core of stable tokens to keep
1249	accuracy consistent, but also allows for dynamic adaptation to any
1250	new types of email behavior a user might be experiencing. It is a
1251	balance of resources as well, as only less-than-mature tokens are
1252	written to the database. NOTE: You should corpus train before
1253	using tum.
1254
1255	notrain: No training. Do not train the user's data, and do not keep totals.
1256	This should only be used in cases where you want to process mail for
1257	a particular user (based on a group, for example), but don't want
1258	the user to accumulate any learning data.
1259
1260	unlearn: Unlearn original training. Use this if you wish to unlearn a
1261	previously learned message. Be sure to specify --source=error and
1262	--class to whatever the original classification the message was
1263	learned under. If not using TrainPristine, this will require the
1264	original signature from training.
1265
1266	RECOMMENDATIONS:
1267	In general, it is recommended that users begin with TEFT. If a user
1268	is experiencing between a 75-85% spam ratio, they may benefit from
1269	Train-on-Mature mode. If a user is experiencing over 90% spam, then
1270	Train-on-Error mode should make a noticeable improvement in accuracy.
1271	It eventually boils down to what works best for your users. There is
1272	no reason a system could not be configured (with a script) to
1273	analyze a user's *.stats file and determine the best training mode
1274	for that user.
1275
1276	--feature=[no,wh,tb=N]
1277	Specifies the features that should be activated for this filter instance.
1278	The following features may be used individually or combined using a comma
1279	as a delimiter:
1280
1281	no: Bayesian Noise Reduction (BNR). Bayesian Noise Reduction kicks in
1282	at 2500 innocent messages and provides an advanced progressive
1283	noise logic to reduce Bayesian Noise (wordlist attacks) in
1284	spams. BNR is not for everyone, and so users should try it out
1285	after they've trained to see if it helps improve accuracy.
1286
1287	tb=N: Sets the training loop buffering level.
1288	Training loop buffering is the amount of statistical sedation
1289	performed to water down statistics and avoid false positives
1290	during the user's training loop. The training buffer sets the
1291	buffer sensitivity, and should be a number between 0 (no buffering
1292	whatsoever) to 10 (heavy buffering). The default is 5, half of
1293	what previous versions of DSPAM used.
1294	To avoid dulling down statistics at all during the training loop,
1295	set this to 0. This feature should be disabled if you're not
1296	paranoid about false positives, as it does increase the number of
1297	spam misses significantly during training.
1298
1299	wh: Automatic whitelisting. DSPAM will keep track of the entire
1300	"From:" line for each message received per user, and automatically
1301	whitelist messages from senders with more than 10 innocent
1302	messages and zero spams. Once the user reports a spam from the
1303	sender, automatic whitelisting will automatically be deactivated
1304	for that sender. Since DSPAM uses the entire "From:" line, and
1305	not just the sender's email address, automatic whitelisting is
1306	a very safe approach to improving accuracy during initial training.
1307
1308	NOTE: None of the present features are necessary when the source is "error",
1309	because the original training data is used from the signature to
1310	retrain, instantiating whatever features (such as whitelisting) were
1311	active at the time of the initial classification. Since BNR is only
1312	necessary when a message is being classified, the
1313	--feature flag can be safely omitted from error source calls.
1314
1315	--daemon
1316	Puts DSPAM in daemon mode; e.g. DSPAM acts like a server when started with
1317	this parameter. See section 2.3 for more information about daemon mode.
1318
1319	2.0 LINKING WITH LIBDSPAM
1320
1321	Developers are able to link to the DSPAM core engine (libdspam) to provide
1322	"drop-in" spam-filtering for their applications. Examples of the libdspam
1323	API can be found in the example.c file included with this distribution.
1324
1325	<COMMERCIAL LICENSING>
1326
1327	IF YOUR PROJECT USES THE LIBDSPAM API, A GPL-COMPATIBLE OPEN SOURCE LICENSE
1328	IS REQUIRED IN ORDER TO REDISTRIBUTE. IF YOU ARE DEVELOPING A CLOSED-SOURCE
1329	APPLICATION OR APPLICATION THAT DOES NOT CONFORM TO GPL STANDARD, YOU MAY
1330	NOT REDISTRIBUTE ANY APPLICATIONS USING LIBDSPAM WITHOUT A COMMERCIAL
1331	LICENSE.
1332
1333	Please contact project administrators paulcockings@users.sourceforge.net
1334	or sbajic@users.sourceforge.net for information about commercial licensing.
1335
1336	</COMMERCIAL LICENSING>
1337
1338	To link to libdspam, follow the instructions for compiling and installing
1339	DSPAM. When compiled, the libdspam static and shared libraries are also
1340	built. This library contains all the functions necessary to use dspam's
1341	filtering in your application.
1342
1343	Your application will also need to link to the correct storage driver
1344	libraries. If you are using libdspam in a multithreaded application, you
1345	will need to either use a thread-safe storage driver or control access to
1346	libdspam using a mutex lock.
1347
1348	If you are using libdspam in a multithreaded environment, each thread will
1349	require its own DSPAM context. Fortunately, you can attach the same
1350	database handle to each context using dspam_attach(). See the man page for
1351	more information.
1352
1353	To build with the dspam API, you will also need the header files from
1354	the distribution. You can copy these to /usr/include/dspam for ease of
1355	use, and then use -I/usr/include/dspam
1356
1357	Please see example.c for API examples.
1358
1359	If you are interested in linking libdspam with your project and have
1360	questions or concerns, please contact the dspam-devel@lists.sourceforge.net
1361	mailing list.
1362
1363	2.1 CONFIGURING GROUPS
1364
1365	Groups enable a group of users to share information.
1366
1367	To create groups, you'll want to create a group configuration file. The location
1368	of this file is defined as GroupConfig in dspam.conf, and defaults to
1369	/usr/local/var/dspam/group. The format of the file is:
1370
1371	group1:type:user1,user2,user3
1372	group2:type:*globaluser
1373
1374	DSPAM will read this file upon startup and determine if the user fits into
1375	any particular group.
1376
1377	DSPAM supports the following group types:
1378
1379	SHARED
1380	Enables users with similar email behavior to share the same dictionary
1381	while still maintaining a private quarantine box. The benefits of this
1382	type of group are faster learning, and sharing a single spam alias. Shared
1383	groups can have both positive and negative effects on accuracy. If a shared
1384	group consists of users with similar, predictable email behavior, the users
1385	in the group can benefit from a larger dictionary of spam and faster
1386	learning (especially for newcomers in the group). If a group consists of
1387	users with different email behavior, however, the users in the group will
1388	experience poor spam filtering and a higher number of false positives.
1389
1390	NOTE: The SQL-based storage drivers support shared groups, but has one caveat:
1391	If you are NOT enabling "virtual users" support, you will need to create
1392	an actual user on your system named after each group you create.
1393
1394	On top of shared group support, a shared group can also be made to be
1395	'managed'. Using the group type 'SHARED,MANAGED' will cause the group to
1396	share a single quarantine mailbox which could be managed by the group's
1397	administrator (aka: the group name). This would enable one individual to
1398	monitor quarantine for the entire group, however personal emails marked as
1399	false positives could potentially be viewed as well. For this reason,
1400	managed groups should only be used when this is not an issue.
1401
1402	NOTE: Use the dspam_stats tool to keep an eye on the effectiveness of
1403	shared groups. If a shared group experiences poor performance, find
1404	the users whose email behavior is inconsistent with that of the group
1405	and remove them from the group.
1406
1407	The format for a shared or shared,managed group is:
1408
1409	group1:shared:user1,user2,userN
1410	group2:shared,managed:user1,user2,userN
1411	group3:shared:*@example.org
1412	group4:shared:*
1413
1414	The group name (in the example above 'group1', 'group2', 'group3', 'group4')
1415	can be anything you like. If you set the shared group to be managed then the
1416	groupname (in the example above 'group2') will be used by DSPAM as the shared
1417	group administrator.
1418
1419	The user/member list for shared group allows the following syntax:
1420	user1 : Exact match of user with the name "user1"
1421	* : Match any user
1422	*@example.org : Match any user having '@example.org' at the end of ther
1423	username. The matching only works for the '@' character.
1424	You can not use something like '*user' to include user
1425	'infouser', 'testuser', 'dummyuser', etc.
1426
1427	INOCULATION
1428	An inoculation group allows users to maintain their own private dictionaries
1429	with their own spam alias, but all members of the group will inoculate other
1430	members with spams they manually forward into their alias. This allows users
1431	to report spams to one another and maintain their own private dictionary.
1432	Another advantage to this is that users do not necessarily have to share the
1433	same email behavior.
1434
1435	VERSATILE LANGUAGE INOCULATION MESSAGES
1436
1437	A new Internet-Draft has been released to the public:
1438
1439	http://tools.ietf.org/html/draft-spamfilt-inoculation-01
1440	http://tools.ietf.org/html/draft-yerazunis-spamfilt-inoculation-03
1441
1442	To create a message format standard for sending inoculation data via email.
1443	This will allow users on different servers, and even using different
1444	anti-spam tools to share inoculation information with one-another.
1445
1446	DSPAM presently implements support for this message standard with the
1447	following limitations:
1448
1449	- Only inbound inoculation messages are supported. DSPAM does not yet send
1450	out inoculations using this message format. This should not be confused
1451	with local inoculation, which is supported.
1452
1453	- The message/inoculation format is the only inoculation type presently
1454	supported. text/inoculation and multipart/inoculation coming soon.
1455
1456	- The only supported authentication mechanism is presently md5 verification
1457	codes/checksums.
1458
1459	Any unsupported inoculations will simply be dropped.
1460
1461	A list of identifies and authentication information can be set up in the file
1462	[username].inoc or in the user's home directory in a .inoc file if
1463	homedir-dotfiles is enabled. The format of this file is:
1464
1465	sender1:shared secret
1466	sender2:shared secret
1467
1468	Each sender should specify the correct sender id when sending an
1469	inoculation, and should generate their checksum based on the shared secret
1470	established between both parties.
1471
1472	NOTE: Users should only be added to an inoculation group after their initial
1473	learning period, to avoid potential false positives due to lack of data.
1474
1475	The format for a innoculation group is:
1476
1477	group1:inoculation:user1,user2,userN
1478	group2:inoculation:user3,user4,userN
1479
1480	The group name (in the example above 'group1', 'group2') can be anything you
1481	like. It is not used by DSPAM and does even not have to be unique.
1482
1483	The user/member list for inoculation group allows the following syntax:
1484	user1 : Exact match of user with the name "user1"
1485
1486	CLASSIFICATION
1487	Classification groups allow a group of users to network their results
1488	together. If DSPAM is uncertain of whether a message is spam or nonspam for
1489	a group member, all other members of the group are queried. If another member
1490	believes the message to be spam, it will be marked as spam. DSPAM is querying
1491	the members one by one and stopps as soon as a member reports believes that
1492	the message is spam.
1493
1494	The format for a classification group is:
1495
1496	group1:classification:user1,user2,userN
1497	group2:classification:user3,user4,userN
1498
1499	The group name (in the example above 'group1', 'group2') can be anything you
1500	like. It is not used by DSPAM and does even not have to be unique.
1501
1502	The user/member list for inoculation group allows the following syntax:
1503	user1 : Exact match of user with the name "user1"
1504
1505	GLOBAL
1506	Global groups allows DSPAM to provide a "SpamAssassin type out-of-the-box
1507	filtering" for all new users until they have built their own useful
1508	dictionaries. A global group can be created by adding a CLASSIFICATION
1509	group definition (see above) but prefix the group member/user with a '*'.
1510
1511	The format for a global classification group is:
1512
1513	groupname:classification:*globaluser
1514
1515	This will automatically add user globaluser as a classification peer to all
1516	users. Any user who has less than 1000 innocent messages or 250 spam messages
1517	in their corpus, or whose filter is uncertain (confidence less than 0.65)
1518	about a particular message will consult the globaluser dictionary for an
1519	answer.
1520
1521	The Global group user (in this case 'globaluser') will need to be trained
1522	using corpus, by using the dspam_merge tool, or other means. The Global
1523	group user (in this case 'globaluser') is treated just as any other user on
1524	the system.
1525
1526	The group name (in the example above 'groupname') can be anything you like. It
1527	is not used by DSPAM and does even not have to be unique.
1528
1529	NOTE: Be sure and set your global user's preferences so that trainingMode
1530	is set to TOE. This will prevent the purge tools you use from
1531	purging them empty in 90 days.
1532
1533	MERGED
1534	Merged groups are similar to global groups in that the entire system uses a
1535	single global user as a parent. What's different is that the merged group is
1536	merged with the individual user's training data at run-time, instead of
1537	switching between the two. This allows the merged group to be treated like a
1538	base dataset for all users, and provides for quicker learning and correction
1539	than the previous approach. It is recommended merged groups are only used with
1540	TOE-mode training so that only corrective data is stored, but systems with
1541	ample amounts of disk may wish to run in TUM mode to learn the user's behavior
1542	dynamically.
1543
1544	The group's data is merged with the user's data in real-time, so if you have:
1545
1546	Group : Viagra = 10 Spam Hits, 0 Innocent Hits
1547	User1 : Viagra = 5 Spam Hits, 15 Innocent Hits
1548	User2 : Viagra = 20 Spam Hits, 1 Innocent Hits
1549
1550	Then the token is loaded as:
1551	User1 : Viagra = 15 Spam Hits, 15 Innocent Hits = 0.50 (50%) = neutral
1552	User2 : Viagra = 30 Spam Hits, 1 Innocent Hits
1553
1554	No data is written to the group by DSPAM; only the user's data. This then
1555	offsets the group's data without affecting other users. Because of the way
1556	this data is merged, it's not recommended that you update the merged group
1557	with more than a handful of messages periodically, as it affects how all
1558	stats are defined for each user.
1559
1560	The format for a merged group is:
1561
1562	group1:merged:user1,user2,userN
1563	group2:merged:user3,user4,userN
1564
1565	The group name (in the example above 'group1', 'group2') can be anything you
1566	like and represents the name of the group user to merge with all members of
1567	the group. DSPAM will use that group name (in the example above 'group1',
1568	'group2') and merge at run-time the tokens from that group name with the tokens
1569	of the user (if the user is member of the merged group).
1570
1571	The user/member list for merged group allows the following syntax:
1572	user1 : exact match of user with the name "user1"
1573	-user1 : exclude user with the name "user1"
1574	* : match any user
1575	*@example.org : match users having "@example.org" at the end of ther
1576	username. The matching only works for the '@' character.
1577	You can not use something like '*user' to include user
1578	'infouser', 'testuser', 'dummyuser', etc.
1579	-*@example.org : exclude users having "@example.org" at the end of their
1580	username. The matching only works for the '@' character.
1581	You can not use something like '-*user' to exclude user
1582	'infouser', 'testuser', 'dummyuser', etc.
1583
1584	NOTE: Merged Groups are great for providing out-of-the-box adaptive filtering,
1585	but allowing users to build their own data from scratch will still
1586	result in the best possible accuracy in the longrun.
1587
1588	NOTE: Be sure and set your group user's preferences so that trainingMode is
1589	set to TOE. This will prevent the purge tools you use from purging them
1590	empty in 90 days.
1591
1592	RESTRICTIONS!
1593
1594	A user can simultaneously be a member of multiple classification / global
1595	group(s) and multiple inoculation group(s), but a user cannot be a member
1596	of both a classification / global group(s) or inoculation group(s) and a
1597	shared or shared,managed group.
1598
1599	A user can not be member of:
1600	* both a classification group and a global group
1601	* multiple merged groups
1602	* multiple shared or shared,managed groups
1603	* both a shared group or shared,managed group and a merged group
1604
1605	2.2 EXTERNAL INOCULATION THEORY
1606
1607	Bill Yerazunis recently expressed his theory of inoculation on an anti-spam
1608	development list, using the term "vaccination":
1609
1610	"Part of the problem is that spam isn't stationary, it evolves. That
1611	pesky .1% error rate is in some part due to the base mutation rate of spam
1612	itself. Maybe the answer is "vaccination". Vaccination is using _one_
1613	person's misery be used to generate some protective agent that protects the
1614	rest of the population; only the first person to get the spam actually has
1615	to read it.
1616
1617	My expectation is this: say you have ten friends, and you all agree to share
1618	your training errors. Each of you will (statistically) expect to be the
1619	first to see a new mutation of spam about 9% of the time; the other ten
1620	friends in this group will have their bayesian filter trained preemptively
1621	to prevent this. Net result: you get a tenfold decrease in error rate -
1622	down to 99.99% accuracy. With a hundred such (trusted) friends, you may be
1623	down to 99.999% accuracy."
1624
1625	DSPAM has taken this concept and rolled it into support for what we call
1626	"inoculation groups" providing the exact functionality Bill describes. This
1627	could be considered an "internal inoculation" practice.
1628
1629	On top of this, DSPAM has been designed to support external inoculation as
1630	a complement to internal inoculation. This is where instead of your internal
1631	circle of friends inoculate you, you rely on external elements - namely
1632	spammers themselves - to inoculate you.
1633
1634	The theory behind external inoculation is this: why put _anyone_ through
1635	the misery of being the first to receive a new spam when you can have
1636	the spammers themselves send it directly to you. On top of this,
1637	external inoculation can be combined with internal inoculation by taking
1638	the spam you received externally and inoculating your friends with it
1639	internally.
1640
1641	Inoculation is a little different from learning, as inoculation causes
1642	tokens to be given additional hit counts in an attempt to learn from a
1643	single email. As a result, any form of inoculation should _only_ be
1644	attempted after an initial learning phase (perhaps when your filtering
1645	accuracy exceeds 99.0%). DSPAM inoculates like this:
1646
1647	1. Every token that doesn't already exist in the database, or have fewer
1648	than two hits will be hit five times.
1649
1650	2. All other tokens are hit twice.
1651
1652	External inoculation is accomplished by creating a covert, external alias
1653	that is configured to automatically inoculate your dictionary from any
1654	messages it receives. The covert alias can then be published onto a series
1655	of public newsgroups and websites where it is sure to be harvested by
1656	a spammer's tools. One could even pro-actively subscribe one's self to
1657	several different opt-in spam lists, etcetera.
1658
1659	The first step is to configure an alias. To do this you would use something
1660	like:
1661
1662	bob_c: "\|/path/to/dspam --process --class=spam --source=inoculation --user bob"
1663
1664	The 'C' in bob is for 'Covert'. We must use a covert alias because if we
1665	use something obvious like 'bob-spam', harvester tools will automatically
1666	strip the -spam off and spam your real account.
1667
1668	Once the alias is set up, make sure this alias gets out only on lists where
1669	harvesters will grab it, and nobody will send legitimate email to it.
1670	It may even be a good idea to put it at the bottom of your tagline in all
1671	your publicly archived emails, something like...
1672
1673	Spammers, send me mail here: bob_c@example.org
1674
1675	Finally, you can multiply the effects of this by sharing an inoculation
1676	group with your friends. If all of your friends have a public covert
1677	alias, then you will all be able to inoculate eachother should one of you
1678	receive a spam to the account. What a great way to train your filter!
1679
1680	On top of this, should external inoculation become commonplace to the
1681	point where harvesters are picking up an equal amount of them as legitimate
1682	email addresses, spammers will start to realize that harvesters are just
1683	plain too dumb to tell the difference (the spammers themselves couldn't tell
1684	if mine was or not). This could, best case scenario, put an end to
1685	harvester bots, making them obsolete as counter-productive tools.
1686
1687	2.3 CLIENT/SERVER MODE
1688
1689	DSPAM supports two different modes of operation. In standard operating
1690	mode, the DSPAM agent is called by the MTA (or proxy) and each agent process
1691	performs independently, establishing its own connection to a database and
1692	performs delivery on its own. The second operating mode, client/server mode,
1693	allows the DSPAM agent to act more like a thin client, connecting to the
1694	DSPAM server process which then does all the work of analyzing and delivering
1695	or quarantining the message. The advantages to using DSPAM in client/server
1696	mode are:
1697
1698	- Maintaining a set of stateful database connections (within the server),
1699	which should enhance performance on some systems by eliminating the need
1700	to establish a new database connection for every message processed.
1701
1702	- Providing a central point of processing. Having one server perform all
1703	processing and delivery, while having multiple thin clients on your mail
1704	servers may be more desirable than having multiple agents performing
1705	processing and delivery on all your servers.
1706
1707	- The DSPAM server speaks LMTP, which some implementations may be able to
1708	take advantage of, eliminating the need for the DSPAM client all together.
1709
1710	- Having a single multithreaded daemon should use less memory and other
1711	resources than having independently operating clients.
1712
1713	If you've already got DSPAM set up, client/server mode won't require any
1714	changes to your mail server's configuration - it's completely transparent.
1715
1716	The DSPAM agent can be compiled with client/server support by configuring
1717	with --enable-daemon. You will need to use a multithread-safe storage driver
1718	(presently mysql_drv, pgsql_drv and hash_drv are supported). Once you have
1719	compiled with daemon support, you'll need to modify your dspam.conf to
1720	provide the settings necessary for client/server mode:
1721
1722	ServerHost 127.0.0.1
1723
1724	The host to listen on. The default is to comment this setting which will
1725	force DSPAM to listen on all available interfaces.
1726
1727	ServerPort 24
1728
1729	The port to listen on. The default is 24, the LMTP port.
1730
1731	ServerQueueSize 32
1732
1733	The maximum number of connections which may remain backlogged before they
1734	are accepted.
1735
1736	ServerPass.Relay1 "secret"
1737	ServerPass.Relay2 "password"
1738
1739	Each client server allowed to connect should have its own password. They
1740	can be defined here.
1741
1742	The DSPAM server can listen on either a network socket or a local unix
1743	domain socket. If you're running the client and server on the same machine,
1744	a domain socket should be used as it eliminates additional overhead. To use
1745	a domain socket, you'll also need to add the following option:
1746
1747	ServerDomainSocketPath "/tmp/dspam.sock"
1748
1749	Once you've configured the server config, you'll want to set the client
1750	configuration on all client machines. If you are using network sockets,
1751	set the following to appropriate values:
1752
1753	ClientHost 127.0.0.1
1754	ClientPort 24
1755
1756	Or if using a domain socket:
1757
1758	ClientHost /tmp/dspam.sock
1759
1760	In both cases, you'll need to set the client's authentication ident:
1761
1762	ClientIdent "secret@Relay1"
1763
1764	Now you're ready to go. To start the DSPAM server, run:
1765
1766	dspam --daemon &
1767
1768	Or alternatively, if you have debugging enabled:
1769
1770	dspam --debug --daemon &
1771
1772	The DSPAM agent can then be called the same as if you were running in
1773	standard (non-client/server) mode and adding --client to the set of
1774	parameters. Running dspam without --client specified will cause DSPAM to
1775	revert to its normal non-daemon behavior and establish database connections
1776	on its own. The client settings will be loaded from dspam.conf, and the
1777	agent will act as a thin client instead. For example:
1778
1779	dspam --client --user dick jane --deliver=innocent -d %u
1780
1781	Alternatively, if you'd like to use a thinner client, dspamc is identical
1782	to the dspam binary in behavior, but has been stripped down to only include
1783	the lightweight client.
1784
1785	dspamc --user dick jane --deliver=innocent -d %u
1786
1787	The conversation that takes place between the client/server is LMTP-based,
1788	and will look like this:
1789
1790	SERVER> 220 DSPAM DLMTP 3.10.0 Authentication Required
1791	CLIENT> LHLO Relay1
1792	SERVER> 250-PIPELINING
1793	SERVER> 250-ENHANCEDSTATUSCODES
1794	SERVER> 250-DSPAMPROCESSMODE
1795	SERVER> 250 SIZE
1796	CLIENT> MAIL FROM: <secret@Relay1> DSPAMPROCESSMODE="--deliver=innocent -d %u"
1797	SERVER> 250 2.1.0 OK
1798	CLIENT> RCPT TO: dick
1799	SERVER> 250 2.1.5 OK
1800	CLIENT> RCPT TO: jane
1801	SERVER> 250 2.1.5 OK
1802	CLIENT> DATA
1803	SERVER> 354 Enter mail, end with "." on a line by itself
1804	CLIENT> Subject: Cheap Viagra!
1805	CLIENT>
1806	CLIENT> Click Here: http://www.cheapviagra.example.org
1807	CLIENT> .
1808	SERVER> 250 2.0.0 <dick> Message accepted for delivery: INNOCENT
1809	SERVER> 250 2.0.0 <jane> Message accepted for delivery: SPAM
1810
1811	Optionally, if you'd like the clients to perform delivery, you can use
1812	DSPAM's --stdout or --classify functionality to obtain a dump of the message
1813	or results, respectively. From there, it's up to you and your MTA to
1814	deliver the message. The DSPAM client will output the results to stdout in
1815	this case, just as it would in standard operating mode.
1816
1817	Once the server is running, its configuration can be reloaded with a SIGHUP.
1818	When the daemon is reloaded, the following occurs:
1819
1820	- The daemon stops listening for new requests
1821	- All threads are allowed to finish processing and exit
1822	- All connections to the database are closed
1823	- The dspam.conf configuration is reloaded
1824	- All connections to the database are re-opened
1825	- The daemon starts listening for new requests
1826
1827	This allows database and listener configurations to also be reloaded from
1828	dspam.conf without the need to interrupt the process.
1829
1830	NOTE: During the period of time the daemon is reloading, client connections
1831	will fail. Depending on how the MTA reacts, this may cause messages to
1832	fall back to queue or to bounce.
1833
1834	2.4 LMTP
1835
1836	DSPAM supports LMTP both on the front-end and back-end (delivery). This
1837	section will briefly provide instructions for configuring either or both of
1838	these advanced options.
1839
1840	LMTP (AND SMTP) DELIVERY
1841
1842	DSPAM supports LMTP delivery for admins who would prefer to use this instead
1843	of local delivery. While LMTP delivery doesn't _require_ operating in
1844	daemon mode, it is necessary to compile DSPAM with --enable-daemon to take
1845	advantage of LMTP delivery. To configure LMTP delivery, perform the following
1846	steps:
1847
1848	1. Compile DSPAM with --enable-daemon to enable LMTP delivery code
1849
1850	2. Configure your DeliveryHost and DeliveryIdent in dspam.conf. Set
1851	DeliveryProto based on whether you would like to delivery via LMTP or SMTP.
1852
1853	NOTE: If you would like to delivery to different hosts based on domain,
1854	specify DeliveryHost.example.org as the configuration directive. Use
1855	DeliveryPort.example.org to specify a port for the delivery.
1856
1857	3. Add the --lmtp-recipient flag to the arguments passed into DSPAM. This is
1858	used to specify the destination address for the message. For example, in
1859	postfix:
1860
1861	--lmtp-recipient=${recipient}
1862
1863	DSPAM will then connect to the specified host, and deliver using a standard
1864	LMTP looking like:
1865
1866	LHLO [ident]
1867	MAIL FROM:<> SIZE=[message_length]
1868	RCPT TO: <recipient>
1869	DATA
1870	[Message]
1871	.
1872
1873	LMTP SERVER
1874
1875	DSPAM supports a "daemon" mode where it will sit and listen for inbound
1876	connections. Depending on how the server is configured, DSPAM can speak
1877	either standard LMTP (for interaction with a mail server, such as postfix)
1878	or DLMTP (DSPAM LMTP) which is a proprietary implementation of LMTP between
1879	the DSPAM client and server. If you plan on calling DSPAM from the commandline
1880	via dspamc, but wish to have a stateful daemon perform processing, then
1881	you'll want to use the "dspam" server mode. If you want to call DSPAM by
1882	having your mail server connect to it via LMTP, then you'll need to specify
1883	the "standard" server mode.
1884
1885	The ServerMode can be set in dspam.conf. Each mode has its own custom
1886	tweaks and configurations that will need to be set in dspam.conf.
1887
1888	"dspam" mode settings.
1889	In "dspam" mode, you'll need to set up authentication for each dspam client
1890	relay. This involves configuring the relay ident and password. Examples are
1891	provided.
1892
1893	"dspam" mode notes.
1894	In dspam mode, only the dspam client will be connecting to your LMTP server.
1895	This can be dspamc (a thin-client) or the dspam binary. In either case,
1896	you'll need to specify --client to tell DSPAM to act as a client. DLMTP
1897	allows the client to pass in any commandline arguments provided, so it should
1898	function identical to if you were running it as a dedicated (non-stateful)
1899	process.
1900
1901	"standard" mode settings.
1902	In "standard" mode, you will need to configure the ServerParameters flag to
1903	reflect the commandline parameters you would normally want to pass to DSPAM.
1904
1905	"standard" mode notes.
1906	One thing to watch out for is that the recipient you're sending via LMTP is
1907	unique to a specific user. This means that all of your aliases should be
1908	resolved before the MTA relays to DSPAM. Because DSPAM uses the addresses in
1909	the RCPT TO as usernames, _not_ resolving any aliases will result in
1910	multiple databases being created for one user. Since the signature will be
1911	different for each user, and since the message must be processed
1912	differently for each user, DSPAM demultiplexes a multi-recipient email. This
1913	means that while it can receive an email with multiple RCPT TO's specified, it
1914	will perform delivery individually.
1915
1916	"auto" mode setting.
1917	If you would like to support both connecting MTAs and remote dspam client
1918	processes (such as for inoculations), you can set the server mode to auto,
1919	which will base its dialect on the ident supplied in the LHLO. If the LHLO
1920	ident matches an ident in dspam.conf's ServerPass section, the server will
1921	default to DLMTP. Otherwise, DSPAM will assume the client is a standard
1922	LMTP client and speak standard LMTP.
1923
1924	LOCAL DELIVERY WITH LMTP FRONT-END
1925
1926	In some circumstances, you may want to relay to DSPAM via LMTP, but have
1927	DSPAM deliver via LDA. In these cases, you may use the following
1928	conventions in your ServerParameters configuration:
1929
1930	%r - The RCPT TO passed in via LMTP
1931	%s - The MAIL FROM passed in via LMTP
1932
1933	In both cases, the content provided between < > is what is actually used.
1934
1935	2.5 DSPAM USER PREFERENCES
1936
1937	Preferences are settings that can be configured globally in dspam.conf or
1938	for individual users via the dspam_admin command.
1939
1940	trainingMode { TOE \| TUM \| TEFT \| NOTRAIN }
1941	How DSPAM should train messages it analyzes. See section 1.5 --mode
1942	(default:teft, see dspam.conf)
1943
1944	spamAction { quarantine \| tag \| deliver }
1945	What to do with spam. The tag and deliver options both deliver, but tag
1946	adds a special prefix to the subject, whereas deliver merely sets
1947	X-DSPAM-Result. (default:quarantine)
1948
1949	spamSubject
1950	A customized subject to prefix when spamAction=tag. (default:[SPAM])
1951
1952	statisticalSedation { 0 - 10 }
1953	The level of dampening during training (0-10, 0 = no dampening, default:0)
1954
1955	enableBNR { on \| off }
1956	Enables or disables bayesian noise reduction (default:off)
1957
1958	enableWhitelist { on \| off }
1959	Enables or disables automatic whitelisting (default:on)
1960
1961	signatureLocation { message \| headers }
1962	Where to place the DSPAM signature. Placement affects forwarding approach.
1963	(default:message)
1964
1965	tagSpam / tagNonspam { on \| off }
1966	Adds a tagline to the end of a message based on its classification; useful
1967	for things such as "Scanned by your ISP example.org". If set to on, the file
1968	msgtag.spam and/or msgtag.nonspam will be looked for in "TxtDirectory"
1969	(see dspam.conf) and appended to appropriate messages.
1970
1971	NOTE: Signed messages will not be tagged in this fashion
1972
1973	showFactors { on \| off }
1974	Whether to include an X-DSPAM-Factors header including decision-making
1975	factors (clues). NOTE: This can break RFC in some cases, and should only
1976	be used for debugging. (default:off)
1977
1978	optIn / optOut { on \| off }
1979	Depending on whether the system is opt-in or opt-out, sets the user's
1980	membership. If user is opted out (or not opted in), mail will be delivered
1981	by DSPAM without being processed.
1982
1983	whitelistThreshold { Integer }
1984	Overrides the default number of times a From: header has been seen before
1985	it is automatically whitelisted. (default:10)
1986
1987	makeCorpus { on \| off }
1988	When activated, a maildir-style corpus is maintained in the user's data
1989	directory (DSPAM_HOME/DATA/USERNAME), suitable for future retraining or
1990	other analysis. (default:off)
1991
1992	storeFragments { on \| off }
1993	When activated, the first 1k of each message are temporarily stored on
1994	the server for reference via the webui's history function. (default:off)
1995
1996	localStore { on \| off }
1997	Overrides the directory name used for the user's dspam data directory. This
1998	is useful when using recipient addresses as usernames, as it will allow
1999	all addresses belonging to a specific user to be written to a single
2000	webui directory. (default:username)
2001
2002	processorBias { on \| off }
2003	Overrides the "bias" setting in dspam.conf, which biases mail as
2004	innocent. (default:on, see dspam.conf)
2005
2006	fallbackDomain { on \| off }
2007	Allows a dspam user ("@example.org") to be marked as a fallback user for
2008	the entire domain, so if the destination dspam user does not exist in
2009	the database, the fallback user's database will be used. The
2010	dspam.conf "FallbackDomains" setting must also be "on". (default:off)
2011	NOTE: You will need to set "FallbackDomains on" in dspam.conf to use this.
2012
2013	trainPristine { on \| off }
2014	Override's the default signature mode and treats messages as if they were
2015	in pristine format when retraining. This requires all retraining to use
2016	the original message that was processed as no dspam signature is stored
2017	for pristine training. (default:off)
2018
2019	optOutClamAV { on \| off }
2020	Opts out of ClamAV virus scanning (if ClamAV is directly integrated with
2021	dspam via dspam.conf). (default:off)
2022
2023	ignoreRBLLookups { on \| off }
2024	Overrides the "Lookup" setting in dspam.conf, which lookups senders IP
2025	addresses in a Realtime Blackhole List (RBL). (default:off)
2026
2027	RBLInoculate { on \| off }
2028	Overrides the "RBLInoculate" setting in dspam.conf, which inoculates mail
2029	as spam if lookup result is positive. (default: depending on dspam.conf)
2030
2031	NOTE: This user preference has higher weight then the one set in dspam.conf.
2032	If you don't set this user preference to on/off then whatever is set in
2033	dspam.conf will be used for every user.
2034
2035	2.6 FALLBACK DOMAINS
2036
2037	Fallback domains allow you to default some or all users for a particular
2038	domain to a single domain user; this allows you to set preferences (including
2039	opting out of filtering entirely) for users based on domain name. Any user
2040	who does not exist as a known user to DSPAM will be defaulted to the
2041	domain it belongs to if it is designated as a fallback domain. This
2042	means that you can create bob@example.org and alice@example.org with their own
2043	databases and preferences, but also default all other users to @example.org.
2044	Alternatively, you could create just the domain without any other users and
2045	default all users to @example.org
2046
2047	To use fallback domains, you'll first need to activate this feature in
2048	dspam.conf:
2049
2050	FallbackDomains on
2051
2052	Next, you'll need to create a dspam user for each domain you wish to use
2053	as a fallback domain. For example, @example.org. Depending on your
2054	implementation, this may be a simple insert into dspam_virtual_uids or may
2055	be created automatically when setting a user's preferences.
2056
2057	Finally, designate that special user as a fallback domain by setting a
2058	preference:
2059
2060	dspam_admin ch pref @example.org fallbackDomain on
2061
2062	Any mail coming in for that domain that does _not_ match a known user in
2063	dspam will now fall back to this user; you can then set specific preferences
2064	or even opt out the entire user. Alternatively, you can create a domain-based
2065	database for filtering mail specific to that domain, just as you would a
2066	normal user.
2067
2068	2.7 EXTERNAL USER LOOKUP
2069	External User Lookup has two major applications. It allows DSPAM to validate
2070	the supplied username in setups where users are Opt'ed-In by default, and there
2071	is no prior recipient checking from the MTA. In those cases, it can be configured
2072	not to automatically create the user entries in the DSPAM system and thus spare
2073	you from polute the DSPAM database with inexistent users.
2074	The other application is when you need username rewritting/mapping. That will
2075	happen when you need to map several email addresses (aliases) into a single
2076	user account or when you wish to integrate DSPAM into systems where the users
2077	email addresses or usernames can change. This will allow you to define alternate
2078	static identifiers while still keeping the users DSPAM dictionaries, across
2079	username/email address change, without dictionary maintenance.
2080
2081	Currently, there are three different modes of operation and two backend lookup
2082	drivers. The mode can be set using the ExtLookupMode directive and the available
2083	possibilities are:
2084
2085	verify - It will verify that the supplied username exists in lookup backend. In
2086	the event that it cannot be verified, DSPAM will not create the user entry in it's
2087	backend facilities.
2088
2089	map - It will NOT verify that the supplied username exists in the lookup backend.
2090	It will, though, try to use the lookup backend to map (rewrite) the username. If
2091	There is a map/rewrite available, it will use the retrieved username, instead of
2092	the supplied one. On the other hand, if there is no map/rewrite available, DSPAM
2093	will use the supplied username and create the respective entries in it's backend.
2094
2095	strict - It will enforce both verify AND map modes. Meaning that it will rewrite
2096	the username, if a rewrite is available, and will also only create that user entry
2097	in it's backend system if there was a successful map/rewrite.
2098
2099	The backend lookup drivers available are only two at the moment, LDAP and Program.
2100	The LDAP drivers allows DSPAM to query an LDAP server for a custom attribute, defined
2101	by the ExtLookupLDAPAttribute directive. The query can be fine grained using the
2102	ExtLookupQuery directive to provide a standard LDAP filter, where %u will be replaced
2103	by the username provided to DSPAM. Literal percentage can used if escaped with
2104	another % sign, i.e., %% will match % in the query filter.
2105	The Program driver exists because this seemed a neat feature and not every one
2106	uses LDAP. In this case, the ExtLookupServer directive will be used to define
2107	the custom program/script call, with the respective arguments. Also here %u can
2108	be used to define the provided username and literal % can be achieved by escaping
2109	the percentage sign with another '%'. Using the program driver, DSPAM will use
2110	whatever was the first line output of the program/script execution.
2111
2112
2113	3.0 BUGS, FEATURE REQUESTS
2114
2115	Please use our Bug Tracker on the sourceforge project page at
2116	http://sourceforge.net/projects/dspam for the current known bugs list and
2117	proper reporting procedure.
2118
2119	In the same place you can ask for new feature via the Feature Request Tracker.
2120
2121	Please note that everything under contrib/ is not officially supported by the
2122	DSPAM Project but by the respective authors; however, in order to help the
2123	authors, facilitate integration with DSPAM and release procedures, we provide
2124	a bug tracker for each script/plugin at the same URL.
2125
2126	3.1 PORTS / PACKAGES
2127
2128	The DSPAM Project does not provide binary packages of DSPAM. Each
2129	OS/distribution has its own contributors (they know perfectly their
2130	distribution's policy, their special guidelines, testing procedures, etc.).
2131
2132	Take a look at the DSPAM Wiki for packages/ports for various distributions located
2133	at http://sourceforge.net/apps/mediawiki/dspam/index.php?title=Main_Page or read
2134	http://dspam.sourceforge.net
2135
2136	If you wish to port DSPAM to an other OS/distro/platform and need help or have
2137	patches you would like to be merged in the repo please email
2138	dspam-devel@lists.sourceforge.net mailing list.
2139
2140
2141	Note:
2142
2143	In order to keep DSPAM unencumbered by intellectual property abuses, all
2144	external contributors to the project are asked to release any rights to the
2145	submission. This keeps the DSPAM project a healthy, unencumbered GPL project.
2146	Please accompany your patch, code, or other submission with the following
2147	statement. By submitting a patch to the project, you agree to be bound by
2148	the terms of this statement whether it is specifically included in the
2149	submission or not, however we still require that it be attached to the
2150	submission:
2151
2152	The author or authors of this submission hereby release any and all
2153	copyright interest in this code, documentation, or other materials
2154	included to the DSPAM project and its primary governors. We intend this
2155	relinquishment of copyright interest in perpetuity of all present and
2156	future rights to said submission under copyright law.
2157
2158	3.2 GIT ACCESS
2159
2160	The DSPAM source tree can be downloaded via read-only git access using the
2161	following commands:
2162
2163	git clone git://dspam.git.sourceforge.net/gitroot/dspam/dspam

Note: See TracBrowser for help on using the repository browser.

Download in other formats: