Use hashcash to combat spam

Source: Internet
Author: User
Tags spamassassin thunderbird mail websphere application server what is wiki
Cute Python: Use hashcash to combat spam
Content:


Basic hashcash knowledge
How does bashcash play a role in email
Why does this work?
Other hashcash applications
General hashcash and my contribution
Conclusion
References
About the author
Comments on this article
Related content:
Cryptography Introduction: Part 1
Cryptography Introduction: Part 1
Cryptography Introduction: Part 1
Use spamassassin to eliminate Spam
Roaming charges: trouble everyday
All the cute Python columns on developerworks
Subscription:
Developerworks Newsletters
Developerworks subscription
(Subscribe to Cd and download)

To send spam, you have to pay the price

Level: Intermediate

David Mertz, Ph. D. Mertz@gnosis.cx)
Developers, Gnosis software, Inc.
November 2004

Hashcash is a clever system based on a widely used SHA-1 algorithm. It enables the requester to perform a large amount of parameterized work, while the evaluate program can still be tested "cheaply. In other words, the sender has to do some practical work to put some content into your inbox. You can certainly use hashcash to prevent spam, but it also has other applications, including preventing spam for Wiki and accelerating the running of distributed parallel applications. In this article, you will have access to David's Python-based hashcash implementation.

Hashcash.org web site (see references) states that the main function of the hashcash system is as a spam filter protocol:

Hashcash is a denial of service (Denial-of-Service) Counter Measurement Tool. At present, it is mainly used to help hashcash users avoid losing emails because of content-based and blacklist-based anti-spam systems.

However, I think this technology has a wide range of applicability, not just for email. This article will also introduce the application of this technology in mail filtering and provide its application in other aspects. This article describes how to implement hashcash in Python (it seems to be the firstCurrentPython version released). This implementation is now included on the hashcash.org site. David McNab created a python implementation, which uses a protocol that is not particularly similar to hashcash. Other developers have also created an incomplete version of pytyhon that implements hashcash.

However, before starting these topics, let's review what hashcash is.

Basic hashcash knowledge
Hashcash was inspired by the idea that some mathematical resultsDifficult to discover and easy to verify. A well-known example is that a factor is decomposed into a large number (especially a number with few factors ). The cost of multiplying some numbers to obtain their product is low (after all, the CPU cycle is money), but first find those factors, and the cost of this operation is much higher.

The RSA public key cryptographic system is based on this factorization feature. If the respondent is able to answer the challenge question, it means that he has done a lot of work (or secretly obtained the factor from the person who generated the combination ).

For interactive queries, factor decomposition is sufficient. For example, I have an online resource and hope you can pay a symbolic price for it. I can send you a message saying, "As long as you can break down this number, I will let you get this resource ". People without sincerity will not be able to get my resources. Only those who can prove that they are interested and pay some CPU cycle to answer this question can get this resource.

Non-interactive question
However, some resources cannot be easily used for interactive negotiation.

My email inbox is a resource that I pay great attention. But unexpected messages occupy some of my disk space and bandwidth. The worst thing is that they attract my attention. I don't mind strangers writing to me, but I hope they can get in touch with me through valuable emails with a slightly serious attitude. At least, I don't want them to be spammers who send emails containing the same message to me and millions of others, expect some of us to buy a product or get into a scam.

To enable non-interactive "Payment", hashcash allows me to distributeStandard question. In your message header, you must include a valid hashcash Stamp (hashcash stamp). Specifically, this flag contains my recipient address.

Hashcash raises a question by requesting "minters" to generate a string (stamp, stamps) When hashing is performed using the secure hash algorithm (secure hash algorithm ), there are many leading zeros in their hashes. The number of leading zeros found is the bit value of a specific stamp. Given the consistency and encryption strength of SHA-1, the only known way to find out the hashcash stamp of the given bit value is to run SHA-1 on average 2 ^ B times.

However, to confirm a stamp, you only need to perform a SHA-1 calculation. For applications in emails, the recommended value is 20-bit. To find a valid stamp, the sender must make about 1 million attempts, on the latest CPU and compiled applications, it will take less than one second. It takes only a few seconds on older machines.

Although we have already started to discuss the basic knowledge of bashcash, before continuing, let's take a look at the powerful functions of the Sha algorithm.

How powerful is Sha?
A collision (see references for a link to an e-mail pointing to Pascal Junod) was disclosed in an event proving to be of significant significance in the cryptographic community, it gives details about the actual collision ). The attack in use takes about 2 ^ 51 steps, which is far less than the 2 ^ 80 steps (and storage space) required for the expected Brute Force build collision) (follow the "birthday paradox (birthday paradox). For more information about the birthday paradox and how to apply it to the hash function, see references ).

Before worrying too much about this bashcash-related attack, remember two points: first, this method attacks SHA-0, not SHA-1 (not yet ). Another related guarantee is that, on the current fastest CPU, steps 2 ^ 51 still need more than 9 CPU years. Even if a similar method can be applied to SHA-1, the cost of constructing a false collision cannot be lower than that of constructing a larger number of 20-bit stamps (or even 40-bit hashcash stamps ).

Go back to our previous discussions.

Hashcash (version 1) Format
It is not enough to have only one specific SHA-1 hash value. We also want the stamp to be specific to the requested resource -- that is, the stamp for the mertz@gnosis.cx should have a different applicability than the one for the someuser@yahoo.com. If this is not the case, the spam producer can generate only one high-bit stamp and use it everywhere.

In addition, once a stamp is generated, I do not want any spammers who want to send me emails to share it. Therefore, hashcash uses the following two additional steps (or at least we recommend that they be part of the protocol ):

  • First, the stamp carries a date. The user may decide that the bits timestamp with a given period earlier is illegal.
  • Secondly, the hashcash client may (and probably should) implementDouble spendDatabase.

In the double spend database, each stamp can only be used once. If you receive it for the second time, it is considered illegal (very similar to the stamp being marked after use ). Specifically, the hashcash (version 1) stamp is similar to the following code:

1:bits:date:resource:ext:salt:suffix

The stamp contains seven fields.

  1. Version Number (version 0 is simpler, but has some limitations ).
  2. The declared bits. If the stamp does not actually use the declared leading zero bit for hash, it is invalid.
  3. The date (and time) when the stamp is generated ). It can be considered that the timestamps after the current time and those that were a long time ago are invalid.
  4. The resource for which the stamp is generated. It may be an email address, but it may also be a URI or another named resource.
  5. Extensions that may be required by specific applications. Any additional data can be stored here. However, this domain is usually empty so far.
  6. A random factor (SALT) that distinguishes this stamp from the one generated on the same date for all other resources ). For example, two different people can send emails to the same address on the same day. They should not be able to send messages successfully because I use the double spend database. However, if each of them uses a random factor, the complete stamp will be different.
  7. The suffix is the part that the algorithm actually works. Assuming that the first six fields are given, in order to generate a hash stamp with the expected number of leading zeros, Minter must try many consecutive suffix values.

Now let's take a look at how bashcash works in email.

How does bashcash play a role in email
In the ideal world, all senders should include the bashcash mark in their messages; the recipients will check their legitimacy when receiving them. However, hashcash has not been widely used in real life. However, the use of bashcash (either as the sender or as the receiver) does not generate anyImpact. In other words, using bashcash in an email will not cause any loss.

To add a stamp to the sent message, you only need to add the header file to the email: For eachTo:OrCc:Recipient'sX-HashcashHeader. For example, a person who wants to send a message to me may include a header file similar to the header file rfc2822:

X-Hashcash: 1:20:040927:mertz@gnosis.cx::odVZhQMP:7ca28

Obviously, Mua (Mail User proxy, Mail User agents), filter, or MTA (mail Transmission proxy, mail transport agents) should be used to do this, rather than requiring users to do it manually. However, manual completion is not difficult, at least during the experiment. First, check the hash of the stamp as follows:

$ echo -n 1:20:040927:mertz@gnosis.cx::odVZhQMP:7ca28 | sha
00000b50b85a61e7ba8ac4d5fed317c737706ae5

Note that the leading zero (each hexadecimal number is 4 bits ). Of course, you also need to check which resource is the one you identified (for example, one of your recipient addresses), which has not been used yet, and the date is the current date. In addition, the number of leading zeros owned by a valid stamp should be the same as the number claimed (however, you can decide to enforce your own minimum price for allowing mail to pass through: 20 bits are incomplete standards (semi-standard), which may eventually change with Moore's Law ).

Why does this work?
It takes only a few seconds to generate a 20-bit stamp. When you send dozens of emails in a day, the cost is not high. However, for spammers who want to send millions of messages, they cannot tolerate the additional several seconds of CPU time for each message. Only 86,400 seconds in a day. Even if spammers use the Trojans-embedded zombies technology, the specific hashcash stamp should at least reduce the amount of zombie processes sent. Of course, it takes only a fraction of a second to verify a stamp.

On the other hand, adding hashcash generation and verification to your own Mua has no negative impact on everyone else (unlike some other anti-spam methods ). For recipients who do not use the Protocol, these are just additional header files that they can easily ignore. For those senders who do not add a hashcash stamp, verifyX-Hashcash:The receiver does not need to verify any content. If the sender does not add a stamp, your situation will not be worse because of the test; it will not be better.

A good Mua or spam filtering system can whitelist emails with valid hashcash stamps ). Spamassassin even provides a higher level for more valid hashcash bits+veScore. In my opinion, applying the bashcash-Based Method to the whitelist is an improvement for interactive inquiry systems such as tmda-the question message will not be lost when returned, and the sender will not forget to respond to the question. The question response is in the original message (as a hashcash stamp ).

Other hashcash applications
Hashcash is most useful for non-interactive queries. However, there is no reason to make it unavailable in interactive context. With the addition of support for hashcash for more tools, especially for multi-purpose applications such as the Mozilla Suite, it is also easier to use bashcash in interactive and non-interactive conditions.

For example, if the Thunderbird mail tool receives an API call for hashcash computing, it should directly allow its tool Firefox web browser to respond to interactive inquiry with the API that generates the hashcash stamp.

What is wiki?
Wiki is "the simplest online database to run ". It supports the creation of hyperlinks and simple text syntax processing between new pages and pages dynamically.

Wiki is a server software that allows users to build freely using browsers.AndEditing the content of a web page provides an "open editing" service, which promotes an unusual group communication mechanism. It not only allows all users to edit the page content, but also allows users to edit the organizations that contribute to the page or site.

For more information about wiki, see the "What Is wiki" link in references.

Protect Wiki
Wiki sometimes suffers similar damages to spam. bashcash seems to be a good solution in non-Email context. Since wiki is usually open to anyone for editing, one of the disasters in the Wiki community is the Wiki-crawler destruction program, which adds irrelevant commercial links to the wiki site.

A wiki that I helped maintain has been maliciously damaged recently, forcing us to respond in an undesirable manner and asking all posting users to have a user account. These accounts are all given on the basis of equal treatment, and a message proving that a random key has been received is returned Based on the inquiry automatically sent by email. However, using such an account is fundamentally contrary to the Wiki spirit.

Adding a hashcash question does not prevent automatic destruction of the wiki site, but it can slow down the destruction. If it takes a lot of seconds to destroy a website, rather than a fraction of a second, retrieval of Wiki to find useless information is not so noticeable. In fact, I think it is a good idea to use a transfer rate greater than 20-bit in such an application. Maybe 24 bits or 28 BITs are reasonably loaded (login users can still avoid it ).

You may think that when you accept wiki editing, the normal time delay will have a similar effect, but there is a loophole in this way of thinking. Destructors can parallelize their destructive behavior. For example, if a latency of 5 seconds is added for each site, the Destructor can use these 5 seconds to start modifying other wiki items on its list. Ensure the utilization rate of the valid CPU. For example, if bashcash is used, the attacker cannot destroy the CPU in parallel.

Wiki questions can be interactive or non-interactive. The site can direct the user to a question screen before directing the user to the actual editing screen. A random resource can be generated as a question to protect the screen.

However, a better way is to make this requirement non-interactive. For example, in an existing wiki system, you can use a URL similar to the following to edit a resource:

http://somewhere.net/wiki?action=edit&id=SomeTopic

In a wiki that assumes bashcash is used for protection, different URLs may be used, for example:

http://somewhere.net/wiki?stamp=1:24:040928:SomeTopic:edit:KG4E9PaK2VLjKM2Z:0000Zbrc

The Wiki Server can verify this stamp before editing is permitted. However, you do not need to create an account or disclose any personal information for editing. Double spending and (may last for a short period of time) Expiration verification further provide a guarantee for the action to be edited. For me, generating the above URL is not difficult. Use the following command:

hashcash -mCb 24 -x edit SomeTopic

However, in general, to ensure less latency, Web browsers may choose to generate similar stamps in the background. For example, when I am reading resources, the above URL may have been created in the cache:

http://somewhere.net/wiki?SomeTopic

Other editing stamps may also be cached and used on the pages linked to the current wiki page.

Check CPU resources
An interactive application of hashcash may be used in distributed processing tasks. Some projects (such as great Internet Mersenne prime search (gimps) and SETI @ home and their tasks (such as protein folding and password problems) sometimes borrow a large number of volunteer machines, only the names of a few projects and tasks are listed here. Every volunteer only needs to download some code and run it as part of a large task, and then send the intermediate computing back to the central server. These tasks are very useful for idle CPU cycles.

Almost all distributed tasks I know allow anyone to join. However, it is hard to imagine that if a node cannot complete its tasks within the expected time range for tasks with collaboration requirements, this slow node will cause more damage to the overall computing than it has contributed.

In this case, each participating node must have a minimum CPU speed. Hashcash provides a relatively general CPU benchmark, although it uses a specific type of computing to verify the speed. SHA-1 is a typical mathematical computation. If you participate in the nodeAlreadyIf hashcash is installed (instead of some customized software tools, the answer to the hashcash question can be used as a type of validation that "you must reach a certain height to enter the room (you must be this tall to enter this ride.

The method for verifying the CPU capability is to obtain a high bit value within a short period of time. OnlyFast enoughThe CPU can answer this question. Therefore, the resource name must be provided semi-interactively. Otherwise, participants can sign their date at a later time to create an illusion of high creation speed.

For example, a fast Pentium III or G4 can generate a 20-bit stamp in less than one second, but Pentium-II or G3 cannot. We can assume a 32-bit question that the candidate machine for trial run must answer it within one hour. The requester may send an email saying, "Send a question to me"; the collaborative server responds: "The time is 040927124732; the question resource is a37tqk ." If the server receives a correct hash before that afternoon, the requester will be eligible to access the resource.

Obviously, the Protocol I proposed cannot ensure that work can be completed on every node. Even the fastest machine may experience power outages. Users may change their idea of running distributed software. However, it can at least prove that it is qualified to appear credible.

General hashcash and my contribution
In terms of hashcash, the use of specific fields and separators is arbitrary to some extent. In fact, hashcash version 0 uses a different domain from version 1. These options are good. However, I think the "actual hashcash" is only a member of a family and we may call it "General hashcash ". That is to say, if any query string is given, the following requirements can be reasonably put forward: "Give me a suffix, oncechallenge+suffixIs hashed, it will generatebBit Collision ". The real hashcash is just an example of this general query.

Now,ExistsToo common. Creating many incompatible, similar bashcash protocols does not actually benefit anyone. For example, there is a "hashcash" Python implementation that uses a challenge protocol similar to bashcash (which may be the same for encryption value), but it is almost impossible to use it to generate a hashcash stamp.

Therefore, I decided to write a python Implementation of bashcash that is actually adapted. It can even accepthashcashSimilar command line switches (however, it may be most useful as an import module for other applications ). Even on a platform with the help of psyco-ization (just a little bit), The Python version runs 10 times slower than the optimized C version. But compared with C, it can still win in terms of flexibility.

Except correct, myhashcash.pyThe module also provides an internal function_mint()And a public function.mint(). The latter generates a real hashcash version 1 stamp. That is youShould.

However, the former, that is_mint(), Completed searchingGeneralized hashcashThe underlying work of the suffix. You may not use it, but if you want to use it (and make sure you use it with caution), it is there and you can use it.

In an unusual context, the bashcash variant may be useful. In any case, I want the C tool to have similar switches, even inmanThe page provides warnings about why you shouldn't do that. They can also find the common hashcash suffix. Computer hackers like to penetrate into things.

Conclusion
I hope this article will give you a general idea about possible bashcash applications. I think the inquiry protocol described above is an extremely clever concept. The challenge is how to obtain more tools that can process bashcash stamps more seamlessly.

There are many MUa, MTA, and spam filtering tools that have done well in using bashcash, but there is still a clear gap between them. Almost no non-email application uses bashcash. However, I believe this concept is attractive.

If this concept becomes increasingly important, it will provide a method for regulating access to electronic resources that is fully compatible with free software and open standards, this method will not cause us to be troubled by digital Restrictions Management (DRM), Information commercialization, and common privacy leaks.

References

  • For more information, see the original article on the developerworks global site.

  • Visit the hashcash.org web site.
  • David's favorite reference is Wikipedia, which has a hashcash topic. To learn about wiki, you must first understand what wiki is.
  • The birthday paradox is the only one that understands the opposite of the usual intuition. Read more about Wikipedia.
  • For more information about SHA-0 collisions, see the e-mail of Pascal Junod In the cryptographic mail archive (Mail archive.
  • Guide cryptography Introduction: Part 2 (developerworks, 1st) describes cryptography and its technical, mathematical, and conceptual basics and terms. Cryptography Introduction: Part 1 (developerworks, 2nd) and cryptography Introduction: Part 2 (developerworks, 3rd) is the course's continuation.
  • For a comprehensive understanding of the utility for filtering spam, see use spamassassin to eliminate spam (developerworks, November October 2002 ).
  • Tagged message delivery agent (tmda) is a spam filtering tool based on a whitelist rather than a blacklist. hashcash can be integrated with tmda.
  • Download David'shashcash.pyModule and script, Python Implementation of hashcash version 1.
  • To learn more about Python, read other articles written by David on developerworks.All the cute python on developerworksColumn.
  • In roaming charges: trouble everyday (developerworks, October 2004), Larry Loeb describes hash collision and studies the security hashing algorithm.
  • Enhancing e-mail security with S/MIME describes in detail the role of the SHA-1 algorithm as the hash algorithm in the S/MIME e-mail security protocol (E-mail-Security Protocol.
  • Lessons in Secure Messaging Using Domino 6 (developerworks, July 2004) provides another idea of SHA-1 as a key role of the hash algorithm.
  • Order Linux books for sale at a discount in the developer bookstore Linux column.
  • You can download the free beta version of IBM middleware Products running on Linux from the speed-start your Linux app area on developerworks, these include WebSphere Studio site developer, WebSphere SDK for Web Services, WebSphere Application Server, DB2 Universal Database personal developers edition, Tivoli Access Manager, and Lotus Domino server. To get started more quickly, see how-to articles and technical support for various products.
  • Join the developerworks community by joining developerworks blogs.
  • In the developerworks Linux area, you can find more references for Linux developers.
About the author
David Mertz is completely Turing (Turing complete), but may not pass the Turing test ). To learn more about his life, visit his personal homepage. He has been writing developerworks columns since 2000.Charming PythonAndXML matters. See hisText processing in PythonA book.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.