A simple cluster task scheduling framework

Source: Internet
Author: User
Tags flock

Speaking of task scheduling in the backend of the server cluster, this may be frequently used by many websites or computing intensive solutions.

This article does not discuss job splitting and scheduling at the map/reduce level. The scheduling framework designed in this article only meets the following features:

1) Lightweight, simple code framework and implementation principles, and easy deployment

2) The cluster can be expanded. Theoretically, the number of cluster machines and the number of tasks executed on each machine can be expanded.

3) for business modularization, the tasks defined by the business are specific and granular. This framework does not assist in splitting tasks or workflows, and only accepts tasks of the most fine granularity.

Implementation principle:

1) All computing nodes (a program instance here) have equal status

2) A task exists in the form of a file, and the computing node grabs the task by sharing the file system.

3) All computing nodes exist permanently and scan task files continuously.

4) the business system issues a task, that is, directly generating a file

We define the computing node as worker. The main logic of the worker is as follows:

While (true ){

If (find (previous unfinished task file) | find (Task file )){

Add the file extension + local IP address. instance number

Process Task

Migrate the task file to the finish directory

}

}

The Python implementation is as follows for your reference.

# Encoding = utf8''' created on 2011-9-24 @ Author: chenggongworker base class ''' import timeimport osimport refrom optparse import optionparserimport filelockerclass workerbase (object ): def _ init _ (Self): Self. patten = ". * "self. taskexname = ". TXT "def set_task_patten (self, Patten): Self. patten = Patten def set_task_exname (self, exname): Self. taskexname = ". "+ exname. replace (". "," ") def dowork (self, filename, Conte NT): Pass def tasklogic (self, filepath): With open (filepath, "R") as filehandle: filelocker. lock (filehandle, filelocker. lock_nb) # Try lock the task try: Self. log ("normal", 0, "Start executing task % s" % filepath) fsname = OS. path. basename (filepath ). split (". ") [0] success = self. dowork (fsname, filehandle. read () failed t exception, E: Self. log ("warning", 0, "the derived class has not caught the exception % s" % STR (E) filelocker. unlock (filehandle) While true: Try: If Success: Self. log ("normal", 0, "task % s ended, succeeded" % filepath) finishfile = filepath. split (". ") [0] + ". finish "If OS. path. exists (finishfile): Self. log ("warning", 0, "this task. the finish file already exists and overwrites ") OS. remove (finishfile) OS. rename (filepath, finishfile) else: Self. log ("normal", 0, "task % s ended, failed to complete" % filepath) errorfile = filepath. split (". ") [0] + ". error "If OS. path. exists (errorfile): Self. log ("warning", 0, "this task. the erorr file already exists and overwrites ") OS. Remove (errorfile) OS. rename (filepath, errorfile) Break failed t exception, E: Self. log ("error", 0, "failed to rename after the task is executed. The file system is abnormal or the task file is damaged! Failed T = % s "% STR (E) time. sleep (5) def start (Self): # Params taskdir = self. options. dir UUID = self. options. uuid ip = self. options. IP # Main Loop while true: Try: For f in OS. listdir (taskdir): filepath = OS. path. join (taskdir, f) taskname = OS. path. basename (filepath ). split (". ") [0] Try: If (not re. match (self. patten, taskname): Continue failed T: Self. log ("fetal", 0, "Patten = % s, regular expression format matching failed" % self. patten) Return Fex = OS. path. splitext (f) [1] If Fex = ". "+ UUID: # My task self. log ("normal", 0, "uncompleted task % s" % STR (F) Try: Self. tasklogic (filepath) failed T: Self. log ("warning", 0, "failed to lock the task. The task may have been locked and UUID = % s may be enabled multiple times! "% UUID) Continue Elif Fex = self. taskexname: # new task try: OS. rename (filepath, "% S. % S. % s "% (filepath, IP, UUID) self. tasklogic ("% S. % S. % s "% (filepath, IP, UUID) failed T: Self. log ("warning", 0, "task file % s failed to lock, or occupied" % filepath) Continue failed T: Self. log ("error", 0, "Connecting task folder % s failed, the network may be disconnected .. "% taskdir) time. sleep (30) def log (self, level, typeid, MSG): logdir = self. options. log if (not OS. path. exists (logdir): OS. mkdir (logdir) filename = time. strftime ('% Y-% m-% d', time. localtime (time. time () + ". log "t = time. strftime ('% H: % m: % s', time. localtime (time. time () filepath = OS. path. join (logdir, filename) with open (filepath, "A") as F: filelocker. lock (F, filelocker. lock_ex) # block lock logmsg = "[% 8 s] [% s] [% s] [% d] % s" % (T, self. options. UUID, level, typeid, MSG) F. write (logmsg + "\ n") filelocker. unlock (f) print logmsg. decode ("utf8 "). encode ("GBK") def set_options (self, options): parser = optionparser () for opt in options: parser. add_option (OPT ['option'], DEST = OPT ['value']) # public parser. add_option ("-d", DEST = "dir") parser. add_option ("-I", DEST = "ip") parser. add_option ("-u", DEST = "UUID") parser. add_option ("-l", DEST = "log") (self. options, argvs) = parser. parse_args ()

Filelocker cross-platform file lock used

#encoding=utf8# portalocker.py - Cross-platform (posix/nt) API for flock-style file locking.#                  Requires python 1.5.2 or better."""Cross-platform (posix/nt) API for flock-style file locking.Synopsis:   import portalocker   file = open("somefile", "r+")   portalocker.lock(file, portalocker.LOCK_EX)   file.seek(12)   file.write("foo")   file.close()If you know what you're doing, you may choose to   portalocker.unlock(file)before closing the file, but why?Methods:   lock( file, flags )   unlock( file )Constants:   LOCK_EX   LOCK_SH   LOCK_NBExceptions:    LockExceptionNotes:For the 'nt' platform, this module requires the Python Extensions for Windows.Be aware that this may not work as expected on Windows 95/98/ME.History:I learned the win32 technique for locking files from sample codeprovided by John Nielsen <nielsenjf@my-deja.com> in the documentationthat accompanies the win32 modules.Author: Jonathan Feinberg <jdf@pobox.com>,        Lowell Alleman <lalleman@mfps.com>Version: $Id: portalocker.py 5474 2008-05-16 20:53:50Z lowell $"""__all__ = [    "lock",    "unlock",    "LOCK_EX",    "LOCK_SH",    "LOCK_NB",    "LockException",]import osclass LockException(Exception):    # Error codes:    LOCK_FAILED = 1if os.name == 'nt':    import win32con    import win32file    import pywintypes    LOCK_EX = win32con.LOCKFILE_EXCLUSIVE_LOCK    LOCK_SH = 0 # the default    LOCK_NB = win32con.LOCKFILE_FAIL_IMMEDIATELY    # is there any reason not to reuse the following structure?    __overlapped = pywintypes.OVERLAPPED()elif os.name == 'posix':    import fcntl    LOCK_EX = fcntl.LOCK_EX    LOCK_SH = fcntl.LOCK_SH    LOCK_NB = fcntl.LOCK_NBelse:    raise RuntimeError, "PortaLocker only defined for nt and posix platforms"if os.name == 'nt':    def lock(file, flags):        hfile = win32file._get_osfhandle(file.fileno())        try:            win32file.LockFileEx(hfile, flags, 0, -0x10000, __overlapped)        except pywintypes.error, exc_value:            # error: (33, 'LockFileEx', 'The process cannot access the file because another process has locked a portion of the file.')            if exc_value[0] == 33:                raise LockException(LockException.LOCK_FAILED, exc_value[2])            else:                # Q:  Are there exceptions/codes we should be dealing with here?                raise        def unlock(file):        hfile = win32file._get_osfhandle(file.fileno())        try:            win32file.UnlockFileEx(hfile, 0, -0x10000, __overlapped)        except pywintypes.error, exc_value:            if exc_value[0] == 158:                # error: (158, 'UnlockFileEx', 'The segment is already unlocked.')                # To match the 'posix' implementation, silently ignore this error                pass            else:                # Q:  Are there exceptions/codes we should be dealing with here?                raiseelif os.name == 'posix':    def lock(file, flags):        try:            fcntl.flock(file.fileno(), flags)        except IOError, exc_value:            #  IOError: [Errno 11] Resource temporarily unavailable            if exc_value[0] == 11:                raise LockException(LockException.LOCK_FAILED, exc_value[1])            else:                raise        def unlock(file):        fcntl.flock(file.fileno(), fcntl.LOCK_UN)if __name__ == '__main__':    from time import time, strftime, localtime    import sys    log = open('\\\\10.1.10.254\\storage\\log.txt', "a+")    lock(log, LOCK_EX)    timestamp = strftime("%m/%d/%Y %H:%M:%S\n", localtime(time()))    log.write( timestamp )    print "Wrote lines. Hit enter to release lock."    dummy = sys.stdin.readline()    log.close()

Workerbase usage example

# Encoding = utf8''' created on 2011-9-24 @ Author: chenggongworker routine ''' from workerbase import workerbaseimport time # derived workbaseclass sampleworker (workerbase): # dowork method # filepath: task file name, # filehandle: Task file content def dowork (self, filepath, content): Print "dowork file = % s content = % s" % (filepath, content) print "doing... "# By self. options. XXXX can get the configured parameter print "myparam = % S % s" % (self. options. test1, self. options. test2) time. sleep (2) # log submission method self. log ("debug", 0, "logs can be submitted in this way") # True is returned if the command is successful, false is returned if the operation fails, and false is returned. '''basic command line parameters are returned, the call must contain at least the following parameters:-D task folder-l log Output Folder-I local IP-u UUID ''' if _ name _ = "_ main __ ": # instantiate sampleworker = sampleworker () # set the task file matching method. If this parameter is not set, the default value is full match. # If the following parameter is set, sampleworker matches all sampleworker files in XXX-xxx-cut. set_task_patten (". *-. *-cut ") # sets the task file extension. If this parameter is not set, the default value is TXT sampleworker. set_task_exname ("TXT") # set your own parameter sampleworker. set_options ([{"option": "-a", "value": "test1" },{ "option": "-B", "value ": "Test2"}]) # Start the main cycle sampleworker. start ()

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.