The IABSE Task Group 10 (super-long span bridge aerodynamics) has the mandate to create a standard procedure for validation of methodology and software programs applied for stability and buffeting response analyses of super-long span bridges. Precise estimations of structural stability and response to strong winds are critical for the successful design of long-span bridges. Task Group 10 covers several important problems related to its mandate including: review and verification of methods developed and adopted by researchers and bridge designers; the definition of guidelines and sample tests for verification and calibration of analytical procedures; identification of fundamental problems of the computation methods; relevant input and output data. Since the beginning of its work, this working group has developed a 3-step benchmark, with multiple sub-steps of fundamental problems to resolve. The first step of this benchmark has been a numerical comparison of the results obtained using different models adopted across the workgroup members. Using the same inputs: flutter stability and the buffeting response of both a deck sectional model and a full bridge are studied. Step 2 will be the comparison of predicted results and experimental tests in wind tunnels, and Step 3 will be of validation against full scale measurements. In this paper, the results of Step 1 will be presented, highlighting critical issues and differences found during the comparison of results. The response of a 3-degrees of freedom bridge deck will be presented both in terms of aeroelastic stability and buffeting response. The results presented are intended to be a reference for the validation of methodologies and software programs that solve for wind response of bridges.